Your browser is out of date!

Update your browser to view this website correctly. Update my browser now

×

Long term component architecture

As the main curator of open standards in Hadoop, Cloudera has a track record of bringing new open source solutions into its platform (such as Apache Spark, Apache HBase, and Apache Parquet) that are eventually adopted by the community at large. As standards, you can build longterm architecture on these components with confidence.

Thank you for choosing CDH, your download instructions are below:

Installation

This section introduces options for installing Cloudera Manager, CDH, and managed services. You can install:

  • Cloudera Manager, CDH, and managed services in a Cloudera Manager deployment. This is the recommended method for installing CDH and managed services.
  • CDH 5 into an unmanaged deployment.

Continue reading:

 

 

Cloudera Manager Deployment

A Cloudera Manager deployment consists of the following software components:

  • Oracle JDK
  • Cloudera Manager Server and Agent packages
  • Supporting database software
  • CDH and managed service software
This section describes the three main installation paths for creating a new Cloudera Manager deployment and the criteria for choosing an installation path. If your cluster already has an installation of a previous version of Cloudera Manager, follow the instructions in Upgrading Cloudera Manager.

The Cloudera Manager installation paths share some common phases, but the variant aspects of each path support different user and cluster host requirements:

  • Demonstration and proof of concept deployments - There are two installation options:
    • Installation Path A - Automated Installation by Cloudera Manager - Cloudera Manager automates the installation of the Oracle JDK, Cloudera Manager Server, embedded PostgreSQL database, and Cloudera Manager Agent, CDH, and managed service software on cluster hosts, and configures databases for the Cloudera Manager Server and Hive Metastore and optionally for Cloudera Management Service roles. This path is recommended for demonstration and proof of concept deployments, but is not recommended for production deployments because its not intended to scale and may require database migration as your cluster grows. To use this method, server and cluster hosts must satisfy the following requirements:
      • Provide the ability to log in to the Cloudera Manager Server host using a root account or an account that has password-less sudo permission.
      • Allow the Cloudera Manager Server host to have uniform SSH access on the same port to all hosts. See Networking and Security Requirements for further information.
      • All hosts must have access to standard package repositories and either archive.cloudera.com or a local repository with the necessary installation files.
    • Installation Path B - Manual Installation Using Cloudera Manager Packages - you install the Oracle JDK and Cloudera Manager Server, and embedded PostgreSQL database packages on the Cloudera Manager Server host. You have two options for installing Oracle JDK, Cloudera Manager Agent, CDH, and managed service software on cluster hosts: manually install it yourself or use Cloudera Manager to automate installation. However, in order for Cloudera Manager to automate installation of Cloudera Manager Agent packages or CDH and managed service software, cluster hosts must satisfy the following requirements:
      • Allow the Cloudera Manager Server host to have uniform SSH access on the same port to all hosts. See Networking and Security Requirements for further information.
      • All hosts must have access to standard package repositories and either archive.cloudera.com or a local repository with the necessary installation files.
  • Production deployments - require you to first manually install and configure a production database for the Cloudera Manager Server and Hive Metastore. There are two installation options:
    • Installation Path B - Manual Installation Using Cloudera Manager Packages - you install the Oracle JDK and Cloudera Manager Server packages on the Cloudera Manager Server host. You have two options for installing Oracle JDK, Cloudera Manager Agent, CDH, and managed service software on cluster hosts: manually install it yourself or use Cloudera Manager to automate installation. However, in order for Cloudera Manager to automate installation of Cloudera Manager Agent packages or CDH and managed service software, cluster hosts must satisfy the following requirements:
      • Allow the Cloudera Manager Server host to have uniform SSH access on the same port to all hosts. See Networking and Security Requirements for further information.
      • All hosts must have access to standard package repositories and either archive.cloudera.com or a local repository with the necessary installation files.
    • Installation Path C - Manual Installation Using Cloudera Manager Tarballs - you install the Oracle JDK, Cloudera Manager Server, and Cloudera Manager Agent software as tarballs and use Cloudera Manager to automate installation of CDH and managed service software as parcels.

 

Unmanaged Deployment

In an unmanaged deployment, you are responsible for managing all phases of the life cycle of CDH and managed service components on each host: installation, configuration, and service life cycle operations such as start and stop. This section describes alternatives for installing CDH 5 software in an unmanaged deployment.

  • Command-line methods:
    • Download and install the CDH 5 "1-click Install" package
    • Add the CDH 5 repository
    • Build your own CDH 5 repository
    If you use one of these command-line methods, the first (downloading and installing the "1-click Install" package) is recommended in most cases because it is simpler than building or adding a repository. See Installing the Latest CDH 5 Release for detailed instructions for each of these options.
  • Tarball You can download a tarball from CDH downloads. Keep the following points in mind:
    • Installing CDH 5 from a tarball installs YARN.
    • In CDH 5, there is no separate tarball for MRv1. Instead, the MRv1 binaries, examples, etc., are delivered in the Hadoop tarball. The scripts for running MRv1 are in the bin-mapreduce1 directory in the tarball, and the MRv1 examples are in the examples-mapreduce1 directory.

Please Read and Accept our Terms

CDH 5 provides packages for Red-Hat-compatible, SLES, Ubuntu, and Debian systems as described below.

Operating System Version Packages
Red Hat Enterprise Linux (RHEL)-compatible
Red Hat Enterprise Linux 5.7 64-bit
  6.2 64-bit
  6.4 64-bit
  6.4 in SE Linux mode 64-bit
  6.5 64-bit
CentOS 5.7 64-bit
  6.2 64-bit
  6.4 64-bit
  6.4 in SE Linux mode 64-bit
  6.5 64-bit
Oracle Linux with default kernel and Unbreakable Enterprise Kernel 5.6 (UEK R2) 64-bit
  6.4 (UEK R2) 64-bit
  6.5 (UEK R2, UEK R3) 64-bit
SLES
SLES Linux Enterprise Server (SLES) 11 with Service Pack 2 or later 64-bit
Ubuntu/Debian
Ubuntu Precise (12.04) - Long-Term Support (LTS) 64-bit
  Trusty (14.04) - Long-Term Support (LTS) 64-bit
Debian Wheezy (7.0, 7.1) 64-bit

Note:

  • CDH 5 provides only 64-bit packages.
  • Cloudera has received reports that our RPMs work well on Fedora, but we have not tested this.
  • If you are using an operating system that is not supported by Cloudera packages, you can also download source tarballs from Downloads.

 

Selected tab: SupportedOperatingSystems
Component MySQL SQLite PostgreSQL Oracle Derby - see Note 4
Oozie 5.5, 5.6 8.4, 9.1, 9.2, 9.3

See Note 2

11gR2 Default
Flume Default (for the JDBC Channel only)
Hue 5.5, 5.6

See Note 1

Default 8.4, 9.1, 9.2, 9.3

See Note 2

11gR2
Hive/Impala 5.5, 5.6

See Note 1

8.4, 9.1, 9.2, 9.3

See Note 2

11gR2 Default
Sentry 5.5, 5.6

See Note 1

8.4, 9.1, 9.2,, 9.3

See Note 2

11gR2
Sqoop 1 See Note 3 See Note 3 See Note 3
Sqoop 2 See Note 4 See Note 4 See Note 4 Default

Note:

  1. MySQL 5.5 is supported on CDH 5.1. MySQL 5.6 is supported on CDH 5.1 and later.
  2. PostgreSQL 9.2 is supported on CDH 5.1 and later. PostgreSQL 9.3 is supported on CDH 5.2 and later.
  3. For the purposes of transferring data only, Sqoop 1 supports MySQL 5.0 and above, PostgreSQL 8.4 and above, Oracle 10.2 and above, Teradata 13.10 and above, and Netezza TwinFin 5.0 and above. The Sqoop metastore works only with HSQLDB (1.8.0 and higher 1.x versions; the metastore does not work with any HSQLDB 2.x versions).
  4. Sqoop 2 can transfer data to and from MySQL 5.0 and above, PostgreSQL 8.4 and above, Oracle 10.2 and above, and Microsoft SQL Server 2012 and above. The Sqoop 2 repository database is supported only on Derby.
  5. Derby is supported as shown in the table, but not always recommended. See the pages for individual components in the Cloudera Installation and Upgrade guide for recommendations.

 

 

 

Selected tab: SupportedDatabases

CDH 5 is supported with the versions shown in the table that follows.

Table 1. Supported JDK Versions

Latest Certified Version Minimum Supported Version Exceptions
1.7.0_67 1.7.0_67 None
1.8.0_11 1.8.0_11 None

Selected tab: SupportedJDKVersions

CDH requires IPv4. IPv6 is not supported.

See also Configuring Network Names.

Selected tab: SupportedInternetProtocol
Selected tab: SystemRequirements

Upstream Issues Fixed

The following upstream issues are fixed in CDH 5.3.2:

  • AVRO-1630 - Creating Builder from instance loses data
  • AVRO-1628 - Add Schema.createUnion(Schema... type)
  • AVRO-1539 - Add FileSystem-based FsInput Constructor
  • AVRO-1623 - GenericData#validate() of enum: IndexOutOfBoundsException
  • AVRO-1614 - Always getting a value...
  • AVRO-1592 - Java keyword as an enum constant in Avro schema file causes deserialization to fail.
  • AVRO-1619 - Generate better JavaDoc
  • AVRO-1622 - Add missing license headers
  • AVRO-1604 - ReflectData.AllowNull fails to generate schemas when @Nullable is present.
  • AVRO-1407 - NettyTransceiver can cause a infinite loop when slow to connect
  • AVRO-834 - Data File corruption recovery tool
  • AVRO-1596 - Cannot read past corrupted block in Avro data file
  • HADOOP-11350 - The size of header buffer of HttpServer is too small when HTTPS is enabled
  • HDFS-7707 - Edit log corruption due to delayed block removal again
  • HDFS-7718 - Store KeyProvider in ClientContext to avoid leaking key provider threads when using FileContext
  • HDFS-6425 - Large postponedMisreplicatedBlocks has impact on blockReport latency
  • HDFS-7560 - ACLs removed by removeDefaultAcl() will be back after NameNode restart/failover
  • HDFS-7513 - HDFS inotify: add defaultBlockSize to CreateEvent
  • HDFS-7158 - Reduce the memory usage of WebImageViewer
  • HDFS-7497 - Inconsistent report of decommissioning DataNodes between dfsadmin and NameNode webui
  • HDFS-6917 - Add an hdfs debug command to validate blocks, call recoverlease, etc.
  • HDFS-6779 - Add missing version subcommand for hdfs
  • YARN-2697 - RMAuthenticationHandler is no longer useful
  • YARN-2656 - RM web services authentication filter should add support for proxy user
  • YARN-3082 - Non thread safe access to systemCredentials in NodeHeartbeatResponse processing
  • YARN-3079 - Scheduler should also update maximumAllocation when updateNodeResource.
  • YARN-2992 - ZKRMStateStore crashes due to session expiry
  • YARN-2675 - containersKilled metrics is not updated when the container is killed during localization
  • YARN-2715 - Proxy user is problem for RPC interface if yarn.resourcemanager.webapp.proxyuser is not set.
  • MAPREDUCE-6198 - NPE from JobTracker#resolveAndAddToTopology in MR1 cause initJob and heartbeat failure.
  • MAPREDUCE-6196 - Fix BigDecimal ArithmeticException in PiEstimator
  • HBASE-12540 - TestRegionServerMetrics#testMobMetrics test failure
  • HBASE-12533 - staging directories are not deleted after secure bulk load
  • HBASE-12077 - FilterLists create many ArrayList$Itr objects per row.
  • HBASE-12386 - Replication gets stuck following a transient zookeeper error to remote peer cluster
  • HBASE-11979 - Compaction progress reporting is wrong
  • HBASE-12445 - hbase is removing all remaining cells immediately after the cell marked with marker = KeyValue.Type.DeleteColumn via PUT
  • HBASE-12837 - ReplicationAdmin leaks zk connections
  • HIVE-7647 - Beeline does not honor --headerInterval and --color when executing with "-e"
  • HIVE-7733 - Ambiguous column reference error on query
  • HIVE-9303 - Parquet files are written with incorrect definition levels
  • HIVE-8444 - update pom to junit 4.11
  • HIVE-9474 - truncate table changes permissions on the target
  • HIVE-9462 - HIVE-8577 - breaks type evolution
  • HIVE-9482 - Hive parquet timestamp compatibility
  • HIVE-6308 - COLUMNS_V2 Metastore table not populated for tables created without an explicit column list.
  • HIVE-9502 - Parquet cannot read Map types from files written with Hive 0.12 or earlier
  • HIVE-9445 - Revert HIVE-5700 - enforce single date format for partition column storage
  • HIVE-9393 - reduce noisy log level of ColumnarSerDe.java:116 from INFO to DEBUG
  • HIVE-7800 - Parquet Column Index Access Schema Size Checking
  • HIVE-9330 - DummyTxnManager will throw NPE if WriteEntity writeType has not been set
  • HIVE-9265 - Hive with encryption throws NPE to fs path without schema
  • HIVE-9199 - Excessive exclusive lock used in some DDLs with DummyTxnManager
  • HIVE-6978 - beeline always exits with 0 status, should exit with non-zero status on error
  • HUE-2556 - [core] Cannot update project tags of a document
  • HUE-2528 - Partitions limit gets capped to 1000 despite configuration
  • HUE-2548 - [metastore] Create table then load data does redirect to the table page
  • HUE-2525 - [core] Fix manual install of samples
  • HUE-2501 - [metastore] Creating a table with header files bigger than 64MB truncates it
  • HUE-2484 - [beeswax] Configure support for Hive Server2 LDAP authentication
  • HUE-2532 - [search] Fix share URL on Internet Explorer
  • HUE-2531 - [impala] Autogrow missing result list
  • HUE-2524 - [impala] Sort numerically recent queries tab
  • HUE-2495 - [oozie] Improve dashboards sorting mechanism
  • HUE-2511 - [impala] Infinite scroll keeps fetching results even if finished
  • HUE-2102 - [oozie] Workflow with credentials can't be used with Coordinator
  • HUE-2152 - [pig] Credentials support in editor
  • OOZIE-2131 - Add flag to sqoop action to skip hbase delegation token generation
  • OOZIE-2047 - Oozie does not support Hive tables that use datatypes introduced since Hive 0.8
  • OOZIE-2102 - Streaming actions are broken cause of incorrect method signature
  • PARQUET-173 - StatisticsFilter doesn't handle And properly
  • PARQUET-157 - Divide by zero in logging code
  • PARQUET-142 - parquet-tools doesn't filter _SUCCESS file
  • PARQUET-124 - parquet.hadoop.ParquetOutputCommitter.commitJob() throws parquet.io.ParquetEncodingException
  • PARQUET-136 - NPE thrown in StatisticsFilter when all values in a string/binary column trunk are null
  • PARQUET-168 - Wrong command line option description in parquet-tools
  • PARQUET-145 - InternalParquetRecordReader.close() should not throw an exception if initialization has failed
  • PARQUET-140 - Allow clients to control the GenericData object that is used to read Avro records
  • SOLR-7033 - [RecoveryStrategy should not publish any state when closed / cancelled.
  • SOLR-5961 - Solr gets crazy on /overseer/queue state change
  • SOLR-6640 - Replication can cause index corruption
  • SOLR-5875 - QueryComponent.mergeIds() unmarshals all docs' sort field values once per doc instead of once per shard
  • SOLR-6919 - Log REST info before executing
  • SOLR-6969 - When opening an HDFSTransactionLog for append we must first attempt to recover it's lease to prevent data loss.
  • SOLR-5515 - NPE when getting stats on date field with empty result on solrcloud
  • SPARK-3778 - newAPIHadoopRDD doesn't properly pass credentials for secure hdfs on yarn
  • SPARK-4835 - Streaming saveAs*HadoopFiles() methods may throw FileAlreadyExistsException during checkpoint recovery
  • SQOOP-2057 - Skip delegation token generation flag during hbase import
  • SQOOP-1779 - Add support for --hive-database when importing Parquet files into Hive
  • IMPALA-1622 - Fix overflow in StringParser::StringToFloatInternal()
  • IMPALA-1614 - Compute stats fails if table name starts with number
  • IMPALA-1623 - unix_timestamp() does not return correct time
  • IMPALA-1535 - Partition pruning with NULL
  • IMPALA-1606 - Impala does not always give short name to Llama
  • IMPALA-1120 - Fetch column statistics using Hive 0.13 bulk API

In addition, CDH 5.3.2 reverts YARN-2713, which has caused problems since its inclusion in CDH 5.3.0.

 

 

 

Selected tab: WhatsNew

Want to Get Involved or Learn More?

Check out our other resources

Cloudera Community

Collaborate with your peers, industry experts, and Clouderans to make the most of your investment in Hadoop.

Cloudera University

Receive expert Hadoop training through Cloudera University, the industry's only truly dynamic Hadoop training curriculum that’s updated regularly to reflect the state of the art in big data.