Your browser is out of date!

Update your browser to view this website correctly. Update my browser now

×

Long term component architecture

As the main curator of open standards in Hadoop, Cloudera has a track record of bringing new open source solutions into its platform (such as Apache Spark, Apache HBase, and Apache Parquet) that are eventually adopted by the community at large. As standards, you can build longterm architecture on these components with confidence.

Thank you for choosing CDH, your download instructions are below:

Installation

 

This section introduces options for installing Cloudera Manager, CDH, and managed services. You can install:

  • Cloudera Manager, CDH, and managed services in a Cloudera Manager deployment. This is the recommended method for installing CDH and managed services.
  • CDH 5 into an unmanaged deployment.

 

 

 

 

Cloudera Manager Deployment

 

A Cloudera Manager deployment consists of the following software components:

  • Oracle JDK
  • Cloudera Manager Server and Agent packages
  • Supporting database software
  • CDH and managed service software

This section describes the three main installation paths for creating a new Cloudera Manager deployment and the criteria for choosing an installation path. If your cluster already has an installation of a previous version of Cloudera Manager, follow the instructions in Upgrading Cloudera Manager.

 

The Cloudera Manager installation paths share some common phases, but the variant aspects of each path support different user and cluster host requirements:

  • Demonstration and proof of concept deployments - There are two installation options:
    • Installation Path A - Automated Installation by Cloudera Manager - Cloudera Manager automates the installation of the Oracle JDK, Cloudera Manager Server, embedded PostgreSQL database, and Cloudera Manager Agent, CDH, and managed service software on cluster hosts, and configures databases for the Cloudera Manager Server and Hive Metastore and optionally for Cloudera Management Service roles. This path is recommended for demonstration and proof of concept deployments, but is not recommended for production deployments because its not intended to scale and may require database migration as your cluster grows. To use this method, server and cluster hosts must satisfy the following requirements:
      • Provide the ability to log in to the Cloudera Manager Server host using a root account or an account that has password-less sudo permission.
      • Allow the Cloudera Manager Server host to have uniform SSH access on the same port to all hosts. See Networking and Security Requirementsfor further information.
      • All hosts must have access to standard package repositories and either archive.cloudera.com or a local repository with the necessary installation files.
    • Installation Path B - Manual Installation Using Cloudera Manager Packages - you install the Oracle JDK and Cloudera Manager Server, and embedded PostgreSQL database packages on the Cloudera Manager Server host. You have two options for installing Oracle JDK, Cloudera Manager Agent, CDH, and managed service software on cluster hosts: manually install it yourself or use Cloudera Manager to automate installation. However, in order for Cloudera Manager to automate installation of Cloudera Manager Agent packages or CDH and managed service software, cluster hosts must satisfy the following requirements:
      • Allow the Cloudera Manager Server host to have uniform SSH access on the same port to all hosts. See Networking and Security Requirementsfor further information.
      • All hosts must have access to standard package repositories and either archive.cloudera.com or a local repository with the necessary installation files.
  • Production deployments - require you to first manually install and configure a production database for the Cloudera Manager Server and Hive Metastore. There are two installation options:
    • Installation Path B - Manual Installation Using Cloudera Manager Packages - you install the Oracle JDK and Cloudera Manager Server packages on the Cloudera Manager Server host. You have two options for installing Oracle JDK, Cloudera Manager Agent, CDH, and managed service software on cluster hosts: manually install it yourself or use Cloudera Manager to automate installation. However, in order for Cloudera Manager to automate installation of Cloudera Manager Agent packages or CDH and managed service software, cluster hosts must satisfy the following requirements:
      • Allow the Cloudera Manager Server host to have uniform SSH access on the same port to all hosts. See Networking and Security Requirementsfor further information.
      • All hosts must have access to standard package repositories and either archive.cloudera.com or a local repository with the necessary installation files.
    • Installation Path C - Manual Installation Using Cloudera Manager Tarballs - you install the Oracle JDK, Cloudera Manager Server, and Cloudera Manager Agent software as tarballs and use Cloudera Manager to automate installation of CDH and managed service software as parcels.

 

 

 

 

Unmanaged Deployment

 

In an unmanaged deployment, you are responsible for managing all phases of the life cycle of CDH and managed service components on each host: installation, configuration, and service life cycle operations such as start and stop. This section describes alternatives for installing CDH 5 software in an unmanaged deployment.

  • Command-line methods:
    • Download and install the CDH 5 "1-click Install" package
    • Add the CDH 5 repository
    • Build your own CDH 5 repository
    If you use one of these command-line methods, the first (downloading and installing the "1-click Install" package) is recommended in most cases because it is simpler than building or adding a repository. See Installing the Latest CDH 5 Release for detailed instructions for each of these options.
  • Tarball You can download a tarball from CDH downloads. Keep the following points in mind:
    • Installing CDH 5 from a tarball installs YARN.
    • In CDH 5, there is no separate tarball for MRv1. Instead, the MRv1 binaries, examples, etc., are delivered in the Hadoop tarball. The scripts for running MRv1 are in the bin-mapreduce1 directory in the tarball, and the MRv1 examples are in the examples-mapreduce1 directory.

 

 

 

 

Please Read and Accept our Terms

CDH 5 provides packages for Red-Hat-compatible, SLES, Ubuntu, and Debian systems as described below.

Operating System Version Packages
Red Hat Enterprise Linux (RHEL)-compatible
Red Hat Enterprise Linux 5.7 64-bit
  5.10 64-bit
  6.4 64-bit
  6.5 64-bit
  6.5 in SE Linux mode 64-bit
  6.6 64-bit
CentOS 5.7 64-bit
  5.10 64-bit
  6.4 64-bit
  6.5 64-bit
  6.5 in SE Linux mode 64-bit
  6.6 64-bit
Oracle Linux with default kernel and Unbreakable Enterprise Kernel 5.6 (UEK R2) 64-bit
  6.4 (UEK R2) 64-bit
  6.5 (UEK R2, UEK R3) 64-bit
  6.6 (UEK R3) 64-bit
SLES
SUSE Linux Enterprise Server (SLES) 11 with Service Pack 2 64-bit
SUSE Linux Enterprise Server (SLES) 11 with Service Pack 3 64-bit
Ubuntu/Debian
Ubuntu Precise (12.04) - Long-Term Support (LTS) 64-bit
  Trusty (14.04) - Long-Term Support (LTS) 64-bit
Debian Wheezy (7.0) 64-bit

Note:

  • CDH 5 provides only 64-bit packages.
  • Cloudera has received reports that our RPMs work well on Fedora, but we have not tested this.
  • If you are using an operating system that is not supported by Cloudera packages, you can also download source tarballs from Downloads.

 

 

 

 

Selected tab: SupportedOperatingSystems
Component MySQL SQLite PostgreSQL Oracle Derby - see Note 4
Oozie 5.5, 5.6 8.4, 9.2, 9.3

See Note 2

11gR2 Default
Flume Default (for the JDBC Channel only)
Hue 5.5, 5.6

See Note 1

Default 8.4, 9.2, 9.3

See Note 2

11gR2
Hive/Impala 5.5, 5.6

See Note 1

8.4, 9.2, 9.3

See Note 2

11gR2 Default
Sentry 5.5, 5.6

See Note 1

8.4, 9.2, 9.3

See Note 2

11gR2
Sqoop 1 See Note 3 See Note 3 See Note 3
Sqoop 2 See Note 4 See Note 4 See Note 4 Default

Note:

  1. MySQL 5.5 is supported on CDH 5.1. MySQL 5.6 is supported on CDH 5.1 and later. The InnoDB storage engine must be enabled in the MySQL server.
  2. PostgreSQL 9.2 is supported on CDH 5.1 and later. PostgreSQL 9.3 is supported on CDH 5.2 and later.
  3. For the purposes of transferring data only, Sqoop 1 supports MySQL 5.0 and above, PostgreSQL 8.4 and above, Oracle 10.2 and above, Teradata 13.10 and above, and Netezza TwinFin 5.0 and above. The Sqoop metastore works only with HSQLDB (1.8.0 and higher 1.x versions; the metastore does not work with any HSQLDB 2.x versions).
  4. Sqoop 2 can transfer data to and from MySQL 5.0 and above, PostgreSQL 8.4 and above, Oracle 10.2 and above, and Microsoft SQL Server 2012 and above. The Sqoop 2 repository database is supported only on Derby and PostgreSQL.
  5. Derby is supported as shown in the table, but not always recommended. See the pages for individual components in the Cloudera Installation and Upgrade guide for recommendations.

 

Selected tab: SupportedDatabases
CDH 5.4.x is supported with the versions shown in the following table:
Minimum Supported Version Recommended Version Notes
1.7.0_55 1.7.0_67 or 1.7.0_75 None
1.8.0_60 1.8.0_60 None
Selected tab: SupportedJDKVersions

CDH requires IPv4. IPv6 is not supported.

See also Configuring Network Names.

Selected tab: SupportedInternetProtocol
Selected tab: SystemRequirements

Issues Fixed in CDH 5.4.11

  • FLUME-2891 - Revert FLUME-2712 and FLUME-2886
  • FLUME-2908 - NetcatSource - SocketChannel not closed when session is broken
  • HADOOP-8436 - NPE In getLocalPathForWrite ( path, conf ) when the required context item is not configured
  • HADOOP-8437 - getLocalPathForWrite should throw IOException for invalid paths
  • HADOOP-8751 - NPE in Token.toString() when Token is constructed using null identifier
  • HADOOP-8934 - Shell command ls should include sort options
  • HADOOP-10048 - LocalDirAllocator should avoid holding locks while accessing the filesystem
  • HADOOP-10971 - Add -C flag to make `hadoop fs -ls` print filenames only
  • HADOOP-11901 - BytesWritable fails to support 2G chunks due to integer overflow
  • HADOOP-12252 - LocalDirAllocator should not throw NPE with empty string configuration
  • HADOOP-12269 - Update aws-sdk dependency to 1.10.6
  • HADOOP-12787 - KMS SPNEGO sequence does not work with WEBHDFS
  • HADOOP-12841 - Update s3-related properties in core-default.xml.
  • HADOOP-12901 - Add warning log when KMSClientProvider cannot create a connection to the KMS server.
  • HADOOP-12972 - Lz4Compressor#getLibraryName returns the wrong version number
  • HADOOP-13079 - Add -q option to Ls to print ? instead of non-printable characters
  • HADOOP-13132 - Handle ClassCastException on AuthenticationException in LoadBalancingKMSClientProvider
  • HADOOP-13155 - Implement TokenRenewer to renew and cancel delegation tokens in KMS
  • HADOOP-13251 - Authenticate with Kerberos credentials when renewing KMS delegation token
  • HADOOP-13255 - KMSClientProvider should check and renew tgt when doing delegation token operations
  • HADOOP-13263 - Reload cached groups in background after expiry.
  • HADOOP-13457 - Remove hardcoded absolute path for shell executable.
  • HDFS-4660 - Block corruption can happen during pipeline recovery
  • HDFS-8211 - DataNode UUID is always null in the JMX counter.
  • HDFS-8451 - DFSClient probe for encryption testing interprets empty URI property for enabled
  • HDFS-8496 - Calling stopWriter() with FSDatasetImpl lock held may block other threads
  • HDFS-8576 - Lease recovery should return true if the lease can be released and the file can be closed
  • HDFS-8722 - Optimize datanode writes for small writes and flushes
  • HDFS-9085 - Show renewer information in DelegationTokenIdentifier#toString
  • HDFS-9220 - Reading small file (< 512 bytes) that is open for append fails due to incorrect checksum
  • HDFS-9276 - Failed to Update HDFS Delegation Token for long running application in HA mode
  • HDFS-9466 - TestShortCircuitCache#testDataXceiverCleansUpSlotsOnFailure is flaky
  • HDFS-9589 - Block files which have been hardlinked should be duplicated before the DataNode appends to the them
  • HDFS-9700 - DFSClient and DFSOutputStream should set TCP_NODELAY on sockets for DataTransferProtocol
  • HDFS-9732 - Improve DelegationTokenIdentifier.toString() for better logging
  • HDFS-9939 - Increase DecompressorStream skip buffer size
  • HDFS-9949 - Add a test case to ensure that the DataNode does not regenerate its UUID when a storage directory is cleared
  • HDFS-10267 - Extra "synchronized" on FsDatasetImpl#recoverAppend and FsDatasetImpl#recoverClose
  • HDFS-10360 - DataNode might format directory and lose blocks if current/VERSION is missing.
  • HDFS-10381 - , DataStreamer DataNode exclusion log message should be warning.
  • MAPREDUCE-4785 - TestMRApp occasionally fails
  • MAPREDUCE-6580 - Test failure: TestMRJobsWithProfiler
  • YARN-2871 - TestRMRestart#testRMRestartGetApplicationList sometimes fails in trunk
  • YARN-3727 - Check if the directory exists before using it for localization
  • YARN-4168 - Fixed a failing test TestLogAggregationService.testLocalFileDeletionOnDiskFull
  • YARN-4354 - Public resource localization fails with NPE
  • YARN-4717 - TestResourceLocalizationService.testPublicResourceInitializesLocalDir fails Intermittently due to IllegalArgumentException from cleanup
  • YARN-5048 - DelegationTokenRenewer#skipTokenRenewal might throw NPE
  • HBASE-6617 - ReplicationSourceManager should be able to track multiple WAL paths (ADDENDUM)
  • HBASE-11625 - Verifies data before building HFileBlock. - Adds HFileBlock.Header class which contains information about location of fields. Testing: Adds CorruptedFSReaderImpl to TestChecksum.
  • HBASE-11927 - Use Native Hadoop Library for HFile checksum.
  • HBASE-14155 - StackOverflowError in reverse scan
  • HBASE-14359 - HTable#close will hang forever if unchecked error/exception thrown in AsyncProcess#sendMultiAction
  • HBASE-14730 - region server needs to log warnings when there are attributes configured for cells with hfile v2
  • HBASE-14759 - Avoid using Math.abs when selecting SyncRunner in FSHLog
  • HBASE-15234 - Don't abort ReplicationLogCleaner on ZooKeeper errors
  • HBASE-15456 - CreateTableProcedure/ModifyTableProcedure needs to fail when there is no family in table descriptor
  • HBASE-15479 - No more garbage or beware of autoboxing
  • HBASE-15582 - SnapshotManifestV1 too verbose when there are no regions
  • HBASE-15707 - ImportTSV bulk output does not support tags with hfile.format.version=3
  • HBASE-15746 - Remove extra RegionCoprocessor preClose() in RSRpcServices#closeRegion
  • HBASE-15811 - Batch Get after batch Put does not fetch all Cells We were not waiting on all executors in a batch to complete. The test for no-more-executors was damaged by the 0.99/0.98.4 fix "HBASE-11403 Fix race conditions around Object#notify"
  • HBASE-15925 - provide default values for hadoop compat module related properties that match default hadoop profile.
  • HBASE-16207 - can't restore snapshot without "Admin" permission
  • HIVE-9499 - hive.limit.query.max.table.partition makes queries fail on non-partitioned tables
  • HIVE-10048 - JDBC - Support SSL encryption regardless of Authentication mechanism
  • HIVE-10303 - HIVE-9471 broke forward compatibility of ORC files
  • HIVE-10685 - Alter table concatenate oparetor will cause duplicate data
  • HIVE-10925 - Non-static threadlocals in metastore code can potentially cause memory leak
  • HIVE-11031 - ORC concatenation of old files can fail while merging column statistics
  • HIVE-11054 - Handle varchar/char partition columns in vectorization
  • HIVE-11243 - Changing log level in Utilities.getBaseWork
  • HIVE-11408 - HiveServer2 is leaking ClassLoaders when add jar / temporary functions are used due to constructor caching in Hadoop ReflectionUtils
  • HIVE-11427 - Location of temporary table for CREATE TABLE SELECT broken by HIVE-7079.
  • HIVE-11488 - Combine the following jiras for "Support sessionId and queryId logging"Add sessionId and queryId info to HS2 log (Aihua Xu, reviewed by Szehon Ho) HIVE-12456: QueryId can't be stored in the configuration of the SessionState since multiple queries can run in a single session
  • HIVE-11583 - When PTF is used over a large partitions result could be corrupted
  • HIVE-11747 - Unnecessary error log is shown when executing a "INSERT OVERWRITE LOCAL DIRECTORY" cmd in the embedded mode
  • HIVE-11827 - STORED AS AVRO fails SELECT COUNT(*) when empty
  • HIVE-11919 - Hive Union Type Mismatch
  • HIVE-12354 - MapJoin with double keys is slow on MR
  • HIVE-12431 - Support timeout for compile lock
  • HIVE-12481 - Occasionally "Request is a replay" will be thrown from HS2
  • HIVE-12635 - Hive should return the latest hbase cell timestamp as the row timestamp value
  • HIVE-12958 - Make embedded Jetty server more configurable
  • HIVE-13200 - Aggregation functions returning empty rows on partitioned columns
  • HIVE-13251 - hive can't read the decimal in AVRO file generated from previous version
  • HIVE-13285 - Orc concatenation may drop old files from moving to final path
  • HIVE-13286 - Query ID is being reused across queries
  • HIVE-13462 - HiveResultSetMetaData.getPrecision() fails for NULL columns
  • HIVE-13527 - Using deprecated APIs in HBase client causes zookeeper connection leaks
  • HIVE-13570 - Some queries with Union all fail when CBO is off
  • HIVE-13932 - Hive SMB Map Join with small set of LIMIT failed with NPE
  • HIVE-13953 - Issues in HiveLockObject equals method
  • HIVE-13991 - Union All on view fail with no valid permission on underneath table
  • HIVE-14118 - Make the alter partition exception more meaningful
  • HUE-3185 - [oozie] Avoid extra API calls for parent information in workflow dashboard
  • HUE-3185 - Revert "[oozie] Avoid extra API calls for parent information in workflow dashboard"
  • HUE-3185 - [oozie] Avoid extra API calls for parent information in workflow dashboard
  • HUE-3437 - [core] PamBackend does not honor ignore_username_case
  • IMPALA-2378 - check proc mem limit before preparing fragment
  • IMPALA-2612 - Free local allocations once for every row batch when building hash tables.
  • IMPALA-2711 - Fix memory leak in Rand().
  • IMPALA-2722 - Free local allocations per row batch in non-partitioned AGG and HJ
  • OOZIE-2429 - TestEventGeneration test is unreliable
  • OOZIE-2466 - Repeated failure of TestMetricsInstrumentation.testSamplers
  • OOZIE-2486 - TestSLAEventsGetForFilterJPAExecutor is unreliable
  • SENTRY-780 - HDFS Plugin should not execute path callbacks for views
  • SENTRY-1184 - Clean up HMSPaths.renameAuthzObject
  • SENTRY-1292 - Reorder DBModelAction EnumSet
  • SENTRY-1293 - Avoid converting string permission to Privilege object
  • SOLR-6631 - DistributedQueue spinning on calling zookeeper getChildren()
  • SOLR-6820 - Make the number of version buckets used by the UpdateLog configurable as increasing beyond the default 256 has been shown to help with high volume indexing performance in SolrCloudIncrease the default number of buckets to 65536 instead of 256fix numVersionBuckets name attribute in configsets
  • SOLR-7332 - Initialize the highest value for all version buckets with the max value from the index or recent updates to avoid unnecessary lookups to the index to check for reordered updates when processing new documents.
  • SOLR-7587 - TestSpellCheckResponse stalled and never timed out
  • SOLR-7625 - Version bucket seed not updated after new index is installed on a replica
  • SOLR-8152 - Overseer Task Processor/Queue can miss responses, leading to timeouts
  • SOLR-8451 - Fix backport
  • SOLR-8451 - We should not call method.abort in HttpSolrClient or HttpSolrCall#remoteQuery and HttpSolrCall#remoteQuery should not close streams.
  • SOLR-8453 - Solr should attempt to consume the request inputstream on errors as we cannot count on the container to do it.
  • SOLR-8578 - Successful or not, requests are not always fully consumed by Solrj clients and we count on HttpClient or the JVM.
  • SOLR-8633 - DistributedUpdateProcess processCommit/deleteByQuery calls finish on DUP and SolrCmdDistributor, which violates the lifecycle and can cause bugs.
  • SOLR-8683 - Tune down stream closed logging
  • SOLR-8683 - Always consume the full request on the server, not just in the case of an error.
  • SOLR-8855 - The HDFS BlockDirectory should not clean up its cache on shutdown.
  • SOLR-8856 - Do not cache merge or 'read once' contexts in the hdfs block cache.
  • SOLR-8857 - HdfsUpdateLog does not use configured or new default number of version buckets and is hard coded to 256.
  • SOLR-8869 - Optionally disable printing field cache entries in SolrFieldCacheMBean
  • SPARK-12087 - Create new JobConf for every batch in saveAsHadoopFiles

Selected tab: WhatsNew

Want to Get Involved or Learn More?

Check out our other resources

Cloudera Community

Collaborate with your peers, industry experts, and Clouderans to make the most of your investment in Hadoop.

Cloudera University

Receive expert Hadoop training through Cloudera University, the industry's only truly dynamic Hadoop training curriculum that’s updated regularly to reflect the state of the art in big data.