Your browser is out of date!

Update your browser to view this website correctly. Update my browser now

×

Long term component architecture

As the main curator of open standards in Hadoop, Cloudera has a track record of bringing new open source solutions into its platform (such as Apache Spark, Apache HBase, and Apache Parquet) that are eventually adopted by the community at large. As standards, you can build longterm architecture on these components with confidence.

Thank you for choosing CDH, your download instructions are below:

Installation

 

This section introduces options for installing Cloudera Manager, CDH, and managed services. You can install:

  • Cloudera Manager, CDH, and managed services in a Cloudera Manager deployment. This is the recommended method for installing CDH and managed services.
  • CDH 5 into an unmanaged deployment.

 

 

 

 

Cloudera Manager Deployment

 

A Cloudera Manager deployment consists of the following software components:

  • Oracle JDK
  • Cloudera Manager Server and Agent packages
  • Supporting database software
  • CDH and managed service software

This section describes the three main installation paths for creating a new Cloudera Manager deployment and the criteria for choosing an installation path. If your cluster already has an installation of a previous version of Cloudera Manager, follow the instructions in Upgrading Cloudera Manager.

 

The Cloudera Manager installation paths share some common phases, but the variant aspects of each path support different user and cluster host requirements:

  • Demonstration and proof of concept deployments - There are two installation options:
    • Installation Path A - Automated Installation by Cloudera Manager - Cloudera Manager automates the installation of the Oracle JDK, Cloudera Manager Server, embedded PostgreSQL database, and Cloudera Manager Agent, CDH, and managed service software on cluster hosts, and configures databases for the Cloudera Manager Server and Hive Metastore and optionally for Cloudera Management Service roles. This path is recommended for demonstration and proof of concept deployments, but is not recommended for production deployments because its not intended to scale and may require database migration as your cluster grows. To use this method, server and cluster hosts must satisfy the following requirements:
      • Provide the ability to log in to the Cloudera Manager Server host using a root account or an account that has password-less sudo permission.
      • Allow the Cloudera Manager Server host to have uniform SSH access on the same port to all hosts. See Networking and Security Requirementsfor further information.
      • All hosts must have access to standard package repositories and either archive.cloudera.com or a local repository with the necessary installation files.
    • Installation Path B - Manual Installation Using Cloudera Manager Packages - you install the Oracle JDK and Cloudera Manager Server, and embedded PostgreSQL database packages on the Cloudera Manager Server host. You have two options for installing Oracle JDK, Cloudera Manager Agent, CDH, and managed service software on cluster hosts: manually install it yourself or use Cloudera Manager to automate installation. However, in order for Cloudera Manager to automate installation of Cloudera Manager Agent packages or CDH and managed service software, cluster hosts must satisfy the following requirements:
      • Allow the Cloudera Manager Server host to have uniform SSH access on the same port to all hosts. See Networking and Security Requirementsfor further information.
      • All hosts must have access to standard package repositories and either archive.cloudera.com or a local repository with the necessary installation files.
  • Production deployments - require you to first manually install and configure a production database for the Cloudera Manager Server and Hive Metastore. There are two installation options:
    • Installation Path B - Manual Installation Using Cloudera Manager Packages - you install the Oracle JDK and Cloudera Manager Server packages on the Cloudera Manager Server host. You have two options for installing Oracle JDK, Cloudera Manager Agent, CDH, and managed service software on cluster hosts: manually install it yourself or use Cloudera Manager to automate installation. However, in order for Cloudera Manager to automate installation of Cloudera Manager Agent packages or CDH and managed service software, cluster hosts must satisfy the following requirements:
      • Allow the Cloudera Manager Server host to have uniform SSH access on the same port to all hosts. See Networking and Security Requirementsfor further information.
      • All hosts must have access to standard package repositories and either archive.cloudera.com or a local repository with the necessary installation files.
    • Installation Path C - Manual Installation Using Cloudera Manager Tarballs - you install the Oracle JDK, Cloudera Manager Server, and Cloudera Manager Agent software as tarballs and use Cloudera Manager to automate installation of CDH and managed service software as parcels.

 

 

 

 

Unmanaged Deployment

 

In an unmanaged deployment, you are responsible for managing all phases of the life cycle of CDH and managed service components on each host: installation, configuration, and service life cycle operations such as start and stop. This section describes alternatives for installing CDH 5 software in an unmanaged deployment.

  • Command-line methods:
    • Download and install the CDH 5 "1-click Install" package
    • Add the CDH 5 repository
    • Build your own CDH 5 repository
    If you use one of these command-line methods, the first (downloading and installing the "1-click Install" package) is recommended in most cases because it is simpler than building or adding a repository. See Installing the Latest CDH 5 Release for detailed instructions for each of these options.
  • Tarball You can download a tarball from CDH downloads. Keep the following points in mind:
    • Installing CDH 5 from a tarball installs YARN.
    • In CDH 5, there is no separate tarball for MRv1. Instead, the MRv1 binaries, examples, etc., are delivered in the Hadoop tarball. The scripts for running MRv1 are in the bin-mapreduce1 directory in the tarball, and the MRv1 examples are in the examples-mapreduce1 directory.

 

 

 

 

Please Read and Accept our Terms

 

 

CDH 5 provides packages for Red-Hat-compatible, SLES, Ubuntu, and Debian systems as described below.

Operating System Version Packages
Red Hat Enterprise Linux (RHEL)-compatible
Red Hat Enterprise Linux 5.7 64-bit
  5.10 64-bit
  6.4 64-bit
  6.5 64-bit
  6.5 in SE Linux mode 64-bit
  6.6 64-bit
CentOS 5.7 64-bit
  5.10 64-bit
  6.4 64-bit
  6.5 64-bit
  6.5 in SE Linux mode 64-bit
  6.6 64-bit
Oracle Linux with default kernel and Unbreakable Enterprise Kernel 5.6 (UEK R2) 64-bit
  6.4 (UEK R2) 64-bit
  6.5 (UEK R2, UEK R3) 64-bit
  6.6 (UEK R3) 64-bit
SLES
SUSE Linux Enterprise Server (SLES) 11 with Service Pack 2 64-bit
SUSE Linux Enterprise Server (SLES) 11 with Service Pack 3 64-bit
Ubuntu/Debian
Ubuntu Precise (12.04) - Long-Term Support (LTS) 64-bit
  Trusty (14.04) - Long-Term Support (LTS) 64-bit
Debian Wheezy (7.0) 64-bit

Note:

  • CDH 5 provides only 64-bit packages.
  • Cloudera has received reports that our RPMs work well on Fedora, but we have not tested this.
  • If you are using an operating system that is not supported by Cloudera packages, you can also download source tarballs from Downloads.

 

 

 

 

Selected tab: SupportedOperatingSystems
Component MySQL SQLite PostgreSQL Oracle Derby - see Note 4
Oozie 5.5, 5.6 8.4, 9.2, 9.3

See Note 2

11gR2 Default
Flume Default (for the JDBC Channel only)
Hue 5.5, 5.6

See Note 1

Default 8.4, 9.2, 9.3

See Note 2

11gR2
Hive/Impala 5.5, 5.6

See Note 1

8.4, 9.2, 9.3

See Note 2

11gR2 Default
Sentry 5.5, 5.6

See Note 1

8.4, 9.2, 9.3

See Note 2

11gR2
Sqoop 1 See Note 3 See Note 3 See Note 3
Sqoop 2 See Note 4 See Note 4 See Note 4 Default

Note:

  1. MySQL 5.5 is supported on CDH 5.1. MySQL 5.6 is supported on CDH 5.1 and later. The InnoDB storage engine must be enabled in the MySQL server.
  2. PostgreSQL 9.2 is supported on CDH 5.1 and later. PostgreSQL 9.3 is supported on CDH 5.2 and later.
  3. For the purposes of transferring data only, Sqoop 1 supports MySQL 5.0 and above, PostgreSQL 8.4 and above, Oracle 10.2 and above, Teradata 13.10 and above, and Netezza TwinFin 5.0 and above. The Sqoop metastore works only with HSQLDB (1.8.0 and higher 1.x versions; the metastore does not work with any HSQLDB 2.x versions).
  4. Sqoop 2 can transfer data to and from MySQL 5.0 and above, PostgreSQL 8.4 and above, Oracle 10.2 and above, and Microsoft SQL Server 2012 and above. The Sqoop 2 repository database is supported only on Derby and PostgreSQL.
  5. Derby is supported as shown in the table, but not always recommended. See the pages for individual components in the Cloudera Installation and Upgrade guide for recommendations.

 

Selected tab: SupportedDatabases
CDH 5.4.x is supported with the versions shown in the following table:
Minimum Supported Version Recommended Version Notes
1.7.0_55 1.7.0_67 or 1.7.0_75 None
1.8.0_60 1.8.0_60 None
Selected tab: SupportedJDKVersions

CDH requires IPv4. IPv6 is not supported.

See also Configuring Network Names.

Selected tab: SupportedInternetProtocol
Selected tab: SystemRequirements

 

 

Issues Fixed in CDH 5.4.9

The following topics describe known issues fixed in CDH 5.4.9.

Apache Commons Collections Deserialization Vulnerability

Cloudera has learned of a potential security vulnerability in a third-party library called the Apache Commons Collections. This library is used in products distributed and supported by Cloudera (“Cloudera Products”), including core Apache Hadoop. The Apache Commons Collections library is also in widespread use beyond the Hadoop ecosystem. At this time, no specific attack vector for this vulnerability has been identified as present in Cloudera Products.

In an abundance of caution, we are currently in the process of incorporating a version of the Apache Commons Collections library with a fix into the Cloudera Products. In most cases, this will require coordination with the projects in the Apache community. One example of this is tracked by HADOOP-12577.

The Apache Commons Collections potential security vulnerability is titled “Arbitrary remote code execution with InvokerTransformer” and is tracked by COLLECTIONS-580. MITRE has not issued a CVE, but related CVE-2015-4852 has been filed for the vulnerability. CERT has issued Vulnerability Note #576313 for this issue.

Releases affected: CDH 5.5.0, CDH 5.4.8 and lower, Cloudera Manager 5.5.0, Cloudera Manager 5.4.8 and lower, Cloudera Navigator 2.4.0, Cloudera Navigator 2.3.8 and lower

Users affected: All

Severity (Low/Medium/High): High

Impact: This potential vulnerability may enable an attacker to execute arbitrary code from a remote machine without requiring authentication.

Immediate action required: Upgrade to Cloudera Manager 5.5.1 and CDH 5.5.1.

Apache HBase

Data may not be replicated to slave cluster if multiwal multiplicity is set to greater than 1.

Upstream Issues Fixed

The following upstream issues are fixed in CDH 5.4.9:

  • FLUME-2841 - Upgrade commons-collections to 3.2.2
  • HADOOP-7713 - dfs -count -q should label output column
  • HADOOP-11171 - Enable using a proxy server to connect to S3a
  • HADOOP-12568 - Update core-default.xml to describe posixGroups support
  • HADOOP-12577 - Bumped up commons-collections version to 3.2.2 to address a security flaw
  • HDFS-7785 - Improve diagnostics information for HttpPutFailedException
  • HDFS-7798 - Checkpointing failure caused by shared KerberosAuthenticator
  • HDFS-7871 - NameNodeEditLogRoller can keep printing 'Swallowing exception' message
  • HDFS-7990 - IBR delete ack should not be delayed
  • HDFS-8646 - Prune cached replicas from DatanodeDescriptor state on replica invalidation
  • HDFS-9123 - Copying from the root to a subdirectory should be forbidden
  • HDFS-9250 - Add Precondition check to LocatedBlock#addCachedLoc
  • HDFS-9273 - ACLs on root directory may be lost after NN restart
  • HDFS-9332 - Fix Precondition failures from NameNodeEditLogRoller while saving namespace
  • HDFS-9364 - Unnecessary DNS resolution attempts when creating NameNodeProxies
  • HDFS-9470 - Encryption zone on root not loaded from fsimage after NN restart
  • MAPREDUCE-6191 - Improve clearing stale state of Java serialization
  • MAPREDUCE-6549 - Multibyte delimiters with LineRecordReader cause duplicate records
  • YARN-4235 - FairScheduler PrimaryGroup does not handle empty groups returned for a user
  • HBASE-6617 - ReplicationSourceManager should be able to track multiple WAL paths
  • HBASE-12865 - WALs may be deleted before they are replicated to peers
  • HBASE-13134 - mutateRow and checkAndMutate APIs don't throw region level exceptions
  • HBASE-13618 - ReplicationSource is too eager to remove sinks.
  • HBASE-13703 - ReplicateContext should not be a member of ReplicationSource.
  • HBASE-14003 - Work around JDK-8044053
  • HBASE-14283 - Reverse scan doesn’t work with HFile inline index/bloom blocks
  • HBASE-14374 - Backport parent 'HBASE-14317 Stuck FSHLog' issue to 1.1
  • HBASE-14501 - NPE in replication with TDE
  • HBASE-14533 - Connection Idle time 1 second is too short and the connection is closed too quickly by the ChoreService
  • HBASE-14547 - Add more debug/trace to zk-procedure
  • HBASE-14799 - Commons-collections object deserialization remote command execution vulnerability
  • HBASE-14809 - Grant / revoke Namespace admin permission to group
  • HIVE-7575 - Revert "GetTables thrift call is very slow
  • HIVE-7575 - GetTables thrift call is very slow
  • HIVE-10265 - Hive CLI crashes on != inequality
  • HIVE-11149 - Sometimes HashMap in PerfLogger.java hangs
  • HIVE-11616 - DelegationTokenSecretManager reuses the same objectstore, which has concurrency issues
  • HIVE-12058 - Change hive script to record errors when calling hbase fails
  • HIVE-12188 - DoAs does not work properly in non-Kerberos secured HS2
  • HIVE-12189 - The list in pushdownPreds of ppd.ExprWalkerInfo should not be allowed to grow very large
  • HIVE-12250 - ZooKeeper connection leaks in Hive's HBaseHandler
  • HIVE-12365 - Added resource path is sent to cluster as an empty string when externally removed
  • HIVE-12378 - Exception on HBaseSerDe.serialize binary field
  • HIVE-12406 - HIVE-9500 introduced incompatible change to LazySimpleSerDe public interface
  • HIVE-12418 - HiveHBaseTableInputFormat.getRecordReader() causes ZooKeeper connection leak
  • HUE-2941 - [hadoop] Cache the active RM HA
  • HUE-3035 - [beeswax] Optimize sample data query for partitioned tables
  • IMPALA-1459 - Fix migration/assignment of On-clause predicates inside inline views.
  • IMPALA-1675 - Avoid overflow when adding large intervals to TIMESTAMPs
  • IMPALA-1746 - QueryExecState doesn't check for query cancellation or errors
  • IMPALA-1949 - Analysis exception when a binary operator contain an IN operator with values
  • IMPALA-2086/IMPALA-2090 - Avoid boost year/month interval logic
  • IMPALA-2141 - UnionNode::GetNext() doesn't check for query errors
  • IMPALA-2252 - Crash (likely race) tearing down BufferedBlockMgr on query failure
  • IMPALA-2260 - Adding a large hour interval caused an interval overflow
  • IMPALA-2265 - Sorter was not checking the returned Status of PrepareRead
  • IMPALA-2273 - Make MAX_PAGE_HEADER_SIZE configurable
  • IMPALA-2286 - Fix race between ~BufferedBlockMgr() and BufferedBlockMgr::Create()
  • IMPALA-2344 - Work-around IMPALA-2344 Fail query with OOM in case block->Pin() fails
  • IMPALA-2357 - Fix spilling sorts with var-len slots that are NULL or empty.
  • IMPALA-2446 - Fix wrong predicate assignment in outer joins
  • IMPALA-2533 - Impala throws IllegalStateException when inserting data into a partition
  • IMPALA-2559 - Fix check failed: sorter_runs_.back()->is_pinned_
  • IMPALA-2664 - Avoid sending large partition stats objects over thrift
  • KITE-1089 - readAvroContainer morphline command should work even if the Avro writer schema of each input file is different
  • PIG-3641 - Split "otherwise" producing incorrect output when combined with ColumnPruning
  • SENTRY-565 - Improve performance of filtering Hive SHOW commands
  • SENTRY-702 - Hive binding should support RELOAD command
  • SENTRY-936 - getGroup and getUser should always return orginal hdfs values for paths in prefixes which are not Sentry managed
  • SENTRY-960 - Blacklist reflect, java_method using hive.server2.builtin.udf.blacklist
  • SOLR-6443 - backportDisable test that fails on Jenkins with SolrCore.getOpenCount()==2
  • SOLR-7049 - LIST Collections API call should be processed directly by the CollectionsHandler instead of the OverseerCollectionProcessor
  • SOLR-7552 - Support using ZkCredentialsProvider/ZkACLProvider in custom filter
  • SOLR-7989 - After a new leader is elected, it should ensure it's state is ACTIVE if it has already registered with ZK
  • SOLR-8075 - Leader Initiated Recovery should not stop a leader that participated in an election with all of it's replicas from becoming a valid leader
  • SOLR-8223 - Avoid accidentally swallowing OutOfMemoryError
  • SOLR-8288 - DistributedUpdateProcessor#doFinish should explicitly check and ensure it does not try to put itself into LIR
  • SPARK-11484 - [WEBUI] Using proxyBase set by spark AM
  • SPARK-11652 - [CORE] Remote code execution with InvokerTransformer

 

 

 

 

 

 

 

Selected tab: WhatsNew

Want to Get Involved or Learn More?

Check out our other resources

Cloudera Community

Collaborate with your peers, industry experts, and Clouderans to make the most of your investment in Hadoop.

Cloudera University

Receive expert Hadoop training through Cloudera University, the industry's only truly dynamic Hadoop training curriculum that’s updated regularly to reflect the state of the art in big data.