Your browser is out of date!

Update your browser to view this website correctly. Update my browser now

×

Long term component architecture

As the main curator of open standards in Hadoop, Cloudera has a track record of bringing new open source solutions into its platform (such as Apache Spark, Apache HBase, and Apache Parquet) that are eventually adopted by the community at large. As standards, you can build longterm architecture on these components with confidence.

Thank you for choosing CDH, your download instructions are below:

Installation

 

This section introduces options for installing Cloudera Manager, CDH, and managed services. You can install:

  • Cloudera Manager, CDH, and managed services in a Cloudera Manager deployment. This is the recommended method for installing CDH and managed services.
  • CDH 5 into an unmanaged deployment.

 

 

 

 

Cloudera Manager Deployment

 

A Cloudera Manager deployment consists of the following software components:

  • Oracle JDK
  • Cloudera Manager Server and Agent packages
  • Supporting database software
  • CDH and managed service software

This section describes the three main installation paths for creating a new Cloudera Manager deployment and the criteria for choosing an installation path. If your cluster already has an installation of a previous version of Cloudera Manager, follow the instructions in Upgrading Cloudera Manager.

 

The Cloudera Manager installation paths share some common phases, but the variant aspects of each path support different user and cluster host requirements:

  • Demonstration and proof of concept deployments - There are two installation options:
    • Installation Path A - Automated Installation by Cloudera Manager - Cloudera Manager automates the installation of the Oracle JDK, Cloudera Manager Server, embedded PostgreSQL database, and Cloudera Manager Agent, CDH, and managed service software on cluster hosts, and configures databases for the Cloudera Manager Server and Hive Metastore and optionally for Cloudera Management Service roles. This path is recommended for demonstration and proof of concept deployments, but is not recommended for production deployments because its not intended to scale and may require database migration as your cluster grows. To use this method, server and cluster hosts must satisfy the following requirements:
      • Provide the ability to log in to the Cloudera Manager Server host using a root account or an account that has password-less sudo permission.
      • Allow the Cloudera Manager Server host to have uniform SSH access on the same port to all hosts. See Networking and Security Requirementsfor further information.
      • All hosts must have access to standard package repositories and either archive.cloudera.com or a local repository with the necessary installation files.
    • Installation Path B - Manual Installation Using Cloudera Manager Packages - you install the Oracle JDK and Cloudera Manager Server, and embedded PostgreSQL database packages on the Cloudera Manager Server host. You have two options for installing Oracle JDK, Cloudera Manager Agent, CDH, and managed service software on cluster hosts: manually install it yourself or use Cloudera Manager to automate installation. However, in order for Cloudera Manager to automate installation of Cloudera Manager Agent packages or CDH and managed service software, cluster hosts must satisfy the following requirements:
      • Allow the Cloudera Manager Server host to have uniform SSH access on the same port to all hosts. See Networking and Security Requirementsfor further information.
      • All hosts must have access to standard package repositories and either archive.cloudera.com or a local repository with the necessary installation files.
  • Production deployments - require you to first manually install and configure a production database for the Cloudera Manager Server and Hive Metastore. There are two installation options:
    • Installation Path B - Manual Installation Using Cloudera Manager Packages - you install the Oracle JDK and Cloudera Manager Server packages on the Cloudera Manager Server host. You have two options for installing Oracle JDK, Cloudera Manager Agent, CDH, and managed service software on cluster hosts: manually install it yourself or use Cloudera Manager to automate installation. However, in order for Cloudera Manager to automate installation of Cloudera Manager Agent packages or CDH and managed service software, cluster hosts must satisfy the following requirements:
      • Allow the Cloudera Manager Server host to have uniform SSH access on the same port to all hosts. See Networking and Security Requirementsfor further information.
      • All hosts must have access to standard package repositories and either archive.cloudera.com or a local repository with the necessary installation files.
    • Installation Path C - Manual Installation Using Cloudera Manager Tarballs - you install the Oracle JDK, Cloudera Manager Server, and Cloudera Manager Agent software as tarballs and use Cloudera Manager to automate installation of CDH and managed service software as parcels.

 

 

 

 

Unmanaged Deployment

 

In an unmanaged deployment, you are responsible for managing all phases of the life cycle of CDH and managed service components on each host: installation, configuration, and service life cycle operations such as start and stop. This section describes alternatives for installing CDH 5 software in an unmanaged deployment.

  • Command-line methods:
    • Download and install the CDH 5 "1-click Install" package
    • Add the CDH 5 repository
    • Build your own CDH 5 repository
    If you use one of these command-line methods, the first (downloading and installing the "1-click Install" package) is recommended in most cases because it is simpler than building or adding a repository. See Installing the Latest CDH 5 Release for detailed instructions for each of these options.
  • Tarball You can download a tarball from CDH downloads. Keep the following points in mind:
    • Installing CDH 5 from a tarball installs YARN.
    • In CDH 5, there is no separate tarball for MRv1. Instead, the MRv1 binaries, examples, etc., are delivered in the Hadoop tarball. The scripts for running MRv1 are in the bin-mapreduce1 directory in the tarball, and the MRv1 examples are in the examples-mapreduce1 directory.

 

 

 

 

Please Read and Accept our Terms

CDH 5 provides packages for RHEL-compatible, SLES, Ubuntu, and Debian systems as described below.

Operating System Version Packages
Red Hat Enterprise Linux (RHEL)-compatible
RHEL 5.7 64-bit
  5.10 64-bit
  6.4 64-bit
  6.5 64-bit
  6.5 in SE Linux mode 64-bit
  6.6 64-bit
  6.6 in SE Linux mode 64-bit
  6.7 64-bit
  7.1 64-bit
CentOS 5.7 64-bit
  5.10 64-bit
  6.4 64-bit
  6.5 64-bit
  6.5 in SE Linux mode 64-bit
  6.6 64-bit
  6.6 in SE Linux mode 64-bit
  6.7 64-bit
  7.1 64-bit
Oracle Linux with default kernel and Unbreakable Enterprise Kernel 5.6 (UEK R2) 64-bit
  6.4 (UEK R2) 64-bit
  6.5 (UEK R2, UEK R3) 64-bit
  6.6 (UEK R3) 64-bit
  7.1 64-bit
SLES
SUSE Linux Enterprise Server (SLES) 11 with Service Pack 2 64-bit
SUSE Linux Enterprise Server (SLES) 11 with Service Pack 3 64-bit
Ubuntu/Debian
Ubuntu Precise (12.04) - Long-Term Support (LTS) 64-bit
  Trusty (14.04) - Long-Term Support (LTS) 64-bit
Debian Wheezy (7.0, 7.1) 64-bit

Note:

  • CDH 5 provides only 64-bit packages.
  • Cloudera has received reports that RPMs work well on Fedora, but this has not been tested.
  • If you are using an operating system that is not supported by Cloudera packages, you can also download source tarballs from Downloads.

 

Important: Cloudera Enterprise is supported on platforms with Security-Enhanced Linux (SELinux) enabled. However, policies need to be provided by other parties or created by the administrator of the cluster deployment. Cloudera is not responsible for policy support nor policy enforcement, nor for any issues with such. If you experience issues with SELinux, contact your OS support provider.

Important: Cloudera supports RHEL 7 with the following limitations:

 

 

Selected tab: SupportedOperatingSystems
Component MariaDB MySQL SQLite PostgreSQL Oracle Derby - see Note 6
Oozie 5.5 5.5, 5.6 9.2, 9.3, 9.4

See Note 3

11gR2, 12c Default
Flume Default (for the JDBC Channel only)
Hue 5.5 5.1, 5.5, 5.6

See Note 7

Default 9.2, 9.3, 9.4

See Note 3

11gR2, 12c
Hive/Impala 5.5 5.5, 5.6

See Note 1

9.2, 9.3, 9.4

See Note 3

11gR2, 12c Default
Sentry 5.5 5.5, 5.6

See Note 1

9.2, 9.3, 9.4

See Note 3

11gR2, 12c
Sqoop 1 5.5 See Note 4 See Note 4 See Note 4
Sqoop 2 5.5 See Note 5 See Note 5 See Note 5 Default

Note:

  1. MySQL 5.5 is supported on CDH 5.1. MySQL 5.6 is supported on CDH 5.1 and higher. The InnoDB storage engine must be enabled in the MySQL server.
  2. Cloudera Manager installation fails if GTID-based replication is enabled in MySQL.
  3. PostgreSQL 9.2 is supported on CDH 5.1 and higher. PostgreSQL 9.3 is supported on CDH 5.2 and higher. PostgreSQL 9.4 is supported on CDH 5.5 and higher.
  4. For purposes of transferring data only, Sqoop 1 supports MySQL 5.0 and above, PostgreSQL 8.4 and above, Oracle 10.2 and above, Teradata 13.10 and above, and Netezza TwinFin 5.0 and above. The Sqoop metastore works only with HSQLDB (1.8.0 and higher 1.x versions; the metastore does not work with any HSQLDB 2.x versions).
  5. Sqoop 2 can transfer data to and from MySQL 5.0 and above, PostgreSQL 8.4 and above, Oracle 10.2 and above, and Microsoft SQL Server 2012 and above. The Sqoop 2 repository database is supported only on Derby and PostgreSQL.
  6. Derby is supported as shown in the table, but not always recommended. See the pages for individual components in the Cloudera Installation and Upgrade guide for recommendations.
  7. CDH 5 Hue requires the default MySQL version of the operating system on which it is being installed, which is usually MySQL 5.1, 5.5, or 5.6.

 

 

Selected tab: SupportedDatabases

Important: There is one exception to the minimum supported and recommended JDK versions in the following table. If Oracle releases a security patch that affects server-side Java before the next minor release of Cloudera products, the Cloudera support policy covers customers using the patch.

CDH 5.5.x is supported with the versions shown in the following table:

 

Minimum Supported Version Recommended Version Exceptions
1.7.0_25 1.7.0_80 None
1.8.0_31 1.8.0_60 Cloudera recommends that you not use JDK 1.8.0_40.

 

Selected tab: SupportedJDKVersions

Hue

Hue works with the two most recent versions of the following browsers. Cookies and JavaScript must be on.

  • Chrome
  • Firefox
  • Safari (not supported on Windows)
  • Internet Explorer
Hue could display in older versions and even other browsers, but you might not have access to all of its features.

Selected tab: SupportedBrowsers

CDH requires IPv4. IPv6 is not supported.

See also Configuring Network Names.

Selected tab: SupportedInternetProtocol

The following components are supported by the indicated versions of Transport Layer Security (TLS):

Table 1. Components Supported by TLS

Component

Role Name Port Version
Flume   Avro Source/Sink 9099 TLS 1.2
HBase Master HBase Master Web UI Port 60010 TLS 1.2
HDFS NameNode Secure NameNode Web UI Port 50470 TLS 1.2
HDFS Secondary NameNode Secure Secondary NameNode Web UI Port 50495 TLS 1.2
HDFS HttpFS REST Port 14000 TLS 1.0
Hive HiveServer2 HiveServer2 Port 10000 TLS 1.2
Hue Hue Server Hue HTTP Port 8888 TLS 1.2
Cloudera Impala Impala Daemon Impala Daemon Beeswax Port 21000 TLS 1.2
Cloudera Impala Impala Daemon Impala Daemon HiveServer2 Port 21050 TLS 1.2
Cloudera Impala Impala Daemon Impala Daemon Backend Port 22000 TLS 1.2
Cloudera Impala Impala Daemon Impala Daemon HTTP Server Port 25000 TLS 1.2
Cloudera Impala Impala StateStore StateStore Service Port 24000 TLS 1.2
Cloudera Impala Impala StateStore StateStore HTTP Server Port 25010 TLS 1.2
Cloudera Impala Impala Catalog Server Catalog Server HTTP Server Port 25020 TLS 1.2
Cloudera Impala Impala Catalog Server Catalog Server Service Port 26000 TLS 1.2
Oozie Oozie Server Oozie HTTPS Port 11443 TLS 1.1, TLS 1.2
Solr Solr Server Solr HTTP Port 8983 TLS 1.1, TLS 1.2
Solr Solr Server Solr HTTPS Port 8985 TLS 1.1, TLS 1.2
YARN ResourceManager ResourceManager Web Application HTTP Port 8090 TLS 1.2
YARN JobHistory Server MRv1 JobHistory Web Application HTTP Port 19890 TLS 1.2

 

Selected tab: SupportedTransportLayerSecurityVersions
Selected tab: SystemRequirements

Issues Fixed in CDH 5.5.4

CDH 5.5.4 fixes the following issues.

 

Apache Hadoop

FSImage may get corrupted after deleting snapshot

Bug: HDFS-9406

When deleting a snapshot that contains the last record of a given INode, the fsimage may become corrupt because the create list of the snapshot diff in the previous snapshot and the child list of the parent INodeDirectory are not cleaned.

 

Apache HBase

 

The ReplicationCleaner process can abort if its connection to ZooKeeper is inconsistent

Bug: HBASE-15234

If the connection with ZooKeeper is inconsistent, the ReplicationCleaner may abort, and the following event is logged by the HMaster:

WARN org.apache.hadoop.hbase.replication.master.ReplicationLogCleaner: Aborting ReplicationLogCleaner
because Failed to get list of replicators

Unprocessed WALs accumulate.

The seekBefore() method calculates the size of the previous data block by assuming that data blocks are contiguous, and HFile v2 and higher store Bloom blocks and leaf-level INode blocks with the data. As a result, reverse scans do not work when Bloom blocks or leaf-level INode blocks are present when HFile v2 or higher is used.

Workaround: Restart the HMaster occasionally. The ReplicationCleaner restarts if necessary and process the unprocessed WALs.

 

Upstream Issues Fixed

The following upstream issues are fixed in CDH 5.5.4:

  • FLUME-2632 - High CPU on KafkaSink
  • FLUME-2712 - Optional channel errors slows down the Source to Main channel event rate
  • FLUME-2781 - Kafka Channel with parseAsFlumeEvent=true should write data as is, not as flume events
  • FLUME-2886 - Optional Channels can cause OOMs
  • FLUME-2891 - Revert FLUME-2712 and FLUME-2886
  • FLUME-2897 - AsyncHBase sink NPE when Channel.getTransaction() fails
  • HADOOP-7139 - Allow appending to existing SequenceFiles
  • HADOOP-7817 - RawLocalFileSystem.append() should give FSDataOutputStream with accurate .getPos()
  • HADOOP-11321 - copyToLocal cannot save a file to an SMB share unless the user has Full Control permissions
  • HADOOP-11687 - Ignore x-* and response headers when copying an Amazon S3 object
  • HADOOP-11722 - Some Instances of Services using ZKDelegationTokenSecretManager go down when old token cannot be deleted
  • HADOOP-12240 - Fix tests requiring native library to be skipped in non-native profile
  • HADOOP-12280 - Skip unit tests based on maven profile rather than NativeCodeLoader.isNativeCodeLoaded
  • HADOOP-12559 - KMS connection failures should trigger TGT renewal
  • HADOOP-12605 - Fix intermittent failure of TestIPC.testIpcWithReaderQueuing
  • HADOOP-12668 - Support excluding weak Ciphers in HttpServer2 through ssl-server.conf
  • HADOOP-12682 - Fix TestKMS#testKMSRestart* failure
  • HADOOP-12699 - TestKMS#testKMSProvider intermittently fails during 'test rollover draining'
  • HADOOP-12715 - TestValueQueue#testgetAtMostPolicyALL fails intermittently
  • HADOOP-12718 - Incorrect error message by fs -put local dir without permission
  • HADOOP-12736 - TestTimedOutTestsListener#testThreadDumpAndDeadlocks sometimes times out
  • HADOOP-12788 - OpensslAesCtrCryptoCodec should log which random number generator is used
  • HADOOP-12825 - Log slow name resolutions
  • HADOOP-12954 - Add a way to change hadoop.security.token.service.use_ip
  • HADOOP-12972 - Lz4Compressor#getLibraryName returns the wrong version number
  • HDFS-6520 - hdfs fsck passes invalid length value when creating BlockReader
  • HDFS-7373 - Clean up temporary files after fsimage transfer failures
  • HDFS-7758 - Retire FsDatasetSpi#getVolumes() and use FsDatasetSpi#getVolumeRefs() instead
  • HDFS-8211 - DataNode UUID is always null in the JMX counter
  • HDFS-8496 - Calling stopWriter() with FSDatasetImpl lock held may block other threads
  • HDFS-8576 - Lease recovery should return true if the lease can be released and the file can be closed
  • HDFS-8785 - TestDistributedFileSystem is failing in trunk
  • HDFS-8855 - Webhdfs client leaks active NameNode connections
  • HDFS-9264 - Minor cleanup of operations on FsVolumeList#volumes
  • HDFS-9289 - Make DataStreamer#block thread safe and verify genStamp in commitBlock
  • HDFS-9347 - Invariant assumption in TestQuorumJournalManager.shutdown() is wrong
  • HDFS-9350 - Avoid creating temprorary strings in Block.toString() and getBlockName()
  • HDFS-9358 - TestNodeCount#testNodeCount timed out
  • HDFS-9406 - FSImage may get corrupted after deleting snapshot
  • HDFS-9514 - TestDistributedFileSystem.testDFSClientPeerWriteTimeout failing; exception being swallowed
  • HDFS-9576 - HTrace: collect position/length information on read operations
  • HDFS-9589 - Block files which have been hardlinked should be duplicated before the DataNode appends to the them
  • HDFS-9612 - DistCp worker threads are not terminated after jobs are done
  • HDFS-9655 - NN should start JVM pause monitor before loading fsimage.
  • HDFS-9688 - Test the effect of nested encryption zones in HDFS downgrade
  • HDFS-9701 - DN may deadlock when hot-swapping under load
  • HDFS-9721 - Allow Delimited PB OIV tool to run upon fsimage that contains INodeReference
  • HDFS-9949 - Add a test case to ensure that the DataNode does not regenerate its UUID when a storage directory is cleared
  • HDFS-10223 - peerFromSocketAndKey performs SASL exchange before setting connection timeouts
  • HDFS-10267 - Extra "synchronized" on FsDatasetImpl#recoverAppend and FsDatasetImpl#recoverClose
  • MAPREDUCE-4785 - TestMRApp occasionally fails
  • MAPREDUCE-6460 - TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException fails
  • MAPREDUCE-6528 - Memory leak for HistoryFileManager.getJobSummary()
  • MAPREDUCE-6580 - Test failure: TestMRJobsWithProfiler
  • MAPREDUCE-6620 - Jobs that did not start are shown as starting in 1969 in the JHS web UI
  • YARN-2749 - Fix some testcases from TestLogAggregationService fails in trunk
  • YARN-2871 - TestRMRestart#testRMRestartGetApplicationList sometime fails in trunk
  • YARN-2902 - Killing a container that is localizing can orphan resources in the DOWNLOADING state
  • YARN-3446 - FairScheduler headroom calculation should exclude nodes in the blacklist
  • YARN-3727 - For better error recovery, check if the directory exists before using it for localization
  • YARN-4155 - TestLogAggregationService.testLogAggregationServiceWithInterval failing
  • YARN-4168 - Fixed a failing test TestLogAggregationService.testLocalFileDeletionOnDiskFull
  • YARN-4354 - Public resource localization fails with NPE
  • YARN-4380 - TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails intermittently
  • YARN-4393 - Fix intermittent test failure for TestResourceLocalizationService#testFailedDirsResourceRelease
  • YARN-4546 - ResourceManager crash due to scheduling opportunity overflow
  • YARN-4573 - Fix test failure in TestRMAppTransitions#testAppRunningKill and testAppKilledKilled
  • YARN-4613 - Fix test failure in TestClientRMService#testGetClusterNodes
  • YARN-4704 - TestResourceManager#testResourceAllocation() fails when using FairScheduler
  • YARN-4717 - TestResourceLocalizationService.testPublicResourceInitializesLocalDir fails Intermittently due to IllegalArgumentException from cleanup
  • HBASE-6617 - ReplicationSourceManager should be able to track multiple WAL paths (ADDENDUM)
  • HBASE-14586 - Use a maven profile to run Jacoco analysis
  • HBASE-14587 - Attach a test-sources.jar for hbase-server
  • HBASE-14588 - Stop accessing test resources from within src folder
  • HBASE-14759 - Avoid using Math.abs when selecting SyncRunner in FSHLog
  • HBASE-15019 - Replication stuck when HDFS is restarted
  • HBASE-15052 - Use EnvironmentEdgeManager in ReplicationSource
  • HBASE-15152 - Automatically include prefix-tree module in MR jobs if present
  • HBASE-15157 - Add *PerformanceTest for Append, CheckAnd*
  • HBASE-15206 - Fix flaky testSplitDaughtersNotInMeta
  • HBASE-15213 - Fix increment performance regression caused by HBASE-8763 on branch-1.0
  • HBASE-15234 - Don't abort ReplicationLogCleaner on ZooKeeper errors
  • HBASE-15456 - CreateTableProcedure/ModifyTableProcedure needs to fail when there is no family in table descriptor
  • HBASE-15479 - No more garbage or beware of autoboxing
  • HBASE-15582 - SnapshotManifestV1 too verbose when there are no regions
  • HIVE-9617 - UDF from_utc_timestamp throws NPE if the second argument is null
  • HIVE-9743 - Revert "(Tests portion only)Incorrect result set for vectorized left outer join (Matt McCline, reviewed by Vikram Dixit)"
  • HIVE-10115 - HS2 running on a Kerberized cluster should offer Kerberos(GSSAPI) and Delegation token(DIGEST) when alternate authentication is enabled
  • HIVE-10213 - MapReduce jobs using dynamic-partitioning fail on commit
  • HIVE-10303 - HIVE-9471 broke forward compatibility of ORC files
  • HIVE-11054 - Handle varchar/char partition columns in vectorization
  • HIVE-11097 - HiveInputFormat uses String.startsWith to compare splitPath and PathToAliases
  • HIVE-11135 - Fix the Beeline set and save command in order to avoid the NullPointerException
  • HIVE-11285 - ObjectInspector for partition columns in FetchOperator in SMBJoin causes exception
  • HIVE-11488 - Need to add support for sessionId and queryId logging, QueryId can't be stored in the configuration of the SessionState since multiple queries can run in a single session
  • HIVE-11583 - When PTF is used over a large partitions result could be corrupted
  • HIVE-11590 - AvroDeserializer is very chatty
  • HIVE-11828 - beeline -f fails on scripts with tabs between column type and comment
  • HIVE-11866 - Add framework to enable testing using LDAPServer using LDAP protocol
  • HIVE-11919 - Hive Union Type Mismatch
  • HIVE-12315 - Fix Vectorized double divide by zero
  • HIVE-12354 - MapJoin with double keys is slow on MR
  • HIVE-12431 - Support timeout for compile lock
  • HIVE-12506 - SHOW CREATE TABLE command creates a table that does not work for RCFile format
  • HIVE-12706 - Incorrect output from from_utc_timestamp()/to_utc_timestamp when local timezone has DST
  • HIVE-12782 - Update the golden files for some tests that fail
  • HIVE-12790 - Metastore connection leaks in HiveServer2
  • HIVE-12885 - LDAP Authenticator improvements
  • HIVE-12909 - Some encryption q-tests fail because trash is disabled in encryption_with_trash.q
  • HIVE-12941 - Unexpected result when using MIN() on struct with NULL in first field
  • HIVE-12946 - Alter table should also add default scheme and authority for the location similar to create table
  • HIVE-13039 - BETWEEN predicate is not functioning correctly with predicate pushdown on Parquet table
  • HIVE-13055 - Add unit tests for HIVE-11512
  • HIVE-13065 - Hive throws NPE when writing map type data to a HBase backed table
  • HIVE-13082 - Enable constant propagation optimization in query with left semi join
  • HIVE-13200 - Aggregation functions returning empty rows on partitioned columns
  • HIVE-13243 - Hive drop table on encryption zone fails for external tables
  • HIVE-13251 - Hive can't read the decimal in AVRO file generated from previous version
  • HIVE-13286 - Query ID is being reused across queries
  • HIVE-13295 - Improvement to LDAP search queries in HS2 LDAP Authenticator
  • HIVE-13401 - Kerberized HS2 with LDAP auth enabled fails kerberos/delegation token authentication
  • HUE-3106 - [filebrowser] Add support for full paths in zip file uploads
  • HUE-3110 - [oozie] Fix bundle submission when coordinator points to multiple bundles
  • HUE-3132 - [core] Fix Sync Ldap users and groups for anonymous binds
  • HUE-3180 - [useradmin] Override duplicate username validation message
  • HUE-3185 - [oozie] Avoid extra API calls for parent information in workflow dashboard
  • HUE-3303 - [core] PostgreSQL requires data update and alter table operations in separate transactions
  • HUE-3310 - [jobsub] Prevent browsing job designs by API
  • HUE-3334 - [editor] Skip checking for multi queries if there is no semi colon, send empty query instead of error
  • HUE-3398 - [beeswax] Filter out sessions with empty guid or secret key
  • HUE-3436 - [oozie] Retain old dependencies when saving a workflow
  • HUE-3437 - [core] PamBackend does not honor ignore_username_case
  • HUE-3523 - [oozie] Modify find_jobs_with_no_doc method to exclude jobs with no name
  • HUE-3528 - [oozie] Call correct metrics api to avoid 500 error
  • HUE-3594 - [fb] Smarter DOM based XSS filter on hashes
  • IMPALA-852 - ,IMPALA-2215: Analyze HAVING clause before aggregation
  • IMPALA-1092 - Fix estimates for trivial coord-only queries
  • IMPALA-1170 - Fix URL parsing when path contains '@'
  • IMPALA-1934 - Allow shell to retrieve LDAP password from shell cmd
  • IMPALA-2093 - Disallow NOT IN aggregate subqueries with a constant lhs expr
  • IMPALA-2184 - Don't inline timestamp methods with try/catch blocks in IR
  • IMPALA-2425 - Broadcast join hint not enforced when low memory limit is set
  • IMPALA-2503 - Add missing String.format() arg in error message
  • IMPALA-2539 - Unmark collections slots of empty union operands
  • IMPALA-2554 - Change default buffer size for RPC servers and clients
  • IMPALA-2565 - Planner tests are flaky due to file size mismatches
  • IMPALA-2592 - DataStreamSender::Channel::CloseInternal() does not close the channel on an error
  • IMPALA-2599 - Pseudo-random sleep before acquiring kerberos ticket possibly not really pseudo-random
  • IMPALA-2711 - Fix memory leak in Rand()
  • IMPALA-2719 - test_parquet_max_page_header fails on Isilon
  • IMPALA-2732 - Timestamp formats with non-padded values
  • IMPALA-2734 - Correlated EXISTS subqueries with HAVING clause return wrong results
  • IMPALA-2742 - Avoid unbounded MemPool growth with AcquireData()
  • IMPALA-2749 - Fix decimal multiplication overflow
  • IMPALA-2765 - Preserve return type of subexpressions substituted in isTrueWithNullSlots()
  • IMPALA-2788 - conv(bigint num, int from_base, int to_base) returns wrong result
  • IMPALA-2798 - Bring in AVRO-1617 fix and add test case for it
  • IMPALA-2818 - Fix cancellation crashes/hangs due to BlockOnWait() race
  • IMPALA-2820 - Support unquoted keywords as struct-field names
  • IMPALA-2832 - Fix cloning of FunctionCallExpr
  • IMPALA-2844 - Allow count(*) on RC files with complex types
  • IMPALA-2870 - Fix failing metadata.test_ddl.TestDdlStatements.test_create_table test
  • IMPALA-2894 - Move regression test into a different .test file
  • IMPALA-2906 - Fix an edge case with materializing TupleIsNullPredicates in analytic sorts
  • IMPALA-2914 - Fix DCHECK Check failed: HasDateOrTime()
  • IMPALA-2926 - Fix off-by-one bug in SelectNode::CopyRows()
  • IMPALA-2940 - Fix leak of dictionaries in Parquet scanner
  • IMPALA-3000 - Fix BitReader::Reset()
  • IMPALA-3034 - Verify all consumed memory of a MemTracker is always released at destruction time
  • IMPALA-3047 - Separate create table test with nested types
  • IMPALA-3054 - Disable proble side filters when spilling
  • IMPALA-3071 - Fix assignment of On-clause predicates belonging to an inner join
  • IMPALA-3085 - Unregister data sinks' MemTrackers at their Close() functions
  • IMPALA-3093 - ReopenClient() could NULL out 'client_key' causing a crash
  • IMPALA-3095 - Add configurable whitelist of authorized internal principals
  • IMPALA-3151 - Impala crash for avro table when casting to char data type
  • IMPALA-3194 - Allow queries materializing scalar type columns in RC/sequence files
  • KITE-1114 - Kite CLI json-import HDFS temp file path not multiuser safe, fix missing license header
  • OOZIE-2419 - HBase credentials are not correctly proxied
  • OOZIE-2428 - TestSLAService, TestSLAEventGeneration flaky tests
  • OOZIE-2429 - TestEventGeneration test is flaky
  • OOZIE-2432 - TestPurgeXCommand fails
  • OOZIE-2435 - TestCoordChangeXCommand is flaky
  • OOZIE-2466 - Repeated failure of TestMetricsInstrumentation.testSamplers
  • OOZIE-2486 - TestSLAEventsGetForFilterJPAExecutor is flaky
  • OOZIE-2490 - Oozie can't set hadoop.security.token.service.use_ip
  • SENTRY-922 - BackportINSERT OVERWRITE DIRECTORY permission not working correctly
  • SENTRY-972 - backportInclude sentry-tests-hive hadoop test script in maven project
  • SENTRY-991 - backportRoles of Sentry Permission needs to be case insensitive
  • SENTRY-1002 - PathsUpdate.parsePath(path) will throw an NPE when parsing relative paths
  • SENTRY-1003 - Support "reload" by updating the classpath of Sentry function aux jar path during runtime
  • SENTRY-1007 - backportSentry column-level performance for wide tables
  • SENTRY-1008 - Path should be not be updated if the create/drop table/partition event fails
  • SENTRY-1015 - backportImprove Sentry + Hive error message when user has insufficient privileges
  • SENTRY-1044 - Tables with non-hdfs locations breaks HMS startup
  • SENTRY-1169 - MetastorePlugin#renameAuthzObject log message prints oldpathname as newpathname
  • SENTRY-1184 - Clean up HMSPaths.renameAuthzObject
  • SOLR-6820 - Make the number of version buckets used by the UpdateLog configurable as increasing beyond the default 256 has been shown to help with high volume indexing performance in SolrCloudIncrease the default number of buckets to 65536 instead of 256, fix numVersionBuckets name attribute in configsets
  • SOLR-7281 - Add an overseer action to publish an entire node as 'down'
  • SOLR-7332 - Initialize the highest value for all version buckets with the max value from the index or recent updates to avoid unnecessary lookups to the index to check for reordered updates when processing new documents
  • SOLR-7493 - Requests aren't distributed evenly if the collection isn't present locally. Merges r1683946 and r1683948 from trunk
  • SOLR-7587 - TestSpellCheckResponse stalled and never timed out -- possible VersionBucket bug?
  • SOLR-7625 - Version bucket seed not updated after new index is installed on a replica
  • SOLR-8215 - Only active replicas should handle incoming requests against a collection
  • SOLR-8371 - Try and prevent too many recovery requests from stacking up and clean up some faulty cancel recovery logic
  • SOLR-8451 - We should not call method.abort in HttpSolrClient or HttpSolrCall#remoteQuery and HttpSolrCall#remoteQuery should not close streams
  • SOLR-8453 - Solr should attempt to consume the request inputstream on errors as we cannot count on the container to do it
  • SOLR-8575 - Fix HDFSLogReader replay status numbers and a performance bug where we can reopen FSDataInputStream too often
  • SOLR-8578 - Successful or not, requests are not always fully consumed by Solrj clients and we count on HttpClient or the JVM
  • SOLR-8615 - Just like creating cores, we should use multiple threads when closing cores
  • SOLR-8633 - DistributedUpdateProcess processCommit/deleteByQuery calls finish on DUP and SolrCmdDistributor, which violates the lifecycle and can cause bugs
  • SOLR-8720 - ZkController#publishAndWaitForDownStates should use #publishNodeAsDown
  • SOLR-8771 - Multi-threaded core shutdown creates executor per core
  • SOLR-8855 - The HDFS BlockDirectory should not clean up its cache on shutdown
  • SOLR-8856 - Do not cache merge or 'read once' contexts in the hdfs block cache
  • SOLR-8857 - HdfsUpdateLog does not use configured or new default number of version buckets and is hard coded to 256
  • SOLR-8869 - Optionally disable printing field cache entries in SolrFieldCacheMBean
  • SPARK-10859 - [SQL] Fix stats of StringType in columnar cache
  • SPARK-10914 - UnsafeRow serialization breaks when two machines have different Oops size
  • SPARK-11009 - [SQL] Fix wrong result of Window function in cluster mode
  • SPARK-11537 - [SQL] Fix negative hours/minutes/seconds
  • SPARK-11737 - [SQL] Fix serialization of UTF8String with Kyro
  • SPARK-12617 - [PYSPARK] Move Py4jCallbackConnectionCleaner to Streaming, clean up the leak sockets of Py4J
  • SPARK-14477 - [BUILD] Allow custom mirrors for downloading artifacts in build/mvn
  • SQOOP-2847 - Sqoop --incremental + missing parent --target-dir reports success with no data

Selected tab: WhatsNew

Want to Get Involved or Learn More?

Check out our other resources

Cloudera Community

Collaborate with your peers, industry experts, and Clouderans to make the most of your investment in Hadoop.

Cloudera University

Receive expert Hadoop training through Cloudera University, the industry's only truly dynamic Hadoop training curriculum that’s updated regularly to reflect the state of the art in big data.