This is the documentation for Cloudera 5.2.x.
Documentation for other versions is available at Cloudera Documentation.

Cloudera Introduction

Cloudera provides a scalable, flexible, integrated platform that makes it easy to manage rapidly increasing volumes and varieties of data in your enterprise. Industry-leading Cloudera products and solutions enable you to deploy and manage Apache Hadoop and related projects, manipulate and analyze your data, and keep that data secure and protected.

Cloudera provides the following products and tools:
  • CDH—The Cloudera distribution of Apache Hadoop and other related open-source projects, including Impala and Cloudera Search. CDH also provides security and integration with numerous hardware and software solutions.
  • Cloudera Manager—A sophisticated application used to deploy, manage, monitor, and diagnose issues with your CDH deployments. Cloudera Manager provides the Admin Console, a web-based user interface that makes administration of your enterprise data simple and straightforward. It also includes the Cloudera Manager API, which you can use to obtain cluster health information and metrics, as well as configure Cloudera Manager.
  • Cloudera Navigator—An end-to-end governance tool for the CDH platform. Cloudera Navigator enables administrators, data managers, and analysts to explore the large amounts of data in Hadoop. The robust auditing, governance, lineage management, and life cycle management in Cloudera Navigator allow enterprises to adhere to stringent compliance and regulatory requirements.
  • Cloudera Impala—A massively parallel processing SQL engine for interactive analytics and business intelligence. Its highly optimized architecture makes it ideally suited for traditional BI-style queries with joins, aggregations, and subqueries. It can query Hadoop data files from a variety of sources, including those produced by MapReduce jobs or loaded into Hive tables. The YARN and Llama resource management components let Impala coexist on clusters running batch workloads concurrently with Impala SQL queries. You can manage Impala alongside other Hadoop components through the Cloudera Manager user interface, and secure its data through the Sentry authorization framework.

This introductory guide provides a general overview of Cloudera Manager, CDH, and Cloudera Navigator. This guide also includes frequently asked questions about Cloudera products and describes how to get support, report issues, and receive information about updates and new releases.

The following guides are included in the Cloudera documentation set:

Guide

Description

Cloudera Introduction

This guide provides a general overview of Cloudera Manager, CDH, and Navigator, as well as answers to frequently asked questions. It also describes how to get support, find information about new releases, and report any issues that you encounter.

Cloudera Release Guide

This guide contains release and download information for installers and administrators. It includes release notes as well as information about versions and downloads. The guide also provides a release matrix that shows which major and minor release version of a product is supported with which release version of Cloudera Manager, CDH and, if applicable, Cloudera Search and Cloudera Impala.

Cloudera QuickStart

This guide describes how to quickly install Cloudera software and create initial deployments for proof of concept (POC) or development. It describes how to download and use the QuickStart virtual machines, which provide everything you need to start a basic installation. It also shows you how to create a new installation of Cloudera Manager 5, CDH5, and managed services on a cluster of four hosts. Quick start installations should be used for demonstrations and POC applications only and are not recommended for production.

Cloudera Installation and Upgrade

This guide provides Cloudera software requirements and installation information for production deployments, as well as upgrade procedures. This guide also provides specific port information for Cloudera software.

Cloudera Administration

This guide describes how to configure and administer a Cloudera deployment. Administrators manage resources, availability, and backup and recovery configurations. In addition, this guide shows how to implement high availability, and discusses integration.

Cloudera Governance

This guide describes how to perform governance using Cloudera Navigator. Governance activities include auditing access to data residing in HDFS and Hive metastores, reviewing and updating metadata, and discovering the lineage of data objects.

Cloudera Operation

This guide shows how to monitor the health of a Cloudera deployment and diagnose issues. You can obtain metrics and usage information and view processing activities. This guide also describes how to examine logs and reports to troubleshoot issues with cluster configuration and operation as well as monitor compliance.

Cloudera Security

This guide is intended for system administrators who want to secure a cluster using data encryption, user authentication, and authorization techniques. This topic also provides information about Hadoop security programs and shows you how to set up a gateway to restrict access.

Cloudera Impala Guide

This guide describes Cloudera Impala, its features and benefits, and how it works with CDH. This topic introduces Impala concepts, describes how to plan your Impala deployment, and provides tutorials for first-time users as well as more advanced tutorials that describe scenarios and specialized features. You will also find a language reference, performance tuning, instructions for using the Impala shell, troubleshooting information, and frequently asked questions.

Cloudera Search Guide

This guide provides Cloudera Search prerequisites, shows how to load and index data in search, and shows how to use Search to query data. In addition, this guide provides a tutorial, various indexing references, and troubleshooting information.

Cloudera Glossaries

This guide contains glossaries of terms for Cloudera components.