HDFS, MapReduce, and YARN (Core Hadoop)
Apache Hadoop's core components, which are integrated parts of CDH and supported with Cloudera Enterprise, allow you to store and process unlimited amounts of data of any type, all within a single platform.
HDFS Key Features
HDFS is designed for massive scalability, so you can store unlimited amounts of data in a single platform. As your data needs grow, you can simply add more servers to linearly scale with your business.
Store data of any type — structured, semi-structured, unstructured — without any upfront modeling. Flexible storage means you always have access to full-fidelity data for a wide range of analytics and use cases.
Automatic, tunable replication means multiple copies of your data are always available for access and protection from data loss. Built-in fault tolerance means servers can fail but your system will remain available for all workloads.
MapReduce Key Features
Supports a wide range of languages for developers, including C++, Java, or Python, as well as high-level language through Apache Hive and Apache Pig.
Process any and all data, regardless of type or format — whether structured, semi-structured, or unstructured. Original data remains available even after batch processing for further analytics, all in the same platform.
Built-in job and task trackers allows processes to fail and restart without affecting other processes or workloads. Additional scheduling allows you to prioritize processes based on needs such as SLAs.
MapReduce is designed to match the massive scale of HDFS and Hadoop, so you can process unlimited amounts of data, fast, all within the same platform where it’s stored.
YARN Key Features
YARN is designed to handle scheduling for the massive scale of Hadoop so you can continue to add new and larger workloads, all within the same platform.
Dynamic resource management provided by YARN supports multiple engines and workloads all sharing the same cluster resources. Open up your data to users across the entire business environment through batch, interactive, advanced, or real-time processing, all within the same platform so you can get the most value from your Hadoop platform.
Optimal Workload Management:
Fine-grained configurations allow for better cluster utilization so you can enable workload SLAs for priority workloads and group-based policies across the business. Process more data in more ways, without disrupting your most critical operations.
Integrated across the platform
Core Hadoop, including HDFS, MapReduce, and YARN, is part of the foundation of Cloudera’s platform. All platform components have access to the same data stored in HDFS and participate in shared resource management via YARN. Hadoop, as part of Cloudera’s platform, also benefits from simple deployment and administration (through Cloudera Manager) and shared compliance-ready security and governance (through Apache Sentry and Cloudera Navigator) — all critical for running in production.
Cloudera’s commitment to Hadoop
Cloudera is actively involved with the Hadoop community, including having Doug Cutting, one of the initial creators of Hadoop, as Cloudera’s Chief Architect. As the most popular Hadoop distribution platform, Cloudera has the most experience with Hadoop — with large-scale production customers across industries — and also has the committers on-staff to continuously drive innovations improvements for our customers and the community.
Expert support for Core Hadoop
Cloudera has Hadoop experts available across the globe ready to deliver world-class support 24/7. With more experience across more customers, for more use cases, Cloudera is the leader in Hadoop support so you can focus on results.