Resource Library

Cloudera offers a variety of materials on big data consolidation, storage and processing. The library includes high-level overviews as well as detailed information on Apache Hadoop and the surrounding ecosystem.

  1. Real-time HBase: Lessons from the Cloud - Operations Session 3
    • Monday, Jun 16 2014
    • Category: HBaseCon, Presentation, Video
    Running HBase in real time in the cloud provides an interesting and ever-changing set of challenges -- instance types are not ideal, neighbors can degrade your performance, and instances can randomly die in unanticipated ways. This talk will cover what HubSpot has learned about running in production on Amazon EC2, how to handle DR and redundancy, and the tooling the team has found to be the most helpful.
  2. /content/cloudera/en/resources/library/recordedwebinar/intel-and-cloudera--accelerating-enterprise-big-data-success-video/jcr:content/mainContent/resourcecomponent.img.png/1405383703159.png
    Intel and Cloudera: Accelerating Enterprise Big Data Success
    • Thursday, Jun 12 2014
    • Category: Video, Recorded Webinars, Big Data, Data hub
    Learn how Cloudera and Intel are jointly innovating through open source software to enable Hadoop to run best on IA (Intel Architecture) and to foster the evolution of a vibrant Big Data ecosystem.
  3. /content/cloudera/en/resources/library/recordedwebinar/intel-and-cloudera--accelerating-enterprise-big-data-success/jcr:content/mainContent/resourcecomponent.img.png/1407188813596.png
    Intel and Cloudera: Accelerating Enterprise Big Data Success
    • Thursday, Jun 12 2014
    • Category: Data hub, Business process optimization, Big Data, Presentation, Presentation Slides
    Learn how Cloudera and Intel are jointly innovating through open source software to enable Hadoop to run best on IA (Intel Architecture) and to foster the evolution of a vibrant Big Data ecosystem.
  4. HBaseCon 2014 | Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity -Operations Session 1
    • Thursday, Jun 05 2014
    • Category: HBaseCon, Video, Presentation
    In early 2013, Yahoo! introduced multi-tenancy to HBase to offer it as a platform service for all Hadoop users. A certain degree of customization per tenant (a user or a project) was achieved through RegionServer groups, namespaces, and customized configs for each tenant. This talk covers how to accommodate diverse needs to individual tenants on the cluster, as well as operational tips and techniques that allow Yahoo! to automate the management of multi-tenant clusters at petabyte scale without errors.
  5. /content/cloudera/en/resources/library/casestudy/merkle-delivers-connected-consumer-recognition-with-its-enterpri/jcr:content/mainContent/resourcecomponent.img.png/1405442962329.png
    Merkle Delivers Connected Consumer Recognition with Its Enterprise Data Hub
    • Wednesday, Jun 04 2014
    • File Type: .PDF
    • Category: Case Studies, Document, Data warehousing offload, Data processing ETL offload, Data hub
    Merkle employs an analytically led, data-driven methodology and an enterprise data hub (EDH) from Cloudera to help large consumer brand clients build and sustain profitable customer relationships through smarter marketing.
  6. /content/cloudera/en/resources/library/solution-brief/zoomdata-solution-brief/jcr:content/mainContent/resourcecomponent.img.png/1405463982523.png
    Cloudera and ZoomData Solution Brief
    • Friday, May 30 2014
    • File Type: .PDF
    • Category: Document, Solution Briefs
    Zoomdata's Next Generation Data Analytics and Reporting platform integrates with Cloudera's Impala and Search products to support big data implementations with streaming analytics and unstructured search.
  7. /content/cloudera/en/resources/library/recordedwebinar/best-practices-for-the-hadoop-data-warehouse-video/jcr:content/mainContent/resourcecomponent.img.png/1405383645562.png
    Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
    • Thursday, May 29 2014
    • Category: Recorded Webinars, Video, Why Consolidation Data Platform, Data processing ETL offload
    Dr. Ralph Kimball and Eli Collins describe standard data warehouse best practices in Hadoop and how to implement them within a Hadoop environment. This includes identification of dimensions and facts, managing primary keys, and handling slowly changing dimensions (SCDs) and conformed dimensions.
  8. /content/cloudera/en/resources/library/recordedwebinar/best-practices-for-the-hadoop-data-warehouse-slides/jcr:content/mainContent/resourcecomponent.img.png/1407188576036.png
    Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
    • Thursday, May 29 2014
    • Category: Video, Why Consolidation Data Platform, Data processing ETL offload, Presentation Slides
    Dr. Ralph Kimball and Eli Collins describe standard data warehouse best practices in Hadoop and how to implement them within a Hadoop environment. This includes identification of dimensions and facts, managing primary keys, and handling slowly changing dimensions (SCDs) and conformed dimensions.
  9. /content/cloudera/en/resources/library/recordedwebinar/large-scale-machine-learning-with-apache-spark/jcr:content/mainContent/resourcecomponent.img.png/1405383605390.png
    Large Scale Machine Learning with Apache Spark
    • Wednesday, May 21 2014
    • Category: Recorded Webinars, Video, CDH, Predictive modeling, Cyber security, Fraud detection
    Spark offers a number of advantages over its predecessor MapReduce that make it ideal for large-scale machine learning. For example, Spark includes MLLib, a library of machine learning algorithms for large data. The presentation will cover the state of MLLib and the details of some of the scalable algorithms it includes, mainly K-means.
  10. /content/cloudera/en/resources/library/recordedwebinar/large-scale-machine-learning-with-apache-spark-slides/jcr:content/mainContent/resourcecomponent.img.png/1405383623252.png
    Large Scale Machine Learning with Apache Spark
    • Wednesday, May 21 2014
    • Category: CDH, Predictive modeling, Cyber security, Fraud detection, Presentation Slides, Presentation
    Spark offers a number of advantages over its predecessor MapReduce that make it ideal for large-scale machine learning. For example, Spark includes MLLib, a library of machine learning algorithms for large data. The presentation will cover the state of MLLib and the details of some of the scalable algorithms it includes, mainly K-means.