CDP resources

Build skills to deliver innovation with Cloudera Data Platform

Keep abreast of the latest Cloudera technology with content and tools tailored for developers, analysts, data scientists, architects, and admins.

WEEKLY DEMOS

Join our experts for live weekly demos 

Find out CDP's latest features and capabilities and get answers to pressing questions by joining Cloudera product experts in live weekly demos. 

Build AI-based web Applications with Cloudera Machine Learning

Thursday, August 04, 2022

Learn how Cloudera Machine Learning (CML) enables data science practitioners to quickly deliver an ML-based web application to business users.

The demo will demonstrate how users can discover and ingest data sets, train ML models with whatever library or language they are most comfortable with, quickly deploy the ML model with an API, and build a web application for business users to interact with the ML model’s API.


CDP Demos
  • Exploratory data analytics to uncover answers to burning business questions [Complete Recording]
  • Exploratory Data Science to discover and visualize data for building machine learning models [Complete Recording]
  • Universal Data Distribution to connect data from any source to any destination [Complete Recording]
  • Multistage Data Pipelines with Cloudera Data Platform (CDP) [highlight]
  • Security & Governance with Cloudera Shared Data Experience (SDX)  [highlight]
  • Streaming Data with Cloudera DataFlow (CDF) [highlight]
  • Enterprise Machine Learning with Cloudera Machine Learning (CML) [highlight]
  • Analytics with Cloudera Data Warehouse (CDW) [highlight]
  • Application Development with Cloudera Operational Database (COD) [highlight]
VIDEOS

See the benefits of CDP through how-to videos. 

Understand the use cases that CDP solves and learn how to successfully deploy and use the full range of the Cloudera Data Platform.


 

TOURS

Experience CDP for yourself

Click below to begin an interactive CDP product tour

 

More CDP Tours

Tutorials to help build, deploy and scale 

Optimize your time with detailed tutorials that clearly explain the best way to deploy, use, and manage Cloudera products.

Less CDP Tours
TUTORIALS
Tutorials

Tutorials to help build, deploy and scale 

Optimize your time with detailed tutorials that clearly explain the best way to deploy, use, and manage Cloudera products.

How to Create a CDP Private Cloud Base Development Cluster

Walk through the installation process for CDP Private Cloud Base (trial version).

Create a Simple Web Application using Cloudera Operational Database

Use Cloudera Operational Database (COD) and Machine Learning (CML) to create a simple web application.

Processing DICOM Files with Spark on CDP

Use Cloudera Data Engineering (CDE) on Cloudera Data Platform (CDP) to transform the DICOM files produced by an MRI into PNG images.

Using NVIDIA RAPIDS to Accelerate AI Training in CDP Hybrid Cloud

Explore how you can leverage NVIDIA's RAPIDS framework using Cloudera Machine Learning (CML), on the Cloudera Data Platform (CDP).

 

EVENTS
Event banner for the Apache Iceberg: Looking Below the Waterline Hybrid Meetup
Meetup

Special Hybrid Event – Apache Iceberg: Looking Below the Waterline


Thursday, December 8, 2022

In a very short span of time since its advent, Apache Iceberg has become the most popular, fastest growing and widely adopted open table format in the big data space. It addresses some of the known big data pain points around data consistency, scalability, performance, schema and partition evolution. In this meetup, you'll hear from the key partners in the open source community leading and driving Iceberg enhancements and roadmap. We have a full agenda; here's a summary of the four talks we plan to deliver:

Apache Iceberg for BI use cases

This talk will cover the integration of Iceberg open table format with Apache Hive and Impala compute engines, Iceberg v1 and v2 capabilities support, customer use cases and future Iceberg enhancements and innovations in the works at Cloudera. We'll take a detailed look into the following capabilities supported in Hive and Impala:

  • Critical functional and performance enhancements
  • Materialized views support
  • In-place Table migration of Hive external to Iceberg tables
  • Row level update/delete
  • Table rollback
  • Table maintenance

Learn how Teranet keeps up with the changing growth and requirements of their business using Apache Iceberg for their change data capture use case leveraging Spark and Impala.

 

Multi-function Analytics with Apache Iceberg

This session will present a demonstration of using Spark with Iceberg tables, highlighting key Iceberg features. We'll show the interoperability of Spark with Hive and Impala. Along the way, we'll cover Cloudera's contributions for improving Spark and Impala support on Iceberg.

 

Apache Iceberg's REST Catalog - Real and Potential Uses Beyond Data Workflows

Iceberg's new REST catalog provides a friendly access point for the rich metadata and functionality that comes with an Iceberg-powered data warehouse. This makes Iceberg even easier to integrate into compute engines and makes catalog operations available from pretty much any client you can imagine. However, the power of the REST catalog doesn't stop there. There are a myriad of tools and features that sit on the edge of the data platform that benefit highly from the REST catalog design. In this talk, we want to cover a few creative uses that currently exist as well as some imaginative uses that could exist.


Incremental compaction using Apache Iceberg

At Linkedin, streaming data in the form of Kafka topics is ingested to the data lake by low-latency ingestion pipelines powered by Apache Gobblin. This often leads to smaller files that can contain duplicate records due to at-least once delivery semantics, which lead to the creation of another set of pipelines that deduplicate data for correctness and compact into larger files for storage and query efficiency.

Those compaction pipelines are bursty, compute intensive and have higher latency due to their batch processing nature. With the increase in data volume, it becomes increasingly important to process/compute data in an incremental fashion for optimal resource utilization and lower latency. In this talk, we present how Linkedin leverages Iceberg to migrate its compaction pipelines from batch to incremental processing models and solve such latency and compute problems. We also show how that leads to an improvement in overall cluster resource utilization and more uniform workload distribution. Furthermore, we will also focus on how we optimize compaction and data deduplication in light of late data.

The registration link below will prompt you to log into your Linked In account:


Past Virtual Meetups

Watch recordings of some of our recent "virtual meetups" held by one of our Future of Data network of local meetup groups via YouTube to see why more than 49,000 of the world's data practitioners choose to work with Cloudera products and services.

CDP TECHNICAL BLOGS

Switching from CPUs to GPUs for NYC Taxi Fare Predictions with NVIDIA RAPIDS

By Jacob Bengtson

This blog demonstrates how easy it is to adapt a script built with popular CPU based Python libraries, like Pandas and Scikitlearn, to instead run with GPU based Python libraries, like cuDF and cuML.

Next Stop – Predicting on Data with Cloudera Machine Learning

By Robert Hryniewicz

This blog series follows the manufacturing and operations data lifecycle stages (Predictive Analytics) of an electric car manufacturer - typically experienced in large, data-driven manufacturing companies.

Next Stop - Building a Data Pipeline from Edge to Insight

By Tui Leauanae and Nicolas Pelaez

This blog series follows the manufacturing, operations and sales data for a connected vehicle manufacturer as the data goes through stages and transformations typically experienced in a large manufacturing company on the leading edge of current technology.

Digital Transformation is a Data Journey From Edge to Insight

By Tui Leauanae, David LeGrand, and Nicolas Pelaez

This is the first in a six-part blog series that outlines the data journey from edge to AI and the business value data produces along the journey. The data journey is not linear, but it is an infinite loop data lifecycle – initiating at the edge, weaving through a data platform, and resulting in business imperative insights applied to real business-critical problems that result in new data-led initiatives.

COMMUNITY
Group of people talking and meeting

Explore the Cloudera Community

Join the Cloudera Community and connect with more than 69,000 of your peers, discussing more than 18,000 solutions.

How to Connect Go Applications to Cloudera Operational Database

The Cloudera Operational Database (COD) experience is a managed dbPaaS solution. It can auto-scale based on the workload utilization of the cluster and will be adding the ability to auto-tune (better performance within the existing infrastructure footprint) and auto-heal (resolve operational problems automatically) later this year.

See article

Spark Structured Streaming Example with CDE

This demo will pull from the Twitter API using NiFi, write to payload to a Kafka topic named "twitter".

See article

How to Configure K9s for Cloudera Data Engineering

How to use K9s to fetch metrics and logs for Cloudera Data Warehouse Experience , I decided to create the same tutorial for Cloudera Data Engineering. The process is very similar, as you can see below.

See article

CLOUDERA EDUCATIONAL SERVICES
CDP Training

CDP training

Hone your big data skills with the world’s leading experts through Cloudera Educational Services’ curriculum.

Get certified. Stand out.

PROFESSIONAL SERVICES

Accelerate success with Cloudera SmartServices expertise

Move from pilot to production quickly, cost-effectively, and securely with hands-on technical insight from Cloudera experts. Our comprehensive portfolio of services helps you shorten time to value from CDP by providing the right offerings and support for everything from launching to accelerating and expanding your deployment.

Group of people talking and meeting

CloudSmart: CDP Public Cloud adoption service

Evaluate cloud options, optimize data, and scale analytics, moving workloads to the public cloud with confidence and minimal risk.

Get CloudSmart

SmartMigrate: Services for moving to Cloudera Data Platform

Upgrade existing CDH and HDP deployments and migrate to CDP Data Center while minimizing risk, business disruptions, and SLA violations. 

Get SmartMigrate

SmartHealth: Platform health check for optimal performance

Ensure peak performance with comprehensive platform deployment and use case implementation health check.

Get SmartHealth

DOCUMENTATION
 

Central repository for technical content on all Cloudera products.

Find guides, quick starts, manuals, and best practices broken down by product and by task.

Your form submission has failed.

This may have been caused by one of the following:

  • Your request timed out
  • A plugin/browser extension blocked the submission. If you have an ad blocking plugin please disable it and close this message to reload the page.