Join our experts for live weekly demos
Find out CDP's latest features and capabilities and get answers to pressing questions by joining Cloudera product experts in live weekly demos.
Build AI-based web Applications with Cloudera Machine Learning
Thursday, August 04, 2022
Learn how Cloudera Machine Learning (CML) enables data science practitioners to quickly deliver an ML-based web application to business users.
The demo will demonstrate how users can discover and ingest data sets, train ML models with whatever library or language they are most comfortable with, quickly deploy the ML model with an API, and build a web application for business users to interact with the ML model’s API.
See the benefits of CDP through how-to videos.
Understand the use cases that CDP solves and learn how to successfully deploy and use the full range of the Cloudera Data Platform.
Special Hybrid Event – Apache Iceberg: Looking Below the Waterline
Thursday, December 8, 2022
In a very short span of time since its advent, Apache Iceberg has become the most popular, fastest growing and widely adopted open table format in the big data space. It addresses some of the known big data pain points around data consistency, scalability, performance, schema and partition evolution. In this meetup, you'll hear from the key partners in the open source community leading and driving Iceberg enhancements and roadmap. We have a full agenda; here's a summary of the four talks we plan to deliver:
Apache Iceberg for BI use cases
This talk will cover the integration of Iceberg open table format with Apache Hive and Impala compute engines, Iceberg v1 and v2 capabilities support, customer use cases and future Iceberg enhancements and innovations in the works at Cloudera. We'll take a detailed look into the following capabilities supported in Hive and Impala:
- Critical functional and performance enhancements
- Materialized views support
- In-place Table migration of Hive external to Iceberg tables
- Row level update/delete
- Table rollback
- Table maintenance
Learn how Teranet keeps up with the changing growth and requirements of their business using Apache Iceberg for their change data capture use case leveraging Spark and Impala.
Multi-function Analytics with Apache Iceberg
This session will present a demonstration of using Spark with Iceberg tables, highlighting key Iceberg features. We'll show the interoperability of Spark with Hive and Impala. Along the way, we'll cover Cloudera's contributions for improving Spark and Impala support on Iceberg.
Apache Iceberg's REST Catalog - Real and Potential Uses Beyond Data Workflows
Iceberg's new REST catalog provides a friendly access point for the rich metadata and functionality that comes with an Iceberg-powered data warehouse. This makes Iceberg even easier to integrate into compute engines and makes catalog operations available from pretty much any client you can imagine. However, the power of the REST catalog doesn't stop there. There are a myriad of tools and features that sit on the edge of the data platform that benefit highly from the REST catalog design. In this talk, we want to cover a few creative uses that currently exist as well as some imaginative uses that could exist.
Incremental compaction using Apache Iceberg
At Linkedin, streaming data in the form of Kafka topics is ingested to the data lake by low-latency ingestion pipelines powered by Apache Gobblin. This often leads to smaller files that can contain duplicate records due to at-least once delivery semantics, which lead to the creation of another set of pipelines that deduplicate data for correctness and compact into larger files for storage and query efficiency.
Those compaction pipelines are bursty, compute intensive and have higher latency due to their batch processing nature. With the increase in data volume, it becomes increasingly important to process/compute data in an incremental fashion for optimal resource utilization and lower latency. In this talk, we present how Linkedin leverages Iceberg to migrate its compaction pipelines from batch to incremental processing models and solve such latency and compute problems. We also show how that leads to an improvement in overall cluster resource utilization and more uniform workload distribution. Furthermore, we will also focus on how we optimize compaction and data deduplication in light of late data.
The registration link below will prompt you to log into your Linked In account:
CDP TECHNICAL BLOGS
Switching from CPUs to GPUs for NYC Taxi Fare Predictions with NVIDIA RAPIDS
By Jacob Bengtson
This blog demonstrates how easy it is to adapt a script built with popular CPU based Python libraries, like Pandas and Scikitlearn, to instead run with GPU based Python libraries, like cuDF and cuML.
By Tui Leauanae, David LeGrand, and Nicolas Pelaez
This is the first in a six-part blog series that outlines the data journey from edge to AI and the business value data produces along the journey. The data journey is not linear, but it is an infinite loop data lifecycle – initiating at the edge, weaving through a data platform, and resulting in business imperative insights applied to real business-critical problems that result in new data-led initiatives.
Explore the Cloudera Community
Join the Cloudera Community and connect with more than 69,000 of your peers, discussing more than 18,000 solutions.
How to Connect Go Applications to Cloudera Operational Database
The Cloudera Operational Database (COD) experience is a managed dbPaaS solution. It can auto-scale based on the workload utilization of the cluster and will be adding the ability to auto-tune (better performance within the existing infrastructure footprint) and auto-heal (resolve operational problems automatically) later this year.
Spark Structured Streaming Example with CDE
This demo will pull from the Twitter API using NiFi, write to payload to a Kafka topic named "twitter".
How to Configure K9s for Cloudera Data Engineering
How to use K9s to fetch metrics and logs for Cloudera Data Warehouse Experience , I decided to create the same tutorial for Cloudera Data Engineering. The process is very similar, as you can see below.
Accelerate success with Cloudera SmartServices expertise
Move from pilot to production quickly, cost-effectively, and securely with hands-on technical insight from Cloudera experts. Our comprehensive portfolio of services helps you shorten time to value from CDP by providing the right offerings and support for everything from launching to accelerating and expanding your deployment.
SmartMigrate: Services for moving to Cloudera Data Platform
Upgrade existing CDH and HDP deployments and migrate to CDP Data Center while minimizing risk, business disruptions, and SLA violations.
Central repository for technical content on all Cloudera products.
Find guides, quick starts, manuals, and best practices broken down by product and by task.