Cloudera named a market leader in 2023 GigaOm Radar Report for Data Lakes & Lakehouses Get the report

Overview

This introduces Apache Iceberg, a high-performance open table format for organizing petabyte-scale analytic datasets on a file system or object store, available on Cloudera Data Warehouse and Cloudera Data Engineering on both Private and Public Cloud. Combined with Cloudera Data Platform, Iceberg can enable users to build an open data lakehouse architecture for multi-function analytics and to deploy large-scale end-to-end pipelines. This course covers various aspects of Apache Iceberg, such as benefits, architecture, internal operation, read and write operations, and advanced functions, all while drawing comparisons to Hive and building on the students’ existing knowledge and experience.

Download full course description

What you'll learn

  • Gain a deep understanding of Iceberg's benefits, snapshots, and their functionalities.

  • Confidently build external and managed tables, configuring copy-on-write and merge-on-read for optimized data management.

  • Perform rollbacks and time travel, navigate schema and partition evolution, and utilize hidden partitions.
  • Create and merge table branches, mastering Iceberg's write-audit-publish procedure.
  • Efficiently perform table maintenance tasks and tackle data migration challenges.

Who Should Take This Course?

This course is for new and existing customers using Cloudera Data Warehouse or Cloudera Data Engineering on Private or Public Cloud who are interested in benefiting from using Apache Iceberg. The course is designed for Data Engineers, Hive SQL Developers, Kafka Streaming Engineers, Data Scientists, and CDP Admins. A general knowledge of HDFS and experience with Hive and Spark are required.

Other Training That Might Interest You

  • Developing Applications with Apache Spark
  • Analyzing with Cloudera Data Warehouse

Book the course

Course Details

Introduction

  • Apache Hive
  • Why Iceberg?
  • Data Lakehouses
  • What is Iceberg?

Catalogs

  • Review Iceberg Catalog Configuration

Iceberg Concepts

  • Snapshots
  • Metadata Layer: Manifest List, Manifest Files
  • Time Travel
  • Schema Evolution
  • Hidden Partition
  • Write-Audit-Publish (WAP)
  • Branches, Tags, Zero-Copy-Clone

Iceberg Table Design

  • Managed & External Tables
  • Table Properties Review
  • Copy-On-Write (COW) vs Merge-On-Read (MOR)
  • Hidden Partitions
  • Compare Hive vs Iceberg Partition Design
  • Table Metadata
  • Table Maintenance

Data-As-Code

  • Iceberg Personas
  • Write-Audit-Publish (WAP)
  • Branches & Tagging

Hive-to-Iceberg Table Migration

  • In-place Migration
  • Shallow Migration

Learn more

Administrator Certification

Upon completion of the course, attendees are encouraged to continue their study and register for the Cloudera Certified Administrator (CCA) exam. Certification is a great differentiator. It helps establish you as a leader in the field, providing employers and customers with tangible evidence of your skills and expertise.

Advance your career

Cloudera Administrators are among the most in-demand roles. Check out some of the job opportunities currently listed that match the professional profile, many of which seek CCA qualification.

Private training

We also provide private training at your site, at your pace, and tailored to your needs.

Your form submission has failed.

This may have been caused by one of the following:

  • Your request timed out
  • A plugin/browser extension blocked the submission. If you have an ad blocking plugin please disable it and close this message to reload the page.