Apache Iceberg is now the de facto open standard for managing large-scale structured, semi-structured, and evolving data. It was originally developed in 2017 at Netflix to address the challenges of delivering reliable, petabyte (PB)-scale analytics on Apache Hive and Spark, and has since grown into a robust, open-table format suited to run multiple workloads concurrently.
Iceberg unifies your data and provides SQL behavior to easily access that data. As it continues to evolve with richer SQL capabilities and simplified data operations, Iceberg is increasingly favored by users of varying technical expertise, not just data engineers but also data consumers (data scientists, analysts, and application developers) seeking fast, reliable access to any data.
With Iceberg, organizations gain true separation of compute and storage, enabling unparalleled flexibility. If you're looking for multifunction analytics, AI readiness, and vendor freedom, no other table format comes close.
In less than 10 years, Iceberg has evolved from emerging tech to enterprise standard. Iceberg’s momentum can be credited to its architectural strengths as well as the vibrant, open community behind it.
Importantly, the Iceberg community is led by its users, not just a single vendor. This user-driven governance model helps ensure the project evolves in ways that serve broad, real-world needs—a major reason why it has gained so much traction.
Iceberg’s mainstream adoption was evident at the 2025 Iceberg Summit in San Francisco. The event brought together startups, Fortune 500s, and the three major cloud providers (AWS, Microsoft, and Google), and attendees joined from across the globe—both in person and virtually—everyone eager to learn, contribute, and grow the ecosystem.
A few themes in particular dominated conversations at the summit: interoperability and Iceberg's growing prominence (its expanding ecosystem and capabilities, including automation).
From Netflix to Apple to Bloomberg, many organizations shared how Iceberg enables them to manage a single source of truth that powers multiple workloads—eliminating redundant data copies and reducing data movement across systems. They discussed the various types of workloads that rely on Iceberg’s trusted data layer to deliver segmentation, personalization, churn/relapse predictions, recommendations, optimized customer experience, and more.
Another highlight was the emergence of new open-source tools such as Comet, Polaris, and Lance in the Iceberg ecosystem, designed to enhance performance and support multi-modal analytics and AI.
There was a lot of excitement around the capabilities coming in Iceberg V3 and V4. V3 will significantly bolster data governance, performance optimization, and support for more complex data types like Variant and Geospatial. By leveraging the principles of columnar format, Variant enables advanced querying capabilities, such as filtering and aggregations, on semi-structured data without requiring extensive transformations. Support for Geospatial will allow organizations to manage location-based data, unlocking new use cases. The new adaptive metadata layout proposed in V4 promises to improve performance for small files.
Another hot topic was automating routine maintenance (partitioning, sorting, compaction) via policy-driven DevOps-style interfaces to reduce manual toil. As organizations bring more data into Iceberg tables, this becomes a huge bottleneck since they must hire experts for these maintenance tasks.
As more and more engines access the data in these Iceberg tables, governance, security, and lineage become high priority. Visibility into data flows and data transformations becomes critical to trust the data. This led to discussions around the need for catalog federation and governance to improve visibility across Iceberg tables.
Cloudera featured native integration of Apache Iceberg in its public cloud Lakehouse platform in 2021, followed by on-premises in 2022. Today, a majority of our customers are either running or testing new workloads on Iceberg; in total, our customers manage PBs of data on Iceberg.
Iceberg is a growth vector for Cloudera. We’re seeing a surge in customers migrating Hive workloads to Iceberg to modernize and future-proof their data platforms.” - Venkat Rajaji, SVP of Product Management, Cloudera
Once a company starts its Iceberg journey, the benefits compound, resulting in growing volumes of data on Iceberg tables, expansion of workloads, and emergence of new use cases. Faster performance is often the first motivator, followed by interoperability and workload flexibility for agility. Moving to Iceberg reduces storage, ETL, and operational costs by up to 75%. Capabilities like time travel, snapshots, write-audit-publish, and hidden partitioning further improve efficiency, making it the right choice to deploy new use cases.
Some of the most popular use cases for Iceberg at Cloudera are:
Listen to Illumina and LY Corporation’s journey with Apache Iceberg and how they are overcoming their data and analytic challenges at scale.
While Lakehouse and Iceberg offer significant benefits, including converging all your data and accelerating analytics, there are a few challenges our customers have shared with us related to adopting Iceberg. First, their data lies in multiple clouds, on premises, and in edge systems and moving all that data to the cloud to leverage Iceberg is almost impossible. Hence, they need the same Iceberg support on premises and in the cloud. Second, they need integration with multiple vendor engines so they can easily share data across systems with confidence, lineage, and traceability. As the data grows, manually and continuously optimizing Iceberg tables for optimal performance becomes very expensive, requiring experts and compute resources. Lastly, while Iceberg increases the usage of data, the freedom to bring in any tool introduces risks and requires effective governance and security tools to control access and provide metadata management for auditability, lineage, and visibility to better understand the data and drive usability.
We’re always innovating to solve customer challenges and have made several platform enhancements to address these common pain points, including:
As we envision a future where Apache Iceberg is the foundation and linchpin, empowering cross-platform data and AI, we relentlessly enhance Iceberg's capabilities to unlock unprecedented agility and intelligence for every enterprise.” Bill Zhang, VP of Product Strategies at Cloudera
We believe that Iceberg will continue to dominate as the enterprise standard for open-table formats. The new innovations in automated optimizations, multi-modal support, metadata management, and Python integration will only further drive adoption. Other open-table formats will likely take a more specialized approach suited to run specific workloads or in specific environments to complement Iceberg.
Cloudera’s goal is to help customers build an open data lakehouse powered by Iceberg with lower complexity, greater flexibility, and higher impact. We’re focused on delivering enterprise- grade security and governance, additional optimizations, tiered storage mechanisms, and “catalog of catalogs” to enhance interoperability and collaboration. You can get started today with the Cloudera Lakehouse 5-day trial or by reading our how-to guides.
This may have been caused by one of the following: