When data from different sources with different formats are dumped in a data lake, you end up with a “data swamp” where your data becomes unmanageable and un-navigable, overwhelming users.
In this highly requested session, we will dive into the “data swamp” problem and introduce the modern lakehouse paradigm. You will learn:
What core components make up a metadata catalog, and how to populate it with automated data lineage and metadata harvesting
How to use end-to-end search and discovery workflows to discover new datasets, understand schema evolution, and assess data quality
Best practices for integrating metadata into your CI/CD pipelines to keep your catalog fresh
Ways to optimize resources while increasing operational efficiency
Keep your data from getting bogged down in a data swamp—explore how a robust metadata catalog can turn an ungoverned data lake into a trusted lakehouse.
This may have been caused by one of the following: