Why Cloudera + dbt
Cloudera offers a very broad and powerful platform with end to end security. Integration with dbt will offer our customers a powerful way of managing SQL based transformations with a software development lifecycle (SDLC) approach that is easy for analysts to participate in.
dbt has been gaining a lot of traction and allows teams to quickly and collaboratively deploy structured transformation pipelines following software engineering best practices like modularity, portability, CI/CD, and documentation. Now, anyone who knows SQL can build production-grade data pipelines, backed by the power and security of Cloudera Data Platform (CDP).
About dbt Labs
dbt Labs is on a mission to empower data practitioners to create and disseminate organizational knowledge. Having pioneered the practice of analytics engineering, we're now fortunate to support a community of over 32,000 data practitioners committed to changing how data teams work together.
Joint Solution Overview
Cloudera is proud to bring dbt to the open data lakehouse with adapters for SQL engines supported by CDP. These adapters eliminate the need for separate tools for transformation and data quality frameworks for Impala and Hive users.
Data teams and different business functions build and manage the business logic of transformation pipelines using their own processes using different engines on the same data lakehouse. There is a growing need to have a central, transparent, version-controlled repository with a consistent SDLC experience to manage these transformation pipelines across data teams and different business functions. Streamlining the SDLC has shown to speed up delivery of data projects while increasing transparency and auditability, leading to a much more data-driven organization.
dbt offers this consistent SDLC experience for transformation pipelines. dbt has become an industry wide movement where companies big and small are leveraging it to streamline their transformation pipeline management.
dbt is a popular transformation tool to build and run SQL based data transformations against a data warehouse. By utilizing the existing Cloudera platform, we have built a seamless data-transformation experience for data engineers and data analysts to collaborate on building data pipelines, bringing the business and data engineering teams together in the process of enriching structured data to feed downstream applications, BI and ML needs.
dbt provides functionality to design, develop and deploy SQL based data models and works via an adapter with an underlying SQL engine to carry out those transformations. Cloudera has built integration of dbt’s capabilities with the engines provided in CDP including Impala, Spark and Hive. Data practitioners can now simply install and configure our adapter packages along with dbt-core and begin to transform their data with dbt.
We have sample projects and tutorials to get you started with dbt adapters, and guidelines for how to leverage Cloudera Machine Learning (CML) to provide a flexible GUI to build and deploy dbt models from within the Cloudera Data Platform (CDP).
Related blog posts
Software Vendor (ISV)
- With this partnership, Cloudera customers with only SQL skills can build and manage data transformation pipelines.
- Cloudera customers will be able to define data quality tests, documentation, version control and collaborate on data models by leveraging dbt within our platform.
- We are proud to offer these capabilities for all Cloudera customers with various form factor, On-Prem, Public Cloud, SaaS.