Your browser is out of date

Update your browser to view this website correctly. Update my browser now



Watch The Video

Why Cloudera + StreamSets?

You can’t analyze what you can’t ingest. To realize the full value of Cloudera you need to continuously land consumption-ready data into your data platforms for utilization by a growing number of applications.  The StreamSets DataOps Platform allows you to build, execute and operate data flow pipelines for streaming and batch movement of data into Cloudera Enterprise Data Hub.

  • Data engineers and data scientists can easily design and test complex pipelines. Unique data previews provide a view into what’s happening at every stage of a pipeline, simplifying the development process and reducing errors.
  • Schema-less design means you only focus on the data attributes you care about and StreamSets intelligent pipelines detect change in your data and automatically propagate changes downstream, minimizing the impact of the constant change that occurs due to normal business operations.
  • StreamSets also adds monitoring for performance, data quality and sensitive data, so you can deploy operations-ready pipelines at scale.
  • StreamSets deploys both on edge or into your Cloudera cluster via Cloudera Manager.

Unlocking the potential of big data requires getting consumption-ready data into the enterprise data hub while dealing with constantly-changing sources, consuming applications and business requirements. Our partnership and technical integrations with Cloudera make it easy for our customers to build and operate continuous data flows into Cloudera Enterprise Data Hub that improve both the speed and quality of downstream analysis.

Girish Pancha, CEO, StreamSets

Joint Solution Overview

Accelerate Data Hub Time to Value with StreamSets + Cloudera Enterprise

A key step in modernizing your data processing architecture is to upgrade how you move data from logs, IoT sensors, and other sources to your enterprise data hub.  An integrated solution combining StreamSets with Cloudera Enterprise makes it possible to continually feed your analytics applications consumption-ready data with efficiency, operational control, and agility.

StreamSets deploys via a Cloudera Manager parcel onto your cluster. It provides a full-featured, integrated development environment (IDE) that lets you build, execute and operate  any-to-any ingest pipelines that mesh stream and batch data, and include a variety of in-stream transformations—all without having to write custom code.  StreamSets lets you build data flows with direct integration to numerous Cloudera Enterprise components including  HDFS, Kafka, Solr, Hive, HBASE, Impala, CDSW,  Kudu, and Cloudera Navigator.

Once StreamSets is running, you get real-time monitoring for both data anomalies and data flow operations, including threshold-based alerting, anomaly detection, and automatic remediation of error records.  Because it is architected to logically isolate each stage in a pipeline, you can meet new business requirements by dropping in new processors and connectors without code and with minimal downtime.

StreamSets Data Collector in the Partner Solutions Gallery

Learn More

About StreamSets

The StreamSets DataOps platform that enables companies to build, execute, operate and protect continuous dataflows that unleash pervasive analytics. It combines award-winning open source software featuring Intelligent Pipelines with a cloud-native control plane that helps enterprises manage their data movement as a continuous ingestion practice.

Founded by Girish Pancha, former chief product officer of Informatica, and Arvind Prabhakar, a former engineering leader at Cloudera, StreamSets is backed by top-tier Silicon Valley venture capital firms, including Battery Ventures, New Enterprise Associates (NEA), and Accel Partners. For more information, visit

Key highlights



Data Integration & Processing


Partner Website

Partnership highlights

  • 2017 Cloudera Partner Impact Award Winner for Modernizing IT
  • Best-in-class data ingest infrastructure for use with Cloudera Enterprise
  • Build ingest pipelines for Cloudera in minutes, not hours or days
  • Schema-less intelligent pipelines adapt to change to enable continuous execution
  • Continuously deliver consumption-ready data to accelerates time-to-insights through
  • Includes connectors for many Cloudera components including HDFS, Kafka, Solr, Hive, HBASE and Kudu
  • Publish metadata directly from StreamSets into Cloudera Navigator
  • Deploys on edge or into Cloudera Enterprise Hub as a Cloudera Manager parcel

Your form submission has failed.

This may have been caused by one of the following:

  • Your request timed out
  • A plugin/browser extension blocked the submission. If you have an ad blocking plugin please disable it and close this message to reload the page.