Your browser is out of date

Update your browser to view this website correctly. Update my browser now


Why Cloudera + StreamSets?

You can’t analyze what you can’t ingest. To realize the full value of Hadoop you need to continuously land consumption-ready data into your data management systems for utilization by a growing number of applications.  The open source StreamSets Data Collector allows you to design, test, deploy, operate, and maintain pipelines that flow streaming and batch data into Cloudera Enterprise. You can design and cleanse complex data flows of streaming and batch data, all without writing code, and then monitor the performance of data flow and quality via KPIs with threshold-based alerts. StreamSets Data Collector deploys both on edge or into your Cloudera cluster via Cloudera Manager.

Unlocking the potential of Big Data requires getting consumption-ready data into the enterprise data hub while dealing with constantly-changing sources, consuming applications and business requirements. Our partnership and technical integrations with Cloudera make it easy for our customers to build and operate continuous data flows into Cloudera Enterprise Hub that improve both the speed and quality of downstream analysis.

Girish Pancha, CEO, StreamSets

Joint Solution Overview

Performance Management for Data Flows with StreamSets Data Collector + Cloudera Enterprise

A key step in modernizing your data processing architecture is to upgrade how you move big data from logs, IoT sensors, and other sources through to your enterprise data hub.  An integrated solution combining StreamSets Data Collector with Cloudera Enterprise makes it possible to continually feed your analytics applications consumption-ready data with efficiency, operational control, and agility.  

StreamSets Data Collector deploys via a Cloudera Manager parcel onto your cluster. It provides a full-featured integrated development environment (IDE) that lets you design, test, deploy, and manage any-to-any ingest pipelines that mesh stream and batch data, and include a variety of in-stream transformations—all without having to write custom code.  StreamSets Data Collector lets you build data flows, including numerous Cloudera Enterprise components such as HDFS, Kafka, Solr, Hive, HBASE, and Kudu.

Once StreamSets Data Collector is running on edge or in your Hadoop cluster, you get real-time monitoring for both data anomalies and data flow operations, including threshold-based alerting, anomaly detection, and automatic remediation of error records.  Because it is architected to logically isolate each stage in a pipeline, you can meet new business requirements by dropping in new processors and connectors without code and with minimal downtime.  

StreamSets Data Collector in the Partner Solutions Gallery

Streamsets Modern Ingest for Network Threat Detection in the Partner Solutions Gallery

Learn More

About StreamSets

StreamSets software delivers performance management for data flows that feed the next generation of big data applications. Its mission is to bring operational excellence to the management of data in motion, so that data arrives on time and with quality, accelerating analysis and decision making. StreamSets Data Collector is in use at hundreds of companies where it brings unprecedented visibility into and control over data as it moves between an expanding variety of sources and destinations.

Founded in 2014 by Girish Pancha, former chief product officer of Informatica, and Arvind Prabhakar, an early employee and engineering leader at Cloudera, StreamSets is headquartered in San Francisco and is backed by Accel Partners, Battery Ventures and New Enterprise Associates (NEA). For more information, visit

Your form submission has failed.

This may have been caused by one of the following:

  • Your request timed out
  • A plugin/browser extention blocked the submission. If you have an ad blocking plugin please disable it and close this message to reload the page.