Your browser is out of date

Update your browser to view this website correctly. Update my browser now


StreamSets Data Collector

Solutions Gallery > StreamSets Data Collector

Solution overview

StreamSets Data Collector can be installed as a Cloudera Manager parcel via a Custom Service Descriptor (CSD) file, via an RPM bundle or as a tarball. Source code for the CSD is available on Github. A Docker image is available on Docker Hub. A graphical IDE lets you design, test and debug ingest flows without requiring schema specification.

  • Built-in transformations help you sanitize, sample and route your data as needed.
  • Intelligent monitoring gives you runtime visibility to data flow performance, including stage-specific early warnings about anomalies and outliers.
  • Deep integration with the Hadoop ecosystem, including connectors for HDFS, HBase, Kafka and Solr
  • Flexible deployment of pipelines to edge servers or to the Enterprise Data Hub as a Spark Streaming application or MapReduce job.
  • Seamless management of infrastructure via Cloudera Manager and parcels

Key highlights 

Security, Risk & Compliance

About StreamSets 
StreamSets software delivers performance management for data flows that feed the next generation of big data applications. Its mission is to bring operational excellence to the management of data in motion, so that data arrives on time and with quality, accelerating analysis and decision making. StreamSets Data Collector is in use at hundreds of companies where it brings unprecedented visibility into and control over data as it moves between an expanding variety of sources and destinations.

Founded in 2014 by Girish Pancha, former chief product officer of Informatica, and Arvind Prabhakar, an early employee and engineering leader at Cloudera, StreamSets is headquartered in San Francisco and is backed by Accel Partners, Battery Ventures and New Enterprise Associates (NEA).


True Performance Management for Multiple Data Flows


How to Build Continuous Ingestion for the Internet of Things

Learn more about the solution from our partner

Your form submission has failed.

This may have been caused by one of the following:

  • Your request timed out
  • A plugin/browser extension blocked the submission. If you have an ad blocking plugin please disable it and close this message to reload the page.