Download StreamSets Data Collector

Continuous Big Data Ingest Made Simple

StreamSets Data Collector is open-source, in-memory big data ingest infrastructure that lets you develop and operate highly-adaptable ingest pipelines for CDH with minimal coding.

StreamSets Data Collector can be installed as a Cloudera Manager parcel via a Custom Service Descriptor (CSD) file, via an RPM bundle or as a tarball. Source code for the CSD is available on Github. A Docker image is available on Docker Hub.

A graphical IDE lets you design, test and debug ingest flows without requiring schema specification.

Built-in transformations help you sanitize, sample and route your data as needed.

Intelligent monitoring gives you runtime visibility to data flow performance, including stage-specific early warnings about anomalies and outliers.

Deep integration with the Hadoop ecosystem, including connectors for HDFS, HBase, Kafka and Solr

Flexible deployment of pipelines to edge servers or to the Enterprise Data Hub as a Spark Streaming application or MapReduce job.

Seamless management of infrastructure via Cloudera Manager and parcels

System Requirements
Resources

System Requirements

Selected tab: systemrequirements

Resources

Install StreamSets

Cloudera Manager Installation

Documentation

Integrations

Product Brief

Selected tab: resources

Want to Get Involved or Learn More?

Check out our other resources

Cloudera Community

Collaborate with your peers, industry experts, and Clouderans to make the most of your investment in Hadoop.

Check it out now

Cloudera Educational Services

Receive expert Hadoop training through Cloudera Educational Services, the industry's only truly dynamic Hadoop training curriculum that’s updated regularly to reflect the state of the art in big data.

Check it out now

Misa Amane

Download StreamSets Data Collector

Continuous Big Data Ingest Made Simple

Get Started Now

Thank you for downloading StreamSets Data Collector