Why Cloudera + StreamSets?
You can’t analyze what you can’t ingest. To realize the full value of Hadoop you need to continuously land consumption-ready data into your data management systems for utilization by a growing number of applications. The open source StreamSets Data Collector allows you to design, test, deploy, operate, and maintain pipelines that flow streaming and batch data into Cloudera Enterprise. You can design and cleanse complex data flows of streaming and batch data, all without writing code, and then monitor the performance of data flow and quality via KPIs with threshold-based alerts. StreamSets Data Collector deploys both on edge or into your Cloudera cluster via Cloudera Manager.
Joint Solution Overview
Performance Management for Data Flows with StreamSets Data Collector + Cloudera Enterprise
A key step in modernizing your data processing architecture is to upgrade how you move big data from logs, IoT sensors, and other sources through to your enterprise data hub. An integrated solution combining StreamSets Data Collector with Cloudera Enterprise makes it possible to continually feed your analytics applications consumption-ready data with efficiency, operational control, and agility.
StreamSets Data Collector deploys via a Cloudera Manager parcel onto your cluster. It provides a full-featured integrated development environment (IDE) that lets you design, test, deploy, and manage any-to-any ingest pipelines that mesh stream and batch data, and include a variety of in-stream transformations—all without having to write custom code. StreamSets Data Collector lets you build data flows, including numerous Cloudera Enterprise components such as HDFS, Kafka, Solr, Hive, HBASE, and Kudu.
Once StreamSets Data Collector is running on edge or in your Hadoop cluster, you get real-time monitoring for both data anomalies and data flow operations, including threshold-based alerting, anomaly detection, and automatic remediation of error records. Because it is architected to logically isolate each stage in a pipeline, you can meet new business requirements by dropping in new processors and connectors without code and with minimal downtime.
StreamSets Data Collector in the Partner Solutions Gallery
Streamsets Modern Ingest for Network Thread Detection in the Partner Solutions Gallery
- Download the Cloudera Manager parcel for StreamSets Data Collector
- Read the whitepaper: True Performance Management for Multiple Data Flows
- Watch the Real-Time IoT ingest Into Cloudera Using StreamSets video
- Watch Continuous Ingest for IOT Recorded Webinar
- Download the IoT Reference Architecture for Hadoop White Paper
- Read Continous Ingest in the Face of Data Drift on Cloudera Vision blog
- Read How to Build a Real-Time Search System on Cloudera Engineering blog
- White Paper: Accelerating Threat Detection and Investigation with Modern Data Ingest
StreamSets software delivers performance management for data flows that feed the next generation of big data applications. Its mission is to bring operational excellence to the management of data in motion, so that data arrives on time and with quality, accelerating analysis and decision making. StreamSets Data Collector is in use at hundreds of companies where it brings unprecedented visibility into and control over data as it moves between an expanding variety of sources and destinations.
Founded in 2014 by Girish Pancha, former chief product officer of Informatica, and Arvind Prabhakar, an early employee and engineering leader at Cloudera, StreamSets is headquartered in San Francisco and is backed by Accel Partners, Battery Ventures and New Enterprise Associates (NEA). For more information, visit streamsets.com.
Big Data Ingest Infrastructure
- Best-in-class big data ingest infrastructure for use with Cloudera Enterprise
- Accelerates time-to-insights through continuous delivery of consumption-ready data
- Includes connectors for HDFS, Kafka, Solr, Hive, HBASE and Kudu
- Deploys on edge or into Cloudera Enterprise Hub as a Cloudera Manager parcel