Why Cloudera + StreamSets?
You can’t analyze what you can’t ingest. To realize the full value of Hadoop you need to continuously land consumption-ready data into your data management systems for utilization by a growing number of applications. The open source StreamSets Data Collector allows you to design, test, deploy, operate, and maintain pipelines that flow streaming and batch data into Cloudera Enterprise. You can design and cleanse complex data flows of streaming and batch data, all without writing code, and then monitor the performance of data flow and quality via KPIs with threshold-based alerts. StreamSets Data Collector deploys both on edge or into your Cloudera cluster via Cloudera Manager.
Unlocking the potential of Big Data requires getting consumption-ready data into the enterprise data hub while dealing with constantly-changing sources, consuming applications and business requirements. Our partnership and technical integrations with Cloudera make it easy for our customers to build and operate continuous data flows into Cloudera Enterprise Hub that improve both the speed and quality of downstream analysis.
–Girish Pancha, CEO, StreamSets
Joint Solution Overview
Performance Management for Data Flows with StreamSets Data Collector + Cloudera Enterprise
A key step in modernizing your data processing architecture is to upgrade how you move big data from logs, IoT sensors, and other sources through to your enterprise data hub. An integrated solution combining StreamSets Data Collector with Cloudera Enterprise makes it possible to continually feed your analytics applications consumption-ready data with efficiency, operational control, and agility.
StreamSets Data Collector deploys via a Cloudera Manager parcel onto your cluster. It provides a full-featured integrated development environment (IDE) that lets you design, test, deploy, and manage any-to-any ingest pipelines that mesh stream and batch data, and include a variety of in-stream transformations—all without having to write custom code. StreamSets Data Collector lets you build data flows, including numerous Cloudera Enterprise components such as HDFS, Kafka, Solr, Hive, HBASE, and Kudu.
Once StreamSets Data Collector is running on edge or in your Hadoop cluster, you get real-time monitoring for both data anomalies and data flow operations, including threshold-based alerting, anomaly detection, and automatic remediation of error records. Because it is architected to logically isolate each stage in a pipeline, you can meet new business requirements by dropping in new processors and connectors without code and with minimal downtime.
- Download the Cloudera Manager parcel for StreamSets Data Collector
- Read the whitepaper: True Performance Management for Multiple Data Flows
- Watch the Real-Time IoT ingest Into Cloudera Using StreamSets video
- Watch Continuous Ingest for IOT Recorded Webinar
- Download the IoT Reference Architecture for Hadoop White Paper
- Read Continous Ingest in the Face of Data Drift on Cloudera Vision blog
- Read How to Build a Real-Time Search System on Cloudera Engineering blog
StreamSets software delivers performance management for data flows that feed the next generation of big data applications. Its mission is to bring operational excellence to the management of data in motion, so that data arrives on time and with quality, accelerating analysis and decision making. StreamSets Data Collector is in use at hundreds of companies where it brings unprecedented visibility into and control over data as it moves between an expanding variety of sources and destinations.
Founded in 2014 by Girish Pancha, former chief product officer of Informatica, and Arvind Prabhakar, an early employee and engineering leader at Cloudera, StreamSets is headquartered in San Francisco and is backed by Accel Partners, Battery Ventures and New Enterprise Associates (NEA). For more information, visit streamsets.com.
Big Data Ingest Infrastructure
- Best-in-class big data ingest infrastructure for use with Cloudera Enterprise
- Accelerates time-to-insights through continuous delivery of consumption-ready data
- Includes connectors for HDFS, Kafka, Solr, Hive, HBASE and Kudu
- Deploys on edge or into Cloudera Enterprise Hub as a Cloudera Manager parcel