Why Cloudera + StreamSets?
You can’t analyze what you can’t ingest. To realize the full value of Cloudera you need to continuously land consumption-ready data into your data platforms for utilization by a growing number of applications. The StreamSets DataOps Platform allows you to build, execute and operate data flow pipelines for streaming and batch movement of data into Cloudera Enterprise Data Hub.
- Data engineers and data scientists can easily design and test complex pipelines. Unique data previews provide a view into what’s happening at every stage of a pipeline, simplifying the development process and reducing errors.
- Schema-less design means you only focus on the data attributes you care about and StreamSets intelligent pipelines detect change in your data and automatically propagate changes downstream, minimizing the impact of the constant change that occurs due to normal business operations.
- StreamSets also adds monitoring for performance, data quality and sensitive data, so you can deploy operations-ready pipelines at scale.
- StreamSets deploys both on edge or into your Cloudera cluster via Cloudera Manager.
Joint Solution Overview
Accelerate Data Hub Time to Value with StreamSets + Cloudera Enterprise
A key step in modernizing your data processing architecture is to upgrade how you move data from logs, IoT sensors, and other sources to your enterprise data hub. An integrated solution combining StreamSets with Cloudera Enterprise makes it possible to continually feed your analytics applications consumption-ready data with efficiency, operational control, and agility.
StreamSets deploys via a Cloudera Manager parcel onto your cluster. It provides a full-featured, integrated development environment (IDE) that lets you build, execute and operate any-to-any ingest pipelines that mesh stream and batch data, and include a variety of in-stream transformations—all without having to write custom code. StreamSets lets you build data flows with direct integration to numerous Cloudera Enterprise components including HDFS, Kafka, Solr, Hive, HBASE, Impala, CDSW, Kudu, and Cloudera Navigator.
Once StreamSets is running, you get real-time monitoring for both data anomalies and data flow operations, including threshold-based alerting, anomaly detection, and automatic remediation of error records. Because it is architected to logically isolate each stage in a pipeline, you can meet new business requirements by dropping in new processors and connectors without code and with minimal downtime.
StreamSets Data Collector in the Partner Solutions Gallery
- Download the Cloudera Manager parcel for StreamSets Data Collector
- Read the Solution Brief: Simplify Ingest and Accelerate Time to Value
- Watch the video: Building Dataflow Pipelines to Cloudera with StreamSets Data Collector
- Watch the Real-Time IoT ingest Into Cloudera Using StreamSets video
- Watch Continuous Ingest for IOT Recorded Webinar
- Download the IoT Reference Architecture for Hadoop White Paper
- Read Continuous Ingest in the Face of Data Drift on Cloudera Vision blog
- Read How to Build a Real-Time Search System on Cloudera Engineering blog
- Read the white paper: Accelerating Threat Detection and Investigation with Modern Data Ingest
The StreamSets DataOps platform that enables companies to build, execute, operate and protect continuous dataflows that unleash pervasive analytics. It combines award-winning open source software featuring Intelligent Pipelines with a cloud-native control plane that helps enterprises manage their data movement as a continuous ingestion practice.
Founded by Girish Pancha, former chief product officer of Informatica, and Arvind Prabhakar, a former engineering leader at Cloudera, StreamSets is backed by top-tier Silicon Valley venture capital firms, including Battery Ventures, New Enterprise Associates (NEA), and Accel Partners. For more information, visit www.streamsets.com.