Your browser is out of date

Update your browser to view this website correctly. Update my browser now


Hortonworks DataFlow (HDF) is now Cloudera DataFlow (CDF).

Cloudera DataFlow (CDF)

The answer to all your real-time streaming data problems.

Manage your data from edge to enterprise with a no-code approach to developing sophisticated streaming applications easily.

The biggest challenge in getting streaming data insights is acquiring the data—quickly, securely, and prioritized for analysis with clear traceability.

Cloudera DataFlow (CDF), formerly Hortonworks DataFlow (HDF), is a scalable, real-time streaming analytics platform that ingests, curates, and analyzes data for key insights and immediate actionable intelligence.

DataFlow addresses the key challenges enterprises face with data-in-motion:

  • Processing real-time data streaming at high volume and high scale
  • Tracking data provenance and lineage of streaming data
  • Managing and monitoring edge applications and streaming sources
CDF architecture diagram

Key benefits

Reduce data integration development time

Imagine a no-code approach to building complex data pipelines with minimal effort. CDF offers a simple visual UI for building sophisticated data flows to accomplish major data ingestions, transformations, and enrichment from a variety of streaming sources. Powered by Apache NiFi, CDF ingests data from devices, enterprise applications, partner systems, and edge applications generating real-time streaming data.

Manage & secure your data from edge to enterprise

CDF enables high volume data collection at the edge, even from edge devices using MiNiFi. Now you can set up widely distributed IoT deployment models for regional data collection with ease using NiFi with MiNiFi to stream data from the edge. Tight integration with Apache Ranger gives CDF the unique advantage of seamless security across all your data-in-motion and data-at-rest.

Get real-time insights faster than ever

Real-time insights and actionable intelligence mean you can act sooner. Using the powerful streaming platform Apache Kafka, CDF can process several million transactions per second, identify key patterns, compare against machine learning models, and offer predictive or prescriptive analytics to help business leadership make key decisions and seize opportunities.

Out-of-the-box compliance

CDF is the only product in the industry offering data provenance and edge-to-enterprise data governance out of the box. In the age of GDPR and other regulatory compliance, it’s important to track data lineage, even for streaming data. NiFi within CDF offers data provenance tracking without any extra configuration or setup. With tight integration of Apache Atlas, you have a complete governance of data from the edge to the enterprise.

Build a data architecture that adapts to IoT-scale

Capitalize on the wealth of IoT data insights

CDF is 100 percent open source technology—so you can design a future-proof architecture without any vendor lock-in. Implement IoT solutions for mission-critical use cases in industries such as automotive, manufacturing, transportation, utilities, retail, and public sector. You can adopt a data strategy to handle highly diversified and large data volumes at high velocities.

"We use HDF as the foundation for our data ingestion specifically Apache NiFi, Kafka, and Spark to process messages that originated from our health systems and hospital partners.”

—Jimmy Hurff, CTO, Clearsense


Introducing Cloudera DataFlow


Apache NiFi for Dummies

Your form submission has failed.

This may have been caused by one of the following:

  • Your request timed out
  • A plugin/browser extension blocked the submission. If you have an ad blocking plugin please disable it and close this message to reload the page.