Cloudera DataFlow (CDF), formerly Hortonworks DataFlow (HDF), is a scalable, real-time streaming analytics platform that ingests, curates, and analyzes data for key insights and immediate actionable intelligence.
DataFlow addresses the key challenges enterprises face with data-in-motion:
- Processing real-time data streaming at high volume and high scale
- Tracking data provenance and lineage of streaming data
- Managing and monitoring edge applications and streaming sources
Get real-time insights faster than ever
Real-time insights and actionable intelligence mean you can act sooner. Using the powerful streaming platform Apache Kafka, CDF can process several million transactions per second, identify key patterns, compare against machine learning models, and offer predictive or prescriptive analytics to help business leadership make key decisions and seize opportunities.
CDF is the only product in the industry offering data provenance and edge-to-enterprise data governance out of the box. In the age of GDPR and other regulatory compliance, it’s important to track data lineage, even for streaming data. NiFi within CDF offers data provenance tracking without any extra configuration or setup. With tight integration of Apache Atlas, you have a complete governance of data from the edge to the enterprise.
Build a data architecture that adapts to IoT-scale
Capitalize on the wealth of IoT data insights
CDF is 100 percent open source technology—so you can design a future-proof architecture without any vendor lock-in. Implement IoT solutions for mission-critical use cases in industries such as automotive, manufacturing, transportation, utilities, retail, and public sector. You can adopt a data strategy to handle highly diversified and large data volumes at high velocities.