In the first blog of the Universal Data Distribution blog series, we discussed the emerging need within enterprise organizations to take control of their data flows. From origin through all points of consumption both on-prem and in the cloud, all data flows need to be controlled in a simple, secure, universal, scalable, and cost-effective way. With the rapid increase of cloud services where data needs to be delivered (data lakes, lakehouses, cloud warehouses, cloud streaming systems, cloud business processes, etc.), controlling distribution while also allowing the freedom and flexibility to deliver the data to different services is more critical than ever.
Cloudera DataFlow for the Public Cloud (CDF-PC), a cloud native universal data distribution service powered by Apache NiFi, was built to solve the data collection and distribution challenges with the four key capabilities: connectivity and application accessibility, indiscriminate data delivery, prioritized streaming data pipelines, and developer accessibility.
In this second installment of the Universal Data Distribution blog series, we will discuss a few different data distribution use cases and deep dive into one of them.
Companies use CDF-PC for diverse data distribution use cases ranging from cybersecurity analytics and SIEM optimization via streaming data collection from hundreds of thousands of edge devices, to self-service analytics workspace provisioning and hydrating data into lakehouses (e.g: Databricks, Dremio), to ingesting data into cloud providers’ data lakes backed by their cloud object storage (AWS, Azure, Google Cloud) and cloud warehouses (Snowflake, Redshift, Google BigQuery).
There are three common classes of data distribution use cases that we often see:
Lets double-click on the IoT and streaming data collection use case category with a specific use case of a global retail company and see how CDF-PC was used to solve the customer’s data distribution needs. The customer is a multinational retail company who wants to collect data from point of sale (POS) systems across the globe and distribute them to multiple cloud services with six key requirements.
The solution was implemented using the latest release of Cloudera DataFlow for the Public Cloud (CDF-PC) and Cloudera Edge Management (CEM):
Check out the following video to see how CDF-PC and CEM were used to solve these six requirements for their data distribution use case:
The below diagram describes how the solution was implemented to address the above requirements.
To learn more about implementing your own IoT use cases, ingesting data into your data lakes and lakehouses, or delivering data to various cloud services, take our interactive product tour or sign up for a free trial.
This may have been caused by one of the following: