Since its initial release in 2021, Cloudera DataFlow for Public Cloud (CDF-PC) has been helping customers solve their data distribution use cases that need high throughput and low latency requiring always-running clusters. CDF-PC’s DataFlow Deployments provides a cloud-native runtime to run your Apache NiFi flows through auto scaling Kubernetes clusters as well as centralized monitoring and alerting and improved SDLC for developers. The DataFlow Deployments model is an ideal fit for use cases with streaming data sources where those streams need to be delivered to destinations with low latency like collecting and distributing streaming POS data in real time.
However, customers also have a class of use cases that do not require always running NiFi flows. These use cases range from event-driven object store processing, microservices that power serverless web applications, IoT data processing, asynchronous API gateway request processing, batch file processing, and job automation with cron/timer scheduling. For these use cases, the NiFi flows need to be treated like jobs with a distinct start and end. The start is based on a trigger event like a file landing in object store, the start of a cron event, or a gateway endpoint being invoked. Once the job completes, the associated compute resources need to shut down. This video is demoing this new feature with a use case where files landing in AWS S3 need to be processed, filtered, enriched and converted. This is done using DataFlow Functions and AWS Lambda.