What is Stream Processing?
Cloudera Stream Processing (CSP) enables customers to turn streams into data products by providing capabilities to analyze streaming data for complex patterns and gain actionable intel.
CSP is powered by Apache Flink and Kafka and provides a complete, enterprise-grade stream management and stateful processing solution. The combination of Kafka as the storage streaming substrate, Flink as the core in-stream processing engine, and first-class support for industry standard interfaces like SQL and REST allows developers, data analysts, and data scientist to easily build hybrid streaming data pipelines that power real-time data products, dashboards, business intelligence apps, microservices, and data science notebooks.
Use cases like fraud detection, network threat analysis, manufacturing intelligence, commerce optimization, real-time offers, instantaneous loan approvals, and more are now possible by moving the data processing components up the stream to address these real-time needs.
HYBRID STREAMING DATA PIPELINES POWERED
BY CLOUDERA STREAM PROCESSING
Cloudera Stream Processing (CSP) use cases
Prevent millions of dollars in loss due to financial fraud by detecting it proactively.
Enterprises across retail, financial services, and other sectors struggle to protect customer data and prevent financial fraud from happening. Cloudera Streaming Processing's capabilities can process real-time streams of customer transactions, identify patterns, create predictive alerts, and uncover actionable intelligence to prevent potential fraud.
Real-time customer analytics improves engagement, retention, and satisfaction.
Every organization needs real-time analytics to improve customer engagement but struggles to implement it due to an excessive volume of data. Cloudera Stream Processing enables customer analytics by processing massive amounts of data with subsecond latencies while detecting customer interactions and recommending better offerings in real time.
Handle millions of trades a second and scale to petabytes of financial information.
Financial stock exchanges face challenges with customer demands for real-time reporting and faster SLA requirements. Yet, petabytes of data must be processed to deliver these services. Cloudera Streams Messaging can easily stream high volumes of data so stock exchanges can quickly create market-driven real-time analytics and meet the increasingly demanding SLAs.
Modernize your logging infrastructure to get real-time analytics.
Log data is increasingly valuable to enterprises. But IT organizations are struggling with effective log collection processes, distributing relevant information upstream, and generating key metrics. Cloudera Stream Processing's capabilities help scale up log processing, deliver real-time insights across the firm, and significantly reduce operating costs.
Cloudera Stream Processing (CSP) capabilities
Streaming Analytics powered by Apache Flink
Streams Messaging powered by Apache Kafka
Powered by Apache Flink with SQL Stream Builder, Cloudera Streaming Analytics provides:
- Low-latency stream processing capabilities
- Simplifies development by enabling users to write streaming applications with industry standard SQL and APIS via REST endpoints
- Advanced windowing techniques to build sophisticated event-driven analytics
- Support for multi-cloud and hybrid cloud models
The SQL Stream Builder (SSB) is a comprehensive interactive user interface for creating stateful stream processing jobs using SQL which gets converted into optimized Flink jobs. By using SQL, you can simply and easily declare expressions that filter, aggregate, route, and otherwise mutate streams of data. SSB is a job management interface that you can use to compose and run SQL on streams as well as to create durable data APIs for the results.
Ensure that data is processed exactly once at all times even during errors and retries. For example, a financial services company needs to use stream processing to coordinate hundreds of back-office transactions systems when consumers pay their home mortgage.
Detect and deal with streaming events that come out of order. For example, real-time fraudulent services need to ensure data is processed in the right order even if data arrives late.
Achieve in-memory, one-at-a time stream processing performance. For example, process requests of 30 million active users making credit card payments, transfers, and balance lookups with millisecond latency.
Trigger events when dealing with hundreds of streaming sources and millions of events per second per stream. For example, when a patient checks into the ER, the system reaches out to external systems to pull patient-specific data from hundreds of sources so it’s available in an EMR by the time the patient arrives in the exam room.
Streaming data has little value unless it can easily integrate, join, and mesh those streams with other at-rest data sources including warehouses, relational databases, and data lakes. Configure data providers using out-of-the-box connectors or your own connector to any data source. Once the data providers are created, the user can easily create virtual tables using DDL. Complex integration between multiple streams and batch data sources becomes easier with well-known SQL constructs such as joins and aggregations.
Powered by Apache Kafka, Cloudera Streams Messaging provides:
- Streams Messaging Manager to monitor/operate clusters
- Streams Replication Manager for HA/DR deployments
- Schema Registry for centralized schema management
- Kafka Connect for simple data movement and change data capture and Cruise Control for intelligent rebalancing and self healing
- Support for multi-cloud and hybrid cloud models
Supports millions of messages per second with low latency and high throughput, scaling elastically and transparently without downtime. Addresses a wide range of streaming data initiatives, enabling enterprises to keep up with customer demand, provide better services, and proactively manage risk.
Streams Messaging Manager provides a single pane of glass view with end-to-end visibility into how data moves across Kafka clusters—among producers, brokers, topics, and consumers—allowing you to track data lineage and governance from edge to cloud. It also simplifies troubleshooting of Kafka environments with intelligent filtering and sorting.
Streams Replication Manager, based on Mirrormaker 2, offers fault-tolerant, scalable, and robust cross-cluster Kafka topic replication, as well as replication monitoring and metrics at the cluster and topic levels. Delivers high availability, disaster recovery, cloud migrations, geo-proximity, and many others.
Schema Registry lets you manage, share, and support the evolution of all producer and customer schemas in a shared schema repository that allows applications to flexibly interact with each other across the Kafka landscape. Safely mitigate interruptions that occur due to schema mismatches.
Cruise Control lets you manage and load-balance large Kafka installations, as well as automatically detect and remediate anomalies. Address hard problems such as frequent hardware/virtual machine failures, cluster expansion/reduction, and load skew among brokers.
Cloudera SDX offers centralized security, control policies, governance, and data lineage across all components. They are set once and automatically enforced and are vendor-agnostic, allowing you to confidently embrace multi-cloud and hybrid cloud strategies. Supports the four main pillars of security: Identity, access, data protection, and visibility.
Stream Processing in the cloud
Eliminate complexity of cloud configuration and infrastructure setups with fully secure, governed, elastic clusters, spun up in less than 10 minutes on AWS, Azure, and GCP.
Streaming Analytics for Data Hub
Streaming Analytics for Data Hub spins up Apache Flink and SQL Stream Builder in public cloud, bringing stream processing of real-time data via SQL or application code into hybrid cloud environments.
Streams Messaging for Data Hub
Streams Messaging for Data Hub extends your on-premises Apache Kafka investment by spinning up Kafka clusters in the public cloud with a comprehensive set of enterprise management capabilities addressing schema governance, monitoring, disaster recovery, intelligent rebalancing, and robust access control and audit.