Cloudera acquires Octopai's platform to enhance metadata management capabilities

Read the press release
Overview

What is Streaming?

Cloudera Streaming (formerly known as Cloudera Stream Processing) enables customers to turn streams into data products by providing capabilities to analyze streaming data for complex patterns and gain actionable intel.

Streaming is powered by Apache Flink and Kafka and provides a complete, enterprise-grade stream management and stateful processing solution. The combination of Kafka as the storage streaming substrate, Flink as the core in-stream processing engine, and first-class support for industry standard interfaces like SQL and REST allows developers, data analysts, and data scientist to easily build hybrid streaming data pipelines that power real-time data products, dashboards, business intelligence apps, microservices, and data science notebooks. 

Use cases like fraud detection, network threat analysis, manufacturing intelligence, commerce optimization, real-time offers, instantaneous loan approvals, and more are now possible by moving the data processing components up the stream to address these real-time needs.

HYBRID STREAMING DATA PIPELINES POWERED
BY CLOUDERA STREAMING

A diagram of hybrid streaming pipelines | Cloudera

Cloudera Streaming use cases

  • Fraud Detection
  • Customer Analytics
  • Market Monitoring
  • Log Analytics

Fraud detection


Prevent millions of dollars in loss due to financial fraud by detecting it proactively. 

Enterprises across retail, financial services, and other sectors struggle to protect customer data and prevent financial fraud from happening. Cloudera  Streaming's capabilities can process real-time streams of customer transactions, identify patterns, create predictive alerts, and uncover actionable intelligence to prevent potential fraud.

PT Bank Rakyat Indonesia: Using big data, AI, and ML to better understand customers

Achieved a 40 percent reduction in fraud.

Read the case study

Customer analytics


Real-time customer analytics improves engagement, retention, and satisfaction.

Every organization needs real-time analytics to improve customer engagement but struggles to implement it due to an excessive volume of data. Cloudera Streaming enables customer analytics by processing massive amounts of data with subsecond latencies while detecting customer interactions and recommending better offerings in real time.

Major airline: Enhancing customer experience with data-driven automation

Achieved a 50 percent data volume decrease by only paying for data streams to transmit once.

Read the case study

Market monitoring


Handle millions of trades a second and scale to petabytes of financial information.

Financial stock exchanges face challenges with customer demands for real-time reporting and faster SLA requirements. Yet, petabytes of data must be processed to deliver these services. Cloudera Streams Messaging can easily stream high volumes of data so stock exchanges can quickly create market-driven real-time analytics and meet the increasingly demanding SLAs.

Bombay Stock Exchange: World’s fastest stock exchange leverages real-time analytics to improve data governance and industry SLA’s

95% reduction in operational costs.

Read the case study

Log analytics


Modernize your logging infrastructure to get real-time analytics.

Log data is increasingly valuable to enterprises. But IT organizations are struggling with effective log collection processes, distributing relevant information upstream, and generating key metrics. Cloudera Streaming's capabilities help scale up log processing, deliver real-time insights across the firm, and significantly reduce operating costs.

Globe Telecom: Enabling the digital lifestyle of mobile customers with a modern analytic environment

600PB of mobile data volume managed.

Read the case study

Get the details on the Data-in-Motion Kubernetes release

Cloudera Streaming capabilities

  • Streaming Analytics powered by Apache Flink
  • Streams Messaging powered by Apache Kafka

Streaming Analytics

Powered by Apache Flink with SQL Stream Builder, Cloudera Streaming Analytics provides:

  • Low-latency stream processing capabilities 
  • Simplifies development by enabling users to write streaming applications with industry standard SQL and APIS via REST endpoints
  • Advanced windowing techniques to build sophisticated event-driven analytics
  • Support for multi-cloud and hybrid cloud models

Key features

Cloudera SQL Stream Builder is a comprehensive interactive UI for creating stateful stream processing jobs using SQL which gets converted into optimized Flink jobs. By using SQL, you can simply and easily declare expressions that filter, aggregate, route, and otherwise mutate streams of data. SQL Stream Builder is a job management interface that you can use to compose and run SQL on streams as well as to create durable data APIs for the results.

Ensure that data is processed exactly once at all times even during errors and retries. For example, a financial services company needs to use stream processing to coordinate hundreds of back-office transactions systems when consumers pay their home mortgage.

Detect and deal with streaming events that come out of order. For example, real-time fraudulent services need to ensure data is processed in the right order even if data arrives late.

Achieve in-memory, one-at-a time stream processing performance. For example, process requests of 30 million active users making credit card payments, transfers, and balance lookups with millisecond latency.

Trigger events when dealing with hundreds of streaming sources and millions of events per second per stream. For example, when a patient checks into the ER, the system reaches out to external systems to pull patient-specific data from hundreds of sources so it’s available in an EMR by the time the patient arrives in the exam room.

Streaming data has little value unless it can easily integrate, join, and mesh those streams with other at-rest data sources including warehouses, relational databases, and data lakes. Configure data providers using out-of-the-box connectors or your own connector to any data source. Once the data providers are created, the user can easily create virtual tables using DDL. Complex integration between multiple streams and batch data sources becomes easier with well-known SQL constructs such as joins and aggregations.

Streams Messaging

Powered by Apache Kafka, Cloudera Streams Messaging provides:

  • Streams Messaging Manager to monitor/operate clusters
  • Streams Replication Manager for HA/DR deployments
  • Schema Registry for centralized schema management
  • Kafka Connect for simple data movement and change data capture and Cruise Control for intelligent rebalancing and self healing
  • Support for multi-cloud and hybrid cloud models

Key features

Supports millions of messages per second with low latency and high throughput, scaling elastically and transparently without downtime. Addresses a wide range of streaming data initiatives, enabling enterprises to keep up with customer demand, provide better services, and proactively manage risk.

Streams Messaging Manager provides a single pane of glass view with end-to-end visibility into how data moves across Kafka clusters—among producers, brokers, topics, and consumers—allowing you to track data lineage and governance from edge to cloud. It also simplifies troubleshooting of Kafka environments with intelligent filtering and sorting.

Streams Replication Manager, based on Mirrormaker 2, offers fault-tolerant, scalable, and robust cross-cluster Kafka topic replication, as well as replication monitoring and metrics at the cluster and topic levels. Delivers high availability, disaster recovery, cloud migrations, geo-proximity, and many others.

Schema Registry lets you manage, share, and support the evolution of all producer and customer schemas in a shared schema repository that allows applications to flexibly interact with each other across the Kafka landscape. Safely mitigate interruptions that occur due to schema mismatches.

Cruise Control lets you manage and load-balance large Kafka installations, as well as automatically detect and remediate anomalies. Address hard problems such as frequent hardware/virtual machine failures, cluster expansion/reduction, and load skew among brokers.

Cloudera SDX offers centralized security, control policies, governance, and data lineage across all components. They are set once and automatically enforced and are vendor-agnostic, allowing you to confidently embrace multi-cloud and hybrid cloud strategies. Supports the four main pillars of security: Identity, access, data protection, and visibility.

Any data, anywhere, with flexible deployment options.


Cloudera Streaming in the cloud

Cloudera features a complete set of integrated stream processing capabilities that can be deployed in the public cloud to scale efficiently.

Cloudera Streaming is built on Apache Kafka and Apache Flink engines with enterprise-grade tooling to simplify deployment and management.

Streams Messaging Manager extends Apache Kafka with a set of capabilities to address schema governance and monitoring, disaster recovery, intelligent rebalancing, and robust access control and audit.

SQL Stream Builder extends Apache Flink with a powerful SQL Console that lets SQL analysts query streaming data as well as collaborate and version control processing logic for downstream applications.


Cloudera Streaming on premises

Cloudera can be deployed on premises with streaming data to control costs and minimize latency for real-time pipelines and applications. Cloudera Streaming integrates Apache Kafka and Apache Flink with enterprise tooling needed to manage these deployments. 


Cloudera Streaming - Kubernetes Operators

Cloudera Streaming capabilities are also available as Kubernetes Operators that can be deployed independently via existing Kubernetes clusters, making it even easier to deploy and scale Kafka to the enterprise. The Kubernetes operator ships with Kafka, Cruise Control, and Zookeeper, enabling streaming use cases on Kubernetes with a robust message broker service, and Flink and SQL Stream Builder, providing a modern distributed stream processing engine to build real-time streaming application that run natively on containers.

Take a Cloudera Streaming product tour

Cloudera Streaming Community Edition


Cloudera Streaming Community Edition makes developing stream processors easy and can be done right from your desktop or any other development node.


Analysts, data scientists, and developers can now evaluate new features, develop SQL-based stream processors locally, and develop Kafka Consumers/Producers and Kafka Connect Connectors, all locally before moving to production.


Get up and running in 5 minutes with the Streaming Community Edition.

GigaOm Radar for Streaming Data Platforms

Cloudera named a 2024 market leader for streaming data platforms.
 

Download the report

GigaOm Radar for Streaming Data Platforms | Cloudera
Webinar

Accelerate Streaming Pipeline Deployments with New Kubernetes Operators

Datasheet

Cloudera Streaming Datasheet

Whitepaper

Cloudera delivers the best Kafka ecosystem today

Whitepaper

Manage, monitor, and replicate Apache Kafka with Cloudera

Ready to get started?

Your form submission has failed.

This may have been caused by one of the following:

  • Your request timed out
  • A plugin/browser extension blocked the submission. If you have an ad blocking plugin please disable it and close this message to reload the page.