Popular data ingestion tool is now ready for production within Cloudera’s platform
PALO ALTO, Calif., – February 18, 2015 – Cloudera, the leader in enterprise analytic data management powered by Apache Hadoop™, today announced that Apache Kafka is fully integrated into Cloudera’s Big Data platforms, CDH and Cloudera Enterprise. Kafka is a highly-scalable, fault-tolerant publish-subscribe messaging system developed out of LinkedIn. It is designed to handle fast data ingestion at scale and has the flexibility to open that data up to diverse tools and use cases, including real-time streaming workloads. An ideal addition to the Hadoop ecosystem, and now supported by Cloudera’s platform, makes this emerging open standard ready for production.
“Cloudera has always been committed to researching and developing promising open source initiatives. Kafka was one of the more exciting projects we wanted to dive into when we started Cloudera Labs last year,” said Charles Zedlewski, vice president, Products at Cloudera. “Now that Kafka is fully integrated into Cloudera’s platform, users can build complete end-to-end workloads, such as real-time streaming, together with components like Apache Spark Streaming and Apache HBase - all within a single system. With this integration, users have greater flexibility and performance for ingesting new and varied streams of data and exploring new use cases, for both faster processing and faster insights.”
Kafka has become increasingly popular among the open source developer community. As a distributed system, it can broker terabytes of data from thousands of users across a single cluster, serving as the critical data backbone for any large organization. Its unique design makes it particularly well equipped to solve a wide range of architectural challenges – including the ingestion of new data that is streaming in from web, mobile, and social applications, as well as Internet of Things (IoT) devices.
According to IDC, the worldwide market for IoT solutions will reach $3.04 trillion by 2020. All the data being generated needs a place to be stored and processed in order to provide meaningful insights. Hadoop is an ideal platform to store and process data coming from the many sources, however, it was not originally built for real-time analytics. Integrating Kafka into Cloudera’s platform opens up Hadoop to production-ready, real-time analytics. Kafka can support high concurrency access with extremely low latencies - a key component for real-time workloads. It can also elastically scale with ease to handle the high volumes of data flowing through these types of systems, even at Hadoop scale. Finally, Kafka’s flexibility makes these high volumes of data available for a variety of use cases.
“There are dozens and dozens of projects within the Hadoop ecosystem, all working in isolation to solve various problems, with varying degrees of maturity,” said Charles Zedlewski, vice president, Products, Cloudera. “Running Hadoop in production requires more than dozens of isolated tools. It requires bringing the best of these tools together to build out complete solutions for all of your users, all within a single system, and must be integrated within an enterprise-grade platform for the security, governance, administration, and support needed to run effectively in production environments.”
“Cloudera is extremely agile at moving open source initiatives into production-ready components. The speed at which we’ve been able to do that with Impala, Spark, and now Kafka not only showcases our deep community involvement in driving powerful open standard technologies, but also our dedication to our customers - allowing them to proceed with confidence in their investments in Hadoop.”
Partners Support Kafka
“Kafka reimagined queuing for high speed applications; VoltDB reimagined the database for high speed applications. The combination of Kafka, VoltDB and Cloudera is a complete – and powerful – fast data-big data stack for real-time decisions, analytics and big data management.”
-- Ryan Betts, chief technology officer, VoltDB
“Apache Kafka is quickly becoming a key component of the Hadoop ecosystem and is increasingly popular with DataTorrent customers. Having Kafka fully integrated in Cloudera’s Big Data platform will continue to accelerate its adoption, enabling organizations to jointly leverage production-ready Hadoop and Kafka.”
-- John Fanelli, vice president, Products and Marketing, DataTorrent
Cloudera is revolutionizing enterprise data management by offering the first unified Platform for big data, an enterprise data hub built on Apache Hadoop. Cloudera offers enterprises one place to store, access, process, secure, and analyze all their data, empowering them to extend the value of existing investments while enabling fundamental new ways to derive value from their data. Cloudera's open source big data platform is the most widely adopted in the world, and Cloudera is the most prolific contributor to the open source Hadoop ecosystem. As the leading educator of Hadoop professionals, Cloudera has trained over 40,000 individuals worldwide. Over 1,700 partners and a seasoned professional services team help deliver greater time to value. Leading organizations in every industry plus top public sector organizations globally run Cloudera in production.
Connect With Cloudera
Follow us on Twitter: http://twitter.com/cloudera
Visit us on Facebook: http://www.facebook.com/cloudera
Join the Cloudera Community: http://cloudera.com/community
Cloudera, Cloudera's Platform for Big Data, Cloudera Enterprise Data Hub Edition, Cloudera Enterprise Flex Edition, Cloudera Enterprise Basic Editionand CDH are trademarks or registered trademarks of Cloudera Inc. in the United States, and in jurisdictions throughout the world. All other company and product names may be trade names or trademarks of their respective owners.