Your browser is out of date!

Update your browser to view this website correctly. Update my browser now


Latest Offering from Leading Big Data Vendor Extends the Capabilities of Hadoop, Offering Easy and Familiar Access to Data for Increased Visibility and Quicker Time to Insight

PALO ALTO and SAN FRANCISCO, CA – June 4, 2013 – From The Economist Information Forum in San Francisco, Cloudera, the leader in enterprise analytic data management powered by Apache Hadoop™, today announced the public beta of Cloudera Search, the industry’s first fully integrated search engine for interactive exploration of data stored in the Hadoop Distributed File System (HDFS) and Apache HBase™. The latest in a series of innovations from Cloudera designed to simplify and increase Hadoop’s usability by more departments of an organization and powered by the leading open source search engine, Apache Solr™, Cloudera Search enables anyone within an organization to perform interactive, natural language keyword searches and faceted navigation on data stored in Hadoop, without additional training or advanced programming knowledge. 
Cloudera Search was developed to address a rapidly emerging need, as enterprises’ Hadoop deployments mature and advance to become the primary repositories for more and more kinds of data: how to better and more quickly combine and refine data into a single, integrated platform. At its core, Cloudera Search incorporates Apache Solr and other search-related open source projects to support a comprehensive big data infrastructure, and to alleviate the significant costs of maintaining the disparate systems that many enterprises currently depend on to execute search queries.

The arrival of Cloudera Search provides the enterprise with breakthrough simplicity and exploration capabilities, so users can drill down deeper into data using full-text and faceted search to solve critical business problems in real-time. Cloudera’s search solution combines the established, feature-rich, open source search platform of Solr and its extensible APIs for easy integration with production legacy systems, offering valuable integration with CDH that address many of the common pain points of standalone search solutions for Hadoop. Through the new, robust failover features available in SolrCloud (Solr4), Cloudera Search delivers the same feature set of the search platform with more scalable indexing and query serving than was ever previously possible.

Like Cloudera Impala, the industry’s first open source, interactive SQL query engine for Hadoop, Cloudera Search extends the reach and capability ofCloudera Enterprise, the definitive Platform for Big Data. Cloudera is now making it possible for enterprises to “unaccept the status quo” imposed by closed source solutions vendors and benefit from the superior economics and unparalleled opportunity of Hadoop as a central, enterprise data platform that addresses the challenges and opportunities presented by big data.

Beyond SQL: Now Everyone Can Benefit from Hadoop

As enterprises increasingly look for ways to derive greater value from all their data, a pervasive challenge has emerged: how to make all data available and consumable beyond IT departments, so it can be more widely leveraged across an entire organization. Cloudera ‘s search solution expands the data exploration capabilities of Hadoop with faceted navigation and full-text search to more quickly find data for processing and analysis. Cloudera Search puts the power of data discovery into the hands of non-technical teams, enabling line of business and everyday users to interact with and uncover relevant correlations from data in a familiar, easy to use search interface. Companies can provide secure access to a centralized data repository and make it accessible to anyone who wants to derive valuable insight and consolidate search and Hadoop cluster investments into one, complete solution with unified management and control through Cloudera Manager.

“Data is one of the most valuable assets we have when it comes to preventative mental and physical healthcare,” said Chris Poulin, managing partner of Patterns and Predictions. “With next generation predictive analytics tools powered by Hadoop, healthcare providers can now address healthcare issues proactively and hope to solve even the most intractable challenges, like suicide prevention for military veterans. With the power to correlate medical reports, patient records, care provider notes, and social media data along with other relevant data sources, we can cultivate a deeper, more holistic understanding of patients and disease to support better treatment plans and optimize patient care. By giving non-technical individuals the power to perform real-time search and queries on data stored in Hadoop, Cloudera is providing critical tools to advance healthcare innovation and discovery.”

Beyond Batch: Real-Time Interaction with Data in Hadoop

Cloudera Search provides enterprises scalable indexing options for big data and extends the Apache Solr project to offer near real-time document processing and indexing of data in transit to Hadoop and other storage endpoints. Data is immediately available to Search and other Hadoop computing frameworks, like Apache Hive™ and Cloudera Impala. Cloudera Search also provides linearly scalable batch indexing for large data stores within Hadoop on-demand, and with the introduction of an innovative GoLive feature can now incorporate incremental index changes, while avoiding costly downtime.

“We have been leveraging Cloudera Search for OpenStack log exploration with great success. It delivers an open source solution for near real-time operational insights stored in Hadoop, and supports faster analytics and time to insight through applications like Cloudera Impala and other workloads,” said Joseph George, director of product strategy in Dell's Revolutionary Solutions Team. “With Cloudera Search, Hadoop has become the master data hub, where search indexes can be easily built on demand, executed, stored and easily managed.”

“It's exciting to see Lucene, a project I started 15 years ago, be included in CDH,” said Doug Cutting, Chief Architect, Cloudera. “Search is an incredibly powerful tool – now it's scalable and integrated with the Hadoop platform.”

Cloudera Search Feature Highlights

Cloudera Search is specifically designed to support business users with their quest to locate relevant data quickly and efficiently in Hadoop, for further processing and analysis. Cloudera Search is fully integrated with the CDH platform. Key features include:

  • Scalable, Reliable Index Storage in HDFS: integrates index storage and serving directly into HDFS
  • Batch Indexing via MapReduce: allows for index creation of data stored in HDFS and HBase as scalable and robust as MapReduce
  • Real-time Indexing at Collection: makes an event searchable as it is stored into Hadoop through near real-time indexing features powered by Apache Flume™.
  • Easy Interaction and Data Exploration via Cloudera Hue: provides a plug-in application for Hue and easy-to-install capabilities for standard Hue servers to query data and view result files, and enables faceted exploration.
  • Simplified Field Extraction and Cross-Platform Data Processing:allows for quick and easy field extraction of any data that is stored into HDFS using optimized Hadoop file formats, such as Apache Avro ™, avoiding the pain that many standalone search solutions might impose, and promotes reusable configurations and processing activities with the new processing framework, Cloudera Morphlines
  • Unified Management and Monitoring with Cloudera Manager: provides a centralized management and monitoring experience that makes it as easy to deploy, configure, and monitor search services as it is to manage CDH deployments and other services on the Hadoop cluster.


“We’re bringing the band back together with Cloudera Search,” said Mike Olson, chief executive officer, Cloudera. “Based on 100% open source Apache Solr, a Lucene project and another Doug Cutting original, Cloudera Search is now fully integrated into our industry leading CDH big data platform. After a successful private beta, it’s the latest in a series of major innovations that we’ve brought to market designed to speed up and simplify an organization’s ability to get the most out of their data. We are further democratizing access to mission-critical information stored in Hadoop by ensuring those without programming expertise can gain insight, find patterns and derive true value from their information assets. Year after year we continue to push the boundaries of what is possible with Hadoop; we have the best minds in data management focused on advancing business transformation.”

Product Availability

The first in the market to ship code, Cloudera Search is immediately available as a supplemental module for Cloudera Enterprise subscribers.

Additional Information

  • Visit the Cloudera Search page for partners support
  • View the Dell customer video
  • Cloudera is launching the first training course for data analysts to perform real-time analytics and use business intelligence tools directly on petabyte-scale data in Hadoop. Cloudera Data Analyst Training enables users to take advantage of Hadoop's massive scalability and flexibility benefits via SQL and familiar scripting languages. Participants will learn how to use tools like Apache Pig, Apache Hive and Cloudera Impala to achieve breakthrough insights more quickly and for less money, without the pain of migrating data or jumping between silos. Registration is available now viaCloudera University for public and private Data Analyst Training beginning in July.

About Cloudera

Cloudera is revolutionizing enterprise data management by offering the first unified Platform for big data, an enterprise data hub built on Apache Hadoop. Cloudera offers enterprises one place to store, access, process, secure, and analyze all their data, empowering them to extend the value of existing investments while enabling fundamental new ways to derive value from their data. Cloudera's open source big data platform is the most widely adopted in the world, and Cloudera is the most prolific contributor to the open source Hadoop ecosystem. As the leading educator of Hadoop professionals, Cloudera has trained over 40,000 individuals worldwide. Over 1,700 partners and a seasoned professional services team help deliver greater time to value. Leading organizations in every industry plus top public sector organizations globally run Cloudera in production.

Connect With Cloudera

About Cloudera:

Read our blogs: and

Follow us on Twitter:

Visit us on Facebook:

Join the Cloudera Community:

Cloudera, Cloudera's Platform for Big Data, Cloudera Enterprise Data Hub Edition, Cloudera Enterprise Flex Edition, Cloudera Enterprise Basic Editionand CDH are trademarks or registered trademarks of Cloudera Inc. in the United States, and in jurisdictions throughout the world. All other company and product names may be trade names or trademarks of their respective owners.

Press Inquiries

Deborah Wiltshire

Keep in touch: