Cloudera Search Guide
Cloudera Search integrates with CDH and uses Apache Solr to provide scalable and reliable search services. Search makes these services available to end users through tools that use familiar access and querying models.
- Search integrates with the existing CDH ecosystem, so data can be stored, shared, and accessed using the various CDH components. This helps to prevent data silos and minimizes expensive data movement.
- Search provides access to data stored in CDH without requiring the Java skills required for MapReduce jobs or the SQL skills required for Impala queries.
- Search returns results typically within seconds, rather than the minutes or more that are often required for MapReduce jobs to complete.
- Search allows you to select the information you want to index. You can optimize indexes for completeness, size, data types, and so on.
This guide describes Cloudera Search prerequisites, demonstrates how to load and index data, and shows how to query data. In addition, this guide includes a tutorial, various reference guides, and troubleshooting information.
This guide covers the following topics:
- Deployment Planning for Cloudera Search
- Deploying Cloudera Search
- Managing Solr Using solrctl
- Extracting, Transforming, and Loading Data With Cloudera Morphlines
- Spark Indexing Reference (CDH 5.2 and higher only)
- MapReduce Batch Indexing Reference
- Flume Near Real-Time Indexing Reference
- Using the Lily HBase Batch Indexer for Indexing
- Configuring the Lily HBase NRT Indexer Service for Use with Cloudera Search
- Cloudera Search Tutorial
- Backing Up and Restoring Cloudera Search
- Schemaless Mode Overview and Best Practices
- Using Search through a Proxy for High Availability
- Migrating Solr Replicas
- Using Custom JAR Files with Search
- Troubleshooting Cloudera Search