Installing the Spark Indexer

The Spark indexer uses a Spark or MapReduce ETL batch job to move data from HDFS files into Apache Solr. As part of this process, the indexer uses Morphlines to extract and transform data.

To use the Spark indexer, you must install the solr-crunch package on hosts where you want to submit a batch indexing job.

To install solr-crunch On RHEL systems:
$ sudo yum install solr-crunch
To install solr-crunch on Ubuntu and Debian systems:
$ sudo apt-get install solr-crunch
To install solr-crunch on SLES systems:
$ sudo zypper install solr-crunch

For information on using Spark to batch index documents, see the Spark Indexing Reference (CDH 5.2 and higher only).