This is the documentation for Cloudera Search CDH 5 Beta 2 and 1.2.0 for CDH 4.
Documentation for other versions is available at Cloudera Documentation.

Using the Lily HBase NRT Indexer Service

Configuring indexing for column families of tables in an HBase cluster requires:

  • Enabling replication on HBase column families
  • Creating collections and configurations
  • Registering a Lily HBase Indexer configuration with the Lily HBase Indexer Service
  • Verifying the indexing is working

Enabling replication on HBase column families

Ensure that cluster-wide HBase replication is enabled. Use the HBase shell to define column-family replication settings.

For every existing table, set the REPLICATION_SCOPE on every column family that needs to be indexed. Do this by issuing a command of the form:

$ hbase shell
hbase shell> disable 'record'
hbase shell> alter 'record', {NAME => 'data', REPLICATION_SCOPE => 1}
hbase shell> enable 'record'

For every new table, set the REPLICATION_SCOPE on every column family that needs to be indexed. Do this by issuing a command of the form:

$ hbase shell
hbase shell> create 'record', {NAME => 'data', REPLICATION_SCOPE => 1}

Creating collections and configurations

Complete three tasks related to creating a collection and configurations. The tasks required for the Lily HBase NRT Indexer Services are the same as those described for the Lily HBase Batch Indexer. Therefore, follow the steps described in these sections:

Registering a Lily HBase Indexer configuration with the Lily HBase Indexer Service

Once the content of the Lily HBase Indexer configuration XML file is satisfactory, register it with the Lily HBase Indexer Service. This is done with a given SolrCloud collection by uploading the Lily HBase Indexer configuration XML file to ZooKeeper. For example:

$ hbase-indexer add-indexer \
--name myIndexer \
--indexer-conf $HOME/morphline-hbase-mapper.xml \
--connection-param solr.zk=solr-cloude-zk1,solr-cloude-zk2/solr \
--connection-param solr.collection=hbase-collection1 \
--zookeeper hbase-cluster-zookeeper:2181

Verify that the indexer was successfully created as follows:

$ hbase-indexer list-indexers
Number of indexes: 1

myIndexer
  + Lifecycle state: ACTIVE
  + Incremental indexing state: SUBSCRIBE_AND_CONSUME
  + Batch indexing state: INACTIVE
  + SEP subscription ID: Indexer_myIndexer
  + SEP subscription timestamp: 2013-06-12T11:23:35.635-07:00
  + Connection type: solr
  + Connection params:
    + solr.collection = hbase-collection1
    + solr.zk = localhost/solr
  + Indexer config:
      110 bytes, use -dump to see content
  + Batch index config:
      (none)
  + Default batch index config:
      (none)
  + Processes
    + 1 running processes
    + 0 failed processes

Existing Lily HBase Indexers can be further manipulated by using the update-indexer and delete-indexer command line options of the hbase-indexer utility.

For more help use the following help commands:

$ hbase-indexer add-indexer --help
$ hbase-indexer list-indexers --help
$ hbase-indexer update-indexer --help
$ hbase-indexer delete-indexer --help
  Note: The morphlines.conf configuration file must be present on every node that runs an indexer.
  Note: The morphlines.conf configuration file can be updated using the Cloudera Manager Admin Console.

To update morphlines.conf using Cloudera Manager

  1. On the Cloudera Manager Home page, click the Key-Value Indexer Store, often KS_INDEXER-1.
  2. Click Configuration > View and Edit.
  3. Expand Service-Wide and click Morphlines.
  4. For the Morphlines File property, paste the new morphlines.conf content into the Value field.
Cloudera Manager automatically copies pasted configuration files to the current working directory of all Lily HBase Indexer cluster processes on start and restart of the Lily HBase Indexer Service. In this case the file location /etc/hbase-solr/conf/morphlines.conf is not applicable.
  Note: Morphline configuration files can be changed without recreating the indexer itself. In such a case, you must restart the Lily HBase Indexer service.

Verifying the indexing is working

Add rows to the indexed HBase table. For example:

$ hbase shell
hbase(main):001:0> put 'record', 'row1', 'data', 'value'
hbase(main):002:0> put 'record', 'row2', 'data', 'value2'

If the put operation succeeds, wait a few seconds, then navigate to the SolrCloud's UI query page, and query the data. Note the updated rows in Solr.

To print diagnostic information, such as the content of records as they pass through the morphline commands, consider enabling TRACE log level. For example, you might add two lines to your log4j.properties file. The lines vary according to which version of morphlines you are using.

  • For CDK 0.9.1 and earlier, which is used with Search 1.2 and earlier or Cloudera Search for CDH beta 1, add:
    log4j.logger.com.cloudera.cdk.morphline=TRACE
    log4j.logger.com.ngdata=TRACE
  • For Kite 0.11.0 and later, which is used with Cloudera Search for CDH beta 2 and later, add:
    log4j.logger.org.kitesdk.morphline=TRACE
    log4j.logger.com.ngdata=TRACE
In Cloudera Manager 4, this can be done by navigating to Services > KS_INDEXER > Configuration > View and Edit > Lily HBase Indexer > Advanced > Lily HBase Indexer Logging Safety Valve, followed by a restart of the Lily HBase Indexer Service.
  Note: Prior to Cloudera Manager 4.8, the service was referred to as Keystore Indexer service.

In Cloudera Manager 5, this can be done by navigating to Clusters > KS_INDEXER-1 > Configuration > View and Edit > Lily HBase Indexer > Advanced > Lily HBase Indexer Logging Safety Valve, followed by a restart of the Lily HBase Indexer Service.

  Note: The name of the particular key-value store indexer may vary. The most common variation is a different number at the end of the name.

Examine the log files in /var/log/hbase-solr/lily-hbase-indexer-* for details.