Using the Lily HBase NRT Indexer Service

To index for column families of tables in an HBase cluster:

  • Enable replication on HBase column families
  • Create collections and configurations
  • Register a Lily HBase Indexer configuration with the Lily HBase Indexer Service
  • Verify that indexing is working

Enabling Replication on HBase Column Families

Ensure that cluster-wide HBase replication is enabled. Use the HBase shell to define column-family replication settings.

For every existing table, set the REPLICATION_SCOPE on every column family that needs to be indexed by issuing a command of the form:

$ hbase shell
hbase shell> disable 'record'
hbase shell> alter 'record', {NAME => 'data', REPLICATION_SCOPE => 1}
hbase shell> enable 'record'

For every new table, set the REPLICATION_SCOPE on every column family that needs to be indexed by issuing a command of the form:

$ hbase shell
hbase shell> create 'record', {NAME => 'data', REPLICATION_SCOPE => 1}

Creating Collections and Configurations

The tasks required for the Lily HBase NRT Indexer Services are the same as those described for the Lily HBase Batch Indexer. Follow the steps described in these sections:

Registering a Lily HBase Indexer Configuration with the Lily HBase Indexer Service

When the content of the Lily HBase Indexer configuration XML file is satisfactory, register it with the Lily HBase Indexer Service. Register the Lily HBase Indexer configuration file by uploading the Lily HBase Indexer configuration XML file to ZooKeeper. For example:

$ hbase-indexer add-indexer \
--name myIndexer \
--indexer-conf $HOME/morphline-hbase-mapper.xml \
--connection-param solr.zk=solr-cloude-zk1,solr-cloude-zk2/solr \
--connection-param solr.collection=hbase-collection1 \
--zookeeper hbase-cluster-zookeeper:2181

Verify that the indexer was successfully created as follows:

$ hbase-indexer list-indexers
Number of indexes: 1

myIndexer
  + Lifecycle state: ACTIVE
  + Incremental indexing state: SUBSCRIBE_AND_CONSUME
  + Batch indexing state: INACTIVE
  + SEP subscription ID: Indexer_myIndexer
  + SEP subscription timestamp: 2013-06-12T11:23:35.635-07:00
  + Connection type: solr
  + Connection params:
    + solr.collection = hbase-collection1
    + solr.zk = localhost/solr
  + Indexer config:
      110 bytes, use -dump to see content
  + Batch index config:
      (none)
  + Default batch index config:
      (none)
  + Processes
    + 1 running processes
    + 0 failed processes

Use the update-indexer and delete-indexer command-line options of the hbase-indexer utility to manipulate existing Lily HBase Indexers.

For more help, use the following commands:

$ hbase-indexer add-indexer --help
$ hbase-indexer list-indexers --help
$ hbase-indexer update-indexer --help
$ hbase-indexer delete-indexer --help

The morphlines.conf configuration file must be present on every host that runs an indexer.

You can use the Cloudera Manager Admin Console to update morphlines.conf:
  1. Go to the Key-Value Store Indexer service.
  2. Click the Configuration tab.
  3. Select Scope > KS_INDEXER (Service Wide)
  4. Select Category > Morphlines.
  5. For the Morphlines File property, paste the new morphlines.conf content into the Value field.
  6. Click Save Changes to commit the changes.
Cloudera Manager automatically copies pasted configuration files to the current working directory of all Lily HBase Indexer cluster processes on start and restart of the Lily HBase Indexer Service. In this case, the file location /etc/hbase-solr/conf/morphlines.conf is not applicable.

Morphline configuration files can be changed without re-creating the indexer itself. In such a case, you must restart the Lily HBase Indexer service.

Verifying that Indexing Works

Add rows to the indexed HBase table. For example:

$ hbase shell
hbase(main):001:0> put 'record', 'row1', 'data', 'value'
hbase(main):002:0> put 'record', 'row2', 'data', 'value2'

If the put operation succeeds, wait a few seconds, go to the SolrCloud UI query page, and query the data. Note the updated rows in Solr.

To print diagnostic information, such as the content of records as they pass through the morphline commands, enable the TRACE log level. For example, you might add two lines to your log4j.properties file:

log4j.logger.org.kitesdk.morphline=TRACE
log4j.logger.com.ngdata=TRACE

In Cloudera Manager do the following:

  1. Go to the Key-Value Store Indexer service.
  2. Click the Configuration tab.
  3. Select Scope > Lily HBase Indexer.
  4. Select Category > Advanced.
  5. Locate the Lily HBase Indexer Logging Advanced Configuration Snippet (Safety Valve) property or search for it by typing its name in the Search box.

    To apply this configuration property to other role groups as needed, edit the value for the appropriate role group. See Modifying Configuration Properties Using Cloudera Manager.

  6. Click Save Changes to commit the changes.
  7. Restart the Key-Value Store Indexer service.

Examine the log files in /var/log/hbase-solr/lily-hbase-indexer-* for details.

Configuring Clients to Use the HTTP Interface

By default, the client does not use the new HTTP interface. Use the HTTP interface only if you want to take advantage of one of the features it provides, such as Kerberos authentication and Sentry integration. The client now supports passing two additional parameters to the list-indexers, create-indexer, delete-indexer, and update-indexer commands:

  • --http: An HTTP URI to the hbase-indexer HTTP API. By default, this URI is of the form http://host:11060/indexer/. If this URI is passed, the Lily HBase Indexer uses the HTTP API. If this URI is not passed, the indexer uses the old behavior of communicating directly with ZooKeeper.
  • --jaas: The specification of a jaas configuration file. This is only necessary for Kerberos-enabled deployments.

For example:

hbase-indexer list-indexers --http http://host:port/indexer/ \
--jaas jaas.conf --zookeeper host:port