Managing Key-Value Store Indexer

The Key-Value Store Indexer service uses the Lily HBase Indexer Service to index the stream of records being added to HBase tables. Indexing allows you to query data stored in HBase with the Solr service.

The Key-Value Store Indexer service is installed in the same parcel or package along with the CDH 5 or Solr service.

Adding the Key-Value Store Indexer Service

Minimum Required Role: Cluster Administrator (also provided by Full Administrator)

  1. On the Home page, click to the right of the cluster name and select Add a Service. A list of service types display. You can add one type of service at a time.
  2. Select the Key-Value Store Indexer service and click Continue.
  3. Select the radio button next to the services on which the new service should depend. All services must depend on the same ZooKeeper service. Click Continue.
  4. Customize the assignment of role instances to hosts. The wizard evaluates the hardware configurations of the hosts to determine the best hosts for each role. The wizard assigns all worker roles to the same set of hosts to which the HDFS DataNode role is assigned. You can reassign role instances if necessary.

    Click a field below a role to display a dialog containing a list of hosts. If you click a field containing multiple hosts, you can also select All Hosts to assign the role to all hosts, or Custom to display the pageable hosts dialog.

    The following shortcuts for specifying hostname patterns are supported:
    • Range of hostnames (without the domain portion)
      Range Definition Matching Hosts
      10.1.1.[1-4] 10.1.1.1, 10.1.1.2, 10.1.1.3, 10.1.1.4
      host[1-3].company.com host1.company.com, host2.company.com, host3.company.com
      host[07-10].company.com host07.company.com, host08.company.com, host09.company.com, host10.company.com
    • IP addresses
    • Rack name

    Click the View By Host button for an overview of the role assignment by hostname ranges.

  5. Click Continue.
  6. Review the configuration changes to be applied. Confirm the settings entered for file system paths. The file paths required vary based on the services to be installed. If you chose to add the Sqoop service, indicate whether to use the default Derby database or the embedded PostgreSQL database. If the latter, type the database name, host, and user credentials that you specified when you created the database. Click Continue. The wizard starts the services.
  7. Click Continue.
  8. Click Finish.

Enabling Morphlines with Search and HBase Indexing

Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)

Cloudera Morphlines is an open source framework that reduces the time and skills necessary to build or change Search indexing applications. A morphline is a rich configuration file that simplifies defining an ETL transformation chain.

  1. Go to the Indexer service.
  2. Click the Configuration tab.
  3. Select Scope > All.
  4. Select Category > Morphlines.
  5. Create the necessary configuration files, and modify the content in the following properties:
    • Morphlines File — Text that goes into the morphlines.conf used by HBase indexers. You should use $ZK_HOST in this file instead of specifying a ZooKeeper quorum. Cloudera Manager automatically replaces the $ZK_HOST variable with the correct value during the Solr configuration deployment.
    • Custom MIME-types File — Text that goes verbatim into the custom-mimetypes.xml file used by HBase Indexers with the detectMimeTypes command. See the Cloudera Morphlines Reference Guide for details on this command.
    • Grok Dictionary File — Text that goes verbatim into the grok-dictionary.conf file used by HBase Indexers with the grok command. See the Cloudera Morphlines Reference Guide for details of this command.
    See Extracting, Transforming, and Loading Data With Cloudera Morphlines for information about using morphlines with Search and HBase.