Setting Up Cloudera Search Using the Command Line

This documentation describes how to install Cloudera Search powered by Solr. It also explains how to install and start supporting tools and services such as the ZooKeeper Server, MapReduce tools for use with Cloudera Search, and Flume Solr Sink.

After installing Cloudera Search as described in this document, you can configure and use Cloudera Search as described in the Cloudera Search Guide. The user guide includes the Cloudera Search Tutorial, as well as topics that describe extracting, transforming, and loading data, establishing high availability, and troubleshooting.

Initializing Solr

Configure ZooKeeper Quorum Addresses

After the ZooKeeper service is running, configure each Solr host with the ZooKeeper quorum addresses. This can be a single address if you have only one ZooKeeper server, or multiple addresses if you are using multiple servers.

Configure the ZooKeeper Quorum addresses in /etc/solr/conf/solr-env.sh on each Solr server host. For example:

$ cat /etc/solr/conf/solr-env.sh
export SOLR_ZK_ENSEMBLE=zk01.example.com:2181,zk02.example.com:2181,zk03.example.com:2181/solr

Configure Solr for Use with HDFS

To use Solr with your established HDFS service, perform the following configurations:

  1. Configure the HDFS URI for Solr to use as a backing store in /etc/default/solr or /opt/cloudera/parcels/CDH-*/etc/default/solr. On every Solr Server host, edit the following property to configure the location of Solr index data in HDFS:
    SOLR_HDFS_HOME=hdfs://nn01.example.com:8020/solr

    Replace nn01.example.com with the hostname of your HDFS NameNode (as specified by fs.default.name or fs.defaultFS in your /etc/hadoop/conf/core-site.xml file). You might also need to change the port number from the default (8020) if your NameNode runs on a non-default port. On an HA-enabled cluster, ensure that the HDFS URI you use reflects the designated name service used by your cluster. This value must be reflected in fs.default.name (for example, hdfs://nameservice1 or something similar).

  2. In some cases, such as configuring Solr to work with HDFS High Availability (HA), you might want to configure the Solr HDFS client by setting the HDFS configuration directory in /etc/default/solr or /opt/cloudera/parcels/CDH-*/etc/default/solr. On every Solr Server host, locate the appropriate HDFS configuration directory and edit the following property with the absolute path to this directory :
    SOLR_HDFS_CONFIG=/etc/hadoop/conf

    Replace the path with the correct directory containing the proper HDFS configuration files, core-site.xml and hdfs-site.xml.

Configuring Solr to Use Secure HDFS

If security is enabled, perform the following steps:

  1. Create the Kerberos principals and Keytab files for every host in your cluster:
    1. Create the Solr principal using either kadmin or kadmin.local.
      kadmin:  addprinc -randkey solr/fully.qualified.domain.name@YOUR-REALM.COM
      kadmin:  xst -norandkey -k solr.keytab solr/fully.qualified.domain.name

      For more information, see Step 4: Create and Deploy the Kerberos Principals and Keytab Files

  2. Deploy the Kerberos Keytab files on every host in your cluster:
    1. Copy or move the keytab files to a directory that Solr can access, such as /etc/solr/conf.
      $ sudo mv solr.keytab /etc/solr/conf/
      $ sudo chown solr:hadoop /etc/solr/conf/solr.keytab
      $ sudo chmod 400 /etc/solr/conf/solr.keytab
  3. Add Kerberos-related settings to /etc/default/solr or /opt/cloudera/parcels/CDH-*/etc/default/solr on every host in your cluster, substituting appropriate values. For a package based installation, use something similar to the following:
    SOLR_KERBEROS_ENABLED=true
    SOLR_KERBEROS_KEYTAB=/etc/solr/conf/solr.keytab
    SOLR_KERBEROS_PRINCIPAL=solr/fully.qualified.domain.name@YOUR-REALM.COM

Create the /solr Directory in HDFS

Before starting the Cloudera Search server, you must create the /solr directory in HDFS. The Cloudera Search service runs as the solr user by default, so it does not have the required permissions to create a top-level directory.

To create the /solr directory in HDFS:
$ sudo -u hdfs hdfs dfs -mkdir /solr
$ sudo -u hdfs hdfs dfs -chown solr /solr

If you are using a Kerberos-enabled cluster, you must authenticate with the hdfs account or another superuser before creating the directory:

$ kinit hdfs@EXAMPLE.COM
$ hdfs dfs -mkdir /solr
$ hdfs dfs -chown solr /solr

Initialize the ZooKeeper Namespace

Before starting the Cloudera Search server, you must create the solr namespace in ZooKeeper:
$ solrctl init

Start Solr

Start the Solr service on each host:
$ sudo service solr-server restart
After you have started the Cloudera Search Server, the Solr server should be running. To verify that all daemons are running, use the jps tool from the Oracle JDK, which you can obtain from the Java SE Downloads page. If you are running a pseudo-distributed HDFS installation and a Solr search installation on one machine, jps shows the following output:
$ sudo jps -lm
31407 sun.tools.jps.Jps -lm
31236 org.apache.catalina.startup.Bootstrap start

Install Hue Search

You must install and configure Hue before you can use Search with Hue.

  1. Follow the instructions for Installing Hue.
  2. Use one of the following commands to install Search applications on the Hue machine:
    • RHEL compatible:
      sudo yum install hue-search
    • Ubuntu/Debian:
      sudo apt-get install hue-search
    • SLES:
      sudo zypper install hue-search
  3. Update the configuration information for the Solr Server:
    Cloudera Manager Environment Environment without Cloudera Manager
    1. Connect to Cloudera Manager.
    2. Select the Hue service.
    3. Click the Configuration tab.
    4. Search for the word "safety".
    5. Add information about your Solr host to Hue Server Advanced Configuration Snippet (Safety Valve) for hue_safety_valve_server.ini. For example, if your hostname is SOLR_HOST, you might add the following:
      [search]
      # URL of the Solr Server
      solr_url=http://SOLR_HOST:8983/solr
    6. (Optional) To enable Hue in environments where Kerberos authentication is required, update the security_enabled property as follows:
      # Requires FQDN in solr_url if enabled
      security_enabled=true
    Update configuration information in /etc/hue/hue.ini.
    1. Specify the Solr URL. For example, to use localhost as your Solr host, you would add the following:
      [search]
      # URL of the Solr Server, replace 'localhost' if Solr is running on another host
      solr_url=http://localhost:8983/solr/
    2. (Optional) To enable Hue in environments where Kerberos authentication is required, update the security_enabled property as follows:
      # Requires FQDN in solr_url if enabled
      security_enabled=true
  4. Configure secure impersonation for Hue.
    • If you are using Search in an environment that uses Cloudera Manager 4.8 and higher, secure impersonation for Hue is automatically configured. To review secure impersonation settings in the Cloudera Manager home page:
      1. Go to the HDFS service.
      2. Click the Configuration tab.
      3. Select Scope > All.
      4. Select Category > All.
      5. Type hue proxy in the Search box.
      6. Note the Service-Wide wild card setting for Hue Proxy Hosts and Hue Proxy User Groups.
    • If you are not using Cloudera Manager or are using a version earlier than Cloudera Manager 4.8, configure Hue to impersonate any user that makes requests by modifying /etc/default/solr or /opt/cloudera/parcels/CDH-*/etc/default/solr. The changes you make may vary according to the users for which you want to configure secure impersonation. For example, you might make the following changes:
      SOLR_SECURITY_ALLOWED_PROXYUSERS=hue
      SOLR_SECURITY_PROXYUSER_hue_HOSTS=*
      SOLR_SECURITY_PROXYUSER_hue_GROUPS=*

      For more information about Secure Impersonation or to set up additional users for Secure Impersonation, see Enabling Secure Impersonation.

  5. (Optional) To view files in HDFS, ensure that the correct webhdfs_url is included in hue.ini and WebHDFS is properly configured as described in Configuring CDH Components for Hue.
  6. Restart Hue:
    $ sudo /etc/init.d/hue restart
  7. Open http://hue-host.com:8888/search/ in your browser.