Validating the Cloudera Search Deployment

After installing and deploying Cloudera Search, you can validate the deployment by indexing and querying sample documents. You can think of this as a type of "Hello, World!" for Cloudera Search to make sure that everything is installed and working properly.

Before beginning this process, make sure you have access to the Apache Solr admin web console. If your cluster is Kerberos-enabled, make sure you have access to the solr@EXAMPLE.COM Kerberos principal (where EXAMPLE.COM is your Kerberos realm name).

Configuring Sentry for Test Collection

If you have enabled Apache Sentry for authorization, you must have update permission for the admin collection as well as the collection you are creating (test_collection in this example). You can also use the wildcard (*) to grant permissions to create any collection.

For more information on configuring Sentry and granting permissions, see Configuring Sentry Authorization for Cloudera Search.

To grant your user account (jdoe in this example) the necessary permissions:

  1. Switch to the Sentry admin user (solr in this example) using kinit:
    $ kinit solr@EXAMPLE.COM
  2. Create a Sentry role for your user account:
    $ solrctl sentry --create-role cloudera_tutorial_role
  3. Map a group to this role. In this example, user jdoe is a member of the eng group:
    $ solrctl sentry --add-role-group cloudera_tutorial_role eng
  4. Grant update privileges to the cloudera_tutorial_role role for the admin and test_collection collections:
    $ solrctl sentry --grant-privilege cloudera_tutorial_role 'collection=admin->action=update'
    $ solrctl sentry --grant-privilege cloudera_tutorial_role 'collection=test_collection->action=update'
    For more information on the Sentry privilege model for Cloudera Search, see Authorization Privilege Model for Solr.

Creating a Test Collection

  1. If you are using Kerberos, kinit as the user that has privileges to create the collection:
    $ kinit jdoe@EXAMPLE.COM

    Replace EXAMPLE.COM with your Kerberos realm name.

  2. Make sure that the SOLR_ZK_ENSEMBLE environment variable is set in /etc/solr/conf/solr-env.sh. For example:
    $ cat /etc/solr/conf/solr-env.sh
    export SOLR_ZK_ENSEMBLE=zk01.example.com:2181,zk02.example.com:2181,zk03.example.com:2181/solr

    If you are using Cloudera Manager, this is automatically set on hosts with a Solr Server or Gateway role.

  3. Generate configuration files for the collection:
    $ solrctl instancedir --generate $HOME/test_collection_config
  4. If you are using Sentry for authorization, overwrite solrconfig.xml with solrconfig.xml.secure. If you omit this step, Sentry authorization is not enabled for the collection:
    $ cp $HOME/test_collection_config/conf/solrconfig.xml.secure $HOME/test_collection_config/conf/solrconfig.xml
  5. Edit solrconfig.xml as follows:

    Change this line:

    <updateRequestProcessorChain name="updateIndexAuthorization">

    to this:

    <updateRequestProcessorChain name="updateIndexAuthorization" default="true">
  6. Upload the configuration to ZooKeeper:
    $ solrctl instancedir --create test_collection_config $HOME/test_collection_config
  7. Create a new collection with two shards (specified by the -s parameter) using the named configuration (specified by the -c parameter):
    $ solrctl collection --create test_collection -s 2 -c test_collection_config

Indexing Sample Data

Cloudera Search includes sample data for testing and validation. Run the following commands to index this data for searching. Replace search01.example.com in the example below with the name of any host running the Solr Server process.
  • Parcel-based Installation (Security Enabled):
    $ cd /opt/cloudera/parcels/CDH/share/doc/solr-doc*/example/exampledocs
    $ find *.xml -exec curl -i -k --negotiate -u: https://search01.example.com:8985/solr/test_collection/update -H "Content-Type: text/xml" --data-binary @{} \;
  • Parcel-based Installation (Security Disabled):
    $ cd /opt/cloudera/parcels/CDH/share/doc/solr-doc*/example/exampledocs
    $ java -Durl=http://search01.example.com:8983/solr/test_collection/update -jar post.jar *.xml
  • Package-based Installation (Security Enabled):
    $ cd /usr/share/doc/solr-doc*/example/exampledocs
    $ find *.xml -exec curl -i -k --negotiate -u: https://search01.example.com:8985/solr/test_collection/update -H "Content-Type: text/xml" --data-binary @{} \;
  • Package-based Installation (Security Disabled):
    $ cd /usr/share/doc/solr-doc*/example/exampledocs
    $ java -Durl=http://search01.example.com:8983/solr/test_collection/update -jar post.jar *.xml

Querying Sample Data

Run a query to verify that the sample data is successfully indexed and that you are able to search it:

  1. Open the Solr admin web interface in a browser by accessing the following URL:
    • Security Enabled: https://search01.example.com:8985/solr
    • Security Disabled: http://search01.example.com:8983/solr
    Replace search01.example.com with the name of any host running the Solr Server process. If you have security enabled on your cluster, enter the credentials for the solr@EXAMPLE.COM principal when prompted.
  2. Select Cloud from the left panel.
  3. Select one of the hosts listed for the test_collection collection.
  4. From the Core Selector drop-down menu in the left panel, select the test_collection shard.
  5. Select Query from the left panel and click Execute Query. If you see results such as the following, indexing was successful:
      "response": {
        "numFound": 32,
        "start": 0,
        "maxScore": 1,
        "docs": [
          {
            "id": "SP2514N",
            "name": "Samsung SpinPoint P120 SP2514N - hard drive - 250 GB - ATA-133",
            "manu": "Samsung Electronics Co. Ltd.",
            "manu_id_s": "samsung",
            "cat": [
              "electronics",
              "hard drive"
            ],

Next Steps

After you have verified that Cloudera Search is installed and running properly, you can experiment with other methods of ingesting and indexing data. This tutorial uses tweets to demonstrate batch indexing and near real time (NRT) indexing. Continue on to the next portion of the tutorial:

To learn more about Solr, see the Apache Solr Tutorial.