Configuring the Flume Solr Sink

This topic describes modifying configuration files by using either:
  • Cloudera Manager in a parcel-based installations to edit the configuration files similar to the process described in Configuring the Flume Agents.
  • Command-line tools in a package-based installation to edit files.
  1. Modify the Flume configuration to specify the Flume source details and set up the flow. You must set the relative or absolute path to the morphline configuration file.
    • Parcel-based Installation: In the Cloudera Manager Admin Console, select Flume > Configuration and modify Configuration File to include:
      agent.sinks.solrSink.morphlineFile = /opt/cloudera/parcels/CDH/etc/flume-ng/conf/morphline.conf
    • Package-based Installation: Edit /etc/flume-ng/conf/flume.conf to include:
      agent.sinks.solrSink.morphlineFile = /etc/flume-ng/conf/morphline.conf
  2. Modify the Morphline configuration to specify the Solr location details using a SOLR_LOCATOR.
    • Parcel-based Installation: In the Cloudera Manager Admin Console, select Flume > Configuration and modify Morphline File.
    • Package-based Installation: Edit /etc/flume-ng/conf/morphline.conf.
    The snippet that includes the SOLR_LOCATOR might appear as follows:
    SOLR_LOCATOR : {
      # Name of solr collection
      collection : collection
    
      # ZooKeeper ensemble
      zkHost : "$ZK_HOST"
    }
    
    morphlines : [
      {
        id : morphline1
        importCommands : ["org.kitesdk.**", "org.apache.solr.**"]
        commands : [
          { generateUUID { field : id } }
    
          { # Remove record fields that are unknown to Solr schema.xml.
            # Recall that Solr throws an exception on any attempt to load a document that
            # contains a field that isn't specified in schema.xml.
            sanitizeUnknownSolrFields {
              solrLocator : ${SOLR_LOCATOR} # Location from which to fetch Solr schema
            }
          }
    
          { logDebug { format : "output record: {}", args : ["@{}"] } }
    
          {
            loadSolr {
              solrLocator : ${SOLR_LOCATOR}
            }
          }
        ]
      }
    ]
  3. Copy flume-env.sh.template to flume-env.sh:
    • Parcel-based Installation:
      $ sudo cp /opt/cloudera/parcels/CDH/etc/flume-ng/conf/flume-env.sh.template \
      /opt/cloudera/parcels/CDH/etc/flume-ng/conf/flume-env.sh
    • Package-based Installation:
      $ sudo cp /etc/flume-ng/conf/flume-env.sh.template \
      /etc/flume-ng/conf/flume-env.sh
  4. Update the Java heap size.
    • Parcel-based Installation: In the Cloudera Manager Admin Console, select Flume > Configuration. In the Search box enter Java Heap Size. Modify Java Heap Size of Agent in Bytes to be 500 and choose MiB units.
    • Package-based Installation: Edit /etc/flume-ng/conf/flume-env.sh or /opt/cloudera/parcels/CDH/etc/flume-ng/conf/flume-env.sh, inserting or replacing JAVA_OPTS as follows:
      JAVA_OPTS="-Xmx500m"
  5. (Optional) Modify Flume logging settings to facilitate monitoring and debugging:
    • Parcel-based Installation: In the Cloudera Manager Admin Console, select Flume > Configuration and modify Agent Logging Advanced Configuration Snippet (Safety Valve) to include:
      log4j.logger.org.apache.flume.sink.solr=DEBUG
      log4j.logger.org.kitesdk.morphline=TRACE
    • Package-based Installation: Use the following commands:
      $ sudo bash -c 'echo "log4j.logger.org.apache.flume.sink.solr=DEBUG" >> \
      /etc/flume-ng/conf/log4j.properties'
      $ sudo bash -c 'echo "log4j.logger.org.kitesdk.morphline=TRACE" >> \
      /etc/flume-ng/conf/log4j.properties'
  6. (Optional) In a packaged-based installation, you can configure where Flume finds Cloudera Search dependencies for Flume Solr Sink using SEARCH_HOME. For example, if you installed Flume from a tarball package, you can configure it to find required files by setting SEARCH_HOME. To set SEARCH_HOME use a command of the form:
    $ export SEARCH_HOME=/usr/lib/search

    Alternatively, you can add the same setting to flume-env.sh.

    In a Cloudera Manager managed environment, Cloudera Manager automatically updates the SOLR_HOME location with any additional required dependencies.