Starting the Flume Agent

  1. Delete all existing documents in Solr:
    $ solrctl collection --deletedocs collection1
  2. Check the status of the Flume Agent to determine if it is running or not:
    $ sudo /etc/init.d/flume-ng-agent status
  3. Use the start or restart functions. For example, to restart a running Flume Agent:
    $ sudo /etc/init.d/flume-ng-agent restart
  4. Monitor progress in the Flume log file and watch for errors:
    $ tail -f /var/log/flume-ng/flume.log

After restarting the Flume agent, use the Cloudera Search GUI. For example, for the localhost, use http://localhost:8983/solr/collection1/select?q=*%3A*&sort=created_at+desc&wt=json&indent=true to verify that new tweets have been ingested into Solr. The query sorts the result set such that the most recently ingested tweets are at the top, based on the created_at timestamp. If you rerun the query, new tweets show up at the top of the result set.

To print diagnostic information, such as the content of records as they pass through the morphline commands, enable TRACE log level diagnostics by adding the following to your log4j.properties file:

log4j.logger.org.kitesdk.morphline=TRACE

In Cloudera Manager, you can use the safety valve to enable TRACE log level.

Go to Menu Services > Flume > Configuration > View and Edit > Agent > Advanced > Agent Logging Safety Valve. After setting this value, restart the service.