Configuring Flume Solr Sink to Sip from the Twitter Firehose

The tutorial provides examples that work with an environment established using a package-based installation. If you installed Cloudera Search using parcels, adjust file paths accordingly.

Edit /etc/flume-ng/conf/flume.conf and replace the following properties with credentials from a valid Twitter account. The Flume TwitterSource uses the Twitter 1.1 API, which requires authentication of both the consumer (application) and the user (you).

agent.sources.twitterSrc.consumerKey = YOUR_TWITTER_CONSUMER_KEY
agent.sources.twitterSrc.consumerSecret = YOUR_TWITTER_CONSUMER_SECRET
agent.sources.twitterSrc.accessToken = YOUR_TWITTER_ACCESS_TOKEN
agent.sources.twitterSrc.accessTokenSecret = YOUR_TWITTER_ACCESS_TOKEN_SECRET

Use the Twitter developer site to generate these four codes by completing the following steps:

  1. Sign in to https://dev.twitter.com with a Twitter account.
  2. Select My applications from the drop-down menu in the top-right corner, and Create a new application.
  3. Fill in the form to represent the Search installation. This can represent multiple clusters, and does not require the callback URL. Because this is not a publicly distributed application, the values you enter for the required name, description, and website fields are not important.
  4. Click Create my access token at the bottom of the page. You may have to refresh the page to see the access token.

Substitute the consumer key, consumer secret, access token, and access token secret into flume.conf. Consider this information confidential, just like your regular Twitter credentials.

To enable authentication, ensure the system clock is set correctly on all hosts where Flume connects to Twitter. You can install NTP and keep the host synchronized by running the ntpd service, or manually synchronize using the command sudo ntpdate pool.ntp.org . To confirm that the time is set correctly, make sure that the output of the command date --utc matches the time shown at http://time.gov/HTML5/. You can also set the time manually using the date command.