This is the documentation for Cloudera Manager 4.8.6. Documentation for other versions is available at Cloudera Documentation.

Cloudera Manager 4 and CDH 4 have reached End of Maintenance (EOM) on August 9, 2015. Cloudera will not support or provide patches for any of the Cloudera Manager 4 or CDH 4 releases after that date.

The Flume Service

The Flume NG service must be added separately from the wizard; the packages are installed by the installation wizard, but the agents are not configured or started as part of First Run. As part of adding Flume as a service, you should first configure your Flume agents before you start those role instances.

For details of how to modify configurations and use configuration overrides in Cloudera Manager, see Modifying Configuration Settings.

For detailed information about Flume agent configuration, see the Flume User Guide.

For general discussion of adding a service, see Adding a Service.

Installing Flume Agents

  1. Click the Services tab, then choose All Services.
  2. From the Actions menu, select Add a Service. A list of possible services are displayed. You can add one type of service at a time.
  3. Select the Flume service.
  4. Select the set of dependencies for the Flume service.
  5. Select the hosts on which you want Flume agents to be installed.
  6. Click Continue and the Flume agents are installed on the nodes you've selected.

Configuring your Flume Agents

The Flume agents are not started automatically. You must first configure your agents appropriately before you start them, following the instructions below.

A default Flume flow configuration is provided as an example in the Configuration properties for the Flume agents; you should replace this with the your own configuration. The example configuration, initially in the Agent (Default) role group, provides configuration for a single agent in a single tier.

When you add new agent roles, they are placed (initially) in the Agent (Default) role group. All agents that share the same configuration should be members of the same Agent role group. However, if you are using multiple tiers, each tier should be configured in its own Agent role group. This is discussed further below.

A single Flume configuration file can contain the configuration for multiple agents, since each configuration property is prefixed by the agent name. You can then set the agents' names using configuration overrides to change the name of a specific agent without changing its role group membership. This is described in more detail below (see To override the agent name for one or more specific agents:).

Also note that different agent role instances can have the same name — agent names do not have to be unique. You can use this to further simplify the configuration file. This is the recommended method to configure Flume.

Flume NG can be installed on a cluster running either CDH3 or CDH4. However, monitoring of Flume is only supported if your cluster is running CDH4.1 or later, or CDH3u5 (refresh 2) or later.

To configure your Flume agents:

  1. Go to the Flume Service page (by selecting your Flume service from the Services menu or from the All Services page).
  2. Pull down the Configuration tab, and select View and Edit.
  3. Select the Agent (Default) role group in the left hand column. The settings you make here apply to the default role group, and thus will apply to all agent instances unless you associate those instances with a different role group, or override them for specific agents.
  4. Set the Agent Name property to the name of the agent (or one of the agents) whose configuration is defined in your flume.conf. You can specify only one agent name here — the name you specify will be used as the default for all Flume agent instances, unless you override the name for specific agents. You can have multiple agents with the same name — they will share the same configuration based on your configuration file.
      Note: The agent name can be comprised of letters, numbers, and the underscore character.
  5. Copy the contents of your flume.conf file, in its entirety, into the Configuration File field. Unless overridden for specific agent instances, this flume.conf file will apply to all your agents. You can provide multiple agent configurations in this file and use Agent Name overrides to determine which configurations to use for each agent. This is the recommended procedure.
      Note: The name/value pairs within the configuration file property must include an equal sign (=). For example, tier1.channels.channel1.capacity = 10000. (Outside of Cloudera Manager, tier1.channels.channel1.capacity 10000 would be accepted as valid, but within Cloudera Manager, this is not true.)
  Important: If your Flume configuration uses multiple tiers, you must create a separate Agent role group for each tier, and move each agent to be a member of the appropriate role group for their tier. By default, all agent roles are initially made members of the Agent (Default) role group; however, you create new role groups and can move agents between them.

From the Configuration menu, select Role Groups to manage your role groups and their membership.

See Managing Roles, and specifically the section on Managing Role Groups for information on creating new role groups and moving existing roles into those groups.

Overriding the Agent Name

If you have specified multiple agent configurations in your flume.conf file, you can override the agent name for the specific agent instances that should use a different configuration. Overriding the agent name will point that agent instance to the appropriate configuration statements in the flume.conf file.

  1. Pull down the Flume service Configuration tab, select Edit and the select the appropriate Agent role group in the left hand column.
  2. To override the agent name for one or more instances, move your cursor over the value area of the Agent Name property, and click Override Instances.
  3. Select the agent (role) instances you want to override.
  4. In the field labeled Change value of selected instances to: select "Other". (You can use the "Inherited Value" setting to return to the service-level value.)
  5. In the field that appears, type the agent name you want to use for the selected agents.
  6. Click Apply to have your change take effect.

After you have completed your configuration changes, you can start the Flume service, which will start all your Flume agents.

  Note: If you need to modify your Flume configuration file after you have started the Flume service, you can use the Update Config command from the Actions menu on the Flume Service Status page to update the configuration across flume agents without having to shut down the Flume service.

Using Flume with HDFS or HBase Sinks

If you want to use Flume with HDFS or HBase sinks, you can add a dependency to that service from the Flume configuration page. This will automatically add the correct client configurations to the Flume agent's classpath.

  Note: In CDH 4, the ZooKeeper configuration file causes the HBase sink to fail. To address this issue, do one of the following:
  • If you are using Flume with HBase, remove the file /etc/zookeeper/conf/zoo.cfg from the Flume client machines and specify ZooKeeper configuration details in hbase-site.xml or flume.conf.
  • For CDH 4.5 and later and Cloudera Manager 4.7 and later Cloudera Manager passes a flag that causes Flume not to include any files named zoo.cfg in its classpath at startup time. The flag is enabled by default. For backward compatibility purposes, you can disable this flag as follows:
    1. Go to Flume service.
    2. Select Configuration > View and Edit.
    3. Expand the Agent default role group and select Advanced.
    4. Disable the HBase sink prefer hbase-site.xml over Zookeeper config property.
    5. Restart the Flume service.

      This will allow files named zoo.cfg to be included in Flume's classpath.

Using Flume with Solr

The Flume Solr Sink provides a flexible, scalable, fault tolerant, transactional, Near Real Time (NRT) oriented system for processing a continuous stream of records into live search indexes. Latency from the time of data arrival to the time of data showing up in search query results is on the order of seconds, and tunable. Completing Near Real-Time (NRT) indexing requires the Flume Solr Sink. Cloudera Manager provides a set of configuration settings under the Flume Service to help configure Flume Morphline Solr Sink. See Configuring Flume Morphline Solr Sink for use with the Solr Service for detailed instructions.

This page last updated September 8, 2015