Kafka Administration Basics

Broker Log Management

Kafka brokers save their data as log segments in a directory. The logs are rotated depending on the size and time settings.

The most common log retention settings to adjust for your cluster are shown below. These are accessible in Cloudera Manager via the Kafka > Configuration tab.

  • log.dirs: The location for the Kafka data (that is, topic directories and log segments).
  • log.retention.{ms|minutes|hours}: The retention period for the entire log. Any older log segments are removed.
  • log.retention.bytes: The retention size for the entire log.

There are many more variables available for fine-tuning broker log management. For more detailed information, look at the relevant variables in the Apache Kafka documentation topic Broker Configs.

  • log.dirs
  • log.flush.*
  • log.retention.*
  • log.roll.*
  • log.segment.*

Record Management

There are two pieces to record management, log segments and log cleaner.

As part of the general data storage, Kafka rolls logs periodically based on size or time limits. Once either limit is hit, a new log segment is created with the all new data being placed there, while older log segments should generally no longer change. This helps limit the risk of data loss or corruption to a single segment instead of the entire log.

  • log.roll.{ms|hours}: The time period for each log segment. Once the current segment is older than this value, it goes through log segment rotation.
  • log.segment.bytes: The maximum size for a single log segment.

There is an alternative to simply removing log segments for a partition. There is another feature based on the log cleaner. When the log cleaner is enabled, individual records in older log segments can be managed differently:

  • log.cleaner.enable: This is a global setting in Kafka to enable the log cleaner.
  • cleanup.policy: This is a per-topic property that is usually set at topic creation time. There are two valid values for this property, delete and compact.
  • log.cleaner.min.compaction.lag.ms: This is the retention period for the “head” of the log. Only records outside of this retention period will be compacted by the log cleaner.

The compact policy, also called log compaction, assumes that the "most recent Kafka record is important." Some examples include tracking a current email address or tracking a current mailing address. With log compaction, older records with the same key are removed from a log segment and the latest one is kept. This effectively removes some offsets from the partition.

Broker Garbage Collection

To set the JVM Garbage Collection log path on the brokers:

  1. In Cloudera Manager, go to Kafka > Configurations.
  2. Find the property Kafka Broker Environment Advanced Configuration Snippet (Safety Valve) and add the line:
    KAFKA_GC_LOG_OPTS="-Xloggc:/var/logs/kafka/gc-broker.log"
  3. Restart the Kafka service to apply the new configuration.

To set the JVM Garbage log rotation on the brokers:

  1. In Cloudera Manager, go to Kafka > Configurations.
  2. Find the property Kafka Broker Environment Advanced Configuration Snippet (Safety Valve) and add the line (modified as appropriate):
    -XX:+UseGCLogFileRotation
    -XX:NumberOfGCLogFiles=10
    -XX:GCLogFileSize=100M
                    
  3. Restart the Kafka service to apply the new configuration.

Adding Users as Kafka Administrators

In some cases, additional users besides the kafka account need administrator access. This can be done in Cloudera Manager by going to Kafka > Configuration > Super users.