Kafka Performance Tuning

Performance tuning involves two important metrics:

  • Latency measures how long it takes to process one event.
  • Throughput measures how many events arrive within a specific amount of time.

Most systems are optimized for either latency or throughput. Kafka is balanced for both. A well-tuned Kafka system has just enough brokers to handle topic throughput, given the latency required to process information as it is received.

Tuning your producers, brokers, and consumers to send, process, and receive the largest possible batches within a manageable amount of time results in the best balance of latency and throughput for your Kafka cluster.

The following sections introduce the concepts you'll need to be able to balance your Kafka workload and then provide practical tuning configuration to address specific circumstances.

For a quick video introduction to tuning Kafka, see Tuning Your Apache Kafka Cluster.

There are a few concepts described here that will help you focus your tuning efforts. Additional topics in this section provide practical tuning guidelines:

Tuning Brokers

Topics are divided into partitions. Each partition has a leader. Topics that are properly configured for reliability will consist of a leader partition and 2 or more follower partitions. When the leaders are not balanced properly, one might be overworked, compared to others.

Depending on your system and how critical your data is, you want to be sure that you have sufficient replication sets to preserve your data. For each topic, Cloudera recommends starting with one partition per physical storage disk and one consumer per partition.

Tuning Producers

Kafka uses an asynchronous publish/subscribe model. When your producer calls send(), the result returned is a future. The future provides methods to let you check the status of the information in process. When the batch is ready, the producer sends it to the broker. The Kafka broker waits for an event, receives the result, and then responds that the transaction is complete.

If you do not use a future, you could get just one record, wait for the result, and then send a response. Latency is very low, but so is throughput. If each transaction takes 5 ms, throughput is 200 events per second — slower than the expected 100,000 events per second.

When you use Producer.send(), you fill up buffers on the producer. When a buffer is full, the producer sends the buffer to the Kafka broker and begins to refill the buffer.

Two parameters are particularly important for latency and throughput: batch size and linger time.

Batch Size

batch.size measures batch size in total bytes instead of the number of messages. It controls how many bytes of data to collect before sending messages to the Kafka broker. Set this as high as possible, without exceeding available memory. The default value is 16384.

If you increase the size of your buffer, it might never get full. The Producer sends the information eventually, based on other triggers, such as linger time in milliseconds. Although you can impair memory usage by setting the buffer batch size too high, this does not impact latency.

If your producer is sending all the time, you are probably getting the best throughput possible. If the producer is often idle, you might not be writing enough data to warrant the current allocation of resources.

Linger Time

linger.ms sets the maximum time to buffer data in asynchronous mode. For example, the setting of 100 means that it batches 100ms of messages to send at once. This improves throughput, but the buffering adds message delivery latency.

By default, the producer does not wait. It sends the buffer any time data is available.

Instead of sending immediately, you can set linger.ms to 5 and send more messages in one batch. This would reduce the number of requests sent, but would add up to 5 milliseconds of latency to records sent, even if the load on the system does not warrant the delay.

The farther away the broker is from the producer, the more overhead required to send messages. Increase linger.ms for higher latency and higher throughput in your producer.

Tuning Consumers

Consumers can create throughput issues on the other side of the pipeline. The maximum number of consumers in a consumer group for a topic is equal to the number of partitions. You need enough partitions to handle all the consumers needed to keep up with the producers.

Consumers in the same consumer group split the partitions among them. Adding more consumers to a group can enhance performance (up to the number of partitions). Adding more consumer groups does not affect performance.

Mirror Maker Performance

Kafka Mirror Maker is a tool to replicate topics between data centers. It is best to run Mirror Maker at the destination data center. Consuming messages from a distant cluster and writing them into a local cluster tends to be more safe than producing over a long-distance network. Deploying the Mirror Maker in the source data center and producing remotely has a higher risk of losing data. However, if you need this setup, make sure that you configure acks=all with appropriate number of retries and min ISR.

  • Encrypting data in transit with SSL has impact on performance of Kafka brokers.
  • To reduce lag between clusters, you can improve performance by deploying multiple Mirror Maker instances using the same consumer group ID.
  • Measure CPU utilization.
  • Consider using compression for consumers and producers when mirroring topics between data centers as bandwidth can be a bottleneck.
  • Monitor lag and metrics of Mirror Maker.

To properly size Mirror Maker, take expected throughput and maximum allowed lag between data centers into account.

  • num.streams parameter controls the number of consumer threads in Mirror Maker.
  • kafka-producer-perf-test can be used to generate load on the source cluster. You can test and measure performance of Mirror Maker with different num.streams values (start from 1 and increase it gradually).

Good performance can be achieved with proper consumer and producer settings and properly tuned OS properties, such as networking and I/O related kernel settings.