Tuning Handbook of Apache Kafka!
Tuning Handbook of Apache Kafka!
We all know the power and advantages of Apache Kafka. It is publish-subscribe messaging system which basically has three major components
- Apache Kafka Consumer
- Apache Producer
- Apache Kafka Broker
This doc is all about how we can achieve maximum throughput while planning to have Kafka in production or in POCs.
There are many organizations running Kafka in their production and also they have provided default configuration to maximize Kafka performance. I’ll be using some of the existing configuration details along with some extra information to describe the configuration we need to take care in setting up the cluster configuration.
Before diving into Kafka tuning pool let’s get some idea what exactly Kafka is and what feature it provides-
Introdcution: Apache Kafka is an open-source message broker project, which was developed by Apache Software Foundation. Kafka provides distributed, partitioned and replicated commit log-service with functionality of a messaging system. The features which make Kafka more reliable and fast are mentioned below:
High Throughput: A single Kafka broker is capable of handling hundreds of megabytes of reads and writes per second from thousands of clients.
Scalable: Kafka is designed in such a way that it can be elastically and transparently expanded without any downtime. It means you can add new broker node at any point of time without shutting down your Kafka cluster.
Durability: All the messages are persisted on the disk and also has a replica of it within the cluster to avoid data loss at any point. Brokers are capable enough of handling terabytes of data without affecting the performance.
Distributed: Kafka also provides distributed processing of messages and its cluster-centric design offers you strong durability and fault-tolerance. Kafka allows us to have partition of any topic which will help us to increase throughput of the system.
So, at very high level we can say that Producers produced the messages and sends them to the Kafka cluster over the network, which will be consumed by respective Consumers who has subscribed for it.
Kafka Use Case Diagram
When we talk about tuning Kafka, there are few configuration parameters to be considered. The most important configurations to improve performance are the one, which controls the disk flush rate.
We can also divide these configurations on component basis as well.
Most important configurations which needs to be taken care at Producer side are –
- Batch size
- Sync or Async
The important configuration at Consumer side is –
- Fetch size
We can also have multiple Consumers running to fetch maximum data from partitioned topic available on Kafka Brokers.
Apache has maintained a very good documentation to describe all the configurations parameter and their definitions. To get the details please follow the link.
Production server configurations
Here are some the configurations parameter and their values –
All the above configurations are broker side configuration. In most of the cases these configuration works well but there are few configurations, which you can modify as per the availability of the cluster environment and machine configuration which are listed below:
num.replica.fetchers: This configuration parameter defines the number of threads which will be replicating data from leader to the follower. Value of this parameter can be modified as per availability of thread. If we have threads available we should have more number of replica fetchers to complete replication in parallel.
replica.fetch.max.bytes: This parameter is all about how much data you want to fetch from any partition in each fetch request. It’s good to increase value for this parameter so that it helps to create replica fast in the followers.
replica.socket.receive.buffer.bytes: In case of less thread available for creating replica, we can increase the size of buffer. It will help to hold more data if replication thread is slow as compared to the incoming message rate.
num.partitions: This is the very important configuration which we should be taken care while having Kafka in live. As many partitions are there, we can have that level of parallelism and write data in parallel which will automatically increase the throughput.
Increasing number of partition can also slow down your performance and throughput if the system configuration is not capable of handling it.
Now the question that rises from the above statement is How??
Yes, this is true if system does not have sufficient threads or just have single disk then it does not make sense in creating lots of partition for better throughput. Creating more partition for a topic is directly dependent on available threads and disk.
num.io.threads: Setting value for I/O threads directly depends on how much disk you have in your cluster. These threads are used by server for executing request. We should have at least as many threads as we have disks.
Apart from these configurations there are few other factors which I have mentioned earlier like batch size and sync/async mode of message transfer.
When we think about batch size its always confusing what batch size will be optimal. Large batch size may be great to have high throughput but you might feel latency issue in that. So, we can conclude that latency and throughput is inversely proportional to each other.
But there are ways to have low latency with high throughput where we have to choose a proper batch-size. We can also use queue-time or refresh-interval to find the required right balance.
Written By – Subhod Lagde
Interesting? Please subscribe to our blogs at www.dataottam.com, to keep you trendy and for future reads on Big Data, Analytics, and IoT.
As, always please feel free to suggest or comment email@example.com.