Apache Kafka Consumer Poll Operation

May 2, 2023 | by Arround The Web | No comments

Apache Kafka Consumer Poll Operation

Kafka is a free, open-source distributed streaming platform which allows us to publish and subscribe to streams of records in near real-time.

A Kafka Consumer, on the other hand, refers to a client or client applications that can subscribe and read the data from the Kafka topics.

At the heart of every Kafka application is a poll() operation. For example, a poll operation in a Kafka consumer application is responsible to retrieve the records from Kafka brokers and then deliver them to the consumer application to be processed.

Kafka Consumer Poll Actions

The following steps can be used to describe how a Kafka consumer poll operation behaves internally:

Once a Kafka consumer starts, it automatically subscribes to a Kafka topic. This can be single or multiple topics and partitions. Each partition is assigned to only one consumer within the group.
Next, the consumer uses the poll() method to fetch the messages from the assigned partitions.
Once the poll() method receives the batch of messages (if any are available), it returns them to the consumer for processing. The size of the message batch is determined by the max.poll.records parameter in the configuration file. By default, the max batch messages are set to 500 records.
Once the batch messages are returned by the consumer’s poll() method, they are stored in an internal buffer in the consumer application.
The consumer then processes the messages that are stored in the buffer and may commit the offset for the processed messages using the commitSync() or commitAsync() methods.
If the internal buffer of the consumer is emptied before the next poll() operation, the consumer blocks the buffer until new messages from a poll() are available. The block duration is specified in the poll call. By default, Kafka uses a value of 500 milliseconds.
However, if the consumer exceeds the allocated time before calling the poll() method, Kafka considers the consumer inactive and automatically removes the consumer from the group.
If a consumer leaves the group, its partitions are reassigned to the remaining consumers. The group coordinator does the reassignment and can take up to max.poll.interval.ms (default 5 minutes) to complete.
When a consumer rejoins the group, it resumes fetching messages from its assigned partitions from the last committed offset.

The following image shows the basics of a consumer poll operation within Kafka:

Source: Conduktor.io

Kafka Consumer Poll Parameters

Various parameters can be used to govern the behavior and operation of a Kafka consumer poll operation. The most important ones include the following:

max.poll.records – This parameter determines the maximum number of messages that can be fetched in a single poll operation. By default, the value is 500 messages.

poll.timeout.ms – This parameter governs the timeout for a poll() operation in milliseconds. The poll() operation returns an empty batch if no messages are available before the timeout.

session.timeout.ms – This parameter determines the maximum time that a consumer can be inactive before it’s considered dead and removed from the consumer group. A lower value can result in a more frequent rebalancing of partitions among the consumers in the group, while a higher value can reduce a rebalancing frequency but increases the risk of slow consumer failures going unnoticed.

heartbeat.interval.ms – This parameter is used to specify the frequency at which the consumer sends heartbeats to the Kafka broker. The heartbeats tell the broker that the consumer is still alive. The default value is set to 3 heartbeats. It is good to keep in mind that a lower value can increase the responsiveness of the consumer to rebalancing events, while a higher value can reduce the overhead of sending heartbeats.

max.poll.interval.ms – This parameter defines the maximum time between two poll() operations. The default value is set to 5 minutes.

max.partition.fetch.bytes – This parameter determines the maximum amount of data that is fetched from a single partition in a poll(). The default value is 1 MB. If a single message is larger than the set value, Kafka skips it and returns to RecordTooLargeException.

The provided parameters are some functional parameters that are used to govern the behavior and performance of a Kafka consumer poll operation.

Conclusion

We explored what a Kafka consumer poll operation is, what it does, and how it works. Next, you can check the other articles to discover how to tune your Kafka consumer client’s poll() operation.

Source: linuxhint.com