apache-kafka Consumer Offset Management and Fault-Tolerance


KafkaConsumers request messages from a Kafka broker via a call to poll() and their progress is tracked via offsets. Each message within each partition of each topic, has a so-called offset assigned—its logical sequence number within the partition. A KafkaConsumer tracks its current offset for each partition that is assigned to it. Pay attention, that the Kafka brokers are not aware of the current offsets of the consumers. Thus, on poll() the consumer needs to send its current offsets to the broker, such that the broker can return the corresponding messages, i.e,. messages with a larger consecutive offset. For example, let us assume we have a single partition topic and a single consumer with current offset 5. On poll() the consumer sends if offset to the broker and the broker return messages for offsets 6,7,8,...

Because consumers track their offsets themselves, this information could get lost if a consumer fails. Thus, offsets must be stored reliably, such that on restart, a consumer can pick up its old offset and resumer where it left of. In Kafka, there is built-in support for this via offset commits. The new KafkaConsumer can commit its current offset to Kafka and Kafka stores those offsets in a special topic called __consumer_offsets. Storing the offsets within a Kafka topic is not just fault-tolerant, but allows to reassign partitions to other consumers during a rebalance, too. Because all consumers of a Consumer Group can access all committed offsets of all partitions, on rebalance, a consumer that gets a new partition assigned just reads the committed offset of this partition from the __consumer_offsets topic and resumes where the old consumer left of.