September 30
Exploring Apache Kafka
today. I got a need to check how commits work in kafka.
By default they use auto commit which commits the offsets after particular number of seconds regularly. We can change this behaviour and take control of this behaviour.
But we need to make sure how we commit. There are 2 ways, sync
and async
. One is blocking and another is non-blocking. We should combine both to fit our needs. We also need to keep in mind about rebalances and consumer restarts.
This went into a rabbit hole of various articles and code bases. I’ve learned a lot anyways. I’ll probably dump all of the articles and code base links here. So that I won’t miss anything or at least it’ll take the load off my mind.
- https://github.com/confluentinc/confluent-kafka-python/blob/master/src/confluent_kafka/src/Consumer.c#L976
- https://docs.confluent.io/platform/current/clients/confluent-kafka-python/html/index.html#
- https://kafka.apache.org/intro
- https://docs.confluent.io/kafka-clients/python/current/overview.html
- https://medium.com/@rramiz.rraza/kafka-programming-different-ways-to-commit-offsets-7bcd179b225a
- https://stackoverflow.com/questions/67910320/what-is-the-impact-if-delay-kafka-manual-commit-offset/67917359#67917359
- https://strimzi.io/blog/2021/01/07/consumer-tuning/
- https://kafka.apache.org/documentation/#gettingStarted
- https://strimzi.io/blog/2023/11/09/kafka-consumer-client-essentials/
- https://riptutorial.com/apache-kafka/example/19389/how-to-commit-offsets
- https://docs.confluent.io/platform/current/clients/consumer.html#message-handling
- https://www.confluent.io/resources/?language=english&technology=apache-kafka
- https://docs.confluent.io/
- https://developer.confluent.io/tutorials/kafka-console-consumer-producer-basics/kafka.html
- https://developer.confluent.io/tutorials/#learn-the-basics
- https://www.confluent.io/blog/apache-kafka-data-access-semantics-consumers-and-membership/
- https://www.confluent.io/blog/kafka-producer-internals-handling-producer-request/
- https://github.com/apache/kafka/blob/2.2/clients/src/main/java/org/apache/kafka/common/requests/FetchRequest.java
- https://www.confluent.io/blog/how-to-send-messages-with-librdkafka/#processing-replies
- https://github.com/confluentinc/librdkafka/tree/master/tests/tools/stats
- https://github.com/confluentinc/librdkafka
- https://developer.confluent.io/get-started/python/#where-next
- https://developer.confluent.io/courses/kafka-python/beyond-simple-python-apps/
- https://github.com/confluentinc/confluent-kafka-python/blob/master/examples/json_consumer.py
- https://docs.confluent.io/platform/current/clients/confluent-kafka-python/html/index.html#pythonclient-configuration
- https://github.com/confluentinc/librdkafka/blob/master/CONFIGURATION.md
I experimented with session.timeout.ms
and max.poll.interval.ms
but only able to trigger rebalance with max.poll.interval.ms
not with session.timeout.ms
, I need to check how those heartbeats work. How the main thread functioning? that part I need to explore.
I also found out about the debug
and logger
configs that we can pass in in the conf
object.
I’ll probably compile a TIL
and a blog if possible about my learnings.
In future I’ll probably train a model to extract these links for me from my blog posts? Let’s see.
I’ll explore about partition revocation callback, stats printing callback, etc. tomorrow.