Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR introduces FNV1-a 32-bit hasher as a new partitioner. The tests confirmed that this implementation yields the same hash and partition selection as Goka/Sarama.
Why
Kafka leaves the partition selection up to the producer. This creates a potential discrepancy issue when different libraries are used. As different libraries across different programming languages may use different hashers, or different configurations for them, the messages may arrive at partitions different than expected. When different hashers or different configurations are used, the message with the same key may arrive at an unexpected partition.
This was the case for us at Beat. We realized that messages that are produced by Python services using Kafka-python arrive at partitions that are different than what consumer services expect. We realized that this is because Kafka-python uses murmur2 as the default partitioner, while Goka library uses Sarama which uses FNV1a-32.
What
This PR introduces a new partitioner that is based on FNV1a 32-bit. It is a separate and isolated implementation. Therefore users can select using the default one, or this one easily.
The partitioner is designed to match the partitioner of Goka/Sarama. It uses the "a" variant of FNV based on 32-bit. It also utilizes twos-complement decimal conversions, and the same parameters used for mentioned libraries. As a result, with this partitioner, python-kafka calculates the same hash and selects the same partition as Goka/Sarama does. In our tests, we experimentally proved that this implementation outputs the same hash and partitioner as Goka/Sarama does.
Who
I worked together on this change with Evgenia Martynova at Beat. Beat is a ride-hailing company that has engineering hubs in Athens and Amsterdam. We are always on the look for great engineers! We are hiring!
This change is