Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: FNV1a 32-Bit Partitioner #141

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

wbarnha
Copy link
Owner

@wbarnha wbarnha commented Mar 8, 2024

Summary

This PR introduces FNV1-a 32-bit hasher as a new partitioner. The tests confirmed that this implementation yields the same hash and partition selection as Goka/Sarama.

Why

Kafka leaves the partition selection up to the producer. This creates a potential discrepancy issue when different libraries are used. As different libraries across different programming languages may use different hashers, or different configurations for them, the messages may arrive at partitions different than expected. When different hashers or different configurations are used, the message with the same key may arrive at an unexpected partition.

This was the case for us at Beat. We realized that messages that are produced by Python services using Kafka-python arrive at partitions that are different than what consumer services expect. We realized that this is because Kafka-python uses murmur2 as the default partitioner, while Goka library uses Sarama which uses FNV1a-32.

What

This PR introduces a new partitioner that is based on FNV1a 32-bit. It is a separate and isolated implementation. Therefore users can select using the default one, or this one easily.

The partitioner is designed to match the partitioner of Goka/Sarama. It uses the "a" variant of FNV based on 32-bit. It also utilizes twos-complement decimal conversions, and the same parameters used for mentioned libraries. As a result, with this partitioner, python-kafka calculates the same hash and selects the same partition as Goka/Sarama does. In our tests, we experimentally proved that this implementation outputs the same hash and partitioner as Goka/Sarama does.

Who

I worked together on this change with Evgenia Martynova at Beat. Beat is a ride-hailing company that has engineering hubs in Athens and Amsterdam. We are always on the look for great engineers! We are hiring!


This change is Reviewable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants