KAFKA-17747: [2/N] Add compute topic and group hash (wip) #2

arvi18 · 2025-04-21T06:43:04Z

Add guava to dependencies.
Add computeTopicHash and computeGroupHash functions to Group
class.
Add related unit test.

Summary by CodeRabbit

New Features
- Introduced deterministic hashing methods for topics and groups, enabling consistent hash computation based on topic metadata and cluster configuration.
Tests
- Added comprehensive unit tests to verify the correctness and sensitivity of the new hashing logic for topics and groups.
Chores
- Added Guava as a new dependency to support hashing functionality.

Signed-off-by: PoAn Yang <[email protected]>

coderabbitai · 2025-04-21T06:43:11Z

Walkthrough

The changes introduce deterministic hashing methods for topics and groups in the Group interface, utilizing Guava's hashing utilities. Guava is added as a dependency to the project, and the import control configuration is updated to allow its use. New unit tests are implemented to verify the correctness and sensitivity of the hashing logic to topic and group metadata. Dependency management files are updated to include the Guava library and its version.

Changes

File(s)	Change Summary
build.gradle, gradle/dependencies.gradle	Added Guava (`com.google.guava:guava:33.4.0-jre`) as a dependency and registered its version in the dependency map.
checkstyle/import-control-group-coordinator.xml	Allowed imports from `com.google.common.hash` for the group coordinator package.
group-coordinator/src/main/java/org/apache/kafka/coordinator/group/Group.java	Added two static methods: `computeGroupHash` and `computeTopicHash` for deterministic hashing of topics and groups.
group-coordinator/src/test/java/org/apache/kafka/coordinator/group/GroupTest.java	Introduced a new test class with unit tests for the new hashing methods, verifying correctness and order sensitivity.

Sequence Diagram(s)

sequenceDiagram
    participant Test as GroupTest
    participant Group as Group (static methods)
    participant Guava as Guava Hashing
    participant Metadata as MetadataImage/ClusterImage

    Test->>Group: computeTopicHash(topicImage, clusterImage)
    Group->>Metadata: Retrieve topic and partition metadata
    Group->>Guava: Hash topic metadata, partition info, rack assignments
    Guava-->>Group: Return computed hash
    Group-->>Test: Return topic hash

    Test->>Group: computeGroupHash(topicHashes)
    Group->>Guava: Combine topic hashes in order
    Guava-->>Group: Return combined group hash
    Group-->>Test: Return group hash

Poem

In the warren of code, a new hash is born,
Guava hops in, dependencies adorn.
Topics and groups now hashed with care,
With racks and partitions all laid bare.
Tests nibble at logic, ensuring it's right—
Deterministic hashes, a rabbit’s delight!
🐇✨

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

visz11 · 2025-04-21T11:17:07Z

@coderabbitai full review

coderabbitai · 2025-04-21T11:17:11Z

✅ Actions performed

Full review triggered.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (6)

group-coordinator/src/test/java/org/apache/kafka/coordinator/group/GroupTest.java (6)
41-41: Consider using a fixed UUID instead of random for deterministic tests

Using Uuid.randomUuid() as a constant might introduce non-deterministic behavior in tests across different runs. For unit tests, it's generally better to use a fixed, predefined UUID to ensure consistent and reproducible test results.
-    private static final Uuid FOO_TOPIC_ID = Uuid.randomUuid();
+    private static final Uuid FOO_TOPIC_ID = Uuid.fromString("123e4567-e89b-12d3-a456-426614174000");
45-48: Missing documentation for MetadataImageBuilder

The MetadataImageBuilder appears to be a custom test helper class, but there's no documentation explaining its purpose or how it works. Consider adding a class-level comment or a reference to where this builder is defined to improve code maintainability.

56-56: Add documentation for the magic byte

The purpose of the magic byte used in the hash function is not explained. Consider adding a comment explaining what this value represents and why it's used in the hash computation.
-            .putByte((byte) 0) // magic byte
+            .putByte((byte) 0) // magic byte - version identifier for the hash algorithm
50-65: Document hash function sensitivity in test method

The test verifies the topic hash function but doesn't clearly document what properties are being validated. Consider adding a descriptive comment explaining that the test is verifying the hash function's correctness and its sensitivity to topic metadata.
    @Test
    void testComputeTopicHash() {
+        // Verify that topic hash computation is correct and sensitive to topic ID, name, 
+        // partition count, partition order, and rack assignments
        long result = Group.computeTopicHash(FOO_METADATA_IMAGE.topics().getTopic(FOO_TOPIC_ID), FOO_METADATA_IMAGE.cluster());
162-174: Document group hash function purpose

The test for computeGroupHash could benefit from a comment explaining the purpose of this function and why it's important for it to be order-sensitive with respect to the input mappings.
    @Test
    void testComputeGroupHash() {
+        // Verify that group hash computation correctly combines topic offsets in a deterministic order
        long result = Group.computeGroupHash(Map.of(
            BAR_TOPIC_NAME, 123L,
            FOO_TOPIC_NAME, 456L
        ));
176-188: Verify order sensitivity explicitly in test name and comments

The test confirms that computeGroupHash is order-sensitive, which is an important property. Consider making this more explicit in the test name and comments for better clarity.
-    void testComputeGroupHashWithDifferentOrder() {
+    void testComputeGroupHashIsOrderSensitive() {
+        // Verify that the group hash result changes when the order of hash inputs changes,
+        // confirming that the implementation is correctly order-sensitive
        long result = Group.computeGroupHash(Map.of(
            BAR_TOPIC_NAME, 123L,
            FOO_TOPIC_NAME, 456L
        ));

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ed79212 and 36cb999.

📒 Files selected for processing (5)

build.gradle (1 hunks)
checkstyle/import-control-group-coordinator.xml (1 hunks)
gradle/dependencies.gradle (2 hunks)
group-coordinator/src/main/java/org/apache/kafka/coordinator/group/Group.java (2 hunks)
group-coordinator/src/test/java/org/apache/kafka/coordinator/group/GroupTest.java (1 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (1)

group-coordinator/src/main/java/org/apache/kafka/coordinator/group/Group.java (2)

metadata/src/main/java/org/apache/kafka/image/ClusterImage.java (1)

ClusterImage (36-108)

metadata/src/main/java/org/apache/kafka/metadata/BrokerRegistration.java (1)

BrokerRegistration (44-398)

🔇 Additional comments (9)

gradle/dependencies.gradle (2)

64-64: Guava dependency version looks appropriate.

Adding Guava 33.4.0-jre for the hashing functionality is a good choice. This is a recent stable version with good performance and reliability.

151-151: Maven coordinate for Guava properly defined.

The dependency definition follows the project's standard pattern for library dependencies.

build.gradle (1)

1423-1423: Guava dependency correctly added to group-coordinator module.

The implementation dependency is properly added to the group-coordinator module to support the new hashing functionality.

checkstyle/import-control-group-coordinator.xml (1)

80-80: Appropriate import allowance for Guava hash package.

The import control rule for allowing com.google.common.hash is properly added, which is needed for the new hashing functionality in the Group interface.
group-coordinator/src/main/java/org/apache/kafka/coordinator/group/Group.java (3)

22-36: Required imports properly added.

The imports for Kafka metadata classes and Guava hash utilities are correctly added to support the new hashing functionality.

223-238: Well-implemented group hash computation method.

The computeGroupHash method correctly creates a deterministic hash by:

Sorting topic entries by name

Converting long hash values to HashCode objects

Using Guava's Hashing.combineOrdered to ensure order-sensitivity

This approach ensures consistent results regardless of input map iteration order.

239-267:
✅ Verification successful

Comprehensive topic hash computation with appropriate handling of metadata.

The computeTopicHash method creates a deterministic Murmur3 hash that includes:

Topic ID and name

Number of partitions

Partition IDs (sorted)

Rack information (sorted)

The implementation correctly handles null values, empty optionals, and uses proper sorting to ensure deterministic results.

One recommendation: consider adding a unit test that verifies hash stability across JVM restarts, as this will be important for consistent behavior in a distributed environment.

🏁 Script executed:
#!/bin/bash
# Check if there are comprehensive tests for the hash functions
rg -A 5 "test.*Hash" group-coordinator/src/test/java/
Length of output: 14144
Implementation and tests are comprehensive

The computeTopicHash method correctly hashes topic ID, name, partition count, sorted partition IDs, and sorted rack lists. Stability and determinism are already verified in group-coordinator/src/test/java/org/apache/kafka/coordinator/group/GroupTest.java (varying magic byte, partition order, rack order, and field differences). No additional tests are needed.
group-coordinator/src/test/java/org/apache/kafka/coordinator/group/GroupTest.java (2)

119-131: Well-structured parameterized test

The parameterized test design effectively tests different scenarios that should result in different hash values. This is a good practice as it systematically verifies the hash function's sensitivity to changes in various fields.

1-189:
✅ Verification successful

Request verification of hash algorithm implementation

The tests verify the behavior of computeTopicHash and computeGroupHash methods, but without seeing the implementation in the Group class, it's hard to confirm if these tests align with the actual implementation. Consider checking the implementation details to ensure complete coverage.

🏁 Script executed:
#!/bin/bash
# Check the implementation of the hash methods in the Group class
fd -t f "Group.java" --exec cat {}
Length of output: 91794

🏁 Script executed:
#!/bin/bash
# Find the implementations of computeTopicHash and computeGroupHash
rg -n "computeTopicHash" -t java
rg -n "computeGroupHash" -t java
Length of output: 1847

🏁 Script executed:
#!/bin/bash
# Display computeGroupHash implementation
sed -n '220,260p' group-coordinator/src/main/java/org/apache/kafka/coordinator/group/Group.java

# Display computeTopicHash implementation
sed -n '260,320p' group-coordinator/src/main/java/org/apache/kafka/coordinator/group/Group.java
Length of output: 2197
computeTopicHash and computeGroupHash tests aligned with implementation
I’ve reviewed both methods in Group.java and confirmed the tests cover:

The magic byte and all topic fields (ID, name, partition count)

Partition ordering via sorted entries

Rack ordering via sorted rack list

Group hash ordering via sorted topic names

No further changes are needed.

visz11 · 2025-04-22T11:53:48Z

@coderabbitai full review

coderabbitai · 2025-04-22T11:53:54Z

✅ Actions performed

Full review triggered.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (4)

group-coordinator/src/test/java/org/apache/kafka/coordinator/group/GroupTest.java (4)
41-41: Consider using a fixed UUID for deterministic tests.

Using Uuid.randomUuid() could potentially make tests non-deterministic across test runs. While it doesn't affect the current tests since you're comparing computations with the same UUID within each test, it's generally better practice to use fixed values in unit tests.
-    private static final Uuid FOO_TOPIC_ID = Uuid.randomUuid();
+    private static final Uuid FOO_TOPIC_ID = Uuid.fromString("123e4567-e89b-12d3-a456-426614174000");
56-56: Extract magic byte as a constant.

The magic byte (byte) 0 appears to be a significant value in the hash computation. Consider extracting it as a constant for better maintainability and to avoid magic numbers.
+    private static final byte TOPIC_HASH_MAGIC_BYTE = 0;
     
     // Then in the test method:
-            .putByte((byte) 0) // magic byte
+            .putByte(TOPIC_HASH_MAGIC_BYTE) // magic byte
45-48: Add edge case tests for the hash functions.

Consider adding test cases for edge scenarios:

Empty topics (0 partitions)

Topics with many partitions

Empty rack assignments

Null or empty group names

162-188: Enhance group hash tests with more variations.

The tests for computeGroupHash are more limited compared to those for computeTopicHash. Consider adding tests for:

Empty maps

Maps with a single entry

Maps with many entries

Maps with duplicate hash values

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Lite

📥 Commits

Reviewing files that changed from the base of the PR and between ed79212 and 36cb999.

📒 Files selected for processing (5)

build.gradle (1 hunks)
checkstyle/import-control-group-coordinator.xml (1 hunks)
gradle/dependencies.gradle (2 hunks)
group-coordinator/src/main/java/org/apache/kafka/coordinator/group/Group.java (2 hunks)
group-coordinator/src/test/java/org/apache/kafka/coordinator/group/GroupTest.java (1 hunks)

🔇 Additional comments (10)

gradle/dependencies.gradle (2)

64-64: Adding Guava dependency version is appropriate.

The addition of Guava version 33.4.0-jre is correctly defined in the versions map, following the project's dependency management pattern. This is the latest stable version of Guava as of early 2024.

151-151: Guava dependency entry correctly defined.

The Guava library is properly added with the Maven coordinate referencing the version variable. This follows the project's standard pattern for library declarations.

checkstyle/import-control-group-coordinator.xml (1)

80-80: Well-scoped import allowance for Guava hashing.

The import control is correctly updated to allow only the specific com.google.common.hash package rather than the entire Guava library. This follows good security practices by limiting the scope of allowed external dependencies.

build.gradle (1)

1423-1423: Properly scoped Guava dependency.

Guava is correctly added as an implementation dependency to the :group-coordinator project, which is appropriate since the hashing utilities are used internally and not exposed through the API.

group-coordinator/src/main/java/org/apache/kafka/coordinator/group/Group.java (4)

22-24: Appropriate imports for new functionality.

The imports for ClusterImage, TopicImage, BrokerRegistration, Guava hashing utilities, and standard Java utilities are correctly added to support the new hashing methods. The imports are organized following project conventions.

Also applies to: 26-32, 36-36

223-237: Well-implemented deterministic group hash computation.

The computeGroupHash method provides a deterministic way to hash multiple topics within a group. Good practices are used:

Deterministic ordering of topic entries by sorting by name

Using Guava's combineOrdered to ensure hash consistency

Clean functional style with streams

This implementation will produce consistent hash values as long as the input topic hashes are consistent.

239-267: Thorough topic hash implementation with comprehensive metadata coverage.

The computeTopicHash method effectively captures all critical aspects of a topic:

Uses Murmur3 (strong hash algorithm with good distribution)

Includes magic byte, topic ID, name, and partition count

Processes partitions in a deterministic order

Incorporates rack awareness by including sorted rack IDs

The deterministic ordering of partitions and racks ensures consistent hashing regardless of order in which metadata is received.

223-267: Verify the hash computation consistency across clusters.

While the implementation is sound, it's important to verify that these hash computations remain stable across different Kafka cluster deployments and versions. The reliance on rack information could potentially cause hash changes when brokers are replaced or racks are reconfigured.

Consider how the hash values might change during operations like broker replacements, rack reconfigurations, or rolling updates. If hash stability is critical for your use case, ensure you have tests covering these scenarios.
group-coordinator/src/test/java/org/apache/kafka/coordinator/group/GroupTest.java (2)

1-189: Well-structured comprehensive test suite for the new hashing functionality.

The test cases effectively verify the behavior of the new computeTopicHash and computeGroupHash functions with good coverage of variations. You've included tests for different magic bytes, partition orders, rack orders, and field variations, which will ensure the hashing functions are deterministic and sensitive to all relevant inputs.

The use of parameterized tests with MethodSource is an excellent approach for testing multiple variations efficiently.

162-174:
✅ Verification successful

Ensure order consistency in topic name map.

The test assumes that Map.of(BAR_TOPIC_NAME, 123L, FOO_TOPIC_NAME, 456L) will always maintain the same iteration order. While this is true for small maps in recent Java versions, it's worth ensuring order consistency by using an ordered map implementation or sorting the keys before hashing.

🏁 Script executed:
#!/bin/bash
# Check if Group.computeGroupHash sorts the map entries or uses a specific ordered map implementation
rg -A 10 "computeGroupHash" --type java | grep -E "(TreeMap|LinkedHashMap|sort)"
Length of output: 210
ComputeGroupHash already enforces key ordering

The implementation of Group.computeGroupHash uses
.entries()
  .stream()
  .sorted(Map.Entry.comparingByKey())
to ensure a consistent iteration order before hashing. The existing test using Map.of(...) is therefore deterministic and does not require switching to an ordered map or additional sorting.

visz11 · 2025-04-24T09:13:55Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces deterministic hashing methods for topics and groups, along with related unit tests and a new Guava dependency. The changes seem well-structured and include comprehensive unit tests. However, there are a few areas that could be improved for clarity and efficiency.

Summary of Findings

Magic Byte in computeTopicHash: The magic byte in computeTopicHash is not explained. It would be beneficial to add a comment explaining its purpose and potential values.
Potential NullPointerException in computeTopicHash: The computeTopicHash method could potentially throw a NullPointerException if clusterImage::broker returns null and Optional::get is called on an empty Optional. Consider adding a check to prevent this.
Inconsistent Hashing: The computeTopicHash method uses topicImage.id().hashCode() which may not be consistent across different JVMs. Consider using a consistent hashing method for the topic ID.

Merge Readiness

The pull request introduces important hashing functionality and includes thorough unit tests. However, the potential for a NullPointerException in computeTopicHash and the inconsistent hashing of topic IDs should be addressed before merging. I am unable to approve this pull request, and recommend that others review and approve this code before merging. Given the high severity issue, I recommend that the pull request not be merged until it is addressed.

arvi18 · 2025-04-28T12:05:56Z

/codehelper review

devd-pr-reviewer · 2025-04-28T12:06:51Z