Skip to content

KAFKA-17871: avoid blocking the herder thread when producer flushing hangs #17

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 37 commits into
base: trunk
Choose a base branch
from

Conversation

arvi18
Copy link

@arvi18 arvi18 commented Apr 26, 2025

The call to backingStore.get() (called by connector task threads through OffsetStorageReaderImpl.offsets()) can block for long time waiting for data flush to complete (KafkaProducer.flush()).

This change moves that call outside the synchronized clause that holds offsetReadFutures, so that if backingStore.get() hangs then it does not keep offsetReadFutures locked. The access to closed flag (closed.get()) is kept inside the synchronize clause to avoid race condition with close().

This is important because OffsetStorageReaderImpl.close() needs to lock offsetReadFutures as well in order to cancel the futures.
Since the herder thread calls OffsetStorageReaderImpl.close() when attempting to stops a task, before this change this was resulting in the herder thread hanging indefinetely waiting for backingStore.get() to complete.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

apalan60 and others added 30 commits April 21, 2025 15:35
…9521)

This patch addresses issue apache#19516 and corrects a typo in
`ApiKeyVersionsProvider`: when `toVersion` exceeds  `latestVersion`, the
`IllegalArgumentException` message was erroneously formatted with
`fromVersion`. The format argument has been updated to use `toVersion`
so that the error message reports the correct value.

Reviewers: Ken Huang <[email protected]>, PoAn Yang
 <[email protected]>, Jhen-Yung Hsu <[email protected]>, Chia-Ping
 Tsai <[email protected]>
The check for `scheduler.pendingTaskSize()` may fail if the thread pool
is too slow to consume the runnable objects

Reviewers: Ken Huang <[email protected]>, PoAn Yang
 <[email protected]>, Chia-Ping Tsai <[email protected]>
…#17099)

Two sets of tests are added:
1. KafkaProducerTest
- when send success, both record.headers() and onAcknowledgement headers
are read only
- when send failure, record.headers() is writable as before and
onAcknowledgement headers is read only
2. ProducerInterceptorsTest
- make both old and new onAcknowledgement method are called successfully

Reviewers: Lianet Magrans <[email protected]>, Omnia Ibrahim
<[email protected]>, Matthias J. Sax <[email protected]>,
Andrew Schofield <[email protected]>, Chia-Ping Tsai
<[email protected]>
…pache#19437)

This PR adds the support for remote storage fetch for share groups.

There is a limitation in remote storage fetch for consumer groups that
we can only perform remote fetch for a single topic partition in a fetch
request. Since, the logic of share fetch requests is largely based on
how consumer groups work, we are following similar logic in implementing 
remote storage fetch. However, this problem should be addressed as 
part of KAFKA-19133 which should help us perform fetch for multiple 
remote fetch topic partition in a single share fetch request.

Reviewers: Jun Rao <[email protected]>
The release script was pushing the RC tag off of a temporary branch that
was never merged back into the release branch. This meant that our RC
and release tags were detached from the rest of the repository.

This patch changes the release script to merge the RC tag back into the
release branch and pushes both the tag and the branch.

Reviewers: Luke Chen <[email protected]>
This PR removes the unstable API flag for the KIP-932 RPCs.

The 4 RPCs which were exposed for the early access release in AK 4.0 are
stabilised at v1. This is because the RPCs have evolved over time and AK
4.0 clients are not compatible with AK 4.1 brokers. By stabilising at
v1, the API version checks prevent incompatible communication and
server-side exceptions when trying to parse the requests from the older
clients.

Reviewers: Apoorv Mittal <[email protected]>
…19500)

Currently the share session cache is desgined like the fetch session
cache. If the cache is full and a new share session is trying to get get
initialized, then the sessions which haven't been touched for more than
2minutes are evicted. This wouldn't be right for share sessions as the
members also hold locks on the acquired records, and session eviction
would mean theose locks will need to be dropped and the corresponding
records re-delivered. This PR removes the time based eviction logic for
share sessions.

Refer: [KAFKA-19159](https://issues.apache.org/jira/browse/KAFKA-19159)

Reviewers: Apoorv Mittal <[email protected]>, Chia-Ping Tsai <[email protected]>
Small improvements to share consumer javadoc.

Reviewers: Apoorv Mittal <[email protected]>
Updated the Kafka Streams documentation to include metrics for tasks,
process nodes, and threads that were missing. I was unable to find
metrics such as stream-state-metrics, client-metrics,
state-store-metrics, and record-cache-metrics in the codebase, so they
are not included in this update.

Reviewers: Bill Bejeck <[email protected]>
…ache#19416)

This change implements upgrading the kraft version from 0 to 1 in existing clusters.
Previously, clusters were formatted with either version 0 or version 1, and could not
be moved between them.

The kraft version for the cluster metadata partition is recorded using the
KRaftVersion control record. If there is no KRaftVersion control record
the default kraft version is 0.

The kraft version is upgraded using the UpdateFeatures RPC. These RPCs
are handled by the QuorumController and FeatureControlManager. This
change adds special handling in the FeatureControlManager so that
upgrades to the kraft.version are directed to
RaftClient#upgradeKRaftVersion.

To allow the FeatureControlManager to call
RaftClient#upgradeKRaftVersion is a non-blocking fashion, the kraft
version upgrade uses optimistic locking. The call to
RaftClient#upgradeKRaftVersion does validations of the version change.
If the validations succeeds, it generates the necessary control records
and adds them to the BatchAccumulator.

Before the kraft version can be upgraded to version 1, all of the
brokers and controllers in the cluster need to support kraft version 1.
The check that all brokers support kraft version 1 is done by the
FeatureControlManager. The check that all of the controllers support
kraft version is done by KafkaRaftClient and LeaderState.

When the kraft version is 0, the kraft leader starts by assuming that
all voters do not support kraft version 1. The leader discovers which
voters support kraft version 1 through the UpdateRaftVoter RPC. The
KRaft leader handles UpdateRaftVoter RPCs by storing the updated
information in-memory until the kraft version is upgraded to version 1.
This state is stored in LeaderState and contains the latest directory
id, endpoints and supported kraft version for each voter.

Only when the KRaft leader has received an UpdateRaftVoter RPC from all
of the voters will it allow the upgrade from kraft.version 0 to 1.

Reviewers: Alyssa Huang <[email protected]>, Colin P. McCabe <[email protected]>
This patch extends the OffsetCommit API to support topic ids. From
version 10 of the API, topic ids must be used. Originally, we wanted to
support both using topic ids and topic names from version 10 but it
turns out that it makes everything more complicated. Hence we propose to
only support topic ids from version 10. Clients which only support using
topic names can either lookup the topic ids using the Metadata API or
stay on using an earlier version.

The patch only contains the server side changes and it keeps the version
10 as unstable for now. We will mark the version as stable when the
client side changes are merged in.

Reviewers: Lianet Magrans <[email protected]>, PoAn Yang <[email protected]>
…a result of change in assignor algorithm (apache#19541)

The system test `ShareConsumerTest.test_share_multiple_partitions`
started failing because of the recent change in the SimpleAssignor
algorithm. The tests assumed that if a share group is subscribed to a
topic, then every share consumers part of the group will be assigned all
partitions of the topic. But that does not happen now, and partitions
are split between the share consumers in certain cases, in which some
partitions are only assigned to a subset of share consumers. This change
removes that assumption

Reviewers: PoAn Yang <[email protected]>, Andrew Schofield <[email protected]>
…ionCache (apache#19505)

This PR removes the group.share.max.groups config. This config was used
to calculate the maximum size of share session cache. But with the new
config group.share.max.share.sessions in place with exactly this
purpose, the ShareSessionCache initialization has also been passed the
new config.

Refer: [KAFKA-19156](https://issues.apache.org/jira/browse/KAFKA-19156)

Reviewers: Apoorv Mittal <[email protected]>, Andrew Schofield <[email protected]>, Chia-Ping Tsai <[email protected]>
…ache#19443)

* There could be scenarios where share partition records in
`__share_group_state` internal topic are not updated for a while
implying these partitions are basically cold.
* In this situation, the presence of these holds back the
pruner from keeping the topic clean and of manageable size.
* To remedy the situation, we have added a periodic
`setupSnapshotColdPartitions` in `ShareCoordinatorService` which does a
writeAll operation on the associated shards in the coordinator and
forces snapshot creation for any cold partitions. In this way the pruner
can continue.
This job has been added as a timer task.
* A new internal config
`share.coordinator.cold.partition.snapshot.interval.ms` has been
introduced to set the period of the job.
* Any failures are logged and ignored.
* New tests have been added to verify the feature.

Reviewers: PoAn Yang <[email protected]>, Andrew Schofield <[email protected]>
Improves a variable name and handling of an Optional.

Reviewers: Bill Bejeck <[email protected]>, Chia-Ping Tsai <[email protected]>, PoAn Yang <[email protected]>
…pache#19440)

Introduces a concrete subclass of `KafkaThread` named `SenderThread`.
The poisoning of the TransactionManager on invalid state changes is
determined by looking at the type of the current thread.

Reviewers: Chia-Ping Tsai <[email protected]>
…pache#19457)

- Construct `AsyncKafkaConsumer` constructor and verify that the
`RequestManagers.supplier()` contains Streams-specific data structures.
- Verify that `RequestManagers` constructs the Streams request managers
correctly
- Test `StreamsGroupHeartbeatManager#resetPollTimer()`
- Test `StreamsOnTasksRevokedCallbackCompletedEvent`,
`StreamsOnTasksAssignedCallbackCompletedEvent`, and
`StreamsOnAllTasksLostCallbackCompletedEvent` in
`ApplicationEventProcessor`
- Test `DefaultStreamsRebalanceListener`
- Test `StreamThread`.
  - Test `handleStreamsRebalanceData`.
  - Test `StreamsRebalanceData`.

Reviewers: Lucas Brutschy <[email protected]>, Bill Bejeck <[email protected]>
Signed-off-by: PoAn Yang <[email protected]>
…he#19547)

Change the log messages which used to warn that KIP-932 was an Early
Access feature to say that it is now a Preview feature. This will make
the broker logs far less noisy when share groups are enabled.

Reviewers: Apoorv Mittal <[email protected]>
The generated response data classes take Readable as input to parse the
Response. However, the associated response objects take ByteBuffer as
input and thus convert them to Readable using `new ByteBufferAccessor`
call.

This PR changes the parse method of all the response classes to take the
Readable interface instead so that no such conversion is needed.

To support parsing the ApiVersionsResponse twice for different version
this change adds the "slice" method to the Readable interface.

Reviewers: José Armando García Sancio <[email protected]>, Truc Nguyen
<[[email protected]](mailto:[email protected])>, Aadithya
Chandra <[[email protected]](mailto:[email protected])>
…#19549)

The heartbeat logic for share groups is tricky when the set of
topic-partitions eligible for assignment changes. We have observed epoch
mismatches when brokers are restarted, which should not be possible.
Improving the logging so we can see the previous member epoch and tally
this with the logged state.

Reviewers: Apoorv Mittal <[email protected]>, Sushant Mahajan <[email protected]>
…19536)

This PR marks the records as non-nullable for ShareFetch.

This PR is as per the changes for Fetch:
apache#18726 and some work for ShareFetch
was done here: apache#19167. I tested with
marking `records` as non-nullable in ShareFetch, which required
additional handling. The same has been fixed in current PR.

Reviewers: Andrew Schofield <[email protected]>, Chia-Ping Tsai
 <[email protected]>, TengYao Chi <[email protected]>, PoAn Yang
 <[email protected]>
…tProducerId (KIP-939) (apache#19429)

This is part of the client side changes required to enable 2PC for
KIP-939

**Producer Config:**
transaction.two.phase.commit.enable The default would be ‘false’.  If
set to ‘true’, the broker is informed that the client is participating
in two phase commit protocol and transactions that this client starts
never expire.

**Overloaded InitProducerId method**
If the value is 'true' then the corresponding field is set in the
InitProducerIdRequest

Reviewers: Justine Olshan <[email protected]>, Artem Livshits
 <[email protected]>
This patch does a few code changes:
* It cleans up the GroupCoordinatorService;
* It moves the helper methods to validate request to Utils;
* It moves the helper methods to create the assignment for the
ConsumerGroupHeartbeatResponse and the ShareGroupHeartbeatResponse from
the GroupMetadataManager to the respective classes.

Reviewers: Chia-Ping Tsai <[email protected]>, Jeff Kim <[email protected]>
…rvers (apache#19545)

Old bootstrap.metadata files cause problems with server that include
KAFKA-18601. When the server tries to read the bootstrap.checkpoint
file, it will fail if the metadata.version is older than 3.3-IV3
(feature level 7). This causes problems when these clusters are
upgraded.

This PR makes it possible to represent older MVs in BootstrapMetadata
objects without causing an exception. An exception is thrown only if we
attempt to access the BootstrapMetadata. This ensures that only the code
path in which we start with an empty metadata log checks that the
metadata version is 7 or newer.

Reviewers: José Armando García Sancio <[email protected]>, Ismael Juma
 <[email protected]>, PoAn Yang <[email protected]>, Liu Zeyu
 <[email protected]>, Alyssa Huang <[email protected]>
Replace names like a, b, c, ... with meaningful names in
AsyncKafkaConsumerTest.

Follow-up:
apache#19457 (comment)

Signed-off-by: PoAn Yang <[email protected]>

Reviewers: Bill Bejeck <[email protected]>, Ken Huang <[email protected]>
aliehsaeedii and others added 7 commits April 24, 2025 21:23
…pache#19450)

Kafka Streams calls `prepareCommit()` in `Taskmanager#closeTaskDirty()`.
However, the dirty task must not get committed and therefore,
prepare-commit tasks such as getting offsets should not be needed as
well. The only thing needed before closing a task dirty is flushing.
Therefore, separating `flush` and `prepareCommit` could be a good fix.

Reviewers: Bill Bejeck <[email protected]>, Matthias J. Sax <[email protected]>
…ache#19548)

If a streams, share or consumer group is described, all group IDs sent
to all shards of the group coordinator. This change fixes it. It tested
in the unit tests, since it's somewhat inconvenient to test the passed
read operation lambda.

Reviewers: David Jacot <[email protected]>, Andrew Schofield
<[email protected]>
apache#19552)

This PR just resolves an NPE when a topic assigned in a share group is
deleted. The NPE is caused by code which uses the current metadata image
to convert from a topic ID to the topic name. For a deleted topic, there
is no longer any entry in the image. A future PR will properly handle
the topic deletion.

Reviewers: Apoorv Mittal <[email protected]>, PoAn Yang <[email protected]>
If the streams rebalance protocol is enabled in
StreamsUncaughtExceptionHandlerIntegrationTest, the streams application
does not shut down correctly upon error.

There are two causes for this. First, sometimes, the SHUTDOWN_APPLICATION
code only sent with the leave heartbeat, but that is not handled broker
side. Second, the SHUTDOWN_APPLICATION code wasn't properly handled
client-side at all.

Reviewers: Bruno Cadonna <[email protected]>, Bill Bejeck
 <[email protected]>, PoAn Yang <[email protected]>
…upMetadataValue (apache#19504)

* Add MetadataHash field to ConsumerGroupMetadataValue,
ShareGroupMetadataValue, and StreamGroupMetadataValue.
* Add metadataHash field to
GroupCoordinatorRecordHelpers#newConsumerGroupEpochRecord,
GroupCoordinatorRecordHelpers#newShareGroupEpochRecord, and
StreamsCoordinatorRecordHelpers#newStreamsGroupEpochRecord.
* Add deprecated message to ConsumerGroupPartitionMetadataKey and
ConsumerGroupPartitionMetadataValue.
* ShareGroupPartitionMetadataKey / ShareGroupPartitionMetadataValue /
StreamGroupPartitionMetadataKey / StreamGroupPartitionMetadataValue will
be removed in next PR.

Reviewers: Lucas Brutschy <[email protected]>, David Jacot <[email protected]>

---------

Signed-off-by: PoAn Yang <[email protected]>
…hangs

The call to `backingStore.get()` (called by connector task threads through
`OffsetStorageReaderImpl.offsets()`) can block for long time waiting for data flush to complete
(`KafkaProducer.flush()`).

This change moves that call outside the synchronized clause that holds `offsetReadFutures`,
so that if `backingStore.get()` hangs then it does not keep `offsetReadFutures` locked.
The access to `closed` flag (`closed.get()`) is kept inside the synchronize clause to avoid race
condition with `close()`.

This is important because `OffsetStorageReaderImpl.close()` needs to lock `offsetReadFutures` as
well in order to cancel the futures.
Since the herder thread calls `OffsetStorageReaderImpl.close()` when attempting to stops a task,
before this change this was resulting in the herder thread hanging indefinetely waiting for
`backingStore.get()` to complete.
@arvi18
Copy link
Author

arvi18 commented Apr 26, 2025

Hi @davide-armand
CI is complaining about check style violations. Could you take a look at it?
Thanks

@arvi18
Copy link
Author

arvi18 commented Apr 26, 2025

The CI is now green.
The additional commit that fixed the CI should be squashed before merging.

@arvi18
Copy link
Author

arvi18 commented Apr 26, 2025

A label of 'needs-attention' was automatically added to this PR in order to raise the
attention of the committers. Once this issue has been triaged, the triage label
should be removed to prevent this automation from happening again.

@arvi18
Copy link
Author

arvi18 commented Apr 26, 2025

This PR is being marked as stale since it has not had any activity in 90 days. If you
would like to keep this PR alive, please leave a comment asking for a review. If the PR has
merge conflicts, update it with the latest from the base branch.

If you are having difficulty finding a reviewer, please reach out on the [mailing list](https://kafka.apache.org/contact).

If this PR is no longer valid or desired, please feel free to close it. If no activity occurs in the next 30 days, it will be automatically closed.

@arvi18
Copy link
Author

arvi18 commented Apr 26, 2025

A label of 'needs-attention' was automatically added to this PR in order to raise the
attention of the committers. Once this issue has been triaged, the triage label
should be removed to prevent this automation from happening again.

@arvi18
Copy link
Author

arvi18 commented Apr 26, 2025

Hi @davide-armand Thanks for the fix, and thanks for the test!

Are you able to remove the Thread.sleep calls from the test? These could introduce some flakiness later if run on slow machines. Additionally I think this test leaks threads through the executors. Could you ensure that the executors are closed, and the threads terminate?

@gharris1727 Pushed a fix in a separate commit, will squash before merging.

I cannot get a green build, if I'm not mistaken it fails on unrelated tests.
Should I keep restarting the CI hoping for a green build?

@arvi18
Copy link
Author

arvi18 commented Apr 26, 2025

Rebased on latest apache:trunk.
Compilation is now failing, seems unrelated to the changes in the PR.

Copy link

coderabbitai bot commented Apr 26, 2025

Important

Review skipped

More than 25% of the files skipped due to max files limit. The review is being skipped to prevent a low-quality review.

193 files out of 300 files are above the max files limit of 100. Please upgrade to Pro plan to get higher limits.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @arvi18, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

This pull request addresses a potential deadlock issue in Kafka Connect when a producer flushing hangs, preventing the herder thread from stopping tasks. It achieves this by moving the backingStore.get() call outside the synchronized clause that protects offsetReadFutures, thus preventing the herder thread from being blocked indefinitely. Additionally, the PR includes several updates and fixes related to Share Consumer and Kafka Producer, including dependency updates, code style suppressions, and documentation improvements.

Highlights

  • Deadlock Fix: Prevents the herder thread from hanging indefinitely when backingStore.get() blocks due to a hung producer flush.
  • Share Consumer Updates: Includes various updates and documentation improvements for the Share Consumer feature, such as removing unstable.api.versions.enable and updating documentation language.
  • Kafka Producer Enhancements: Introduces a new initTransactions(boolean keepPreparedTxn) method in KafkaProducer and Producer interface to support two-phase commit scenarios, along with a new configuration property transaction.two_phase_commit.enable.
  • Code Style and Dependency Management: Addresses checkstyle suppressions and adds a missing dependency on the :metadata project for the :storage project.
  • Readable Interface: Implements a slice() method in ByteBufferAccessor and adds a Readable interface to improve buffer handling during response parsing.

Changelog

Click here to see the changelog
  • .github/scripts/junit.py
    • Updates the thread dump message to indicate the script will now fail.
  • build.gradle
    • Removes an unnecessary afterEvaluate block related to shadow dependencies.
    • Adds a dependency on the :metadata project for the :storage project.
  • checkstyle/import-control-storage.xml
    • Adds allowed imports for org.apache.kafka.image and org.apache.kafka.metadata.
  • checkstyle/suppressions.xml
    • Adds KafkaProducerTest.java to the list of files suppressed for JavaNCSS checks.
  • clients/clients-integration-tests/src/test/java/org/apache/kafka/clients/consumer/ShareConsumerTest.java
    • Removes the unstable.api.versions.enable property from several integration tests.
  • clients/src/main/java/org/apache/kafka/clients/admin/internals/AlterConsumerGroupOffsetsHandler.java
    • Uses OffsetCommitRequest.Builder.forTopicNames instead of new OffsetCommitRequest.Builder.
  • clients/src/main/java/org/apache/kafka/clients/admin/internals/DeleteShareGroupOffsetsHandler.java
    • Removes an unnecessary boolean parameter in the DeleteShareGroupOffsetsRequest.Builder constructor.
  • clients/src/main/java/org/apache/kafka/clients/admin/internals/DescribeShareGroupsHandler.java
    • Removes an unnecessary boolean parameter in the ShareGroupDescribeRequest.Builder constructor.
  • clients/src/main/java/org/apache/kafka/clients/admin/internals/ListShareGroupOffsetsHandler.java
    • Removes an unnecessary boolean parameter in the DescribeShareGroupOffsetsRequest.Builder constructor.
  • clients/src/main/java/org/apache/kafka/clients/consumer/KafkaShareConsumer.java
    • Updates documentation to reflect the preview status of Share Consumer.
    • Corrects terminology from 'parameter' to 'property' in configuration descriptions.
    • Clarifies the description of implicit and explicit acknowledgement modes.
    • Adds a missing blank line in the example code.
  • clients/src/main/java/org/apache/kafka/clients/consumer/internals/AsyncKafkaConsumer.java
    • Renames streamsGroupRebalanceCallbacks to streamsRebalanceListener for clarity.
    • Simplifies the logic for handling exceptions in Streams rebalance callbacks.
  • clients/src/main/java/org/apache/kafka/clients/consumer/internals/CommitRequestManager.java
    • Uses OffsetCommitRequest.Builder.forTopicNames instead of new OffsetCommitRequest.Builder.
  • clients/src/main/java/org/apache/kafka/clients/consumer/internals/ConsumerCoordinator.java
    • Uses OffsetCommitRequest.Builder.forTopicNames instead of new OffsetCommitRequest.Builder.
  • clients/src/main/java/org/apache/kafka/clients/consumer/internals/ShareConsumerDelegateCreator.java
    • Updates the warning message to reflect the preview status of Share Consumer.
  • clients/src/main/java/org/apache/kafka/clients/consumer/internals/StreamsGroupHeartbeatRequestManager.java
    • Stores the statuses in streamsRebalanceData.
  • clients/src/main/java/org/apache/kafka/clients/consumer/internals/StreamsMembershipManager.java
    • Adds a method to expose state listeners for testing.
  • clients/src/main/java/org/apache/kafka/clients/consumer/internals/StreamsRebalanceData.java
    • Adds a field to store the statuses of the streams group.
  • clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java
    • Changes the type of ioThread from Thread to Sender.SenderThread.
    • Adds a new configuration property transaction.two_phase_commit.enable and updates the TransactionManager constructor to accept it.
    • Adds a new initTransactions(boolean keepPreparedTxn) method to allow retaining in-flight prepared transactions.
    • Adds headers to the onAcknowledgement method of ProducerInterceptor.
  • clients/src/main/java/org/apache/kafka/clients/producer/MockProducer.java
    • Updates the initTransactions method to accept a keepPreparedTxn parameter.
  • clients/src/main/java/org/apache/kafka/clients/producer/Producer.java
    • Adds a default implementation for initTransactions() and a new initTransactions(boolean keepPreparedTxn) method.
  • clients/src/main/java/org/apache/kafka/clients/producer/ProducerConfig.java
    • Adds a new configuration property transaction.two_phase_commit.enable and validation logic.
    • Adds validation to ensure transaction.timeout.ms is not set when transaction.two.phase.commit.enable is true.
  • clients/src/main/java/org/apache/kafka/clients/producer/ProducerInterceptor.java
    • Adds a new onAcknowledgement method that includes headers.
  • clients/src/main/java/org/apache/kafka/clients/producer/internals/ProducerInterceptors.java
    • Updates the onAcknowledgement method to pass headers to the interceptors.
    • Adds logic to handle headers when sending the record fails in onSend.
  • clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java
    • Changes the type of ioThread to Sender.SenderThread.
  • clients/src/main/java/org/apache/kafka/clients/producer/internals/TransactionManager.java
    • Removes the ThreadLocal for poisoning the state on invalid transitions.
    • Adds a new constructor parameter enable2PC.
    • Updates the initializeTransactions method to accept a keepPreparedTxn parameter and adds logic for 2PC transactions.
    • Updates the transitionTo method to use a protected method shouldPoisonStateOnInvalidTransition().
  • clients/src/main/java/org/apache/kafka/common/header/internals/RecordHeaders.java
    • Adds a isReadOnly() method.
  • clients/src/main/java/org/apache/kafka/common/protocol/ByteBufferAccessor.java
    • Adds a slice() method.
  • clients/src/main/java/org/apache/kafka/common/protocol/Readable.java
    • Adds a slice() method.
  • clients/src/main/java/org/apache/kafka/common/requests/...Request.java
    • Removes enableUnstableLastVersion parameters from several request builder constructors.
  • clients/src/main/java/org/apache/kafka/common/requests/...Response.java
    • Updates all response classes to use Readable instead of ByteBuffer in the parse method.
    • Removes ByteBufferAccessor imports.
  • clients/src/main/java/org/apache/kafka/common/requests/AbstractResponse.java
    • Changes the return type of serialize() to ByteBufferAccessor.
    • Updates the parseResponse method to accept a Readable instead of a ByteBuffer.
  • clients/src/main/java/org/apache/kafka/common/requests/internals/ConsumerProtocol.java
    • No changes
  • clients/src/main/java/org/apache/kafka/common/requests/internals/JoinGroupResponseDataJsonConverter.java
    • No changes
  • clients/src/main/java/org/apache/kafka/common/requests/internals/JoinGroupResponseDataReaders.java
    • No changes
  • clients/src/main/java/org/apache/kafka/common/requests/internals/JoinGroupResponseDataWriters.java
    • No changes
  • clients/src/main/java/org/apache/kafka/common/requests/internals/RequestThrottleHelper.java
    • No changes
  • clients/src/main/java/org/apache/kafka/common/requests/internals/TopicMetadataReaders.java
    • No changes
  • clients/src/main/java/org/apache/kafka/common/requests/internals/TopicMetadataWriters.java
    • No changes
  • clients/src/main/resources/common/message/...Request.json
    • Removes latestVersionUnstable fields from several request message definitions.
  • clients/src/main/resources/common/message/OffsetCommitRequest.json
    • Adds support for topic IDs and removes support for topic names in version 10.
    • Adds latestVersionUnstable to the message definition.
  • clients/src/main/resources/common/message/OffsetCommitResponse.json
    • Adds support for topic IDs and removes support for topic names in version 10.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.


A deadlock's a tricky thing,
A coding conundrum, it can sting.
But with careful thought,
And a fix hard-fought,
We'll keep our threads merrily swing.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a potential deadlock issue in the Kafka Connect runtime by moving a potentially blocking call outside of a synchronized block. The changes also include several updates related to Share Groups and KRaft, including dependency updates, code cleanup, and the addition of a new configuration option for two-phase commit. Overall, the changes seem well-structured and address the intended problem effectively.

Summary of Findings

  • Severity settings: The review settings instructed me to only identify issues at the medium, high, critical levels of severity, so I did not add review comments for any issues I found at the low level of severity. However, I have still included them in this findings_summary section.
  • Potential deadlock resolution: The core change in OffsetStorageReaderImpl.java appears to correctly address the potential deadlock by moving the backingStore.get() call outside the synchronized block. This prevents the herder thread from being blocked indefinitely.
  • Share Group and KRaft updates: Several files have been modified to enhance Share Group functionality and align with KRaft's architecture. These include dependency additions, code cleanups, and the introduction of a new configuration option for two-phase commit.
  • AbstractResponse parsing: The changes in AbstractResponse.java and related files improve the parsing of responses by using Readable instead of ByteBuffer, enhancing flexibility and potentially performance.
  • ProducerInterceptor changes: The changes in ProducerInterceptor.java and ProducerInterceptors.java add support for headers in the onAcknowledgement method, providing more context to interceptors.
  • Transactional Producer changes: The changes in KafkaProducer.java and ProducerConfig.java introduce a new configuration option (transaction.two_phase_commit.enable) and modify the initTransactions method to support two-phase commit scenarios.

Merge Readiness

The pull request appears to be well-structured and addresses the intended problem effectively. However, there are some high severity issues that need to be addressed before merging. Once those are resolved, the pull request should be ready for merging, but I am unable to directly approve the pull request, and users should have others review and approve this code before merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.