-
Notifications
You must be signed in to change notification settings - Fork 874
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consumer subscribers are not being called. #1832
Comments
@edenhill this has me at a loss as the logs don't get give much to go off of. From the logs, it seems like it's something to do with Group assignment may not be happening so the subscriber doesn't get notified, or an issue with topic metadata propagation? I'm still a huge kafka noob but any and all advice would be greatly appreciated. |
One of the highly concurrent tests just failed with:
|
metadata doesn't propagate synchronously. my first thought is what you're seeing may in some way be due to this (given the error in the previous comment). if you try setting some arbitrary delay after topic creation (maybe 2s or something), does this help? |
@mhowlett I added a second delay and it helped but caused some test failures. I added a two second delay via referenced commit and it took a lot of runs before it failed locally. Two seconds on the build server caused an immediate failure. If this is the issue, can we get a wait option when we call the create topic apis? If there is any chance you could debug this locally that would be a massive help. I'd be willing to meet up and work with you as well. |
@mhowlett I still haven't been able to narrow this down even on 1.9.0 |
is this a single broker cluster? |
Yes, full reproduction is available in the linked project above via simple clone and run. I did a single broker cluster for resource constraints for running on github actions and locally. |
I made some changes to try out on mac arm (in latest commit), but it's way less stable than windows and seems like my broker becomes unresponsive locally even via kcat while running tests (might be related?). |
getting Broker: Unknown topic or partition after you've waited for topic creation on a single broker cluster seems very odd |
consistent with broker issue.. |
I agree, it's all very odd. I would think that single broker would be the most reliable for these tests as less communication and sync across (only single) brokers. |
@mhowlett Do you think 1.9.1 will help or what are the next steps? |
@mhowlett do you think any of my issues could be related to the issues you are running into with .NET 6? |
no |
i'm triaging this as investigate further/low. we appreciate the testing and there is some chance this may be reflective of an actual issue. |
Thanks, I just tried the latest rc and still have failures and tests seems slower to run locally. https://github.com/FoundatioFx/Foundatio.Kafka/actions/runs/3348053132 |
@mhowlett were you able to figure out if this is apart of a larger issue? |
Latest 2.0.2 release seems better but still getting failures 43:46.62225 E:KafkaMessageBus - Error consuming test_ef3b4c5799cd43a5b174f2ed3aee1d43 GroupId=5e9f8df1a6604840bc4ed9642045cb43 message: Failed to query logical offset END: Broker: Unknown topic or partition |
Just bumping to see if there is any idea what might be causing this. The following pr is tracking every new release and still has this issue: FoundatioFx/Foundatio.Kafka#8 |
Description
We've been writing a Foundatio message bus implementation around Kafka and noticed that our tests are extremely flakey in some cases (https://github.com/FoundatioFx/Foundatio.Kafka/actions all test failures). The commonality so far is when we have multiple consumers listening to the same topic, the consumers are never notified of a topic message. 1.9.0 helped a lot with reliability locally but still get failures at random. I have a 5900x with a lot of resources locally compared to the build server.
How to reproduce
docker compose up
in the cloned folder.dotnet test
.The test
KafkaMessageBusTests.CanSendMessageToMultipleSubscribersAsync
seems to be the test most easily to reproduce this error (after a few runs) and is the simplest.Under the hood, each call to subscribe will ensure topic exists, then create a consumer subscriber listening in a loop, only if an existing listener isn't already running (at most one listener per bus instance). I've included logs of varying detail.
Checklist
Please provide the following information:
Test logs with handlers not commented out (https://github.com/FoundatioFx/Foundatio.Kafka/blob/main/src/Foundatio.Kafka/Messaging/KafkaMessageBus.cs#L167-L173)
I saw a similar issue and I wondered if I should not be using these handlers (https://github.com/ah-/rdkafka-dotnet/issues/61). Upon googling I came across this which I didn't know if it was similar (confluentinc/confluent-kafka-python#970)
See the following gist for all the unit logs (both passing and failing with variying levels of debug logs). GitHub wouldn't let me post it here as it said the commend was too long: https://gist.github.com/niemyjski/bac539002aa046738d6e029d0d1ba688
Broker logs from recent failure
The text was updated successfully, but these errors were encountered: