-
Notifications
You must be signed in to change notification settings - Fork 203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Group Coordination over distributed nodes #379
Comments
Hi. I noticed in your logs that the generation number jumped from 39437 to 39439
|
Hello, thanks for the overview. I removed the option at all, so that we use the default values. We're still looping however, here is the log of the two nodes racing each others:
Node 2:
Should I raise, instead, the default session_timeout? (currently set to default 10s) |
yes, try with a larger session timeout. |
I tried with
In the group_config, however, we're still racing. By the way, why is re-joining performed on 5 seconds basis?
|
ah, the sync group request has a hard coded 5 seconds limit on waiting for response. it should be fixed to use session timeout for sync-request timeout. before we can fix it, you can try async processing in group subscriber |
this was done only for join group request
|
I'm trying the following configuration (I use div function on default values), however no improvement.. [
{max_rejoin_attempts, 30},
{session_timeout_seconds, 40},
{offset_commit_policy, commit_to_kafka_v2},
{offset_commit_interval_seconds, 5}
],
[
{max_bytes, 1048576 div 2},
{prefetch_bytes, 102400 div 2},
{prefetch_count, 10 div 2},
{begin_offset, latest},
{offset_reset_policy, reset_to_earliest},
{max_wait_time, 5000},
{sleep_timeout, 3000}
], |
try my branch in the PR ? |
it very much depends on how big your messages and message sets are. |
I am trying to deploy your patch, I will let you know in some minutes. |
Unfortunately, even with your last patch, we're still looping.. |
anything different in logs ? |
Unfortunately not at all, we're still looping each 5 seconds with the very same log as before. Also, as a side note, we tried to flush all the kafka queue before starting the second node this time and we're not producing any more messages in the topic. |
For completeness' sake: #{
id => socrates_brod_client,
modules => [?MODULE],
restart => permanent,
shutdown => 1000,
start => {brod_client, start_link, [[
{"broker.kafka.l4lb.thisdcos.directory", 9092}
], ClientID, []]}
},
#{
id => socrates_brod_group,
modules => [socrates_multiplex],
restart => temporary,
shutdown => 1000,
type => worker,
start => {brod, start_link_group_subscriber, [
ClientID,
GroupId,
[TOPIC],
[
{max_rejoin_attempts, 30},
{session_timeout_seconds, 40},
{offset_commit_policy, commit_to_kafka_v2},
{offset_commit_interval_seconds, 5}
],
[
{max_bytes, 1048576 div 2},
{prefetch_bytes, 102400 div 2},
{prefetch_count, 10 div 2},
{begin_offset, latest},
{offset_reset_policy, reset_to_earliest},
{max_wait_time, 5000},
{sleep_timeout, 3000}
],
message,
socrates_multiplex,
#{}
]}
}
where TOPIC = <<"socrates">>,
GroupId = <<TOPIC/binary, "-groupid-shared-by-all-members">>,
ClientID = list_to_atom(io_lib:format("socrates_brod_client~s", [node()])), So each node shares the same TopicId and GroupId but not the ClientId. |
Could you try a different |
what's your kafka version btw. |
We are using Kafka version 2.12-2.3.0. We also tried the following:
|
Hello again I would also ask another question:
is always the same, even if after 5 times it should terminate (and terminate the parent group_subscriber) if I understood correctly. May this be the cause of the stabilization issues? |
there is actually a (re)join failure log https://github.com/klarna/brod/blob/3.10.0/src/brod_group_coordinator.erl#L513
Given that you have 10 seconds offset commit interval and 5 seconds heartbeat rate (which is the default) and you are seeing I can try to reproduce the issue with kafka 2.3.0, but can't give an ETA now. In the mean time, you can perhaps try 3 things:
rr(brod_group_coordinator).
sys:get_state(pid(0,1633,0)).
|
Thanks, we will try following your suggestions. Is there any suggested kafka version you recommend us to use? |
@k32 knows better which versions are proven working fine with |
We've been running fine with all major versions up to 2.2.* |
As you pointed out, it's a version compatibility issue with Kafka version >2.2.* As soon as we downgraded to the correct version the grouping started behaving as expected. Thanks for all the help. Should we now rename the issue in "Support Kafka versions > 2.2.*" ? |
seems to be working fine with 2.4.1 too. |
2.3.1 is also good. |
I had a quick look at 2.3.1 release notes https://downloads.apache.org/kafka/2.3.1/RELEASE_NOTES.html @rocveralfre could you check if my findings were correct ? |
Here is the list of what we tried so far (we cannot test all versions because we're bound to a packet manager that does not offer them all):
|
I did not find such issue with kafka 2.3.1 and 2.4.0 |
Sorry for not answering. I stepped down from "acting maintainer" role that I had volunteered for. It's not decided yet who will take over. |
no worries @k32 , just pinged all members I have interacted with in the last couple of weeks to make sure this issue gets some attention. |
We are trying to have an environment were we can reproduce the issue so that we can verify behaviour against different versions of Kafka. Will report back! |
There is one interesting things in the timing of the logged events sent by @rocveralfre:
This means the Kafka group coordinator elected node 1 as generation 40412 group leader somewhere between 12:49:37.207 and 12:49:37.210, but by 12:49:38.683 it already decided node 1 is inactive and elected node 2 as generation 40413 group leader instead. This is less than 1.5 seconds, so it's very unlikely that the root cause would be a too low timeout value or that the |
@dszoboszlay had the issue very well spotted! such short leader re-election interval, I can only think of two possible causes:
with regards to issue reproducing: |
I managed to reproduce the bug too with This looks like a Kafka regression in 2.3.0, which is fixed in 2.3.1, see KAFKA-8653. |
Hello,
We're using this project for a long time. The configuration is the following:
We have one Erlang node running a brod_client and a brod_group_subscriber (using the start_link_group_subscriber) with the following parameters:
brod version 3.10.0
group_config:
consumer_config:
We have just one topic with 4 partitions.
When we just have one node running, everything works correctly and the group_coordinator assigns all the partition to the single consumer.
However, when we start another node with the very same configuration, the two coordinators start looping over and try to acquire all partitions exclusively:
It's worth to say that each node's clientId is different, while the groupId is the same for all the mnesia distributed nodes.
We also noticed, however, that if we stop and restart ALL our custom applications with an rpc call, e.g.
Then the coordinators are able to reach a shared stable state where partitions are correctly distributed.
Are we misconfiguring the group_subscriber? Is our initial assumption (if a new member of the group joins all the members try to stabilize once more) incorrect?
This is really important for our project and we would be happy to hear your feedback.
Thanks in advance
The text was updated successfully, but these errors were encountered: