-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bgpd: Fix over-aggressive in de-dup BGP messages #15027
Conversation
When a BGP Update->Withdraw->Update happens in very quick succession, FRR would think the second Update is a duplicate, and not advertise the Update out after the Withdraw. Thus fixing the logic to not execute dedup when 1, there is no match on adj; 2, when matched adj is marked as withdrawl which would still in adj table and have attr info until the Withdraw being sent out; 3, attr_hash matching to confirm dedup is not enough for hash conflict case. Signed-off-by: Yuan Yuan <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least for me, reversing the logic (is_withdraw -> is_update) is more readable. Also, seems that lots of tests are failing, we need to figure out what's not right here yet.
/* Is not withdraw */ | ||
&& !adj->is_withdraw | ||
/* Still have to compare the original attr to make sure nothing has changed */ | ||
&& adj->attr && attr && attrhash_cmp(adj->attr, attr)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adj->attr && attr
seems redundant: attr
can't be NULL here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are so many corner cases, I added the check anyways just to make sure no core dump could happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, this should never happen, otherwise we have more problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, waiting on @ton31337 's comment
I used "is_withdraw" is because the withdrawal msg is really the key in the issue which caused update msg not being sent out. Checked the failing tests, not convinced they are related. Diving deep. |
it looks like the ci failures are valid and related to the changes in this code ... @yyuanam can you take a look and see if there's some way to see how this change is related? |
I tried to replicate this issue locally while sending very fast various update/withdraw combinations, and still not sure fully how to trigger this case. Do you have maybe some ideas on how to replicate this easily? |
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
@yyuanam still working on this? |
@frrbot autoclose in 1 month |
When a BGP Update->Withdraw->Update happens in very quick succession, FRR would think the second Update is a duplicate, and not advertise the Update out after the Withdraw.
Thus fixing the logic to not execute dedup when 1, there is no match on adj; 2, when matched adj is marked as withdrawl which would still in adj table and have attr info until the Withdraw being sent out; 3, attr_hash matching to confirm dedup is not enough for hash conflict case.