Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bgpd: Fix over-aggressive in de-dup BGP messages #15027

Closed
wants to merge 1 commit into from

Conversation

yyuanam
Copy link
Contributor

@yyuanam yyuanam commented Dec 14, 2023

When a BGP Update->Withdraw->Update happens in very quick succession, FRR would think the second Update is a duplicate, and not advertise the Update out after the Withdraw.

Thus fixing the logic to not execute dedup when 1, there is no match on adj; 2, when matched adj is marked as withdrawl which would still in adj table and have attr info until the Withdraw being sent out; 3, attr_hash matching to confirm dedup is not enough for hash conflict case.

When a BGP Update->Withdraw->Update happens in very quick succession,
FRR would think the second Update is a duplicate, and not advertise
the Update out after the Withdraw.

Thus fixing the logic to not execute dedup when 1, there is no match
on adj; 2, when matched adj is marked as withdrawl which would still
in adj table and have attr info until the Withdraw being sent out;
3, attr_hash matching to confirm dedup is not enough for hash
conflict case.

Signed-off-by: Yuan Yuan <[email protected]>
Copy link
Member

@ton31337 ton31337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least for me, reversing the logic (is_withdraw -> is_update) is more readable. Also, seems that lots of tests are failing, we need to figure out what's not right here yet.

/* Is not withdraw */
&& !adj->is_withdraw
/* Still have to compare the original attr to make sure nothing has changed */
&& adj->attr && attr && attrhash_cmp(adj->attr, attr)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adj->attr && attr seems redundant: attr can't be NULL here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are so many corner cases, I added the check anyways just to make sure no core dump could happen.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, this should never happen, otherwise we have more problems.

@riw777 riw777 self-requested a review December 19, 2023 16:33
Copy link
Member

@riw777 riw777 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, waiting on @ton31337 's comment

@yyuanam
Copy link
Contributor Author

yyuanam commented Dec 19, 2023

At least for me, reversing the logic (is_withdraw -> is_update) is more readable. Also, seems that lots of tests are failing, we need to figure out what's not right here yet.

I used "is_withdraw" is because the withdrawal msg is really the key in the issue which caused update msg not being sent out. Checked the failing tests, not convinced they are related. Diving deep.

@riw777
Copy link
Member

riw777 commented Jan 16, 2024

it looks like the ci failures are valid and related to the changes in this code ... @yyuanam can you take a look and see if there's some way to see how this change is related?

@ton31337
Copy link
Member

ton31337 commented Feb 8, 2024

I tried to replicate this issue locally while sending very fast various update/withdraw combinations, and still not sure fully how to trigger this case. Do you have maybe some ideas on how to replicate this easily?

Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@riw777
Copy link
Member

riw777 commented Mar 19, 2024

@yyuanam still working on this?

@riw777
Copy link
Member

riw777 commented Jul 16, 2024

@frrbot autoclose in 1 month

@frrbot frrbot bot added the autoclose label Jul 16, 2024
@github-actions github-actions bot added rebase PR needs rebase and removed autoclose labels Jul 16, 2024
@frrbot frrbot bot closed this Aug 16, 2024
@frrbot frrbot bot closed this Aug 16, 2024
@frrbot frrbot bot closed this Aug 16, 2024
@frrbot frrbot bot closed this Aug 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants