Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bgpd: bgp_sync_label_manager failure case #15104

Merged
merged 1 commit into from
Jan 7, 2024

Conversation

donaldsharp
Copy link
Member

There are several problems with the bgp_sync_label_manager function:

a) It is possible that a request in the lp->requests fifo will be unable to be filled at this point in time and the lf will be leaked and not ever fullfilled.

b) The bgp_sync_label_manager runs one time a second irrelevant if there is work to do or not.

To fix (a) just add the request back to the requests fifo and set the timer to pop in the future.

To fix (b) just every time something is put into
the request pool start a timer to run in 1 second
and do not restart it if all the work is done.

There are several problems with the bgp_sync_label_manager
function:

a) It is possible that a request in the lp->requests
fifo will be unable to be filled at this point in time
and the lf will be leaked and not ever fullfilled.

b) The bgp_sync_label_manager runs one time a second
irrelevant if there is work to do or not.

To fix (a) just add the request back to the requests
fifo and set the timer to pop in the future.

To fix (b) just every time something is put into
the request pool start a timer to run in 1 second
and do not restart it if all the work is done.

Signed-off-by: Donald Sharp <[email protected]>
@donaldsharp
Copy link
Member Author

donaldsharp commented Jan 7, 2024

As a note I chased this down because I was looking at a show thread cpu and saw this line:

    1     104315.384   2682787       38       608       42      4795         0         0         11    T    bgp_sync_label_manager

bgp_sync_label_manager has been run 2.6 million times on my box, which is effectively 31 days of FRR uptime. This event alone has consumed 104 seconds of cpu! I do not even have any label configuration on this box. So this is extremely wasteful.

@ton31337 ton31337 merged commit fa62132 into FRRouting:master Jan 7, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants