Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zebra: fix route deletion during zebra shutdown #15424

Merged
merged 1 commit into from
Mar 18, 2024

Conversation

askorichenko
Copy link
Contributor

@askorichenko askorichenko commented Feb 24, 2024

Restore the order of execution in zebra's main thread:
vrf_terminate();
zebra_dplane_finish();
zebra_dplane_shutdown();
changed in commit 3b0e170.

vrf_terminate() enqueues routes for deletion in dplane thread.
zebra_dplane_shutdown() effectively forces dplane thread
to immediately cease and join (leaving unprocessed routes), while
zebra_dplane_finish() has a condition that prevents
zebra_dplane_shutdown() from running until all routes queued for dplane thread have been deleted.

Closes #14062
Closes #14178

@askorichenko
Copy link
Contributor Author

askorichenko commented Feb 24, 2024

This patch fixes issues
#14178
#14062
The bug affects routes installed by different daemons, not only staticd

Bug reproduction (on staticd example) with current master build (9.2):

$ cat >static_254_routes.config<<EOF
ip route 10.0.0.1/32 10.255.0.1
ip route 10.0.0.2/32 10.255.0.1
...
ip route 10.0.0.254/32 10.255.0.1
EOF

# vtysh -f static_254_routes.config
# systemctl stop frr
# ip r 
10.0.0.76 nhid 14 via 10.255.0.1 dev enp1s0 proto 196 metric 20
10.0.0.77 nhid 14 via 10.255.0.1 dev enp1s0 proto 196 metric 20
...
10.0.0.254 nhid 14 via 10.255.0.1 dev enp1s0 proto 196 metric 20

i.e. the tail of the queued deletion events remained unprocessed.

Please review.
Thank you!

@ton31337 ton31337 added this to the 10.0 milestone Feb 26, 2024
@ton31337
Copy link
Member

@Mergifyio backport dev/10.0 stable/9.1

Copy link

mergify bot commented Feb 26, 2024

backport dev/10.0 stable/9.1

🟠 Waiting for conditions to match

  • merged [📌 backport requirement]

Copy link
Member

@ton31337 ton31337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mjstapp
Copy link
Contributor

mjstapp commented Feb 27, 2024

So the commit you refer to believed it was fixing an ordering problem with vrfs - is that problem still fixed, or will it reappear if this change is merged?

@riw777 riw777 self-requested a review February 27, 2024 16:16
@askorichenko askorichenko marked this pull request as draft February 28, 2024 14:59
@github-actions github-actions bot added size/M and removed size/XS labels Feb 28, 2024
@askorichenko
Copy link
Contributor Author

askorichenko commented Feb 29, 2024

  1. The currently implemented order aborts route deletion by running zebra_dplane_shutdown() right after vrf_terminate()
zebra_dplane_finish();
vrf_terminate();
zebra_dplane_shutdown();
  1. My first version of the patch could delete routes properly, but freed memory too early in vrf_terminate() (thus reintroduced the possible crash fixed by 3b0e170)
vrf_terminate();
zebra_dplane_finish();
zebra_dplane_shutdown();
  1. Newer edit suggests to split vrf_terminate() into parts - one that initiates route removal (protected by zebra_dplane_finish() from cancellation as in 2.), and the other releases memory at the final stage (like in 1.). In the main zebra thread the order is:
vrf_disable_all();
zebra_dplane_finish();
vrf_terminate();
zebra_dplane_shutdown();

vrf_terminate() is a library function, it calls disable() (if not already disabled) and delete() (releases resources).
vrf_disable_all() is copied from there but implements only disable() method.

@askorichenko askorichenko marked this pull request as ready for review February 29, 2024 20:54
@askorichenko askorichenko requested a review from ton31337 March 1, 2024 10:41
Copy link

mergify bot commented Mar 4, 2024

backport dev/10.0 stable/9.1

✅ Backports have been created

@ton31337 ton31337 dismissed their stale review March 5, 2024 07:06

ASAN fixed.

lib/vrf.c Outdated Show resolved Hide resolved
Copy link
Member

@riw777 riw777 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

@github-actions github-actions bot added the rebase PR needs rebase label Mar 12, 2024
@askorichenko
Copy link
Contributor Author

Added code cleanup requested by @mjstapp.
Agreed this way it looks better. Thank you!

@askorichenko askorichenko force-pushed the master branch 2 times, most recently from f5626ec to 31065d5 Compare March 12, 2024 11:39
Copy link
Contributor

@mjstapp mjstapp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This version looks good - but I see that there are two commits here, when really there should just be the one that makes this set of changes. Can you remove or squash the other commit?

Split zebra's vrf_terminate() into disable() and delete() stages.
The former enqueues all events for the dplane thread.
Memory freeing is performed in the second stage.

Signed-off-by: Alexander Skorichenko <[email protected]>
@askorichenko
Copy link
Contributor Author

Pushed in a single commit.
Though there is a failing test in TopoTests Debian 10 amd64 Part 1, but I noticed this particular fail pops out regularly across other PRs, as well, seemed unrelated to the change.

@mjstapp
Copy link
Contributor

mjstapp commented Mar 15, 2024

so the tests need to pass - but if you look at the results summary, you'll see that there's a button that reruns just the failed tests, which is very useful. I've asked for that rerun - let's see if it's clean.

@mjstapp mjstapp merged commit 8cc52ef into FRRouting:master Mar 18, 2024
9 checks passed
mjstapp added a commit that referenced this pull request Mar 18, 2024
zebra: fix route deletion during zebra shutdown (backport #15424)
mjstapp added a commit that referenced this pull request Mar 18, 2024
zebra: fix route deletion during zebra shutdown (backport #15424)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants