-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zebra: fix nhg out of sync between zebra and kernel #14080
Conversation
PR#13413 introduces reinstall mechanism, but there is problem with the route leak scenario. With route leak configuration: ( `x1` and `x2` are binded to `vrf1` ) ``` vrf vrf2 ip route 75.75.75.75/32 77.75.1.75 nexthop-vrf vrf1 ip route 75.75.75.75/32 77.75.2.75 nexthop-vrf vrf1 exit-vrf ``` Firstly, all are ok. But after `x1` is set down and up ( The interval between the down and up operations should be less than 180 seconds. ) , `x1` is lost from the nexthop group: ``` anlan# ip nexthop id 121 group 122/123 proto zebra id 122 via 77.75.1.75 dev x1 scope link proto zebra id 123 via 77.75.2.75 dev x2 scope link proto zebra anlan# ip route show table 2 75.75.75.75 nhid 121 proto 196 metric 20 nexthop via 77.75.1.75 dev x1 weight 1 nexthop via 77.75.2.75 dev x2 weight 1 anlan# ip link set dev x1 down anlan# ip link set dev x1 up anlan# ip route show table 2 <- Wrong, one nexthop lost from group 75.75.75.75 nhid 121 via 77.75.2.75 dev x2 proto 196 metric 20 anlan# ip nexthop id 121 group 123 proto zebra id 122 via 77.75.1.75 dev x1 scope link proto zebra id 123 via 77.75.2.75 dev x2 scope link proto zebra anlan# show ip route vrf vrf2 <- Still ok VRF vrf2: S>* 75.75.75.75/32 [1/0] via 77.75.1.75, x1 (vrf vrf1), weight 1, 00:00:05 * via 77.75.2.75, x2 (vrf vrf1), weight 1, 00:00:05 ``` From the impact on kernel: The `nh->type` of `id 122` is *always* `NEXTHOP_TYPE_IPV4` in the route leak case. Then, `nexthop_is_ifindex_type()` introduced by commit `5bb877` always returns `false`, so its dependents can't be reinstalled. After `x1` is down, there is only `id 123` in the group of `id 121`. So, Finally `id 121` remains unchanged after `x1` is up, i.e., `id 122` is not added to the group even it is reinstalled itself. From the impact on zebra: The `show ip route vrf vrf2` is still ok because the `id`s are reused/reinstalled successfully within 180 seconds after `x1` is down and up. The group of `id 121` is with old `NEXTHOP_GROUP_INSTALLED` flag, and it is still the group of `id 122` and `id 123` as before. In this way, kernel and zebra have become out of sync. The `nh->type` of `id 122` should be adjusted to `NEXTHOP_TYPE_IPV4_IFINDEX` after nexthop resolved. This commit is for doing this to make that reinstall mechanism work. Signed-off-by: anlan_cs <[email protected]>
Continuous Integration Result: FAILEDContinuous Integration Result: FAILEDSee below for issues. This is a comment from an automated CI system. Get source / Pull Request: SuccessfulBuilding Stage: SuccessfulBasic Tests: FailedTopotests Ubuntu 18.04 i386 part 8: Failed (click for details)Topology Test Results are at https://ci1.netdef.org/browse/FRR-PULLREQ2-TOPO8U18I386-13092/test Topology Tests failed for Topotests Ubuntu 18.04 i386 part 8 Successful on other platforms/tests
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good finding.
Continuous Integration Result: SUCCESSFULCongratulations, this patch passed basic tests Tested-by: NetDEF / OpenSourceRouting.org CI System CI System Testrun URL: https://ci1.netdef.org/browse/FRR-PULLREQ2-13092/ This is a comment from an automated CI system. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good
@Mergifyio backport stable/9.0 stable/8.5 |
✅ Backports have been created
|
zebra: fix nhg out of sync between zebra and kernel (backport #14080)
zebra: fix nhg out of sync between zebra and kernel (backport #14080)
PR#13413 introduces reinstall mechanism, but there is problem with the route leak scenario.
With route leak configuration: (
x1
andx2
are binded tovrf1
)Firstly, all are ok. But after
x1
is set down and up ( The interval between the down and up operations should be less than 180 seconds. ) ,x1
is lost from the nexthop group:From the impact on kernel:
The
nh->type
ofid 122
is alwaysNEXTHOP_TYPE_IPV4
in the route leak case. Then,nexthop_is_ifindex_type()
introduced by commit5bb877
always returnsfalse
, so its dependents can't be reinstalled. Afterx1
is down, there is onlyid 123
in the group ofid 121
. So, Finallyid 121
remains unchanged afterx1
is up, i.e.,id 122
is not added to the group even it is reinstalled itself.From the impact on zebra:
The
show ip route vrf vrf2
is still ok because theid
s are reused/reinstalled successfully within 180 seconds afterx1
is down and up. The group ofid 121
is with oldNEXTHOP_GROUP_INSTALLED
flag, and it is still the group ofid 122
andid 123
as before.In this way, kernel and zebra have become out of sync.
The
nh->type
ofid 122
should be adjusted toNEXTHOP_TYPE_IPV4_IFINDEX
after nexthop resolved. This commit is for doing this to make that reinstall mechanism work.