Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zebra: fix nhg out of sync between zebra and kernel #14080

Merged
merged 1 commit into from
Jul 25, 2023

Conversation

anlancs
Copy link
Contributor

@anlancs anlancs commented Jul 24, 2023

PR#13413 introduces reinstall mechanism, but there is problem with the route leak scenario.

With route leak configuration: ( x1 and x2 are binded to vrf1 )

vrf vrf2
 ip route 75.75.75.75/32 77.75.1.75 nexthop-vrf vrf1
 ip route 75.75.75.75/32 77.75.2.75 nexthop-vrf vrf1
exit-vrf

Firstly, all are ok. But after x1 is set down and up ( The interval between the down and up operations should be less than 180 seconds. ) , x1 is lost from the nexthop group:

anlan# ip nexthop
id 121 group 122/123 proto zebra
id 122 via 77.75.1.75 dev x1 scope link proto zebra
id 123 via 77.75.2.75 dev x2 scope link proto zebra
anlan# ip route show table 2
75.75.75.75 nhid 121 proto 196 metric 20
        nexthop via 77.75.1.75 dev x1 weight 1
        nexthop via 77.75.2.75 dev x2 weight 1
anlan# ip link set dev x1 down
anlan# ip link set dev x1 up
anlan# ip route show table 2 <- Wrong, one nexthop lost from group
75.75.75.75 nhid 121 via 77.75.2.75 dev x2 proto 196 metric 20
anlan# ip nexthop
id 121 group 123 proto zebra
id 122 via 77.75.1.75 dev x1 scope link proto zebra
id 123 via 77.75.2.75 dev x2 scope link proto zebra
anlan# show ip route vrf vrf2 <- Still ok
VRF vrf2:
S>* 75.75.75.75/32 [1/0] via 77.75.1.75, x1 (vrf vrf1), weight 1, 00:00:05
  *                      via 77.75.2.75, x2 (vrf vrf1), weight 1, 00:00:05

From the impact on kernel:
The nh->type of id 122 is always NEXTHOP_TYPE_IPV4 in the route leak case. Then, nexthop_is_ifindex_type() introduced by commit 5bb877 always returns false, so its dependents can't be reinstalled. After x1 is down, there is only id 123 in the group of id 121. So, Finally id 121 remains unchanged after x1 is up, i.e., id 122 is not added to the group even it is reinstalled itself.

From the impact on zebra:
The show ip route vrf vrf2 is still ok because the ids are reused/reinstalled successfully within 180 seconds after x1 is down and up. The group of id 121 is with old NEXTHOP_GROUP_INSTALLED flag, and it is still the group of id 122 and id 123 as before.

In this way, kernel and zebra have become out of sync.

The nh->type of id 122 should be adjusted to NEXTHOP_TYPE_IPV4_IFINDEX after nexthop resolved. This commit is for doing this to make that reinstall mechanism work.

PR#13413 introduces reinstall mechanism, but there is problem with the route
leak scenario.

With route leak configuration: ( `x1` and `x2` are binded to `vrf1` )
```
vrf vrf2
 ip route 75.75.75.75/32 77.75.1.75 nexthop-vrf vrf1
 ip route 75.75.75.75/32 77.75.2.75 nexthop-vrf vrf1
exit-vrf
```

Firstly, all are ok.  But after `x1` is set down and up ( The interval
between the down and up operations should be less than 180 seconds. ) ,
`x1` is lost from the nexthop group:
```
anlan# ip nexthop
id 121 group 122/123 proto zebra
id 122 via 77.75.1.75 dev x1 scope link proto zebra
id 123 via 77.75.2.75 dev x2 scope link proto zebra
anlan# ip route show table 2
75.75.75.75 nhid 121 proto 196 metric 20
        nexthop via 77.75.1.75 dev x1 weight 1
        nexthop via 77.75.2.75 dev x2 weight 1
anlan# ip link set dev x1 down
anlan# ip link set dev x1 up
anlan# ip route show table 2 <- Wrong, one nexthop lost from group
75.75.75.75 nhid 121 via 77.75.2.75 dev x2 proto 196 metric 20
anlan# ip nexthop
id 121 group 123 proto zebra
id 122 via 77.75.1.75 dev x1 scope link proto zebra
id 123 via 77.75.2.75 dev x2 scope link proto zebra
anlan# show ip route vrf vrf2 <- Still ok
VRF vrf2:
S>* 75.75.75.75/32 [1/0] via 77.75.1.75, x1 (vrf vrf1), weight 1, 00:00:05
  *                      via 77.75.2.75, x2 (vrf vrf1), weight 1, 00:00:05
```

From the impact on kernel:
The `nh->type` of `id 122` is *always* `NEXTHOP_TYPE_IPV4` in the route leak
case.  Then, `nexthop_is_ifindex_type()` introduced by commit `5bb877` always
returns `false`, so its dependents can't be reinstalled.  After `x1` is down,
there is only `id 123` in the group of `id 121`.  So, Finally `id 121` remains
unchanged after `x1` is up, i.e., `id 122` is not added to the group even it is
reinstalled itself.

From the impact on zebra:
The `show ip route vrf vrf2` is still ok because the `id`s are reused/reinstalled
successfully within 180 seconds after `x1` is down and up.  The group of `id 121`
is with old `NEXTHOP_GROUP_INSTALLED` flag, and it is still the group of `id 122`
and `id 123` as before.

In this way, kernel and zebra have become out of sync.

The `nh->type` of `id 122` should be adjusted to `NEXTHOP_TYPE_IPV4_IFINDEX`
after nexthop resolved.  This commit is for doing this to make that reinstall
mechanism work.

Signed-off-by: anlan_cs <[email protected]>
@NetDEF-CI
Copy link
Collaborator

NetDEF-CI commented Jul 24, 2023

Continuous Integration Result: FAILED

Continuous Integration Result: FAILED

See below for issues.
CI System Testrun URL: https://ci1.netdef.org/browse/FRR-PULLREQ2-13092/

This is a comment from an automated CI system.
For questions and feedback in regards to this CI system, please feel free to email
Martin Winter - mwinter (at) opensourcerouting.org.

Get source / Pull Request: Successful

Building Stage: Successful

Basic Tests: Failed

Topotests Ubuntu 18.04 i386 part 8: Failed (click for details)

Topology Test Results are at https://ci1.netdef.org/browse/FRR-PULLREQ2-TOPO8U18I386-13092/test

Topology Tests failed for Topotests Ubuntu 18.04 i386 part 8
see full log at https://ci1.netdef.org/browse/FRR-PULLREQ2-13092/artifact/TOPO8U18I386/TopotestLogs/log_topotests.txt
Topotests Ubuntu 18.04 i386 part 8: Unknown Log
URL: https://ci1.netdef.org/browse/FRR-PULLREQ2-13092/artifact/TOPO8U18I386/TopotestDetails/

Successful on other platforms/tests
  • Topotests Ubuntu 18.04 arm8 part 5
  • CentOS 7 rpm pkg check
  • Addresssanitizer topotests part 8
  • Topotests Ubuntu 18.04 amd64 part 9
  • Topotests Ubuntu 18.04 i386 part 9
  • Addresssanitizer topotests part 6
  • Static analyzer (clang)
  • Topotests debian 10 amd64 part 1
  • Topotests Ubuntu 18.04 amd64 part 1
  • Topotests Ubuntu 18.04 i386 part 5
  • Topotests Ubuntu 18.04 i386 part 0
  • Topotests debian 10 amd64 part 2
  • Topotests debian 10 amd64 part 7
  • Topotests Ubuntu 18.04 arm8 part 0
  • Topotests Ubuntu 18.04 amd64 part 8
  • Topotests debian 10 amd64 part 3
  • Addresssanitizer topotests part 0
  • Topotests Ubuntu 18.04 amd64 part 6
  • Ubuntu 18.04 deb pkg check
  • Topotests debian 10 amd64 part 6
  • Topotests Ubuntu 18.04 arm8 part 1
  • Topotests Ubuntu 18.04 arm8 part 6
  • Addresssanitizer topotests part 1
  • Topotests Ubuntu 18.04 i386 part 4
  • Topotests debian 10 amd64 part 5
  • Topotests Ubuntu 18.04 arm8 part 3
  • Debian 9 deb pkg check
  • Addresssanitizer topotests part 4
  • Topotests Ubuntu 18.04 i386 part 2
  • Topotests Ubuntu 18.04 arm8 part 8
  • Addresssanitizer topotests part 7
  • Topotests debian 10 amd64 part 4
  • Topotests debian 10 amd64 part 9
  • Topotests Ubuntu 18.04 i386 part 7
  • Topotests Ubuntu 18.04 amd64 part 3
  • Addresssanitizer topotests part 5
  • Topotests Ubuntu 18.04 i386 part 6
  • Topotests Ubuntu 18.04 amd64 part 2
  • Topotests debian 10 amd64 part 8
  • Topotests Ubuntu 18.04 amd64 part 4
  • Topotests Ubuntu 18.04 arm8 part 4
  • Topotests Ubuntu 18.04 i386 part 1
  • Ubuntu 20.04 deb pkg check
  • Addresssanitizer topotests part 2
  • Topotests Ubuntu 18.04 arm8 part 9
  • Debian 10 deb pkg check
  • Topotests Ubuntu 18.04 amd64 part 5
  • Topotests Ubuntu 18.04 amd64 part 0
  • Topotests Ubuntu 18.04 arm8 part 2
  • Addresssanitizer topotests part 3
  • Topotests Ubuntu 18.04 amd64 part 7
  • Topotests debian 10 amd64 part 0
  • Topotests Ubuntu 18.04 i386 part 3
  • Addresssanitizer topotests part 9
  • Topotests Ubuntu 18.04 arm8 part 7

Copy link
Member

@chiragshah6 chiragshah6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good finding.

@NetDEF-CI
Copy link
Collaborator

Continuous Integration Result: SUCCESSFUL

Congratulations, this patch passed basic tests

Tested-by: NetDEF / OpenSourceRouting.org CI System

CI System Testrun URL: https://ci1.netdef.org/browse/FRR-PULLREQ2-13092/

This is a comment from an automated CI system.
For questions and feedback in regards to this CI system, please feel free to email
Martin Winter - mwinter (at) opensourcerouting.org.

Copy link
Member

@riw777 riw777 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

@ton31337
Copy link
Member

@Mergifyio backport stable/9.0 stable/8.5

Copy link

mergify bot commented Feb 12, 2024

backport stable/9.0 stable/8.5

✅ Backports have been created

ton31337 added a commit that referenced this pull request Feb 13, 2024
zebra: fix nhg out of sync between zebra and kernel (backport #14080)
ton31337 added a commit that referenced this pull request Feb 13, 2024
zebra: fix nhg out of sync between zebra and kernel (backport #14080)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants