-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BGPd can't handle nexthop cache correctly with BGP unnumbered in VRF #14545
Comments
Maybe it might be related a bit #8452? |
@ton31337 Thanks for your comment. I think it's different issue because #8452 is trying to use IPv6 link local address as a neighbor address in VRF BGP instance, but this issue will happen interface name as a neighbor address. For example, in my setup, I configured VRF BGP instance as follow. BGPd can establish BGP peers with RT02 and RT03. However, once I deleted one of these interfaces (e.g. eth1, eth2), BGPd will invalidate BGP routes advertised from another BGP peer and BGP peer will not go down. eth2 is not related with eth1. So, I think BGPd should not consider BGP routes from RT03 as invalid.
|
I suggested that PR because it seems that a wrong interface (scope_id technically) is picked here also. I can check this next week. |
Sorry, but my previous setup is not correct and I retested with master branch (4a60045) and this issue didn't happend. So, let me close this issue.
|
Describe the bug
When we delete one of interfaces bound to VRF and used for BGP unnumbered, the routes advertised from other interfaces will be invalidated.
BGPd can report BGP nexthop cache by executing
show bgp vrf [VRF] nexthop
. However, when we use BGP unnumbered in VRF BGP instance, BGPd will report incorrect BGP cache (described in "To reproduce / Expected behavior").To Reproduce / Expected behavior
I prepared Containerlab manifest to reproduce this behavior, but of course, this issue can be reproduced by doing the similar setup.
https://github.com/proelbtn/containerlab-frr-bgp-unnumbered-in-vrf
In order to reporoduce this issue, we need to prepare at least 3 nodes (rt01, rt02, rt03). In rt01, eth1 and eth2 need to be bound to VRF device (in the above Containerlab manifest, I created vrf1 VRF in the startup script).
After all BGP neighbors established, we can check some issue.
First issue is that BGPd reports corrupted nexthop cache. RT01's BGPd reports both
fe80::a8c1:abff:fe95:6ba
andfe80::a8c1:abff:fedc:8c37
are bound to interface eth2. However,fe80::a8c1:abff:fedc:8c37
is a link local address of eth1 in RT02. So, BGPd should reportsfe80::a8c1:abff:fedc:8c37
is bound to interface eth1.Zebra can report the relationship between interfaces and neighbors, so it seems the matter of BGPd.
This issue may occur another issue. At first, BGPd in RT01 can establish BGP neighbors with RT02 and RT03. In the above example, RT02 and RT03 advertise 0.0.0.0/0 to RT01. So, both routes are considered as valid.
Next, I deleted one of interfaces bound to VRF device (in this case, I deleted eth2 in RT01). Then, BGPd will invalidate routes advertised from eth1 (not eth2).
In this timing, BGPd reports that nexthop
fe80::a8c1:abff:fedc:8c37
is invalid. However, of course, because eth1 still exists, this nexthop should be valid.Versions
I reproduced in the following environments.
master: 4a60045 (I built from master branch with ./docker/alpine/build.sh)The text was updated successfully, but these errors were encountered: