BGPd can't handle nexthop cache correctly with BGP unnumbered in VRF #14545

proelbtn · 2023-10-06T19:05:18Z

Describe the bug

Did you check if this is a duplicate issue?
Did you test it on the latest FRRouting/frr master branch?

When we delete one of interfaces bound to VRF and used for BGP unnumbered, the routes advertised from other interfaces will be invalidated.

+ sudo docker exec -it rt01 vtysh -c 'show bgp vrf vrf1 ipv4 unicast'
% Can't open configuration file /etc/frr/vtysh.conf due to 'No such file or directory'.
BGP table version is 2, local router ID is 1.1.1.1, vrf id 2
Default local pref 100, local AS 1
Status codes:  s suppressed, d damped, h history, * valid, > best, = multipath,
               i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes:  i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

    Network          Next Hop            Metric LocPrf Weight Path
 *= 0.0.0.0/0        eth1                     0             0 2 i
 *>                  eth2                     0             0 3 i

Displayed  1 routes and 2 total paths

+ sudo docker exec rt01 ip link delete eth2

+ sudo docker exec -it rt01 vtysh -c 'show bgp vrf vrf1 ipv4 unicast'
% Can't open configuration file /etc/frr/vtysh.conf due to 'No such file or directory'.
BGP table version is 3, local router ID is 1.1.1.1, vrf id 2
Default local pref 100, local AS 1
Status codes:  s suppressed, d damped, h history, * valid, > best, = multipath,
               i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes:  i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

    Network          Next Hop            Metric LocPrf Weight Path
    0.0.0.0/0        eth1                     0             0 2 i

Displayed  1 routes and 1 total paths

BGPd can report BGP nexthop cache by executing show bgp vrf [VRF] nexthop. However, when we use BGP unnumbered in VRF BGP instance, BGPd will report incorrect BGP cache (described in "To reproduce / Expected behavior").

To Reproduce / Expected behavior

I prepared Containerlab manifest to reproduce this behavior, but of course, this issue can be reproduced by doing the similar setup.

https://github.com/proelbtn/containerlab-frr-bgp-unnumbered-in-vrf

In order to reporoduce this issue, we need to prepare at least 3 nodes (rt01, rt02, rt03). In rt01, eth1 and eth2 need to be bound to VRF device (in the above Containerlab manifest, I created vrf1 VRF in the startup script).

After all BGP neighbors established, we can check some issue.

First issue is that BGPd reports corrupted nexthop cache. RT01's BGPd reports bothfe80::a8c1:abff:fe95:6ba and fe80::a8c1:abff:fedc:8c37 are bound to interface eth2. However, fe80::a8c1:abff:fedc:8c37 is a link local address of eth1 in RT02. So, BGPd should reports fe80::a8c1:abff:fedc:8c37 is bound to interface eth1.

+ sudo docker exec -it rt01 vtysh -c 'show bgp vrf vrf1 nexthop'
% Can't open configuration file /etc/frr/vtysh.conf due to 'No such file or directory'.
Current BGP nexthop cache:
 fe80::a8c1:abff:fe95:6ba valid [IGP metric 0], #paths 1
  if eth2
  Last update: Fri Oct  6 18:47:52 2023
 fe80::a8c1:abff:fedc:8c37 valid [IGP metric 0], #paths 1
  if eth2
  Last update: Fri Oct  6 18:47:52 2023
 fe80::a8c1:abff:fedc:8c37 valid [IGP metric 0], #paths 0, peer eth1
  Last update: Fri Oct  6 18:47:51 2023
 fe80::a8c1:abff:fe95:6ba valid [IGP metric 0], #paths 0, peer eth2
  Last update: Fri Oct  6 18:47:50 2023

+ sudo docker exec -it rt02 ip addr show eth1
589: eth1@if590: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9500 qdisc noqueue state UP group default
    link/ether aa:c1:ab:dc:8c:37 brd ff:ff:ff:ff:ff:ff link-netnsid 1
    inet6 fe80::a8c1:abff:fedc:8c37/64 scope link
       valid_lft forever preferred_lft forever

+ sudo docker exec -it rt03 ip addr show eth1
591: eth1@if592: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9500 qdisc noqueue state UP group default
    link/ether aa:c1:ab:95:06:ba brd ff:ff:ff:ff:ff:ff link-netnsid 1
    inet6 fe80::a8c1:abff:fe95:6ba/64 scope link
       valid_lft forever preferred_lft forever

Zebra can report the relationship between interfaces and neighbors, so it seems the matter of BGPd.

+ sudo docker exec -it rt01 vtysh -c 'show interface vrf vrf1'
% Can't open configuration file /etc/frr/vtysh.conf due to 'No such file or directory'.
Interface eth1 is up, line protocol is up
  Link ups:       0    last: (never)
  Link downs:     0    last: (never)
  vrf: vrf1
  index 590 metric 0 mtu 9500 speed 10000
  flags: <UP,BROADCAST,RUNNING,MULTICAST>
  Type: Ethernet
  HWaddr: aa:c1:ab:f9:7f:cf
  inet6 fe80::a8c1:abff:fef9:7fcf/64
  Interface Type VETH
  Interface Slave Type Vrf
  protodown: off
  Parent ifindex: 589
  ND advertised reachable time is 0 milliseconds
  ND advertised retransmit interval is 0 milliseconds
  ND advertised hop-count limit is 64 hops
  ND router advertisements sent: 3 rcvd: 5
  ND router advertisements are sent every 3 seconds
  ND router advertisements lifetime tracks ra-interval
  ND router advertisement default router preference is medium
  Hosts use stateless autoconfig for addresses.
  Neighbor address(s):
  inet6 fe80::a8c1:abff:fedc:8c37/128
Interface eth2 is up, line protocol is up
  Link ups:       0    last: (never)
  Link downs:     0    last: (never)
  vrf: vrf1
  index 592 metric 0 mtu 9500 speed 10000
  flags: <UP,BROADCAST,RUNNING,MULTICAST>
  Type: Ethernet
  HWaddr: aa:c1:ab:b0:4b:b7
  inet6 fe80::a8c1:abff:feb0:4bb7/64
  Interface Type VETH
  Interface Slave Type Vrf
  protodown: off
  Parent ifindex: 591
  ND advertised reachable time is 0 milliseconds
  ND advertised retransmit interval is 0 milliseconds
  ND advertised hop-count limit is 64 hops
  ND router advertisements sent: 5 rcvd: 4
  ND router advertisements are sent every 3 seconds
  ND router advertisements lifetime tracks ra-interval
  ND router advertisement default router preference is medium
  Hosts use stateless autoconfig for addresses.
  Neighbor address(s):
  inet6 fe80::a8c1:abff:fe95:6ba/128
Interface vrf1 is up, line protocol is up
  Link ups:       0    last: (never)
  Link downs:     0    last: (never)
  vrf: vrf1
  index 2 metric 0 mtu 65575 speed 0
  flags: <UP,RUNNING,NOARP>
  Type: Ethernet
  HWaddr: 32:56:c0:8e:cb:0d
  Interface Type VRF
  Interface Slave Type None
  protodown: off

This issue may occur another issue. At first, BGPd in RT01 can establish BGP neighbors with RT02 and RT03. In the above example, RT02 and RT03 advertise 0.0.0.0/0 to RT01. So, both routes are considered as valid.

+ sudo docker exec -it rt01 vtysh -c 'show bgp vrf vrf1 ipv4 unicast'
% Can't open configuration file /etc/frr/vtysh.conf due to 'No such file or directory'.
BGP table version is 2, local router ID is 1.1.1.1, vrf id 2
Default local pref 100, local AS 1
Status codes:  s suppressed, d damped, h history, * valid, > best, = multipath,
               i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes:  i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

    Network          Next Hop            Metric LocPrf Weight Path
 *= 0.0.0.0/0        eth1                     0             0 2 i
 *>                  eth2                     0             0 3 i

Displayed  1 routes and 2 total paths

Next, I deleted one of interfaces bound to VRF device (in this case, I deleted eth2 in RT01). Then, BGPd will invalidate routes advertised from eth1 (not eth2).

+ sudo docker exec rt01 ip link delete eth2

+ sudo docker exec -it rt01 vtysh -c 'show bgp vrf vrf1 ipv4 unicast'
% Can't open configuration file /etc/frr/vtysh.conf due to 'No such file or directory'.
BGP table version is 3, local router ID is 1.1.1.1, vrf id 2
Default local pref 100, local AS 1
Status codes:  s suppressed, d damped, h history, * valid, > best, = multipath,
               i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes:  i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

    Network          Next Hop            Metric LocPrf Weight Path
    0.0.0.0/0        eth1                     0             0 2 i

Displayed  1 routes and 1 total paths

In this timing, BGPd reports that nexthop fe80::a8c1:abff:fedc:8c37 is invalid. However, of course, because eth1 still exists, this nexthop should be valid.

+ sudo docker exec -it rt01 vtysh -c 'show bgp vrf vrf1 nexthop'
% Can't open configuration file /etc/frr/vtysh.conf due to 'No such file or directory'.
Current BGP nexthop cache:
 fe80::a8c1:abff:fedc:8c37 invalid, #paths 1
  Must be Connected
  Last update: Fri Oct  6 18:47:54 2023
 fe80::a8c1:abff:fedc:8c37 valid [IGP metric 0], #paths 0, peer eth1
  Last update: Fri Oct  6 18:47:51 2023
 fe80::a8c1:abff:fe95:6ba invalid, #paths 0, peer eth2
  Must be Connected
  Last update: Fri Oct  6 18:47:54 2023

Versions

I reproduced in the following environments.

OS Version: Rocky Linux 8.7 (Green Obsidian)
Kernel: 4.18.0-425.13.1.el8_7.x86_64
FRR Version:
- 8.5.0 (quay.io/frrouting/frr:8.5.0)
- 9.0.1 (quay.io/frrouting/frr:9.0.1)
- ~~master: 4a60045 (I built from master branch with ./docker/alpine/build.sh)~~

The text was updated successfully, but these errors were encountered:

ton31337 · 2023-10-08T05:41:36Z

Maybe it might be related a bit #8452?

proelbtn · 2023-10-08T10:26:52Z

@ton31337 Thanks for your comment. I think it's different issue because #8452 is trying to use IPv6 link local address as a neighbor address in VRF BGP instance, but this issue will happen interface name as a neighbor address.

For example, in my setup, I configured VRF BGP instance as follow. BGPd can establish BGP peers with RT02 and RT03. However, once I deleted one of these interfaces (e.g. eth1, eth2), BGPd will invalidate BGP routes advertised from another BGP peer and BGP peer will not go down. eth2 is not related with eth1. So, I think BGPd should not consider BGP routes from RT03 as invalid.

router bgp 1 vrf vrf1
 bgp router-id 1.1.1.1
 no bgp ebgp-requires-policy
 no bgp default ipv4-unicast
 bgp bestpath as-path multipath-relax
 neighbor eth1 interface
 neighbor eth1 remote-as external
 neighbor eth1 timers 3 9
 neighbor eth1 capability extended-nexthop
 neighbor eth2 interface
 neighbor eth2 remote-as external
 neighbor eth2 timers 3 9
 neighbor eth2 capability extended-nexthop
 !
 address-family ipv4 unicast
  neighbor eth1 activate
  neighbor eth1 soft-reconfiguration inbound
  neighbor eth2 activate
  neighbor eth2 soft-reconfiguration inbound
 exit-address-family
exit

ton31337 · 2023-10-08T16:39:39Z

I suggested that PR because it seems that a wrong interface (scope_id technically) is picked here also. I can check this next week.

proelbtn · 2023-10-09T18:22:14Z

Sorry, but my previous setup is not correct and I retested with master branch (4a60045) and this issue didn't happend. So, let me close this issue.

+ sudo docker exec -it rt01 vtysh -c 'show bgp vrf vrf1 ipv4 unicast'
% Can't open configuration file /etc/frr/vtysh.conf due to 'No such file or directory'.
Configuration file[/etc/frr/frr.conf] processing failure: 11
BGP table version is 3, local router ID is 1.1.1.1, vrf id 2
Default local pref 100, local AS 1
Status codes:  s suppressed, d damped, h history, * valid, > best, = multipath,
               i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes:  i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

    Network          Next Hop            Metric LocPrf Weight Path
 *> 0.0.0.0/0        eth1                     0             0 2 i

Displayed  1 routes and 1 total paths

+ sudo docker exec -it rt01 vtysh -c 'show bgp vrf vrf1 nexthop'
% Can't open configuration file /etc/frr/vtysh.conf due to 'No such file or directory'.
Configuration file[/etc/frr/frr.conf] processing failure: 11
Current BGP nexthop cache:
 fe80::a8c1:abff:fe76:1ff9 invalid, #paths 0, peer eth2
  Must be Connected
  Last update: Mon Oct  9 18:21:39 2023
 fe80::a8c1:abff:fecb:e948 valid [IGP metric 0], #paths 1, peer eth1
  Last update: Mon Oct  9 18:21:34 2023

proelbtn added the triage Needs further investigation label Oct 6, 2023

ton31337 added the bgp label Oct 8, 2023

ton31337 self-assigned this Oct 8, 2023

proelbtn closed this as completed Oct 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BGPd can't handle nexthop cache correctly with BGP unnumbered in VRF #14545

BGPd can't handle nexthop cache correctly with BGP unnumbered in VRF #14545

proelbtn commented Oct 6, 2023 •

edited

Loading

ton31337 commented Oct 8, 2023

proelbtn commented Oct 8, 2023

ton31337 commented Oct 8, 2023

proelbtn commented Oct 9, 2023

BGPd can't handle nexthop cache correctly with BGP unnumbered in VRF #14545

BGPd can't handle nexthop cache correctly with BGP unnumbered in VRF #14545

Comments

proelbtn commented Oct 6, 2023 • edited Loading

ton31337 commented Oct 8, 2023

proelbtn commented Oct 8, 2023

ton31337 commented Oct 8, 2023

proelbtn commented Oct 9, 2023

proelbtn commented Oct 6, 2023 •

edited

Loading