Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong delete/add of connected routes when interface flaps #14488

Closed
mwinter-osr opened this issue Sep 25, 2023 · 6 comments
Closed

Wrong delete/add of connected routes when interface flaps #14488

mwinter-osr opened this issue Sep 25, 2023 · 6 comments
Labels
triage Needs further investigation

Comments

@mwinter-osr
Copy link
Member

This is seen in Release 9.0, 9.0.1 and latest master as of Sept 25, 2023 (sha 1c829fa)

I have a topology as follows:

 R1 ========= R2
 |            |
 |            |
 R4 ========= R3

(== are dual links)
and run OSPFv3

I see the following routes on R2:

r2# sh ipv6 routes
% Unknown command: sh ipv6 routes
ipv6-r2# sh ipv6 route
Codes: K - kernel route, C - connected, S - static, R - RIPng,
       O - OSPFv3, I - IS-IS, B - BGP, N - NHRP, T - Table,
       v - VNC, V - VNC-Direct, A - Babel, F - PBR,
       f - OpenFabric,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

O>* fc00::1/128 [110/11] via fe80::5054:ff:fead:5bb2, enp8s0, weight 1, 00:00:01
O   fc00::2/128 [110/10] is directly connected, lo, weight 1, 00:00:06
C>* fc00::2/128 is directly connected, lo, 00:00:15
O>* fc00::3/128 [110/11] via fe80::5054:ff:fedb:3f4f, enp9s0, weight 1, 00:00:01
O>* fc00::4/128 [110/12] via fe80::5054:ff:fead:5bb2, enp8s0, weight 1, 00:00:01
  *                      via fe80::5054:ff:fedb:3f4f, enp9s0, weight 1, 00:00:01
O   fc00:1::/64 [110/1] is directly connected, enp7s0, weight 1, 00:00:01
C>* fc00:1::/64 is directly connected, enp7s0, 00:00:13
O   fc00:2::/64 [110/1] is directly connected, enp8s0, weight 1, 00:00:01
C>* fc00:2::/64 is directly connected, enp8s0, 00:00:13
O   fc00:3::/64 [110/1] is directly connected, enp9s0, weight 1, 00:00:01
C>* fc00:3::/64 is directly connected, enp9s0, 00:00:14
O>* fc00:4::/64 [110/2] via fe80::5054:ff:fedb:3f4f, enp9s0, weight 1, 00:00:01
O>* fc00:5::/64 [110/2] via fe80::5054:ff:fedb:3f4f, enp9s0, weight 1, 00:00:01
O>* fc00:6::/64 [110/2] via fe80::5054:ff:fead:5bb2, enp8s0, weight 1, 00:00:01
C * fe80::/64 is directly connected, enp8s0, 00:00:14
C * fe80::/64 is directly connected, enp7s0, 00:00:14
C * fe80::/64 is directly connected, enp1s0, 00:00:14
C>* fe80::/64 is directly connected, enp9s0, 00:00:14

(Pay attention to last 4 connected routes - one fe80::/64 for each interface)

Now disabling enp8s0 and enp9s0:

r2:~# ip link set dev enp8s0 down; ip link set dev enp9s0 down

Routing table now looks like this:

ipv6-r2# sh ipv6 route
Codes: K - kernel route, C - connected, S - static, R - RIPng,
       O - OSPFv3, I - IS-IS, B - BGP, N - NHRP, T - Table,
       v - VNC, V - VNC-Direct, A - Babel, F - PBR,
       f - OpenFabric,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

O>* fc00::1/128 [110/11] via fe80::5054:ff:fec0:1213, enp7s0, weight 1, 00:00:03
O   fc00::2/128 [110/10] is directly connected, lo, weight 1, 00:00:17
C>* fc00::2/128 is directly connected, lo, 00:00:26
O>* fc00::3/128 [110/13] via fe80::5054:ff:fec0:1213, enp7s0, weight 1, 00:00:03
O>* fc00::4/128 [110/12] via fe80::5054:ff:fec0:1213, enp7s0, weight 1, 00:00:03
O   fc00:1::/64 [110/1] is directly connected, enp7s0, weight 1, 00:00:12
C>* fc00:1::/64 is directly connected, enp7s0, 00:00:24
O>* fc00:2::/64 [110/2] via fe80::5054:ff:fec0:1213, enp7s0, weight 1, 00:00:03
O>* fc00:3::/64 [110/4] via fe80::5054:ff:fec0:1213, enp7s0, weight 1, 00:00:03
O>* fc00:4::/64 [110/3] via fe80::5054:ff:fec0:1213, enp7s0, weight 1, 00:00:03
O>* fc00:5::/64 [110/3] via fe80::5054:ff:fec0:1213, enp7s0, weight 1, 00:00:03
O>* fc00:6::/64 [110/2] via fe80::5054:ff:fec0:1213, enp7s0, weight 1, 00:00:03
C * fe80::/64 is directly connected, enp1s0, 00:00:25
C>* fe80::/64 is directly connected, enp9s0, 00:00:25

--> Notice the last 2 entries: enp7s0 and en8s0 got removed, but I've disabled enp8s0 and enp9s0.
Wrong routes get removed.

The fix is most likely PR #13340 - A quick test shows that this is fixed with this PR

@mwinter-osr mwinter-osr added the triage Needs further investigation label Sep 25, 2023
@subsecond
Copy link
Contributor

subsecond commented Sep 26, 2023

Network Diagram

+-------------+      +-------------+
|             |      |             |
|   TEST-R1   |      |   TEST-R2   |
|             |      |             |
+------+------+      +------+------+
|      | eth1 +------+ eth1 |      |
+------+------+      +------+------+
| eth3 | eth2 +------+ eth2 | eth3 |
+---+--+------+      +------+--+---+
    |                          |
    |                          |
    |                          |
+---+--+------+      +------+--+---+
| eth1 | eth2 +------+ eth2 | eth1 |
+------+------+      +------+------+
|      | eth3 +------+ eth3 |      |
+------+------+      +------+------+
|             |      |             |
|   TEST-R4   |      |   TEST-R3   |
|             |      |             |
+-------------+      +-------------+

Configuration

test-r1
# Drop network configuration
cat > /etc/network/interfaces <<'EOF'
auto lo
iface lo inet loopback
    alias TEST-R1
    address 192.0.2.1/32
    address 2001:db8::1/128
 
# Management network interface
auto eth0
iface eth0 inet dhcp
    vrf mgmt
 
auto mgmt
iface mgmt
    address 127.0.0.1/8
    vrf-table auto
 
auto eth1
iface eth1
    alias TEST-R2
    address 192.0.2.1/32
    address 2001:db8::1/128
 
auto eth2
iface eth2
    alias TEST-R2
    address 192.0.2.1/32
    address 2001:db8::1/128
 
auto eth3
iface eth3
    alias TEST-R4
    address 192.0.2.1/32
    address 2001:db8::1/128
EOF

# Drop FRR configuration
cat > /etc/frr/frr.conf <<'EOF'
frr version 7.5.1
frr defaults traditional
hostname test-r1
log syslog informational
service integrated-vtysh-config
!
ip router-id 192.0.2.1
!
interface eth1
 ip ospf area 0.0.0.0
 ip ospf dead-interval minimal hello-multiplier 4
 ip ospf network point-to-point
 ipv6 ospf6 dead-interval 4
 ipv6 ospf6 hello-interval 1
 ipv6 ospf6 network point-to-point
!
interface eth2
 ip ospf area 0.0.0.0
 ip ospf dead-interval minimal hello-multiplier 4
 ip ospf network point-to-point
 ipv6 ospf6 dead-interval 4
 ipv6 ospf6 hello-interval 1
 ipv6 ospf6 network point-to-point
!
interface eth3
 ip ospf area 0.0.0.0
 ip ospf dead-interval minimal hello-multiplier 4
 ip ospf network point-to-point
 ipv6 ospf6 dead-interval 4
 ipv6 ospf6 hello-interval 1
 ipv6 ospf6 network point-to-point
!
interface lo
 ip ospf area 0.0.0.0
 ipv6 ospf6 passive
!
router ospf
 log-adjacency-changes
 passive-interface default
 no passive-interface eth1
 no passive-interface eth2
 no passive-interface eth3
!
router ospf6
 log-adjacency-changes
 interface lo area 0.0.0.0
 interface eth1 area 0.0.0.0
 interface eth2 area 0.0.0.0
 interface eth3 area 0.0.0.0
!
line vty
!
EOF
test-r2
# Drop network configuration
cat > /etc/network/interfaces <<'EOF'
auto lo
iface lo inet loopback
    alias TEST-R2
    address 192.0.2.2/32
    address 2001:db8::2/128
 
# Management network interface
auto eth0
iface eth0 inet dhcp
    vrf mgmt
 
auto mgmt
iface mgmt
    address 127.0.0.1/8
    vrf-table auto
 
auto eth1
iface eth1
    alias TEST-R1
    address 192.0.2.2/32
    address 2001:db8::2/128
 
auto eth2
iface eth2
    alias TEST-R1
    address 192.0.2.2/32
    address 2001:db8::2/128
 
auto eth3
iface eth3
    alias TEST-R3
    address 192.0.2.2/32
    address 2001:db8::2/128
EOF

# Drop FRR configuration
cat > /etc/frr/frr.conf <<'EOF'
frr version 7.5.1
frr defaults traditional
hostname test-r1
log syslog informational
service integrated-vtysh-config
!
ip router-id 192.0.2.2
!
interface eth1
 ip ospf area 0.0.0.0
 ip ospf dead-interval minimal hello-multiplier 4
 ip ospf network point-to-point
 ipv6 ospf6 dead-interval 4
 ipv6 ospf6 hello-interval 1
 ipv6 ospf6 network point-to-point
!
interface eth2
 ip ospf area 0.0.0.0
 ip ospf dead-interval minimal hello-multiplier 4
 ip ospf network point-to-point
 ipv6 ospf6 dead-interval 4
 ipv6 ospf6 hello-interval 1
 ipv6 ospf6 network point-to-point
!
interface eth3
 ip ospf area 0.0.0.0
 ip ospf dead-interval minimal hello-multiplier 4
 ip ospf network point-to-point
 ipv6 ospf6 dead-interval 4
 ipv6 ospf6 hello-interval 1
 ipv6 ospf6 network point-to-point
!
interface lo
 ip ospf area 0.0.0.0
 ipv6 ospf6 passive
!
router ospf
 log-adjacency-changes
 passive-interface default
 no passive-interface eth1
 no passive-interface eth2
 no passive-interface eth3
!
router ospf6
 log-adjacency-changes
 interface lo area 0.0.0.0
 interface eth1 area 0.0.0.0
 interface eth2 area 0.0.0.0
 interface eth3 area 0.0.0.0
!
line vty
!
EOF
test-r3
# Drop network configuration
cat > /etc/network/interfaces <<'EOF'
auto lo
iface lo inet loopback
    alias TEST-R3
    address 192.0.2.3/32
    address 2001:db8::3/128
 
# Management network interface
auto eth0
iface eth0 inet dhcp
    vrf mgmt
 
auto mgmt
iface mgmt
    address 127.0.0.1/8
    vrf-table auto
 
auto eth1
iface eth1
    alias TEST-R2
    address 192.0.2.3/32
    address 2001:db8::3/128
 
auto eth2
iface eth2
    alias TEST-R4
    address 192.0.2.3/32
    address 2001:db8::3/128
 
auto eth3
iface eth3
    alias TEST-R4
    address 192.0.2.3/32
    address 2001:db8::3/128
EOF

# Drop FRR configuration
cat > /etc/frr/frr.conf <<'EOF'
frr version 7.5.1
frr defaults traditional
hostname test-r1
log syslog informational
service integrated-vtysh-config
!
ip router-id 192.0.2.3
!
interface eth1
 ip ospf area 0.0.0.0
 ip ospf dead-interval minimal hello-multiplier 4
 ip ospf network point-to-point
 ipv6 ospf6 dead-interval 4
 ipv6 ospf6 hello-interval 1
 ipv6 ospf6 network point-to-point
!
interface eth2
 ip ospf area 0.0.0.0
 ip ospf dead-interval minimal hello-multiplier 4
 ip ospf network point-to-point
 ipv6 ospf6 dead-interval 4
 ipv6 ospf6 hello-interval 1
 ipv6 ospf6 network point-to-point
!
interface eth3
 ip ospf area 0.0.0.0
 ip ospf dead-interval minimal hello-multiplier 4
 ip ospf network point-to-point
 ipv6 ospf6 dead-interval 4
 ipv6 ospf6 hello-interval 1
 ipv6 ospf6 network point-to-point
!
interface lo
 ip ospf area 0.0.0.0
 ipv6 ospf6 passive
!
router ospf
 log-adjacency-changes
 passive-interface default
 no passive-interface eth1
 no passive-interface eth2
 no passive-interface eth3
!
router ospf6
 log-adjacency-changes
 interface lo area 0.0.0.0
 interface eth1 area 0.0.0.0
 interface eth2 area 0.0.0.0
 interface eth3 area 0.0.0.0
!
line vty
!
EOF
test-r4
# Drop network configuration
cat > /etc/network/interfaces <<'EOF'
auto lo
iface lo inet loopback
    alias TEST-R4
    address 192.0.2.4/32
    address 2001:db8::4/128
 
# Management network interface
auto eth0
iface eth0 inet dhcp
    vrf mgmt
 
auto mgmt
iface mgmt
    address 127.0.0.1/8
    vrf-table auto
 
auto eth1
iface eth1
    alias TEST-R1
    address 192.0.2.4/32
    address 2001:db8::4/128
 
auto eth2
iface eth2
    alias TEST-R3
    address 192.0.2.4/32
    address 2001:db8::4/128
 
auto eth3
iface eth3
    alias TEST-R3
    address 192.0.2.4/32
    address 2001:db8::4/128
EOF

# Drop FRR configuration
cat > /etc/frr/frr.conf <<'EOF'
frr version 7.5.1
frr defaults traditional
hostname test-r1
log syslog informational
service integrated-vtysh-config
!
ip router-id 192.0.2.4
!
interface eth1
 ip ospf area 0.0.0.0
 ip ospf dead-interval minimal hello-multiplier 4
 ip ospf network point-to-point
 ipv6 ospf6 dead-interval 4
 ipv6 ospf6 hello-interval 1
 ipv6 ospf6 network point-to-point
!
interface eth2
 ip ospf area 0.0.0.0
 ip ospf dead-interval minimal hello-multiplier 4
 ip ospf network point-to-point
 ipv6 ospf6 dead-interval 4
 ipv6 ospf6 hello-interval 1
 ipv6 ospf6 network point-to-point
!
interface eth3
 ip ospf area 0.0.0.0
 ip ospf dead-interval minimal hello-multiplier 4
 ip ospf network point-to-point
 ipv6 ospf6 dead-interval 4
 ipv6 ospf6 hello-interval 1
 ipv6 ospf6 network point-to-point
!
interface lo
 ip ospf area 0.0.0.0
 ipv6 ospf6 passive
!
router ospf
 log-adjacency-changes
 passive-interface default
 no passive-interface eth1
 no passive-interface eth2
 no passive-interface eth3
!
router ospf6
 log-adjacency-changes
 interface lo area 0.0.0.0
 interface eth1 area 0.0.0.0
 interface eth2 area 0.0.0.0
 interface eth3 area 0.0.0.0
!
line vty
!
EOF

Expected Behavior

When all interfaces are up:

root@test-r1:~# vtysh

Hello, this is FRRouting (version 7.5.1).
Copyright 1996-2005 Kunihiro Ishiguro, et al.

test-r1# show ipv6 ospf nei
Neighbor ID     Pri    DeadTime    State/IfState         Duration I/F[State]
192.0.2.2         1    00:00:03     Full/PointToPoint    00:00:07 eth1[PointToPoint]
192.0.2.2         1    00:00:03     Full/PointToPoint    00:00:07 eth2[PointToPoint]
192.0.2.4         1    00:00:03     Full/PointToPoint    00:00:12 eth3[PointToPoint]

test-r1# show ipv6 route
Codes: K - kernel route, C - connected, S - static, R - RIPng,
       O - OSPFv3, I - IS-IS, B - BGP, N - NHRP, T - Table,
       v - VNC, V - VNC-Direct, A - Babel, D - SHARP, F - PBR,
       f - OpenFabric,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup

O   2001:db8::1/128 [110/1] is directly connected, eth1, weight 1, 00:00:17
C * 2001:db8::1/128 is directly connected, eth2, 00:00:18
C * 2001:db8::1/128 is directly connected, eth3, 00:00:18
C * 2001:db8::1/128 is directly connected, eth1, 00:00:18
C>* 2001:db8::1/128 is directly connected, lo, 00:00:18
O>* 2001:db8::2/128 [110/2] via fe80::f816:3eff:fe11:93b2, eth1, weight 1, 00:00:07
  *                         via fe80::f816:3eff:fe6e:f330, eth2, weight 1, 00:00:07
O>* 2001:db8::3/128 [110/3] via fe80::f816:3eff:fe6e:f330, eth2, weight 1, 00:00:07
  *                         via fe80::f816:3eff:fef6:1e36, eth3, weight 1, 00:00:07
O>* 2001:db8::4/128 [110/2] via fe80::f816:3eff:fef6:1e36, eth3, weight 1, 00:00:12
C * fe80::/64 is directly connected, eth3, 00:00:17
C * fe80::/64 is directly connected, eth2, 00:00:18
C>* fe80::/64 is directly connected, eth1, 00:00:18

(note the last three entries as well as the ones learned via OSPFv3)

Now take the links between test-r1 and test-r2 down:

root@test-r1:~# ip link set down dev eth1
root@test-r1:~# ip link set down dev eth2

Expected output:

root@test-r1:~# vtysh

Hello, this is FRRouting (version 7.5.1).
Copyright 1996-2005 Kunihiro Ishiguro, et al.

test-r1# show ipv6 ospf nei
Neighbor ID     Pri    DeadTime    State/IfState         Duration I/F[State]
192.0.2.4         1    00:00:03     Full/PointToPoint    00:00:34 eth3[PointToPoint]

test-r1# show ipv6 route
Codes: K - kernel route, C - connected, S - static, R - RIPng,
       O - OSPFv3, I - IS-IS, B - BGP, N - NHRP, T - Table,
       v - VNC, V - VNC-Direct, A - Babel, D - SHARP, F - PBR,
       f - OpenFabric,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup

O   2001:db8::1/128 [110/1] is directly connected, eth3, weight 1, 00:00:04
C * 2001:db8::1/128 is directly connected, eth3, 00:00:38
C>* 2001:db8::1/128 is directly connected, lo, 00:00:38
O>* 2001:db8::2/128 [110/4] via fe80::f816:3eff:fef6:1e36, eth3, weight 1, 00:00:06
O>* 2001:db8::3/128 [110/3] via fe80::f816:3eff:fef6:1e36, eth3, weight 1, 00:00:06
O>* 2001:db8::4/128 [110/2] via fe80::f816:3eff:fef6:1e36, eth3, weight 1, 00:00:32
C>* fe80::/64 is directly connected, eth3, 00:00:37

Bottom line: Traffic to all three routers is correctly rerouted via eth3.

Now take the links back up:

root@test-r1:~# ip link set up dev eth1
root@test-r1:~# ip link set up dev eth2

Expected output:

root@test-r1:~# vtysh

Hello, this is FRRouting (version 7.5.1).
Copyright 1996-2005 Kunihiro Ishiguro, et al.

test-r1# show ipv6 ospf nei
Neighbor ID     Pri    DeadTime    State/IfState         Duration I/F[State]
192.0.2.2         1    00:00:03     Full/PointToPoint    00:00:08 eth1[PointToPoint]
192.0.2.2         1    00:00:03     Full/PointToPoint    00:00:06 eth2[PointToPoint]
192.0.2.4         1    00:00:03     Full/PointToPoint    00:05:03 eth3[PointToPoint]

test-r1# show ipv6 route
Codes: K - kernel route, C - connected, S - static, R - RIPng,
       O - OSPFv3, I - IS-IS, B - BGP, N - NHRP, T - Table,
       v - VNC, V - VNC-Direct, A - Babel, D - SHARP, F - PBR,
       f - OpenFabric,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup

O   2001:db8::1/128 [110/1] is directly connected, eth1, weight 1, 00:00:19
C * 2001:db8::1/128 is directly connected, eth2, 00:00:20
C * 2001:db8::1/128 is directly connected, eth1, 00:00:21
C * 2001:db8::1/128 is directly connected, eth3, 00:05:15
C>* 2001:db8::1/128 is directly connected, lo, 00:05:15
O>* 2001:db8::2/128 [110/2] via fe80::f816:3eff:fe11:93b2, eth1, weight 1, 00:00:17
  *                         via fe80::f816:3eff:fe6e:f330, eth2, weight 1, 00:00:17
O>* 2001:db8::3/128 [110/3] via fe80::f816:3eff:fe6e:f330, eth2, weight 1, 00:00:17
  *                         via fe80::f816:3eff:fef6:1e36, eth3, weight 1, 00:00:17
O>* 2001:db8::4/128 [110/2] via fe80::f816:3eff:fef6:1e36, eth3, weight 1, 00:05:09
C * fe80::/64 is directly connected, eth2, 00:00:18
C * fe80::/64 is directly connected, eth1, 00:00:19
C>* fe80::/64 is directly connected, eth3, 00:05:14

Bottom line: Routes are correctly re-added.

Actual Behavior (tested using FRR 9.0.1 from deb.frrouting.org)

When all interfaces are up:

root@test-r1:~# vtysh

Hello, this is FRRouting (version 9.0.1).
Copyright 1996-2005 Kunihiro Ishiguro, et al.

test-r1# show ipv6 ospf nei
Neighbor ID     Pri    DeadTime    State/IfState         Duration I/F[State]
192.0.2.2         1    00:00:03     Full/PointToPoint    00:00:09 eth1[PointToPoint]
192.0.2.2         1    00:00:03     Full/PointToPoint    00:00:09 eth2[PointToPoint]
192.0.2.4         1    00:00:03     Full/PointToPoint    00:00:09 eth3[PointToPoint]

test-r1# show ipv6 route
Codes: K - kernel route, C - connected, S - static, R - RIPng,
       O - OSPFv3, I - IS-IS, B - BGP, N - NHRP, T - Table,
       v - VNC, V - VNC-Direct, A - Babel, F - PBR,
       f - OpenFabric,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

C * 2001:db8::1/128 is directly connected, eth2, 00:00:13
O   2001:db8::1/128 [110/1] is directly connected, eth1, weight 1, 00:00:13
C * 2001:db8::1/128 is directly connected, eth1, 00:00:13
C * 2001:db8::1/128 is directly connected, eth3, 00:00:13
C>* 2001:db8::1/128 is directly connected, lo, 00:00:13
O>* 2001:db8::2/128 [110/2] via fe80::f816:3eff:fe11:93b2, eth1, weight 1, 00:00:07
  *                         via fe80::f816:3eff:fe6e:f330, eth2, weight 1, 00:00:07
O>* 2001:db8::3/128 [110/3] via fe80::f816:3eff:fe11:93b2, eth1, weight 1, 00:00:07
  *                         via fe80::f816:3eff:fe6e:f330, eth2, weight 1, 00:00:07
  *                         via fe80::f816:3eff:fef6:1e36, eth3, weight 1, 00:00:07
O>* 2001:db8::4/128 [110/2] via fe80::f816:3eff:fef6:1e36, eth3, weight 1, 00:00:07
C * fe80::/64 is directly connected, eth3, 00:00:13
C * fe80::/64 is directly connected, eth2, 00:00:13
C>* fe80::/64 is directly connected, eth1, 00:00:13

(note the last three entries as well as the ones learned via OSPFv3)

Now take the links between test-r1 and test-r2 down:

root@test-r1:~# ip link set down dev eth1
root@test-r1:~# ip link set down dev eth2

You will then see the following errors in the log:

Sep 26 13:53:43 test-r1 zebra[572]: [X5XE1-RS0SW][EC 4043309074] Failed to install Nexthop (41[fe80::f816:3eff:fe6e:f330 if 4]) into the kernel
Sep 26 13:53:43 test-r1 zebra[572]: [VYKYC-709DP] default(0:254):2001:db8::2/128: Route install failed
Sep 26 13:53:43 test-r1 zebra[572]: [X5XE1-RS0SW][EC 4043309074] Failed to install Nexthop (20[if 3]) into the kernel

Actual output:

root@test-r1:~# vtysh

Hello, this is FRRouting (version 9.0.1).
Copyright 1996-2005 Kunihiro Ishiguro, et al.

test-r1# show ipv6 ospf nei
Neighbor ID     Pri    DeadTime    State/IfState         Duration I/F[State]
192.0.2.4         1    00:00:03     Full/PointToPoint    00:00:39 eth3[PointToPoint]

test-r1# show ipv6 route
Codes: K - kernel route, C - connected, S - static, R - RIPng,
       O - OSPFv3, I - IS-IS, B - BGP, N - NHRP, T - Table,
       v - VNC, V - VNC-Direct, A - Babel, F - PBR,
       f - OpenFabric,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

O   2001:db8::1/128 [110/1] is directly connected, eth3, weight 1, 00:00:08
C * 2001:db8::1/128 is directly connected, eth3, 00:00:44
C>* 2001:db8::1/128 is directly connected, lo, 00:00:44
O>* 2001:db8::2/128 [110/4] via fe80::f816:3eff:fef6:1e36, eth3, weight 1, 00:00:10
O>* 2001:db8::3/128 [110/3] via fe80::f816:3eff:fef6:1e36, eth3, weight 1, 00:00:10
O>* 2001:db8::4/128 [110/2] via fe80::f816:3eff:fef6:1e36, eth3, weight 1, 00:00:38
C>* fe80::/64 is directly connected, eth1, 00:00:44

(we expect a link-local directly connected route on eth3, but not on eth1)

Bring the links back up:

root@test-r1:~# ip link set down up eth1
root@test-r1:~# ip link set down up eth2

Actual output:

root@test-r1:~# vtysh

Hello, this is FRRouting (version 9.0.1).
Copyright 1996-2005 Kunihiro Ishiguro, et al.

test-r1# show ipv6 ospf nei
Neighbor ID     Pri    DeadTime    State/IfState         Duration I/F[State]
192.0.2.2         1    00:00:03     Full/PointToPoint    00:00:02 eth1[PointToPoint]
192.0.2.2         1    00:00:03     Full/PointToPoint    00:00:00 eth2[PointToPoint]
192.0.2.4         1    00:00:03     Full/PointToPoint    00:05:11 eth3[PointToPoint]
test-r1# show ipv6 route
Codes: K - kernel route, C - connected, S - static, R - RIPng,
       O - OSPFv3, I - IS-IS, B - BGP, N - NHRP, T - Table,
       v - VNC, V - VNC-Direct, A - Babel, F - PBR,
       f - OpenFabric,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

O   2001:db8::1/128 [110/1] is directly connected, eth1, weight 1, 00:00:11
C * 2001:db8::1/128 is directly connected, eth2, 00:00:11
C * 2001:db8::1/128 is directly connected, eth1, 00:00:12
C * 2001:db8::1/128 is directly connected, eth3, 00:05:15
C>* 2001:db8::1/128 is directly connected, lo, 00:05:15
O>* 2001:db8::2/128 [110/2] via fe80::f816:3eff:fe11:93b2, eth1, weight 1, 00:00:03
  *                         via fe80::f816:3eff:fe6e:f330, eth2, weight 1, 00:00:03
O>* 2001:db8::3/128 [110/3] via fe80::f816:3eff:fe11:93b2, eth1, weight 1, 00:00:03
  *                         via fe80::f816:3eff:fe6e:f330, eth2, weight 1, 00:00:03
  *                         via fe80::f816:3eff:fef6:1e36, eth3, weight 1, 00:00:03
O>* 2001:db8::4/128 [110/2] via fe80::f816:3eff:fef6:1e36, eth3, weight 1, 00:05:09
C * fe80::/64 is directly connected, eth2, 00:00:09
C * fe80::/64 is directly connected, eth1, 00:00:11
C>* fe80::/64 is directly connected, eth1, 00:05:15

(now we have two link-local connected routes for eth1... but the route for eth3 is missing.)

Next, I will test #13340 on this setup.

@subsecond
Copy link
Contributor

Actual Behavior (tested using FRR 9.1-Dev-PR13340)

All interfaces up:

test-r1# show ipv6 ospf nei
Neighbor ID     Pri    DeadTime    State/IfState         Duration I/F[State]
192.0.2.2         1    00:00:03     Full/PointToPoint    00:01:14 eth1[PointToPoint]
192.0.2.2         1    00:00:03     Full/PointToPoint    00:01:14 eth2[PointToPoint]
192.0.2.4         1    00:00:03     Full/PointToPoint    00:01:14 eth3[PointToPoint]

test-r1# show ipv6 route
Codes: K - kernel route, C - connected, S - static, R - RIPng,
       O - OSPFv3, I - IS-IS, B - BGP, N - NHRP, T - Table,
       v - VNC, V - VNC-Direct, A - Babel, F - PBR,
       f - OpenFabric,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

C * 2001:db8::1/128 is directly connected, eth2, 00:01:19
O   2001:db8::1/128 [110/1] is directly connected, eth1, weight 1, 00:01:19
C * 2001:db8::1/128 is directly connected, eth3, 00:01:19
C * 2001:db8::1/128 is directly connected, eth1, 00:01:19
C>* 2001:db8::1/128 is directly connected, lo, 00:01:19
O>* 2001:db8::2/128 [110/2] via fe80::f816:3eff:fe11:93b2, eth1, weight 1, 00:01:14
  *                         via fe80::f816:3eff:fe6e:f330, eth2, weight 1, 00:01:14
O>* 2001:db8::3/128 [110/3] via fe80::f816:3eff:fe11:93b2, eth1, weight 1, 00:01:14
  *                         via fe80::f816:3eff:fe6e:f330, eth2, weight 1, 00:01:14
  *                         via fe80::f816:3eff:fef6:1e36, eth3, weight 1, 00:01:14
O>* 2001:db8::4/128 [110/2] via fe80::f816:3eff:fef6:1e36, eth3, weight 1, 00:01:14
C * fe80::/64 is directly connected, eth2, 00:01:19
C * fe80::/64 is directly connected, eth1, 00:01:19
C>* fe80::/64 is directly connected, eth3, 00:01:19

Now take the links between test-r1 and test-r2 down:

root@test-r1:~# ip link set down dev eth1
root@test-r1:~# ip link set down dev eth2

Actual output:

root@test-r1:~# vtysh

Hello, this is FRRouting (version 9.1-dev).
Copyright 1996-2005 Kunihiro Ishiguro, et al.

test-r1# show ipv6 ospf nei
Neighbor ID     Pri    DeadTime    State/IfState         Duration I/F[State]
192.0.2.4         1    00:00:03     Full/PointToPoint    00:01:31 eth3[PointToPoint]

test-r1# show ipv6 route
Codes: K - kernel route, C - connected, S - static, R - RIPng,
       O - OSPFv3, I - IS-IS, B - BGP, N - NHRP, T - Table,
       v - VNC, V - VNC-Direct, A - Babel, F - PBR,
       f - OpenFabric,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

O   2001:db8::1/128 [110/1] is directly connected, eth3, weight 1, 00:00:15
C * 2001:db8::1/128 is directly connected, eth1, 00:01:45
C>* 2001:db8::1/128 is directly connected, lo, 00:01:45
O>* 2001:db8::2/128 [110/4] via fe80::f816:3eff:fef6:1e36, eth3, weight 1, 00:00:17
O>* 2001:db8::3/128 [110/3] via fe80::f816:3eff:fef6:1e36, eth3, weight 1, 00:00:17
O>* 2001:db8::4/128 [110/2] via fe80::f816:3eff:fef6:1e36, eth3, weight 1, 00:01:40
C>* fe80::/64 is directly connected, eth3, 00:01:45

While we no longer see wrong next-hops for directly connected link-local routes (eth3 is correct), we see a connected route for the GUA address on eth1:
2001:db8::1/128 is directly connected, eth1, 00:01:45

This gets worse when bringing the links back up:

root@test-r1:~# ip link set up dev eth1
root@test-r1:~# ip link set up dev eth2

Actual output:

test-r1# show ipv6 ospf nei
Neighbor ID     Pri    DeadTime    State/IfState         Duration I/F[State]
192.0.2.2         1    00:00:03  Loading/PointToPoint    00:00:04 eth1[PointToPoint]
192.0.2.2         1    00:00:03  Loading/PointToPoint    00:00:02 eth2[PointToPoint]
192.0.2.4         1    00:00:03     Full/PointToPoint    00:07:24 eth3[PointToPoint]

test-r1# show ipv6 route
Codes: K - kernel route, C - connected, S - static, R - RIPng,
       O - OSPFv3, I - IS-IS, B - BGP, N - NHRP, T - Table,
       v - VNC, V - VNC-Direct, A - Babel, F - PBR,
       f - OpenFabric,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

O   2001:db8::1/128 [110/1] is directly connected, eth1, weight 1, 00:00:08
C * 2001:db8::1/128 is directly connected, eth2, 00:00:08
C * 2001:db8::1/128 is directly connected, eth1, 00:00:09
C * 2001:db8::1/128 is directly connected, eth1, 00:07:28
C>* 2001:db8::1/128 is directly connected, lo, 00:07:28
O>* 2001:db8::2/128 [110/2] via fe80::f816:3eff:fe11:93b2, eth1, weight 1, 00:00:01
  *                         via fe80::f816:3eff:fe6e:f330, eth2, weight 1, 00:00:01
O>* 2001:db8::3/128 [110/3] via fe80::f816:3eff:fe11:93b2, eth1, weight 1, 00:00:01
  *                         via fe80::f816:3eff:fe6e:f330, eth2, weight 1, 00:00:01
  *                         via fe80::f816:3eff:fef6:1e36, eth3, weight 1, 00:00:01
O>* 2001:db8::4/128 [110/2] via fe80::f816:3eff:fef6:1e36, eth3, weight 1, 00:07:23
C * fe80::/64 is directly connected, eth2, 00:00:07
C * fe80::/64 is directly connected, eth1, 00:00:08
C>* fe80::/64 is directly connected, eth3, 00:07:28

We now see two connected routes for eth1 and none for eth3, see this part here:

C * 2001:db8::1/128 is directly connected, eth2, 00:00:08
C * 2001:db8::1/128 is directly connected, eth1, 00:00:09
C * 2001:db8::1/128 is directly connected, eth1, 00:07:28

IMO #13340 does not (completely) fix the issue.

href added a commit to cloudscale-ch/frr that referenced this issue Oct 10, 2023
This is a preview that needs further work.

See FRRouting#14488
href added a commit to cloudscale-ch/frr that referenced this issue Oct 12, 2023
OSPF on IPv4/IPv6 removes the wrong routes in certain cases, causing
issues when removing and re-enabling interfaces. This test proofs that.

Note that the underlying issue presents itself as a race condition at
times, causing some of the tests to pass (rarely).

While no instances of all tests passing in a single run have been
observed, all tests should probably be run multiple times, once the
underlying issue has been fixed, to be sure.

See FRRouting#14488
href added a commit to cloudscale-ch/frr that referenced this issue Oct 12, 2023
OSPF on IPv4/IPv6 removes the wrong routes in certain cases, causing
issues when removing and re-enabling interfaces. This test proofs that.

Note that the underlying issue presents itself as a race condition at
times, causing some of the tests to pass (rarely).

While no instances of all tests passing in a single run have been
observed, all tests should probably be run multiple times, once the
underlying issue has been fixed, to be sure.

See FRRouting#14488
href added a commit to cloudscale-ch/frr that referenced this issue Oct 12, 2023
OSPF on IPv4/IPv6 removes the wrong routes in certain cases, causing
issues when removing and re-enabling interfaces. This test proofs that.

Note that the underlying issue presents itself as a race condition at
times, causing some of the tests to pass (rarely).

While no instances of all tests passing in a single run have been
observed, all tests should probably be run multiple times, once the
underlying issue has been fixed, to be sure.

See FRRouting#14488
href added a commit to cloudscale-ch/frr that referenced this issue Oct 12, 2023
OSPF on IPv4/IPv6 removes the wrong routes in certain cases, causing
issues when removing and re-enabling interfaces. This test proofs that.

These tests all pass with FRRouting#13340
and the latest master (d2324b7).

See FRRouting#14488
href added a commit to cloudscale-ch/frr that referenced this issue Oct 12, 2023
OSPF on IPv4/IPv6 removes the wrong routes in certain cases, causing
issues when removing and re-enabling interfaces. This test proofs that.

These tests all pass with FRRouting#13340
and the latest master (d2324b7).

See FRRouting#14488
href added a commit to cloudscale-ch/frr that referenced this issue Oct 12, 2023
OSPF on IPv4/IPv6 removes the wrong routes in certain cases, causing
issues when removing and re-enabling interfaces. This test proofs that.

These tests all pass with FRRouting#13340
and the latest master (d2324b7).

See FRRouting#14488

Signed-off-by: Denis Krienbühl <[email protected]>
@subsecond
Copy link
Contributor

Update:
We don't see the issue described above when cherry-picking #13340 on current master.

href added a commit to cloudscale-ch/frr that referenced this issue Oct 13, 2023
OSPF on IPv4/IPv6 removes the wrong routes in certain cases, causing
issues when removing and re-enabling interfaces. This test proofs that.

These tests all pass with FRRouting#13340
and the latest master (d2324b7).

See FRRouting#14488

Signed-off-by: Denis Krienbühl <[email protected]>
@subsecond
Copy link
Contributor

@mwinter-osr With the merge of #13340 and an added topotest to catch regressions in #14582 I think we can go ahead and close this issue.

mergify bot pushed a commit that referenced this issue Oct 15, 2023
OSPF on IPv4/IPv6 removes the wrong routes in certain cases, causing
issues when removing and re-enabling interfaces. This test proofs that.

These tests all pass with #13340
and the latest master (d2324b7).

See #14488

Signed-off-by: Denis Krienbühl <[email protected]>
(cherry picked from commit 616e1fa)
donaldsharp pushed a commit to donaldsharp/frr that referenced this issue Mar 20, 2024
OSPF on IPv4/IPv6 removes the wrong routes in certain cases, causing
issues when removing and re-enabling interfaces. This test proofs that.

These tests all pass with FRRouting#13340
and the latest master (d2324b7).

See FRRouting#14488
Ticket: #3684268
Signed-off-by: Denis Krienbühl <[email protected]>
(cherry picked from commit 616e1fa)
Copy link

This issue is stale because it has been open 180 days with no activity. Comment or remove the autoclose label in order to avoid having this issue closed.

@frrbot
Copy link

frrbot bot commented Apr 13, 2024

This issue will be automatically closed in the specified period unless there is further activity.

@frrbot frrbot bot closed this as completed Apr 20, 2024
@frrbot frrbot bot removed the autoclose label Apr 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage Needs further investigation
Projects
None yet
Development

No branches or pull requests

2 participants