-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problems with BGP routes on FRR 10.0-01 #16101
Comments
Could you show the configuration also? |
Sure. ip prefix-list pl-ipv4-wrt-in seq 10 permit 100.64.1.64/27 ge 32 le 32 ipv6 prefix-list pl-ipv6-wrt-out seq 10 permit 2001:db8:fffe::2/128 |
Could you also show interface configuration? "ip add show" (want to see exact configuration, including eth0). |
Sure, I have also added som additional commands.
|
Could you provide these outputs?
|
Sure, I have attached the commands.
|
remote ipv6 peer address is should 2001:db8:4::3 and 2001:db8:5::3. |
I'm having this issue as well with around ~15 lab servers moving from 9.1 to 10.0. I can provide full configurations and troubleshooting if necessary, but I may move most of them back to 9.1 given the number of issues 10.0 has. It looks like the nexthops themselves are incorrect for some reason. Here's a server on 10.0:
Here's a server on 9.1, on the same exact subnet (it has a different IP, but a fairly similar configuration otherwise):
@ton31337 let me know if you want my configuration or output of anything else. Additionally I wanted to point out that I believe |
Suppress duplicates should not influence such a behavior at all, because for outgoing updates only. |
@cliff-ha could you describe what the topology looks like in your case? I'm trying to replicate the same locally, but still struggling. How the peers are connected, and how the route originator is connected also. |
But what's in your case is 10.0.9.193, 195.191.143.10, and 195.191.143.22? |
@cliff-ha just in case to eliminate one question (can you try if that changes something or not if you disable |
195.191.143.10 and 195.191.143.22 router-id of the peering devices. I did try to add
|
That's because you don't have anything in the RIB (no connected routes)... Can you add |
I have just tried to do it again but I am still having the same problem. |
Unfortunately I'm not able to reproduce this locally... Would it be possible to get the debug logs? At least |
Hello, We are trying to deploy FRR v10 with ookla on a RHEL9 server that has two peers to two Nokia 7750s via eBGP. After a failed deployment we shut down a peer, filtered on incoming default routes, and allowed FRR to only advertise the anycast Lo we installed on the server. We plan to set this up in our company LAB to try to reproduce this issue and would like to work with the FRR community to track down the issue and fix it. |
Again, my offer still stands. I have a dozen or more boxes that can be migrated between 9.1 and 10 which all exhibit this issue, and I can get you access or debugging output. |
@ton31337 Sorry for the delay, as this is production servers I have to keep downgrading and upgrading to run the debug commands. This is the debug of the two commands when I have added
This is how it looks when I have not added the
In the last one the routes are being installed on eth0 and not the correct interface.
|
Something is really strange. Your NHs are always resolved via default route (while you have it disabled):
Is it possible to somehow access the box and debug/troubleshoot what's going on? Or if this is the same case, then @Trae32566 could I get the access to your box? |
Yes it is quite strange why it keeps doing that. Unfortunately it is production devices, so it is not possible to give access to them. |
I deployed RockyLinux 9.4 on two virtual machines using VirtualBox and encountered the same issue. The BGP connection was successful, and in cases where the server had a default route, it would generate a route through that default route. However, if the server did not have a default route, no route would be generated. The problem was resolved by downgrading to FRR 9.1. 192.168.56.106: 192.168.56.105: |
I just noticed that both your cases has |
I didn't realize it would break route selection like that; you may want to be a bit more verbose on those release notes given NetworkManager is the default for RHEL, Rocky Linux, CentOS, Fedora, and many others, and it doesn't clearly mention that it will cause severe routing issues. There are entire sections of the Red Hat documentation dedicated to getting FRR working on RHEL, and now those will no longer be valid due to this change because it breaks every RHEL or Fedora-based distribution. It would be one thing if this were a bug, but now you're doubling down and effectively saying it's a feature. I have a fleet of hundreds of servers..we can't just stop using the default networking provider, and even if we could, I wouldn't want to. Can you please implement a flag or something to change this functionality? It's insane to me that it even made it this far. |
When importing routes from the kernel, the zebra daemon ignores any routes marked as 'proto kernel', such as the link-scoped routes that the kernel generates for addresses assigned to interfaces. Instead, zebra implements its own logic to synthesise routes for each address assignment, installing them into the RIB with the ZEBRA_ROUTE_CONNECT proto set. This behaviour requires zebra to mirror the logic of the kernel, to avoid having the kernel FIB diverge from the FRR RIB, which can cause routing loops or other failures. One example of this was the recent addition of support for the 'noprefixroute' flag to zebra[0]. However, attempting to mirror the kernel behaviour this way causes problems when the mirroring is imperfect. An example of this was seen as a result of the change mentioned above, where zebra honouring the noprefixroute flag leads to routes missing from the RIB in some cases. Specifically, this happens when network management daemons set the noprefixroute on the address assignment, but subsequently installs a link-scoped route into the kernel identical to the prefix route the kernel would have installed automatically. The use case for this is enable the network management daemon to atomically change route attributes (such as route metric) on the prefix route, but otherwise keep the behaviour identical to the case where the kernel creates the prefix route itself. The failure described above was noticed for NetworkManager and reported as a NetworkManager bug[1] as well as an FRR issue[2]. Other network management daemons use the noprefixroute flag for similar purposes (e.g., systemd-networkd[3]). [0] FRRouting#14957 [1] https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/1452 [2] FRRouting#16101 [3] https://github.com/systemd/systemd/blob/main/src/network/networkd-dhcp4.c#L962 To resolve this discrepancy between the kernel FIB and the FRR RIB, this patch changes zebra's behaviour to import 'proto kernel' instead of ignoring them, and to treat routes with 'scope link' as ZEBRA_ROUTE_CONNECT routes, just like the ones synthesised by zebra itself. This allows the noprefixroute flag to work correctly, while still playing nice with network management daemons that install a different link-scope route for installed addresses. The change in behaviour can be seen from the following example: Kernel config: 5: veth0@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether fe:da:bb:eb:74:17 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 10.11.1.2/24 scope global veth0 valid_lft forever preferred_lft forever inet 10.12.0.0/24 scope global noprefixroute veth0 valid_lft forever preferred_lft forever 10.11.0.0/16 via 10.11.1.1 dev veth0 10.11.1.0/24 dev veth0 proto kernel scope link src 10.11.1.2 10.12.0.0/24 dev veth0 proto kernel scope link metric 100 The 10.12.0.0/24 route was manually added with: Running zebra, pre-patch: Codes: K - kernel route, C - connected, L - local, S - static, R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP, T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR, f - OpenFabric, t - Table-Direct, > - selected route, * - FIB route, q - queued, r - rejected, b - backup t - trapped, o - offload failure K>* 10.11.0.0/16 [0/0] via 10.11.1.1, veth0, 00:00:22 C>* 10.11.1.0/24 is directly connected, veth0, 00:00:22 L>* 10.11.1.2/32 is directly connected, veth0, 00:00:22 L>* 10.12.0.0/32 is directly connected, veth0, 00:00:22 Notice that the 10.12.0.0/24 route is missing from the RIB. After the patch: Codes: K - kernel route, C - connected, L - local, S - static, R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP, T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR, f - OpenFabric, t - Table-Direct, > - selected route, * - FIB route, q - queued, r - rejected, b - backup t - trapped, o - offload failure K>* 10.11.0.0/16 [0/0] via 10.11.1.1, veth0, 00:00:05 C * 10.11.1.0/24 is directly connected, veth0, 00:00:05 C>* 10.11.1.0/24 is directly connected, veth0, 00:00:05 L>* 10.11.1.2/32 is directly connected, veth0, 00:00:05 C>* 10.12.0.0/24 [0/100] is directly connected, veth0, 00:00:05 L>* 10.12.0.0/32 is directly connected, veth0, 00:00:05 The prefix is now shown as connected (C>) as it should. Note also that the other prefix (10.11.1.0/24, without the noprefix flag) now appears twice, because it's both created by zebra from the interface config, and imported from the kernel. This is harmless as the routes are identical, and an arbitrary one just ends up being selected. Signed-off-by: Toke Høiland-Jørgensen <[email protected]> Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
When importing routes from the kernel, the zebra daemon ignores any routes marked as 'proto kernel', such as the link-scoped routes that the kernel generates for addresses assigned to interfaces. Instead, zebra implements its own logic to synthesise routes for each address assignment, installing them into the RIB with the ZEBRA_ROUTE_CONNECT proto set. This behaviour requires zebra to mirror the logic of the kernel, to avoid having the kernel FIB diverge from the FRR RIB, which can cause routing loops or other failures. One example of this was the recent addition of support for the 'noprefixroute' flag to zebra[0]. However, attempting to mirror the kernel behaviour this way causes problems when the mirroring is imperfect. An example of this was seen as a result of the change mentioned above, where zebra honouring the noprefixroute flag leads to routes missing from the RIB in some cases. Specifically, this happens when network management daemons set the noprefixroute on the address assignment, but subsequently installs a link-scoped route into the kernel identical to the prefix route the kernel would have installed automatically. The use case for this is enable the network management daemon to atomically change route attributes (such as route metric) on the prefix route, but otherwise keep the behaviour identical to the case where the kernel creates the prefix route itself. The failure described above was noticed for NetworkManager and reported as a NetworkManager bug[1] as well as an FRR issue[2]. Other network management daemons use the noprefixroute flag for similar purposes (e.g., systemd-networkd[3]). [0] FRRouting#14957 [1] https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/1452 [2] FRRouting#16101 [3] https://github.com/systemd/systemd/blob/main/src/network/networkd-dhcp4.c#L962 To resolve this discrepancy between the kernel FIB and the FRR RIB, this patch changes zebra's behaviour to import 'proto kernel' instead of ignoring them, and to treat routes with 'scope link' as ZEBRA_ROUTE_CONNECT routes, just like the ones synthesised by zebra itself. This allows the noprefixroute flag to work correctly, while still playing nice with network management daemons that install a different link-scope route for installed addresses. The change in behaviour can be seen from the following example: Kernel config: 5: veth0@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether fe:da:bb:eb:74:17 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 10.11.1.2/24 scope global veth0 valid_lft forever preferred_lft forever inet 10.12.0.0/24 scope global noprefixroute veth0 valid_lft forever preferred_lft forever 10.11.0.0/16 via 10.11.1.1 dev veth0 10.11.1.0/24 dev veth0 proto kernel scope link src 10.11.1.2 10.12.0.0/24 dev veth0 proto kernel scope link metric 100 The 10.12.0.0/24 route was manually added with: Running zebra, pre-patch: Codes: K - kernel route, C - connected, L - local, S - static, R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP, T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR, f - OpenFabric, t - Table-Direct, > - selected route, * - FIB route, q - queued, r - rejected, b - backup t - trapped, o - offload failure K>* 10.11.0.0/16 [0/0] via 10.11.1.1, veth0, 00:00:22 C>* 10.11.1.0/24 is directly connected, veth0, 00:00:22 L>* 10.11.1.2/32 is directly connected, veth0, 00:00:22 L>* 10.12.0.0/32 is directly connected, veth0, 00:00:22 Notice that the 10.12.0.0/24 route is missing from the RIB. After the patch: Codes: K - kernel route, C - connected, L - local, S - static, R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP, T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR, f - OpenFabric, t - Table-Direct, > - selected route, * - FIB route, q - queued, r - rejected, b - backup t - trapped, o - offload failure K>* 10.11.0.0/16 [0/0] via 10.11.1.1, veth0, 00:00:05 C * 10.11.1.0/24 is directly connected, veth0, 00:00:05 C>* 10.11.1.0/24 is directly connected, veth0, 00:00:05 L>* 10.11.1.2/32 is directly connected, veth0, 00:00:05 C>* 10.12.0.0/24 [0/100] is directly connected, veth0, 00:00:05 L>* 10.12.0.0/32 is directly connected, veth0, 00:00:05 The prefix is now shown as connected (C>) as it should. Note also that the other prefix (10.11.1.0/24, without the noprefix flag) now appears twice, because it's both created by zebra from the interface config, and imported from the kernel. This is harmless as the routes are identical, and an arbitrary one just ends up being selected. Signed-off-by: Toke Høiland-Jørgensen <[email protected]> Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
When importing routes from the kernel, the zebra daemon ignores any routes marked as 'proto kernel', such as the link-scoped routes that the kernel generates for addresses assigned to interfaces. Instead, zebra implements its own logic to synthesise routes for each address assignment, installing them into the RIB with the ZEBRA_ROUTE_CONNECT proto set. This behaviour requires zebra to mirror the logic of the kernel, to avoid having the kernel FIB diverge from the FRR RIB, which can cause routing loops or other failures. One example of this was the recent addition of support for the 'noprefixroute' flag to zebra[0]. However, attempting to mirror the kernel behaviour this way causes problems when the mirroring is imperfect. An example of this was seen as a result of the change mentioned above, where zebra honouring the noprefixroute flag leads to routes missing from the RIB in some cases. Specifically, this happens when network management daemons set the noprefixroute on the address assignment, but subsequently installs a link-scoped route into the kernel identical to the prefix route the kernel would have installed automatically. The use case for this is enable the network management daemon to atomically change route attributes (such as route metric) on the prefix route, but otherwise keep the behaviour identical to the case where the kernel creates the prefix route itself. The failure described above was noticed for NetworkManager and reported as a NetworkManager bug[1] as well as an FRR issue[2]. Other network management daemons use the noprefixroute flag for similar purposes (e.g., systemd-networkd[3]). [0] FRRouting#14957 [1] https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/1452 [2] FRRouting#16101 [3] https://github.com/systemd/systemd/blob/main/src/network/networkd-dhcp4.c#L962 To resolve this discrepancy between the kernel FIB and the FRR RIB, this patch changes zebra's behaviour to import 'proto kernel' instead of ignoring them, and to treat routes with 'scope link' as ZEBRA_ROUTE_CONNECT routes, just like the ones synthesised by zebra itself. This allows the noprefixroute flag to work correctly, while still playing nice with network management daemons that install a different link-scope route for installed addresses. The change in behaviour can be seen from the following example: Kernel config: 5: veth0@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether fe:da:bb:eb:74:17 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 10.11.1.2/24 scope global veth0 valid_lft forever preferred_lft forever inet 10.12.0.0/24 scope global noprefixroute veth0 valid_lft forever preferred_lft forever 10.11.0.0/16 via 10.11.1.1 dev veth0 10.11.1.0/24 dev veth0 proto kernel scope link src 10.11.1.2 10.12.0.0/24 dev veth0 proto kernel scope link metric 100 The 10.12.0.0/24 route was manually added with: Running zebra, pre-patch: Codes: K - kernel route, C - connected, L - local, S - static, R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP, T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR, f - OpenFabric, t - Table-Direct, > - selected route, * - FIB route, q - queued, r - rejected, b - backup t - trapped, o - offload failure K>* 10.11.0.0/16 [0/0] via 10.11.1.1, veth0, 00:00:22 C>* 10.11.1.0/24 is directly connected, veth0, 00:00:22 L>* 10.11.1.2/32 is directly connected, veth0, 00:00:22 L>* 10.12.0.0/32 is directly connected, veth0, 00:00:22 Notice that the 10.12.0.0/24 route is missing from the RIB. After the patch: Codes: K - kernel route, C - connected, L - local, S - static, R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP, T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR, f - OpenFabric, t - Table-Direct, > - selected route, * - FIB route, q - queued, r - rejected, b - backup t - trapped, o - offload failure K>* 10.11.0.0/16 [0/0] via 10.11.1.1, veth0, 00:00:05 C * 10.11.1.0/24 is directly connected, veth0, 00:00:05 C>* 10.11.1.0/24 is directly connected, veth0, 00:00:05 L>* 10.11.1.2/32 is directly connected, veth0, 00:00:05 C>* 10.12.0.0/24 [0/100] is directly connected, veth0, 00:00:05 L>* 10.12.0.0/32 is directly connected, veth0, 00:00:05 The prefix is now shown as connected (C>) as it should. Note also that the other prefix (10.11.1.0/24, without the noprefix flag) now appears twice, because it's both created by zebra from the interface config, and imported from the kernel. This is harmless as the routes are identical, and an arbitrary one just ends up being selected. Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
When importing routes from the kernel, the zebra daemon ignores any routes marked as 'proto kernel', such as the link-scoped routes that the kernel generates for addresses assigned to interfaces. Instead, zebra implements its own logic to synthesise routes for each address assignment, installing them into the RIB with the ZEBRA_ROUTE_CONNECT proto set. This behaviour requires zebra to mirror the logic of the kernel, to avoid having the kernel FIB diverge from the FRR RIB, which can cause routing loops or other failures. One example of this was the recent addition of support for the 'noprefixroute' flag to zebra[0]. However, attempting to mirror the kernel behaviour this way causes problems when the mirroring is imperfect. An example of this was seen as a result of the change mentioned above, where zebra honouring the noprefixroute flag leads to routes missing from the RIB in some cases. Specifically, this happens when network management daemons set the noprefixroute on the address assignment, but subsequently installs a link-scoped route into the kernel identical to the prefix route the kernel would have installed automatically. The use case for this is enable the network management daemon to atomically change route attributes (such as route metric) on the prefix route, but otherwise keep the behaviour identical to the case where the kernel creates the prefix route itself. The failure described above was noticed for NetworkManager and reported as a NetworkManager bug[1] as well as an FRR issue[2]. Other network management daemons use the noprefixroute flag for similar purposes (e.g., systemd-networkd[3]). [0] FRRouting#14957 [1] https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/1452 [2] FRRouting#16101 [3] https://github.com/systemd/systemd/blob/main/src/network/networkd-dhcp4.c#L962 To resolve this discrepancy between the kernel FIB and the FRR RIB, this patch changes zebra's behaviour to import 'proto kernel' instead of ignoring them, and to treat routes with 'scope link' as ZEBRA_ROUTE_CONNECT routes, just like the ones synthesised by zebra itself. This allows the noprefixroute flag to work correctly, while still playing nice with network management daemons that install a different link-scope route for installed addresses. The change in behaviour can be seen from the following example: Kernel config: 5: veth0@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether fe:da:bb:eb:74:17 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 10.11.1.2/24 scope global veth0 valid_lft forever preferred_lft forever inet 10.12.0.0/24 scope global noprefixroute veth0 valid_lft forever preferred_lft forever 10.11.0.0/16 via 10.11.1.1 dev veth0 10.11.1.0/24 dev veth0 proto kernel scope link src 10.11.1.2 10.12.0.0/24 dev veth0 proto kernel scope link metric 100 The 10.12.0.0/24 route was manually added with: Running zebra, pre-patch: Codes: K - kernel route, C - connected, L - local, S - static, R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP, T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR, f - OpenFabric, t - Table-Direct, > - selected route, * - FIB route, q - queued, r - rejected, b - backup t - trapped, o - offload failure K>* 10.11.0.0/16 [0/0] via 10.11.1.1, veth0, 00:00:22 C>* 10.11.1.0/24 is directly connected, veth0, 00:00:22 L>* 10.11.1.2/32 is directly connected, veth0, 00:00:22 L>* 10.12.0.0/32 is directly connected, veth0, 00:00:22 Notice that the 10.12.0.0/24 route is missing from the RIB. After the patch: Codes: K - kernel route, C - connected, L - local, S - static, R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP, T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR, f - OpenFabric, t - Table-Direct, > - selected route, * - FIB route, q - queued, r - rejected, b - backup t - trapped, o - offload failure K>* 10.11.0.0/16 [0/0] via 10.11.1.1, veth0, 00:00:05 C * 10.11.1.0/24 is directly connected, veth0, 00:00:05 C>* 10.11.1.0/24 is directly connected, veth0, 00:00:05 L>* 10.11.1.2/32 is directly connected, veth0, 00:00:05 C>* 10.12.0.0/24 [0/100] is directly connected, veth0, 00:00:05 L>* 10.12.0.0/32 is directly connected, veth0, 00:00:05 The prefix is now shown as connected (C>) as it should. Note also that the other prefix (10.11.1.0/24, without the noprefix flag) now appears twice, because it's both created by zebra from the interface config, and imported from the kernel. This is harmless as the routes are identical, and an arbitrary one just ends up being selected. Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
When importing routes from the kernel, the zebra daemon ignores any routes marked as 'proto kernel', such as the link-scoped routes that the kernel generates for addresses assigned to interfaces. Instead, zebra implements its own logic to synthesise routes for each address assignment, installing them into the RIB with the ZEBRA_ROUTE_CONNECT proto set. This behaviour requires zebra to mirror the logic of the kernel, to avoid having the kernel FIB diverge from the FRR RIB, which can cause routing loops or other failures. One example of this was the recent addition of support for the 'noprefixroute' flag to zebra[0]. However, attempting to mirror the kernel behaviour this way causes problems when the mirroring is imperfect. An example of this was seen as a result of the change mentioned above, where zebra honouring the noprefixroute flag leads to routes missing from the RIB in some cases. Specifically, this happens when network management daemons set the noprefixroute on the address assignment, but subsequently installs a link-scoped route into the kernel identical to the prefix route the kernel would have installed automatically. The use case for this is enable the network management daemon to atomically change route attributes (such as route metric) on the prefix route, but otherwise keep the behaviour identical to the case where the kernel creates the prefix route itself. The failure described above was noticed for NetworkManager and reported as a NetworkManager bug[1] as well as an FRR issue[2]. Other network management daemons use the noprefixroute flag for similar purposes (e.g., systemd-networkd[3]). [0] FRRouting#14957 [1] https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/1452 [2] FRRouting#16101 [3] https://github.com/systemd/systemd/blob/main/src/network/networkd-dhcp4.c#L962 To resolve this discrepancy between the kernel FIB and the FRR RIB, this patch changes zebra's behaviour to import 'proto kernel' instead of ignoring them, and to treat routes with 'scope link' as ZEBRA_ROUTE_CONNECT routes, just like the ones synthesised by zebra itself. This allows the noprefixroute flag to work correctly, while still playing nice with network management daemons that install a different link-scope route for installed addresses. The change in behaviour can be seen from the following example: Kernel config: 5: veth0@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether fe:da:bb:eb:74:17 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 10.11.1.2/24 scope global veth0 valid_lft forever preferred_lft forever inet 10.12.0.0/24 scope global noprefixroute veth0 valid_lft forever preferred_lft forever 10.11.0.0/16 via 10.11.1.1 dev veth0 10.11.1.0/24 dev veth0 proto kernel scope link src 10.11.1.2 10.12.0.0/24 dev veth0 proto kernel scope link metric 100 The 10.12.0.0/24 route was manually added with: Running zebra, pre-patch: Codes: K - kernel route, C - connected, L - local, S - static, R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP, T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR, f - OpenFabric, t - Table-Direct, > - selected route, * - FIB route, q - queued, r - rejected, b - backup t - trapped, o - offload failure K>* 10.11.0.0/16 [0/0] via 10.11.1.1, veth0, 00:00:22 C>* 10.11.1.0/24 is directly connected, veth0, 00:00:22 L>* 10.11.1.2/32 is directly connected, veth0, 00:00:22 L>* 10.12.0.0/32 is directly connected, veth0, 00:00:22 Notice that the 10.12.0.0/24 route is missing from the RIB. After the patch: Codes: K - kernel route, C - connected, L - local, S - static, R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP, T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR, f - OpenFabric, t - Table-Direct, > - selected route, * - FIB route, q - queued, r - rejected, b - backup t - trapped, o - offload failure K>* 10.11.0.0/16 [0/0] via 10.11.1.1, veth0, 00:00:05 C * 10.11.1.0/24 is directly connected, veth0, 00:00:05 C>* 10.11.1.0/24 is directly connected, veth0, 00:00:05 L>* 10.11.1.2/32 is directly connected, veth0, 00:00:05 C>* 10.12.0.0/24 [0/100] is directly connected, veth0, 00:00:05 L>* 10.12.0.0/32 is directly connected, veth0, 00:00:05 The prefix is now shown as connected (C>) as it should. Note also that the other prefix (10.11.1.0/24, without the noprefix flag) now appears twice, because it's both created by zebra from the interface config, and imported from the kernel. This is harmless as the routes are identical, and an arbitrary one just ends up being selected. Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
Is this issue closed or not ? |
This is not FRR's issue, it's a network daemon's (in this case NetworkManager) issue. |
May be possible to build FRR without this "new feature" via configure flag? Or with frr.conf option? Is not possible to change NetworkManager - this is what the community highlights to the developers |
This is patently false. NetworkManager is using the flag correctly, as is proven above several times by kernel and NetworkManager developers. the FRR team's misunderstanding and misuse of the noprefixroute flag is the true issue here..the sad part is that they fail to see that, and instead double down that NetworkManager is "wrong" which is completely unwarranted, especially given other network daemons like systemd-networkd also use this functionality identically. https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/1452#note_2225586
TLDR: FRR, not NetworkManager or the Linux kernel, is doing it wrong here. You basically took an existing routing mechanism and decided to use it in a way it was never designed to be used. We are actually version locking on 9, and long term looking at migrating to BIRD or another project because of this, but we really like the CLI and syntax of FRR. |
@ton31337 It's very obvious this isn't complete, and neither is the discussion. It was pretty untoward of you to close an active issue, it does nothing but throw the efforts of everyone in this issue away and spit in the face of the bug reporters. You might as well stop packaging for RHEL, CentOS, Fedora, and every other RPM-based distro too while you're at it, since they're literally unusable beyond 9.X. Why would you think that it's acceptable to break this? |
This is closed, because it's already fixed (backported to 10.1), and gonna be released next week. |
@Trae32566 see #16300 |
Description
After upgrading FRR from version 9.1 to version 10.0-01, we have problems with routes not being installed correctly on the server.
As soon as we downgrade the FRR version everything start working again.
We have multiple interfaces on the server (eth0,eth1,eth2), the BGP peers is on eth1 and eth2, and it is receiving the same routes on the two interfaces, but the routes is being installed as if they were received on eth0:
With ipv6 the routes is just marked as invalid because of the ip being inaccessible even when the peer ip is a direct neighbor.
Version
How to reproduce
Have multiple interfaces on the server and then have BGP peers on eth1 and eth2.
Expected behavior
Routes being installed on the interfaces it is received
Actual behavior
routes is installed as received on eth0
Additional context
No response
Checklist
The text was updated successfully, but these errors were encountered: