Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lib: take into account the Linux IFF_LOWER_UP flag #15082

Merged
merged 4 commits into from
May 28, 2024

Conversation

louis-6wind
Copy link
Contributor

In Linux, a network driver can set the interface flags IFF_UP and IFF_RUNNING although the IFF_LOWER_UP flag is down, which means the interface is ready but the carrier is down:

These values contain interface state:

ifinfomsg::if_flags & IFF_UP:
Interface is admin up
ifinfomsg::if_flags & IFF_RUNNING:
Interface is in RFC2863 operational state UP or UNKNOWN. This is for
backward compatibility, routing daemons, dhcp clients can use this
flag to determine whether they should use the interface.
ifinfomsg::if_flags & IFF_LOWER_UP:
Driver has signaled netif_carrier_on()

However, FRR considers an interface is operational as soon it is up (IFF_UP) and running (IFF_RUNNING), disregarding the IFF_LOWER_UP flag. This can lead to a scenario where FRR starts adding routes through an interface that is technically down at the carrier level, resulting in kernel errors.

Jan 02 18:07:18 dut-vm zebra[283731]: [WVJCK-PPMGD][EC 4043309093] netlink-dp (NS 0) error: Network is down, type=RTM_NEWNEXTHOP(104), seq=243, pid=3112881162
Jan 02 18:07:18 dut-vm zebra[283731]: [X5XE1-RS0SW][EC 4043309074] Failed to install Nexthop (318[if 164]) into the kernel
Jan 02 18:07:18 dut-vm zebra[283731]: [HSYZM-HV7HF] Extended Error: Carrier for nexthop device is down
Jan 02 18:07:18 dut-vm zebra[283731]: [WVJCK-PPMGD][EC 4043309093] netlink-dp (NS 0) error: Network is down, type=RTM_NEWNEXTHOP(104), seq=245, pid=3112881162
Jan 02 18:07:18 dut-vm zebra[283731]: [HSYZM-HV7HF] Extended Error: Nexthop id does not exist
Jan 02 18:07:18 dut-vm zebra[283731]: [WVJCK-PPMGD][EC 4043309093] netlink-dp (NS 0) error: Invalid argument, type=RTM_NEWROUTE(24), seq=246, pid=3112881162
Jan 02 18:07:18 dut-vm zebra[283731]: [X5XE1-RS0SW][EC 4043309074] Failed to install Nexthop (320[10.125.0.2 if 164]) into the kernel
Jan 02 18:07:18 dut-vm zebra[283731]: [VYKYC-709DP] default(0:254):0.0.0.0/0: Route install failed

Consider an interface is operational when it has the IFF_UP, IFF_RUNNING and IFF_LOWER_UP flags.

Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/networking/operstates.rst?h=v6.7-rc8#n29
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/ipv4/nexthop.c?h=v6.7-rc8#n2886
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/netdevice.h?h=v6.7-rc8#n4198

@frrbot frrbot bot added the libfrr label Jan 3, 2024
@louis-6wind louis-6wind force-pushed the fix-iff-lower-up branch 2 times, most recently from a3aded7 to 03695fc Compare January 3, 2024 16:51
@github-actions github-actions bot added size/XL and removed size/M labels Jan 3, 2024
lib/if.h Outdated
@@ -12,6 +12,10 @@
#include "qobj.h"
#include "hook.h"
#include "admin_group.h"
#ifdef GNU_LINUX
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, let's remove the zebra.h inclusion of net/if.h and move it into this file instead. It will require some small reworking of stuff in nhrpd/linux.c and zebra/if_netlink.c, but frankly it's worth it.

I'd also break up the commits into 2 -> The inclusion of the new header files and the small rearrangement and the second commit would be the actual changes in lib/if.c

Why do I want it this way? too much is in zebra.h -> It needs to be broken up. if.h should own interface header definitions not zebra.h. I've started breaking up zebra.h in other commits just haven't gotten to this part yet. Since you are in here I would just do it now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure we can remove "net/if.h", it is even set in Linux kernel sources #include.

I'd also break up the commits into 2 -> The inclusion of the new header files and the small rearrangement and the second commit would be the actual changes in lib/if.c

done

In lib/if.c, I am testing the definition IFF_LOWER_UP

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please reread what I stated. inclusion of net/if.h does not belong in zebra.h it belongs in lib/if.h.

@louis-6wind
Copy link
Contributor Author

Why do we rely on our own copy of the linux headers instead of compiling with the headers of the local machine ?

@donaldsharp
Copy link
Member

we rely on our own headers because the local machine may not have a new enough of a header. I can write a feature against the linux netlink stack that is available on 6.X but if I only have 5. locally then the compile will fail.

This way FRR can compile against newer versions of the linux abi and when that abi is not available due to it being an older kernel the call will just fail.

This also allows FRR to compile into a package and be deployed against a variety of kernels

@louis-6wind louis-6wind force-pushed the fix-iff-lower-up branch 3 times, most recently from 64c08d4 to 593dabb Compare January 11, 2024 16:27
@github-actions github-actions bot added size/XXL and removed size/XL labels Jan 11, 2024
@louis-6wind
Copy link
Contributor Author

@donaldsharp could you detail what you expect for the zebra/if_netlink.c rework

I cannot get rid of net/if.h there

Removal of if.h include

$ git diff
diff --git a/zebra/if_netlink.c b/zebra/if_netlink.c
index dac2c0a33a..0b5eb6a779 100644
--- a/zebra/if_netlink.c
+++ b/zebra/if_netlink.c
@@ -17,7 +17,6 @@
 #define _LINUX_IF_H
 #define _LINUX_IP_H
 
-#include "if.h"
 #include <linux/netlink.h>
 #include <linux/rtnetlink.h>
 #include <netinet/if_ether.h>
$ make -j12
true
make  all-am
make[1]: Entering directory '/home/lscalber/git/frr'
  CC       zebra/if_netlink.o
In file included from zebra/if_netlink.c:25:
./include/linux/if_tunnel.h:30:14: error: ‘IFNAMSIZ’ undeclared here (not in a function)
   30 |  char   name[IFNAMSIZ];
      |              ^~~~~~~~
cc1: note: unrecognized command-line option ‘-Wno-microsoft-anon-tag’ may have been intended to silence earlier diagnostics
make[1]: *** [Makefile:10616: zebra/if_netlink.o] Error 1
make[1]: Leaving directory '/home/lscalber/git/frr'
make: *** [Makefile:6478: all] Error 2

Re-add linux if.h

$ git diff
diff --git a/zebra/if_netlink.c b/zebra/if_netlink.c
index dac2c0a33a..29b85056cd 100644
--- a/zebra/if_netlink.c
+++ b/zebra/if_netlink.c
@@ -14,10 +14,8 @@
  * Reference - https://sourceware.org/ml/libc-alpha/2013-01/msg00599.html
  */
 #define _LINUX_IN6_H
-#define _LINUX_IF_H
 #define _LINUX_IP_H
 
-#include "if.h"
 #include <linux/netlink.h>
 #include <linux/rtnetlink.h>
 #include <netinet/if_ether.h>
$ make -j12
true
make  all-am
make[1]: Entering directory '/home/lscalber/git/frr'
  CC       zebra/if_netlink.o
In file included from ./include/linux/if_tunnel.h:5,
                 from zebra/if_netlink.c:24:
./include/linux/if.h:112:19: error: redeclaration of enumerator ‘IFF_UP’
  112 | #define IFF_UP    IFF_UP
      |                   ^~~~~~
./include/linux/if.h:85:2: note: previous definition of ‘IFF_UP’ was here
   85 |  IFF_UP    = 1<<0,  /* sysfs */
      |  ^~~~~~
./include/linux/if.h:113:25: error: redeclaration of enumerator ‘IFF_BROADCAST’
  113 | #define IFF_BROADCAST   IFF_BROADCAST
      |                         ^~~~~~~~~~~~~
./include/linux/if.h:86:2: note: previous definition of ‘IFF_BROADCAST’ was here
   86 |  IFF_BROADCAST   = 1<<1,  /* volatile */
      |  ^~~~~~~~~~~~~
./include/linux/if.h:114:21: error: redeclaration of enumerator ‘IFF_DEBUG’
  114 | #define IFF_DEBUG   IFF_DEBUG
      |                     ^~~~~~~~~
./include/linux/if.h:87:2: note: previous definition of ‘IFF_DEBUG’ was here
   87 |  IFF_DEBUG   = 1<<2,  /* sysfs */
      |  ^~~~~~~~~
./include/linux/if.h:115:24: error: redeclaration of enumerator ‘IFF_LOOPBACK’
  115 | #define IFF_LOOPBACK   IFF_LOOPBACK
      |                        ^~~~~~~~~~~~
./include/linux/if.h:88:2: note: previous definition of ‘IFF_LOOPBACK’ was here
   88 |  IFF_LOOPBACK   = 1<<3,  /* volatile */
      |  ^~~~~~~~~~~~
./include/linux/if.h:116:27: error: redeclaration of enumerator ‘IFF_POINTOPOINT’
  116 | #define IFF_POINTOPOINT   IFF_POINTOPOINT
      |                           ^~~~~~~~~~~~~~~
./include/linux/if.h:89:2: note: previous definition of ‘IFF_POINTOPOINT’ was here
   89 |  IFF_POINTOPOINT   = 1<<4,  /* volatile */
      |  ^~~~~~~~~~~~~~~
./include/linux/if.h:117:26: error: redeclaration of enumerator ‘IFF_NOTRAILERS’
  117 | #define IFF_NOTRAILERS   IFF_NOTRAILERS
      |                          ^~~~~~~~~~~~~~
./include/linux/if.h:90:2: note: previous definition of ‘IFF_NOTRAILERS’ was here
   90 |  IFF_NOTRAILERS   = 1<<5,  /* sysfs */
      |  ^~~~~~~~~~~~~~
./include/linux/if.h:118:23: error: redeclaration of enumerator ‘IFF_RUNNING’
  118 | #define IFF_RUNNING   IFF_RUNNING
      |                       ^~~~~~~~~~~
./include/linux/if.h:91:2: note: previous definition of ‘IFF_RUNNING’ was here
   91 |  IFF_RUNNING   = 1<<6,  /* volatile */
      |  ^~~~~~~~~~~
./include/linux/if.h:119:21: error: redeclaration of enumerator ‘IFF_NOARP’
  119 | #define IFF_NOARP   IFF_NOARP
      |                     ^~~~~~~~~
./include/linux/if.h:92:2: note: previous definition of ‘IFF_NOARP’ was here
   92 |  IFF_NOARP   = 1<<7,  /* sysfs */
      |  ^~~~~~~~~
./include/linux/if.h:120:23: error: redeclaration of enumerator ‘IFF_PROMISC’
  120 | #define IFF_PROMISC   IFF_PROMISC
      |                       ^~~~~~~~~~~
./include/linux/if.h:93:2: note: previous definition of ‘IFF_PROMISC’ was here
   93 |  IFF_PROMISC   = 1<<8,  /* sysfs */
      |  ^~~~~~~~~~~
./include/linux/if.h:121:24: error: redeclaration of enumerator ‘IFF_ALLMULTI’
  121 | #define IFF_ALLMULTI   IFF_ALLMULTI
      |                        ^~~~~~~~~~~~
./include/linux/if.h:94:2: note: previous definition of ‘IFF_ALLMULTI’ was here
   94 |  IFF_ALLMULTI   = 1<<9,  /* sysfs */
      |  ^~~~~~~~~~~~
./include/linux/if.h:122:22: error: redeclaration of enumerator ‘IFF_MASTER’
  122 | #define IFF_MASTER   IFF_MASTER
      |                      ^~~~~~~~~~
./include/linux/if.h:95:2: note: previous definition of ‘IFF_MASTER’ was here
   95 |  IFF_MASTER   = 1<<10, /* volatile */
      |  ^~~~~~~~~~
./include/linux/if.h:123:21: error: redeclaration of enumerator ‘IFF_SLAVE’
  123 | #define IFF_SLAVE   IFF_SLAVE
      |                     ^~~~~~~~~
./include/linux/if.h:96:2: note: previous definition of ‘IFF_SLAVE’ was here
   96 |  IFF_SLAVE   = 1<<11, /* volatile */
      |  ^~~~~~~~~
./include/linux/if.h:124:25: error: redeclaration of enumerator ‘IFF_MULTICAST’
  124 | #define IFF_MULTICAST   IFF_MULTICAST
      |                         ^~~~~~~~~~~~~
./include/linux/if.h:97:2: note: previous definition of ‘IFF_MULTICAST’ was here
   97 |  IFF_MULTICAST   = 1<<12, /* sysfs */
      |  ^~~~~~~~~~~~~
./include/linux/if.h:125:23: error: redeclaration of enumerator ‘IFF_PORTSEL’
  125 | #define IFF_PORTSEL   IFF_PORTSEL
      |                       ^~~~~~~~~~~
./include/linux/if.h:98:2: note: previous definition of ‘IFF_PORTSEL’ was here
   98 |  IFF_PORTSEL   = 1<<13, /* sysfs */
      |  ^~~~~~~~~~~
./include/linux/if.h:126:25: error: redeclaration of enumerator ‘IFF_AUTOMEDIA’
  126 | #define IFF_AUTOMEDIA   IFF_AUTOMEDIA
      |                         ^~~~~~~~~~~~~
./include/linux/if.h:99:2: note: previous definition of ‘IFF_AUTOMEDIA’ was here
   99 |  IFF_AUTOMEDIA   = 1<<14, /* sysfs */
      |  ^~~~~~~~~~~~~
./include/linux/if.h:127:23: error: redeclaration of enumerator ‘IFF_DYNAMIC’
  127 | #define IFF_DYNAMIC   IFF_DYNAMIC
      |                       ^~~~~~~~~~~
./include/linux/if.h:100:2: note: previous definition of ‘IFF_DYNAMIC’ was here
  100 |  IFF_DYNAMIC   = 1<<15, /* sysfs */
      |  ^~~~~~~~~~~
In file included from ./lib/if.h:9,
                 from ./lib/sockunion.h:13,
                 from ./lib/prefix.h:10,
                 from zebra/if_netlink.c:31:
/usr/include/net/if.h:111:8: error: redefinition of ‘struct ifmap’
  111 | struct ifmap
      |        ^~~~~
In file included from ./include/linux/if_tunnel.h:5,
                 from zebra/if_netlink.c:24:
./include/linux/if.h:195:8: note: originally defined here
  195 | struct ifmap {
      |        ^~~~~
In file included from ./lib/if.h:9,
                 from ./lib/sockunion.h:13,
                 from ./lib/prefix.h:10,
                 from zebra/if_netlink.c:31:
/usr/include/net/if.h:126:8: error: redefinition of ‘struct ifreq’
  126 | struct ifreq
      |        ^~~~~
In file included from ./include/linux/if_tunnel.h:5,
                 from zebra/if_netlink.c:24:
./include/linux/if.h:232:8: note: originally defined here
  232 | struct ifreq {
      |        ^~~~~
In file included from ./lib/if.h:9,
                 from ./lib/sockunion.h:13,
                 from ./lib/prefix.h:10,
                 from zebra/if_netlink.c:31:
/usr/include/net/if.h:176:8: error: redefinition of ‘struct ifconf’
  176 | struct ifconf
      |        ^~~~~~
In file included from ./include/linux/if_tunnel.h:5,
                 from zebra/if_netlink.c:24:
./include/linux/if.h:284:8: note: originally defined here
  284 | struct ifconf  {
      |        ^~~~~~
cc1: note: unrecognized command-line option ‘-Wno-microsoft-anon-tag’ may have been intended to silence earlier diagnostics
make[1]: *** [Makefile:10616: zebra/if_netlink.o] Error 1
make[1]: Leaving directory '/home/lscalber/git/frr'
make: *** [Makefile:6478: all] Error 2

Force not including net/if.h

$ git diff
diff --git a/zebra/if_netlink.c b/zebra/if_netlink.c
index dac2c0a33a..8e06c0c02a 100644
--- a/zebra/if_netlink.c
+++ b/zebra/if_netlink.c
@@ -14,10 +14,9 @@
  * Reference - https://sourceware.org/ml/libc-alpha/2013-01/msg00599.html
  */
 #define _LINUX_IN6_H
-#define _LINUX_IF_H
 #define _LINUX_IP_H
+#define _NET_IF_H
 
-#include "if.h"
 #include <linux/netlink.h>
 #include <linux/rtnetlink.h>
 #include <netinet/if_ether.h>
$ make -j12
true
make  all-am
make[1]: Entering directory '/home/lscalber/git/frr'
  CC       zebra/if_netlink.o
In file included from ./include/linux/if.h:37,
                 from ./include/linux/if_tunnel.h:5,
                 from zebra/if_netlink.c:25:
/usr/include/linux/hdlc/ioctl.h:74:14: error: ‘IFNAMSIZ’ undeclared here (not in a function); did you mean ‘ALTIFNAMSIZ’?
   74 |  char master[IFNAMSIZ]; /* Name of master FRAD device */
      |              ^~~~~~~~
      |              ALTIFNAMSIZ
zebra/if_netlink.c: In function ‘get_iflink_speed’:
zebra/if_netlink.c:258:15: error: storage size of ‘ifdata’ isn’t known
  258 |  struct ifreq ifdata;
      |               ^~~~~~
zebra/if_netlink.c:258:15: error: unused variable ‘ifdata’ [-Werror=unused-variable]
zebra/if_netlink.c: At top level:
cc1: note: unrecognized command-line option ‘-Wno-microsoft-anon-tag’ may have been intended to silence earlier diagnostics
cc1: all warnings being treated as errors
make[1]: *** [Makefile:10616: zebra/if_netlink.o] Error 1
make[1]: Leaving directory '/home/lscalber/git/frr'
make: *** [Makefile:6478: all] Error 2

Copy link
Contributor

@idryzhov idryzhov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't check if the header inclusion was done correctly, not an expert in this. But I approve the actual code change.

@louis-6wind
Copy link
Contributor Author

@idryzhov could you merge this please ?

@mwinter-osr
Copy link
Member

CI:rerun Rerun after fixing git access on CI infra

1 similar comment
@mwinter-osr
Copy link
Member

CI:rerun Rerun after fixing git access on CI infra

@idryzhov
Copy link
Contributor

@louis-6wind it looks like there are actual build errors right now. Please, take a look.

@frrbot frrbot bot added the bugfix label Apr 23, 2024
Copy link
Contributor

@eqvinox eqvinox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

questions

return ((ifp->flags & IFF_UP) &&
(((ifp->flags & IFF_RUNNING)
#ifdef IFF_LOWER_UP
&& (ifp->flags & IFF_LOWER_UP)
Copy link
Contributor

@eqvinox eqvinox Apr 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1. this will break FRR completely on older kernels that do not send this flag (you are making a compile-time check, that does not mean the [different] kernel at run-time will support the flag)

  1. AFAIK IFF_RUNNING should never be set if IFF_LOWER_UP is not set? Can you elaborate when this is not the case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've crossed out the compatibility part, since the flag has been in the kernel since ≤2014. The other question remains.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2. AFAIK IFF_RUNNING should never be set if IFF_LOWER_UP is not set? Can you elaborate when this is not the case?

According to the kernel documentation: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/networking/operstates.rst?h=v6.7-rc8#n29

ifinfomsg::if_flags & IFF_RUNNING:
 Interface is in RFC2863 operational state UP or UNKNOWN. [...]
ifinfomsg::if_flags & IFF_LOWER_UP:
 Driver has signaled netif_carrier_on()

Therefore, it is consistent with the kernel specifications for an interface to have IFF_RUNNING set without having IFF_LOWER_UP set.

@eqvinox
Copy link
Contributor

eqvinox commented Apr 25, 2024

The tentative answer during the call on tuesday is that this was done to work around kernel bugs with specific interface drivers. If this is the reason, I'm opposed to this change. My reasoning is quite simple:

The change is apparently being made because some buggy drivers set IFF_RUNNING incorrectly when the link is down, and IFF_LOWER_UP would work correctly. If we change the code to check IFF_LOWER_UP, these buggy drivers would work. Except now we would have the reverse problem if there are buggy drivers that set IFF_RUNNING correctly but don't set IFF_LOWER_UP. We would be choosing "which bug to support". What happens when someone comes in a few weeks and changes it back because another driver has this reverse bug?

We should code to the API as designed, not to some specific driver bugs. Checking for IFF_RUNNING is the correct thing, no?

@vjardin
Copy link
Contributor

vjardin commented Apr 25, 2024

Can we have a list (interface types) of bug'd kernel drivers ? Could it be limited to those interfaces until the kernel's ones will be fixed ?

@eqvinox
Copy link
Contributor

eqvinox commented Apr 25, 2024

Can we have a list (interface types) of bug'd kernel drivers ? Could it be limited to those interfaces until the kernel's ones will be fixed ?

I only have the info in this PR… @louis-6wind can you provide the context/info here?

@louis-6wind
Copy link
Contributor Author

louis-6wind commented Apr 25, 2024

The tentative answer during the call on tuesday is that this was done to work around kernel bugs with specific interface drivers.

On Tuesday, I mentioned a problem with a specific driver, not a kernel bug. The issue was identified with the tun/tap driver - I couldn't remember which one was having problems.

At initialization, the module reports IFF_RUNNING & !IFF_LOWER_UP then IFF_RUNNING & IFF_LOWER_UP.

If this is the reason, I'm opposed to this change. My reasoning is quite simple:

The change is apparently being made because some buggy drivers set IFF_RUNNING incorrectly when the link is down, and IFF_LOWER_UP would work correctly. If we change the code to check IFF_LOWER_UP, these buggy drivers would work. Except now we would have the reverse problem if there are buggy drivers that set IFF_RUNNING correctly but don't set IFF_LOWER_UP. We would be choosing "which bug to support". What happens when someone comes in a few weeks and changes it back because another driver has this reverse bug?

According to the pull-request description, Zebra is unable to install the nexthop, as indicated by the logs

Jan 02 18:07:18 dut-vm zebra[283731]: [X5XE1-RS0SW][EC 4043309074] Failed to install Nexthop (318[if 164]) into the kernel
Jan 02 18:07:18 dut-vm zebra[283731]: [HSYZM-HV7HF] Extended Error: Carrier for nexthop device is down

This "extended error" originates from the kernel, because !netif_carrier_ok(cfg->dev) returns true, indicating that the network interface carrier is down. See: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/ipv4/nexthop.c#n3127

Furthermore, the kernel itself sets IFF_LOWER_UP if netif_carrier_ok(cfg->dev) is true. See:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/core/dev.c?h=v6.8#n8585

It's important to note that this flag setting is handled by the kernel, not by the driver directly.

We should code to the API as designed, not to some specific driver bugs. Checking for IFF_RUNNING is the correct thing, no?

@louis-6wind
Copy link
Contributor Author

Seems to be related #10199

@louis-6wind
Copy link
Contributor Author

ci:rerun

@eqvinox
Copy link
Contributor

eqvinox commented Apr 25, 2024

The issue was identified with the tun/tap driver […] At initialization, the module reports IFF_RUNNING & !IFF_LOWER_UP then IFF_RUNNING & IFF_LOWER_UP.

I don't see this behavior, what exactly do you need to do to get a tun/tap device in IFF_RUNNING & !IFF_LOWER_UP?

Furthermore, the kernel itself sets IFF_LOWER_UP if netif_carrier_ok(cfg->dev) is true. See:
git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/core/dev.c?h=v6.8#n8585

Yeah, but IFF_RUNNINGnetif_oper_upoperstate == IF_OPER_{UP,UNKNOWN} ⇒ link_watch code
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/core/link_watch.c#n36
so IF_OPER will not be set if !netif_carrier_ok

I'd really like to see where in the kernel it is possible to get IFF_RUNNING & !IFF_LOWER_UP

@louis-6wind
Copy link
Contributor Author

louis-6wind commented Apr 26, 2024

The issue was identified with the tun/tap driver […] At initialization, the module reports IFF_RUNNING & !IFF_LOWER_UP then IFF_RUNNING & IFF_LOWER_UP.

I don't see this behavior, what exactly do you need to do to get a tun/tap device in IFF_RUNNING & !IFF_LOWER_UP?

I am not able to extract the exact conditions, but here is what I observe with 'ip -d monitor link'

8: ntfp2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default 
    link/ether de:ed:02:44:4c:55 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65521 
    tun type tap pi off vnet_hdr on multi_queue numqueues 2 numdisabled 0 persist off user root group 1000001 numtxqueues 256 numrxqueues 256 gso_max_size 65536 gso_max_segs 65535
8: ntfp2: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state UNKNOWN group default
    link/ether de:ed:02:44:4c:55 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65521
    tun type tap pi off vnet_hdr on multi_queue numqueues 1 numdisabled 1 persist on user root group 1000001 numtxqueues 256 numrxqueues 256 gso_max_size 65536 gso_max_segs 65535
8: ntfp2: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state UNKNOWN group default
    link/ether de:ed:02:44:4c:55 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65521
    tun type tap pi off vnet_hdr on multi_queue numqueues 1 numdisabled 1 persist on user root group 1000001 numtxqueues 256 numrxqueues 256 gso_max_size 65536 gso_max_segs 65535
8: ntfp2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default
    link/ether de:ed:02:44:4c:55 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65521
    tun type tap pi off vnet_hdr on multi_queue numqueues 1 numdisabled 1 persist on user root group 1000001 numtxqueues 256 numrxqueues 256 gso_max_size 65536 gso_max_segs 65535

Furthermore, the kernel itself sets IFF_LOWER_UP if netif_carrier_ok(cfg->dev) is true. See:
git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/core/dev.c?h=v6.8#n8585

Yeah, but IFF_RUNNINGnetif_oper_upoperstate == IF_OPER_{UP,UNKNOWN} ⇒ link_watch code https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/core/link_watch.c#n36 so IF_OPER will not be set if !netif_carrier_ok

You are saying that IFF_RUNNING means IF_OPER_UP or IF_OPER_UNKNOWN. That is correct.
If you have IF_OPER_UP, you will also have netif_carrier_ok. That's correct.

That doesn't contradict what I'm saying. If the state is IF_OPER_UNKNOWN, the driver is not usable.

I'd really like to see where in the kernel it is possible to get IFF_RUNNING & !IFF_LOWER_UP

Logs are saying it is possible and it conforms with the specifications.

Import our a copy of Linux if.h and its dependencies
(compiler_types.h and libc-compat.h) from the above links. In "if.h",
"#include <linux/compiler.h>" has been replaced by
"#include <linux/compiler_types.h>". libc-compat.h is needed to avoid
conflicts with glibc.

Include "linux/if.h" in "zebra.h" when compiling on Linux. De-reference
"net/if.h" in C files.

Note that "net/if.h" is still needed. It is even included in many Linux
kernel files. "linux/if.h" provides additional definitions.

Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/include/uapi/linux/if.h?h=v5.5
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/include/uapi/linux/libc-compat.h?h=v5.5
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/include/linux/compiler_types.h?h=v5.5
Link: https://patchwork.ozlabs.org/project/glibc/patch/[email protected]/
Signed-off-by: Louis Scalbert <[email protected]>
In Linux, a network driver can set the interface flags IFF_UP and
IFF_RUNNING although the IFF_LOWER_UP flag is down, which means the
interface is ready but the carrier is down:

> These values contain interface state:
>
> ifinfomsg::if_flags & IFF_UP:
>  Interface is admin up
> ifinfomsg::if_flags & IFF_RUNNING:
>  Interface is in RFC2863 operational state UP or UNKNOWN. This is for
>  backward compatibility, routing daemons, dhcp clients can use this
>  flag to determine whether they should use the interface.
> ifinfomsg::if_flags & IFF_LOWER_UP:
>  Driver has signaled netif_carrier_on()

However, FRR considers an interface is operational as soon it is up
(IFF_UP) and running (IFF_RUNNING), disregarding the IFF_LOWER_UP flag.
This can lead to a scenario where FRR starts adding routes through an
interface that is technically down at the carrier level, resulting in
kernel errors.

> Jan 02 18:07:18 dut-vm zebra[283731]: [WVJCK-PPMGD][EC 4043309093] netlink-dp (NS 0) error: Network is down, type=RTM_NEWNEXTHOP(104), seq=243, pid=3112881162
> Jan 02 18:07:18 dut-vm zebra[283731]: [X5XE1-RS0SW][EC 4043309074] Failed to install Nexthop (318[if 164]) into the kernel
> Jan 02 18:07:18 dut-vm zebra[283731]: [HSYZM-HV7HF] Extended Error: Carrier for nexthop device is down
> Jan 02 18:07:18 dut-vm zebra[283731]: [WVJCK-PPMGD][EC 4043309093] netlink-dp (NS 0) error: Network is down, type=RTM_NEWNEXTHOP(104), seq=245, pid=3112881162
> Jan 02 18:07:18 dut-vm zebra[283731]: [HSYZM-HV7HF] Extended Error: Nexthop id does not exist
> Jan 02 18:07:18 dut-vm zebra[283731]: [WVJCK-PPMGD][EC 4043309093] netlink-dp (NS 0) error: Invalid argument, type=RTM_NEWROUTE(24), seq=246, pid=3112881162
> Jan 02 18:07:18 dut-vm zebra[283731]: [X5XE1-RS0SW][EC 4043309074] Failed to install Nexthop (320[10.125.0.2 if 164]) into the kernel
> Jan 02 18:07:18 dut-vm zebra[283731]: [VYKYC-709DP] default(0:254):0.0.0.0/0: Route install failed

Consider an interface is operational when it has the IFF_UP, IFF_RUNNING
and IFF_LOWER_UP flags.

Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/networking/operstates.rst?h=v6.7-rc8#n29
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/ipv4/nexthop.c?h=v6.7-rc8#n2886
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/netdevice.h?h=v6.7-rc8#n4198
Signed-off-by: Louis Scalbert <[email protected]>
Add lower_up to expected interface flags.

Signed-off-by: Louis Scalbert <[email protected]>
I don't know why it appears only now.

> ../sdist/zebra/fpm_listener.c:420:8: error: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Werror,-Wsign-compare]
>                 if (!RTNH_OK(rtnh, len)) {
>                      ^~~~~~~~~~~~~~~~~~
> ../sdist/include/linux/rtnetlink.h:437:31: note: expanded from macro 'RTNH_OK'
>                            ((int)(rtnh)->rtnh_len) <= (len))

len is set with RTA_PAYLOAD and should be an integer.

> len = RTA_PAYLOAD(mpath_rtattr);
> #define RTA_PAYLOAD(rta) ((int)((rta)->rta_len) - RTA_LENGTH(0))

Signed-off-by: Louis Scalbert <[email protected]>
@louis-6wind
Copy link
Contributor Author

To summarize, nexthop objects can only be installed on interfaces marked as IFF_LOWER_UP. The IFF_LOWER_UP flag indicates that the operational state of the interface is "up." On the other hand, IFF_RUNNING indicates that the operational state is either "up" or "unknown."

Zebra currently treats an interface as operational and allows the creation of nexthop objects as soon as the IFF_RUNNING flag is set, regardless of the IFF_LOWER_UP status. However, this approach can lead to issues if the interface's operational state is "unknown," causing nexthop creation to fail. It would be more appropriate for Zebra to initiate nexthop creation only when IFF_LOWER_UP is set.

It is worth noting that the IFF_LOWER_UP flag is activated by the kernel itself when the driver invokes netif_carrier_on(), reliably indicating that the interface is operational.

@eqvinox
Copy link
Contributor

eqvinox commented Apr 30, 2024

Ok, I think it should work. I hope we won't have the opposite problem with another driver though. If that happens we should just revert this.

@eqvinox eqvinox dismissed their stale review April 30, 2024 12:44

discussed

@louis-6wind
Copy link
Contributor Author

@eqvinox could you approve and merge this pull request?

"verify source" is triggering errors on imported linux headers. They should not be modified.
frrbot error is unrelevant

@louis-6wind
Copy link
Contributor Author

@eqvinox please approve and merge!

@idryzhov idryzhov merged commit f8abf96 into FRRouting:master May 28, 2024
8 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants