From a7a7fa57fe1f7a018438433d6b55d2c633b1fe82 Mon Sep 17 00:00:00 2001 From: Donald Sharp Date: Thu, 21 Sep 2023 15:30:08 -0400 Subject: [PATCH] bgpd: Ensure send order is 100% consistent When BGP is sending updates to peers on a neighbor up event it was noticed that the bgp updates being sent were in reverse order being sent to the first peer. Imagine r1 -- r2 -- r3. r1 and r2 are ebgp peers and r2 and r3 are ebgp peers. r1's interface to r2 is currently shutdown. Prior to this fix the send order would look like this: r1 -> r2 send of routes to r2 and then they would be installed in order received: 10.0.0.12 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20 10.0.0.11 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20 10.0.0.10 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20 10.0.0.9 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20 10.0.0.8 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20 10.0.0.7 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20 10.0.0.6 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20 10.0.0.5 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20 10.0.0.4 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20 10.0.0.3 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20 10.0.0.2 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20 10.0.0.1 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20 r2 would then send these routes to r3 and then they would be installed in order received: 10.0.0.1 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20 10.0.0.2 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20 10.0.0.3 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20 10.0.0.4 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20 10.0.0.5 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20 10.0.0.6 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20 10.0.0.7 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20 10.0.0.8 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20 10.0.0.9 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20 10.0.0.10 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20 10.0.0.11 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20 10.0.0.12 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20 Not that big of a deal right? Well imagine a situation where r1 is originating several ten's of thousands of routes. It sends routes to r2 r2 is processing routes but in reverse order and at the same time it is sending routes to r3, in the correct order of the bgp table. r3 will have the early 10.0.0.1/32 routes installed and start forwarding while r2 will not have those routes installed yet( since they were at the end and zebra is slightly slower for processing routes than bgp is ). Ensure that the order sent is a true FIFO. What is happening is that there is an update fifo which stores all routes. And off that FIFO is a bgp advertise attribute list which stores the list of prefixes which share the same attribute that allow for more efficient packing this list was being stored in reverse order causing the problem for the initial send. When adding items to this list put them at the end so we keep the fifo order that is traversed when we walk through the bgp table. After the fix: r2 installation order: 10.0.0.0 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20 10.0.0.1 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20 10.0.0.2 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20 10.0.0.3 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20 10.0.0.4 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20 10.0.0.5 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20 10.0.0.6 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20 10.0.0.7 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20 10.0.0.8 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20 10.0.0.9 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20 10.0.0.10 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20 10.0.0.11 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20 10.0.0.12 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20 r3 installation order: 10.0.0.0 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20 10.0.0.1 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20 10.0.0.2 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20 10.0.0.3 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20 10.0.0.4 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20 10.0.0.5 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20 10.0.0.6 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20 10.0.0.7 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20 10.0.0.8 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20 10.0.0.9 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20 10.0.0.10 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20 10.0.0.11 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20 10.0.0.12 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20 Signed-off-by: Donald Sharp --- bgpd/bgp_advertise.c | 23 +++++++++++++++++++---- 1 file changed, 19 insertions(+), 4 deletions(-) diff --git a/bgpd/bgp_advertise.c b/bgpd/bgp_advertise.c index 6f4916b3c311..17d6592c0f78 100644 --- a/bgpd/bgp_advertise.c +++ b/bgpd/bgp_advertise.c @@ -83,10 +83,25 @@ void bgp_advertise_free(struct bgp_advertise *adv) void bgp_advertise_add(struct bgp_advertise_attr *baa, struct bgp_advertise *adv) { - adv->next = baa->adv; - if (baa->adv) - baa->adv->prev = adv; - baa->adv = adv; + struct bgp_advertise *spot, *prev = NULL; + + spot = baa->adv; + + while (spot) { + prev = spot; + spot = spot->next; + } + + if (prev) { + prev->next = adv; + adv->prev = prev; + } else + adv->prev = NULL; + + adv->next = NULL; + + if (!baa->adv) + baa->adv = adv; } void bgp_advertise_delete(struct bgp_advertise_attr *baa,