Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bgpd: Ensure send order is 100% consistent #14468

Merged
merged 1 commit into from
Sep 24, 2023

Conversation

donaldsharp
Copy link
Member

When BGP is sending updates to peers on a neighbor up event it was noticed that the bgp updates being sent were in reverse order being sent to the first peer.

Imagine r1 -- r2 -- r3. r1 and r2 are ebgp peers and r2 and r3 are ebgp peers. r1's interface to r2 is currently shutdown. Prior to this fix the send order would look like this:

r1 -> r2 send of routes to r2 and then they would be installed in order received:

10.0.0.12 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20 10.0.0.11 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20 10.0.0.10 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20 10.0.0.9 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20 10.0.0.8 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20 10.0.0.7 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20 10.0.0.6 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20 10.0.0.5 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20 10.0.0.4 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20 10.0.0.3 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20 10.0.0.2 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20 10.0.0.1 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20

r2 would then send these routes to r3 and then they would be installed in order received:

10.0.0.1 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20 10.0.0.2 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20 10.0.0.3 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20 10.0.0.4 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20 10.0.0.5 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20 10.0.0.6 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20 10.0.0.7 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20 10.0.0.8 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20 10.0.0.9 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20 10.0.0.10 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20 10.0.0.11 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20 10.0.0.12 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20

Not that big of a deal right? Well imagine a situation where r1 is originating several ten's of thousands of routes. It sends routes to r2 r2 is processing routes but in reverse order and at the same time it is sending routes to r3, in the correct order of the bgp table.

r3 will have the early 10.0.0.1/32 routes installed and start forwarding while r2 will not have those routes installed yet( since they were at the end and zebra is slightly slower for processing routes than bgp is ).

Ensure that the order sent is a true FIFO. What is happening is that there is an update fifo which stores all routes. And off that FIFO is a bgp advertise attribute list which stores the list of prefixes which share the same attribute that allow for more efficient packing this list was being stored in reverse order causing the problem for the initial send. When adding items to this list put them at the end so we keep the fifo order that is traversed when we walk through the bgp table.

After the fix:

r2 installation order:

10.0.0.0 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20 10.0.0.1 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20 10.0.0.2 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20 10.0.0.3 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20 10.0.0.4 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20 10.0.0.5 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20 10.0.0.6 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20 10.0.0.7 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20 10.0.0.8 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20 10.0.0.9 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20 10.0.0.10 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20 10.0.0.11 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20 10.0.0.12 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20

r3 installation order:

10.0.0.0 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20 10.0.0.1 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20 10.0.0.2 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20 10.0.0.3 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20 10.0.0.4 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20 10.0.0.5 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20 10.0.0.6 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20 10.0.0.7 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20 10.0.0.8 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20 10.0.0.9 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20 10.0.0.10 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20 10.0.0.11 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20 10.0.0.12 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20

When BGP is sending updates to peers on a neighbor up event
it was noticed that the bgp updates being sent were in reverse
order being sent to the first peer.

Imagine r1 -- r2 -- r3.  r1 and r2 are ebgp peers and
r2 and r3 are ebgp peers.  r1's interface to r2 is currently
shutdown.  Prior to this fix the send order would look like this:

r1 -> r2 send of routes to r2 and then they would be installed in order
received:

10.0.0.12 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20
10.0.0.11 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20
10.0.0.10 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20
10.0.0.9 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20
10.0.0.8 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20
10.0.0.7 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20
10.0.0.6 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20
10.0.0.5 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20
10.0.0.4 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20
10.0.0.3 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20
10.0.0.2 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20
10.0.0.1 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20

r2 would then send these routes to r3 and then they would be installed
in order received:

10.0.0.1 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20
10.0.0.2 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20
10.0.0.3 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20
10.0.0.4 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20
10.0.0.5 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20
10.0.0.6 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20
10.0.0.7 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20
10.0.0.8 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20
10.0.0.9 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20
10.0.0.10 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20
10.0.0.11 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20
10.0.0.12 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20

Not that big of a deal right?  Well imagine a situation where r1 is
originating several ten's of thousands of routes.  It sends routes to r2
r2 is processing routes but in reverse order and at the same time it
is sending routes to r3, in the correct order of the bgp table.

r3 will have the early 10.0.0.1/32 routes installed and start forwarding
while r2 will not have those routes installed yet( since they were at the
end and zebra is slightly slower for processing routes than bgp is ).

Ensure that the order sent is a true FIFO.  What is happening is that
there is an update fifo which stores all routes.  And off that FIFO
is a bgp advertise attribute list which stores the list of prefixes
which share the same attribute that allow for more efficient packing
this list was being stored in reverse order causing the problem for
the initial send.  When adding items to this list put them at the
end so we keep the fifo order that is traversed when we walk through
the bgp table.

After the fix:

r2 installation order:

10.0.0.0 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20
10.0.0.1 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20
10.0.0.2 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20
10.0.0.3 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20
10.0.0.4 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20
10.0.0.5 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20
10.0.0.6 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20
10.0.0.7 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20
10.0.0.8 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20
10.0.0.9 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20
10.0.0.10 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20
10.0.0.11 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20
10.0.0.12 nhid 39 via 192.168.8.2 dev leaf2-eth5 proto bgp metric 20

r3 installation order:

10.0.0.0 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20
10.0.0.1 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20
10.0.0.2 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20
10.0.0.3 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20
10.0.0.4 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20
10.0.0.5 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20
10.0.0.6 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20
10.0.0.7 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20
10.0.0.8 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20
10.0.0.9 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20
10.0.0.10 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20
10.0.0.11 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20
10.0.0.12 nhid 12 via 192.168.61.2 dev spine2-eth1 proto bgp metric 20

Signed-off-by: Donald Sharp <[email protected]>
@NetDEF-CI
Copy link
Collaborator

NetDEF-CI commented Sep 21, 2023

Continuous Integration Result: FAILED

Continuous Integration Result: FAILED

See below for issues.
CI System Testrun URL: https://ci1.netdef.org/browse/FRR-PULLREQ2-14292/

This is a comment from an automated CI system.
For questions and feedback in regards to this CI system, please feel free to email
Martin Winter - mwinter (at) opensourcerouting.org.

Get source / Pull Request: Successful

Building Stage: Successful

Basic Tests: Failed

Topotests Ubuntu 18.04 i386 part 8: Failed (click for details)

Topology Test Results are at https://ci1.netdef.org/browse/FRR-PULLREQ2-TOPO8U18I386-14292/test

Topology Tests failed for Topotests Ubuntu 18.04 i386 part 8
see full log at https://ci1.netdef.org/browse/FRR-PULLREQ2-14292/artifact/TOPO8U18I386/TopotestLogs/log_topotests.txt
Topotests Ubuntu 18.04 i386 part 8: Unknown Log
URL: https://ci1.netdef.org/browse/FRR-PULLREQ2-14292/artifact/TOPO8U18I386/TopotestDetails/

Successful on other platforms/tests
  • Topotests Ubuntu 18.04 arm8 part 2
  • Topotests debian 10 amd64 part 5
  • Topotests debian 10 amd64 part 0
  • Addresssanitizer topotests part 9
  • Topotests Ubuntu 18.04 i386 part 6
  • Topotests Ubuntu 18.04 arm8 part 7
  • Topotests Ubuntu 18.04 i386 part 2
  • Topotests Ubuntu 18.04 i386 part 1
  • Addresssanitizer topotests part 7
  • Topotests Ubuntu 18.04 arm8 part 8
  • Topotests debian 10 amd64 part 6
  • Topotests Ubuntu 18.04 amd64 part 5
  • Topotests Ubuntu 18.04 i386 part 7
  • Topotests Ubuntu 18.04 arm8 part 1
  • Topotests Ubuntu 18.04 arm8 part 6
  • Addresssanitizer topotests part 3
  • Topotests Ubuntu 18.04 amd64 part 7
  • Topotests Ubuntu 18.04 i386 part 5
  • Topotests Ubuntu 18.04 i386 part 3
  • Topotests Ubuntu 18.04 i386 part 0
  • Topotests Ubuntu 18.04 amd64 part 2
  • Topotests debian 10 amd64 part 8
  • CentOS 7 rpm pkg check
  • Topotests Ubuntu 18.04 amd64 part 4
  • Topotests Ubuntu 18.04 arm8 part 4
  • Topotests debian 10 amd64 part 9
  • Addresssanitizer topotests part 2
  • Topotests Ubuntu 18.04 arm8 part 9
  • Topotests Ubuntu 18.04 amd64 part 0
  • Topotests Ubuntu 18.04 amd64 part 3
  • Addresssanitizer topotests part 1
  • Topotests Ubuntu 18.04 i386 part 4
  • Static analyzer (clang)
  • Topotests Ubuntu 18.04 amd64 part 1
  • Topotests debian 10 amd64 part 2
  • Topotests debian 10 amd64 part 7
  • Topotests Ubuntu 18.04 arm8 part 0
  • Topotests Ubuntu 18.04 amd64 part 8
  • Debian 9 deb pkg check
  • Topotests Ubuntu 18.04 arm8 part 5
  • Addresssanitizer topotests part 8
  • Topotests Ubuntu 18.04 amd64 part 6
  • Topotests Ubuntu 18.04 amd64 part 9
  • Ubuntu 20.04 deb pkg check
  • Ubuntu 18.04 deb pkg check
  • Debian 10 deb pkg check
  • Addresssanitizer topotests part 6
  • Addresssanitizer topotests part 5
  • Topotests debian 10 amd64 part 1
  • Topotests Ubuntu 18.04 arm8 part 3
  • Addresssanitizer topotests part 4
  • Topotests debian 10 amd64 part 3
  • Addresssanitizer topotests part 0
  • Topotests debian 10 amd64 part 4
  • Topotests Ubuntu 18.04 i386 part 9

@NetDEF-CI
Copy link
Collaborator

Continuous Integration Result: SUCCESSFUL

Congratulations, this patch passed basic tests

Tested-by: NetDEF / OpenSourceRouting.org CI System

CI System Testrun URL: https://ci1.netdef.org/browse/FRR-PULLREQ2-14292/

This is a comment from an automated CI system.
For questions and feedback in regards to this CI system, please feel free to email
Martin Winter - mwinter (at) opensourcerouting.org.

@ton31337 ton31337 merged commit a2a9733 into FRRouting:master Sep 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants