bgpd: CPU Hog seen for bgpd on startup with large number of routes in bgpd.conf #17077

avadhootd · 2024-10-11T20:24:49Z

Description

We are running bgpd with FRR version 8.1.
We have approx. 20K routes in the bgpd.conf file( total lines approx. 40K ).
At the startup bgpd takes too long to read and apply the bgpd.conf file and results in a CPU hog.
We cannot consolidate the routes and advertise as a single big subnet or multiple big subnets.
It takes approx. 40min before it receives CPU hog then restarts and after 40mins again we see CPU hog.
Can someone please help with this or suggest a way to speed up the bgpd startup?

Version

FRRouting 8.1

How to reproduce

Add below in bgpd.conf file with 20K routes,

bgpd.conf
!
!
frr version 8.1
frr defaults traditional
!
router bgp vrf vrf2
bgp router-id
no bgp network import-check
neighbor remote-as
neighbor remote-as
!
address-family ipv4 unicast
network <ipv4 subnet #1>
network <ipv4 subnet #2>
network <ipv4 subnet #3>
…
network <ipv4 subnet #20000>
redistribute connected
neighbor prefix-list vrf2_DENY_IN_V4 in
neighbor prefix-list vrf2_ALLOW_OUT_V4 out
exit-address-family
!
address-family ipv6 unicast
redistribute connected
neighbor activate
neighbor prefix-list vrf2_DENY_IN_V6 in
neighbor prefix-list vrf2_ALLOW_OUT_V6 out
exit-address-family
exit
!
ip prefix-list vrf2_DENY_IN_V4 seq 5 deny any
ipv6 prefix-list vrf2_DENY_IN_V6 seq 5 deny any
ip prefix-list vrf2_ALLOW_OUT_V4 seq 5 permit <ipv4 subnet #1>
ip prefix-list vrf2_ALLOW_OUT_V4 seq 10 permit <ipv4 subnet #2>
ip prefix-list vrf2_ALLOW_OUT_V4 seq 15 permit <ipv4 subnet #3>
…
ip prefix-list vrf2_ALLOW_OUT_V4 seq permit <ipv4 subnet #20000>
!
!
!

Expected behavior

BGPD should not go into CPU hog and restart multiple times.
It also should not take 40 mins to advertise these routes.
Reducing the number of routes to be advertised is not the solution we are looking.

Actual behavior

BGPD takes too long to process bgpd.conf file and results in CPU hog. It restarts after CPU hog and fails to apply bgpd.conf again and results in CPU hog again until number of routes go down to approx. 2K.

Additional context

No response

Checklist

I have searched the open issues for this bug.
I have not included sensitive information in this report.

The text was updated successfully, but these errors were encountered:

riw777 · 2024-10-15T15:46:27Z

Can you try with the latest version/master? There have been a lot of improvements in BGP performance since this version.

avadhootd · 2024-10-15T19:02:35Z

Thanks riw777.

We tried frr version 10.1 as well, we still face same issue of CPU hog there as well.
When we add these routes through a program instead of bgpd.conf we do see some improvement with FRR 10.1 version compared to 8.1 but not through bgpd.conf file.

Here is what we see when there is CPU hog,

CPU usage is 99% on this BGPD thread.

We ran perf tool on the bgpd and we got this results on FRR 10.1.

Is it possible to have multiple threads working parallelly inside bgpd to apply these large scale routes?
Or if there is a way to split the configuration file into multiple files, will that make bgpd process routes faster or without CPU hog? We don't know the underlying implementation of bgpd.
Thank you!

avadhootd added the triage Needs further investigation label Oct 11, 2024

avadhootd changed the title ~~CPU Hog seen for bgpd on startup with large number of routes in bgpd.conf~~ bgpd: CPU Hog seen for bgpd on startup with large number of routes in bgpd.conf Oct 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bgpd: CPU Hog seen for bgpd on startup with large number of routes in bgpd.conf #17077

bgpd: CPU Hog seen for bgpd on startup with large number of routes in bgpd.conf #17077

avadhootd commented Oct 11, 2024

riw777 commented Oct 15, 2024

avadhootd commented Oct 15, 2024 •

edited

Loading

bgpd: CPU Hog seen for bgpd on startup with large number of routes in bgpd.conf #17077

bgpd: CPU Hog seen for bgpd on startup with large number of routes in bgpd.conf #17077

Comments

avadhootd commented Oct 11, 2024

Description

Version

How to reproduce

Expected behavior

Actual behavior

Additional context

Checklist

riw777 commented Oct 15, 2024

avadhootd commented Oct 15, 2024 • edited Loading

avadhootd commented Oct 15, 2024 •

edited

Loading