-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EVPN L2VNI/L3VNI Optimize inline Global walk for remote route installations #17526
EVPN L2VNI/L3VNI Optimize inline Global walk for remote route installations #17526
Conversation
For "bgpd: Suppress redundant L3VNI delete processing" Instrumented logs without fix: ifdown br_l3vni
Instrumented logs without fix: ifup br_l3vni
With Fix: ifdown br_l3vni
With Fix: ifup br_l3vni
|
b186ca3
to
6b68753
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH, I don't understand the goal of 58e8563. Why not bundle together with the real changes?
6b68753
to
2cfe7bd
Compare
4accd1d
to
2d1e5f4
Compare
On a setup with 2501 L2VNI’s 5 L3VNI’s and 200K EVPN scale Test Case:
Without Fix: (Output Fifo rose from 2.7K to 120K with stream size 1.5MB to 639MB)
With Fix: (Output Fifo rose from 2K to 11K with stream size 27MB to 81MB)
|
tests/topotests/evpn_type5_test_topo1/test_evpn_type5_chaos_topo1.py
Outdated
Show resolved
Hide resolved
2d1e5f4
to
7515018
Compare
Adds a msg list for getting strings mapping to enum bgp_evpn_route_type Ticket: #3318830 Signed-off-by: Trey Aspelund <[email protected]>
Anytime BGP gets a L2 VNI ADD from zebra, - Walking the entire global routing table per L2VNI is very expensive. - The next read (say of another VNI ADD) from the socket does not proceed unless this walk is complete. So for triggers where a bulk of L2VNI's are flapped, this results in huge output buffer FIFO growth spiking up the memory in zebra since bgp is slow/busy processing the first message. To avoid this, idea is to hookup the VPN off the bgp_master struct and maintain a VPN FIFO list which is processed later on, where we walk a chunk of VPNs and do the remote route install. Note: So far in the L3 backpressure cases(FRRouting#15524), we have considered the fact that zebra is slow, and the buffer grows in the BGP. However this is the reverse i.e. BGP is very busy processing the first ZAPI message from zebra due to which the buffer grows huge in zebra and memory spikes up. Ticket :#3864372 Signed-off-by: Rajasekar Raja <[email protected]>
Anytime BGP gets a L3 VNI ADD/DEL from zebra, - Walking the entire global routing table per L3VNI is very expensive. - The next read (say of another VNI ADD/DEL) from the socket does not proceed unless this walk is complete. So for triggers where a bulk of L3VNI's are flapped, this results in huge output buffer FIFO growth spiking up the memory in zebra since bgp is slow/busy processing the first message. To avoid this, idea is to hookup the BGP-VRF off the struct bgp_master and maintain a struct bgp FIFO list which is processed later on, where we walk a chunk of BGP-VRFs and do the remote route install/uninstall. Ticket :#3864372 Signed-off-by: Rajasekar Raja <[email protected]>
Consider a master bridge interface (br_l3vni) having a slave vxlan99 mapped to vlans used by 3 L3VNIs. During ifdown br_l3vni interface, the function zebra_vxlan_process_l3vni_oper_down() where zebra sends ZAPI to bgp for a delete L3VNI is sent twice. 1) if_down -> zebra_vxlan_svi_down() 2) VXLAN is unlinked from the bridge i.e. vxlan99 zebra_if_dplane_ifp_handling() --> zebra_vxlan_if_update_vni() (since ZEBRA_VXLIF_MASTER_CHANGE flag is set) During ifup br_l3vni interface, the function zebra_vxlan_process_l3vni_oper_down() is invoked because of access-vlan change - process oper down, associate with new svi_if and then process oper up again The problem here is that the redundant ZAPI message of L3VNI delete results in BGP doing a inline Global table walk for remote route installation when the L3VNI is already removed/deleted. Bigger the scale, more wastage is the CPU utilization. Given the triggers for bridge flap is not a common scenario, idea is to simply return from BGP if the L3VNI is already set to 0 i.e. if the L3VNI is already deleted, do nothing and return. NOTE/TBD: An ideal fix is to make zebra not send the second L3VNI delete ZAPI message. However it is a much involved and a day-1 code to handle corner cases. Ticket :#3864372 Signed-off-by: Rajasekar Raja <[email protected]>
7515018
to
bd32706
Compare
ci:rerun |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good
The following are cases where it is the reverse i.e. BGP is very busy processing the first ZAPI message from zebra due to which the buffer grows huge in zebra and memory spikes up. Below are few triggers as example
Interface Up/Down events
Anytime BGP gets a L2 VNI ADD (or) L3 VNI ADD/DEL from zebra,
So for triggers where a bulk of L2VNI's/L3VNIs are flapped, this results in huge output buffer FIFO growth spiking up the memory in zebra since bgp is slow/busy processing the first message.
To avoid this, idea is to
Note: So far in the L3 backpressure cases(#15524), we have considered the fact that zebra is slow, and the buffer grows in the BGP.
However, this is the reverse i.e. BGP is very busy processing the first ZAPI message from zebra due to which the buffer grows huge in zebra and memory spikes up.