bgpd: add input per-peer priority pkt queue #16517

pguibert6WIND · 2024-08-05T16:55:17Z

Under heavy loaded system with many connections, and many BGP updates received, the system has issues to maintain BGP connectivity with peers.

For instance, the wireshark capture indicates a BGP open packet is received, and is processed by the device many seconds after. Obviously, the remote peer had already flushed the connection..

The proposed fix attempts to create a second stream of buffers reserved for prioritary BGP packets: BGP OPEN, BGP NOTIF, BGP KEEPALIVE. When running the bgp_packet_receive() job, this prioritary fifo is dequeued before consuming the standard fifo queue.

Under heavy loaded system with many connections, and many BGP updates received, the system has issues to maintain BGP connectivity with peers. For instance, the wireshark capture indicates a BGP open packet is received, and is processed by the device many seconds after. Obviously, the remote peer had already flushed the connection.. The proposed fix attempts to create a second stream of buffers reserved for prioritary BGP packets: BGP OPEN, BGP NOTIF, BGP KEEPALIVE. When running the bgp_packet_receive() job, this prioritary fifo is dequeued before consuming the standard fifo queue. Signed-off-by: Philippe Guibert <[email protected]>

donaldsharp · 2024-08-05T17:01:42Z

bgpd/bgp_io.c

@@ -200,8 +203,18 @@ static int read_ibuf_work(struct peer_connection *connection)
 	stream_set_endp(pkt, pktsize);

 	frrtrace(2, frr_bgp, packet_read, connection->peer, pkt);
+
+	/* get the type of message */
+	stream_forward_getp(pkt, BGP_MARKER_SIZE + 2);


Please write some code that ensures that the pktsize is sufficient to actually read this data.

mjstapp · 2024-08-05T17:05:11Z

I have reservations about this: if a system is so loaded that it is running seconds behind, as in the description, should it be prioritizing more connections and more work? what good would come from that?

donaldsharp · 2024-08-05T17:41:27Z

I'm pretty ok with making such a change. Although I really don't understand how what @pguibert6WIND has done here will actually solve the problem that is being stated. I do not see any code where we prioritize the special packet types to handle those immediately -vs- handling other peers already knee deep in reading in full tables. Can I be pointed at this decision? Additionally if a peer is already up in bgp_io.c we reset the keepalive timer for any packet we already receive, so I think the proiritizing of keepalive packets make no sense there. All peers eventually end up adding a connect->t_process_packet event, which is per peer and why I don't think this will work.

ton31337

I have a question about when Graceful Restart is used. End-of-Rib is sent as BGP_MSG_UPDATE, that mean we do not prioritize those messages?

pguibert6WIND · 2024-08-05T19:34:51Z

I'm pretty ok with making such a change. Although I really don't understand how what @pguibert6WIND has done here will actually solve the problem that is being stated. I do not see any code where we prioritize the special packet types to handle those immediately -vs- handling other peers already knee deep in reading in full tables. Can I be pointed at this decision? Additionally if a peer is already up in bgp_io.c we reset the keepalive timer for any packet we already receive, so I think the proiritizing of keepalive packets make no sense there. All peers eventually end up adding a connect->t_process_packet event, which is per peer and why I don't think this will work.

Typo: the main issue is related to BGP open messages that are processed too lately.
This said, the same problem should happen with KEEPALIVE messages if user configures lower keepalive values.

pguibert6WIND · 2024-08-05T19:36:12Z

I have reservations about this: if a system is so loaded that it is running seconds behind, as in the description, should it be prioritizing more connections and more work? what good would come from that?

scenario is a RR with 998 listening peers and 2 speaking peers, advertising 4M L3VPN prefixes.

donaldsharp · 2024-08-05T20:19:30Z

You don't explain how this helps, given the way the system works. If I have 998 peers that means I possibly have 998 events that are before the new peers event to handle it. Each of those events can process a whole bunch of packets already. Those events are the ones that are not allowing us to get to the problem, as you are describing. Remember we have a event per peer to handle it's i/o, adding a priority queue for the peer doesn't help when I have to process the other events from other peers first( as that their input queues could be filled). To solve the problem from my perspective you would need to have all priority events handled first before any normal packet processing could occur. This implies some sort of equivalent to zebra's MetaQ handling. Finally, your answer about keepalives makes no sense to me as well. As that FRR currently counts any packets received as triggering the keepalive already. When we made this change I have not seen keepalive problems since then at all.

Is this code you have had internally for a long time and are now just forward porting?

pguibert6WIND · 2024-08-06T10:00:15Z

You don't explain how this helps, given the way the system works. If I have 998 peers that means I possibly have 998 events that are before the new peers event to handle it. Each of those events can process a whole bunch of packets already. Those events are the ones that are not allowing us to get to the problem, as you are describing. Remember we have a event per peer to handle it's i/o, adding a priority queue for the peer doesn't help when I have to process the other events from other peers first( as that their input queues could be filled). To solve the problem from my perspective you would need to have all priority events handled first before any normal packet processing could occur. This implies some sort of equivalent to zebra's MetaQ handling. Finally, your answer about keepalives makes no sense to me as well. As that FRR currently counts any packets received as triggering the keepalive already. When we made this change I have not seen keepalive problems since then at all.

Is this code you have had internally for a long time and are now just forward porting?

This code is a trial done a few weeks ago.
The impact during the test was not obvious, and we did not use it then.
The intent here was to share the code for further review.

You are totally right that about the per peer queue case. thanks for pointing out this.

The processing of a given peer will not speed up the processing of an other peer.
=> I need to think of an alternate change.

About the bgp keepalive, I think this may happen.

If we take delay to process bgp updates for a given peer, and we also have bgp keepalives, the bgp keepalives packet will only be processed after the bgp updates. and the delay may take seconds..afaik, the keepalive thread is done for transmitting bgp keepalive, not receiving bgp keepalives.

pguibert6WIND · 2024-08-06T10:02:34Z

moving it to draft to decide of the future of this commit:

to propose better solution to anticipate bgp open message handling. it becomes clear it is not significant to bgp open messages.
or only to focus on bgp keepalive (and bgp notifications).

donaldsharp

As far as I understand the code, this change will not do anything to make the situation better. I am just adding a Request Changes so that nothing gets added until we resolve the actual problem trying to be resolved

pushpasis · 2024-08-08T07:07:38Z

I have reservations about this: if a system is so loaded that it is running seconds behind, as in the description, should it be prioritizing more connections and more work? what good would come from that?

I agree with @mjstapp here. I feel a better approach will be to prioritize the keep-alive messages instead, so that there's no flap of the existing sessions due to high amount of updates.

frrbot bot added the bgp label Aug 5, 2024

github-actions bot added master size/M labels Aug 5, 2024

donaldsharp reviewed Aug 5, 2024

View reviewed changes

mjstapp self-requested a review August 5, 2024 17:05

ton31337 reviewed Aug 5, 2024

View reviewed changes

pguibert6WIND marked this pull request as draft August 6, 2024 10:01

donaldsharp requested changes Aug 6, 2024

View reviewed changes

pguibert6WIND closed this Sep 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bgpd: add input per-peer priority pkt queue #16517

bgpd: add input per-peer priority pkt queue #16517

pguibert6WIND commented Aug 5, 2024

donaldsharp Aug 5, 2024

mjstapp commented Aug 5, 2024

donaldsharp commented Aug 5, 2024

ton31337 left a comment •

edited

Loading

pguibert6WIND commented Aug 5, 2024 •

edited

Loading

pguibert6WIND commented Aug 5, 2024 •

edited

Loading

donaldsharp commented Aug 5, 2024

pguibert6WIND commented Aug 6, 2024 •

edited

Loading

pguibert6WIND commented Aug 6, 2024

donaldsharp left a comment

pushpasis commented Aug 8, 2024 •

edited

Loading

bgpd: add input per-peer priority pkt queue #16517

bgpd: add input per-peer priority pkt queue #16517

Conversation

pguibert6WIND commented Aug 5, 2024

donaldsharp Aug 5, 2024

Choose a reason for hiding this comment

mjstapp commented Aug 5, 2024

donaldsharp commented Aug 5, 2024

ton31337 left a comment • edited Loading

Choose a reason for hiding this comment

pguibert6WIND commented Aug 5, 2024 • edited Loading

pguibert6WIND commented Aug 5, 2024 • edited Loading

donaldsharp commented Aug 5, 2024

pguibert6WIND commented Aug 6, 2024 • edited Loading

pguibert6WIND commented Aug 6, 2024

donaldsharp left a comment

Choose a reason for hiding this comment

pushpasis commented Aug 8, 2024 • edited Loading

ton31337 left a comment •

edited

Loading

pguibert6WIND commented Aug 5, 2024 •

edited

Loading

pguibert6WIND commented Aug 5, 2024 •

edited

Loading

pguibert6WIND commented Aug 6, 2024 •

edited

Loading

pushpasis commented Aug 8, 2024 •

edited

Loading