Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bgpd: add input per-peer priority pkt queue #16517

Closed
wants to merge 1 commit into from

Conversation

pguibert6WIND
Copy link
Member

Under heavy loaded system with many connections, and many BGP updates received, the system has issues to maintain BGP connectivity with peers.

For instance, the wireshark capture indicates a BGP open packet is received, and is processed by the device many seconds after. Obviously, the remote peer had already flushed the connection..

The proposed fix attempts to create a second stream of buffers reserved for prioritary BGP packets: BGP OPEN, BGP NOTIF, BGP KEEPALIVE. When running the bgp_packet_receive() job, this prioritary fifo is dequeued before consuming the standard fifo queue.

Under heavy loaded system with many connections, and many BGP updates
received, the system has issues to maintain BGP connectivity with peers.

For instance, the wireshark capture indicates a BGP open packet is received,
and is processed by the device many seconds after. Obviously, the remote
peer had already flushed the connection..

The proposed fix attempts to create a second stream of buffers reserved
for prioritary BGP packets: BGP OPEN, BGP NOTIF, BGP KEEPALIVE.
When running the bgp_packet_receive() job, this prioritary fifo is
dequeued before consuming the standard fifo queue.

Signed-off-by: Philippe Guibert <[email protected]>
@@ -200,8 +203,18 @@ static int read_ibuf_work(struct peer_connection *connection)
stream_set_endp(pkt, pktsize);

frrtrace(2, frr_bgp, packet_read, connection->peer, pkt);

/* get the type of message */
stream_forward_getp(pkt, BGP_MARKER_SIZE + 2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please write some code that ensures that the pktsize is sufficient to actually read this data.

@mjstapp
Copy link
Contributor

mjstapp commented Aug 5, 2024

I have reservations about this: if a system is so loaded that it is running seconds behind, as in the description, should it be prioritizing more connections and more work? what good would come from that?

@mjstapp mjstapp self-requested a review August 5, 2024 17:05
@donaldsharp
Copy link
Member

I'm pretty ok with making such a change. Although I really don't understand how what @pguibert6WIND has done here will actually solve the problem that is being stated. I do not see any code where we prioritize the special packet types to handle those immediately -vs- handling other peers already knee deep in reading in full tables. Can I be pointed at this decision? Additionally if a peer is already up in bgp_io.c we reset the keepalive timer for any packet we already receive, so I think the proiritizing of keepalive packets make no sense there. All peers eventually end up adding a connect->t_process_packet event, which is per peer and why I don't think this will work.

Copy link
Member

@ton31337 ton31337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a question about when Graceful Restart is used. End-of-Rib is sent as BGP_MSG_UPDATE, that mean we do not prioritize those messages?

@pguibert6WIND
Copy link
Member Author

pguibert6WIND commented Aug 5, 2024

I'm pretty ok with making such a change. Although I really don't understand how what @pguibert6WIND has done here will actually solve the problem that is being stated. I do not see any code where we prioritize the special packet types to handle those immediately -vs- handling other peers already knee deep in reading in full tables. Can I be pointed at this decision? Additionally if a peer is already up in bgp_io.c we reset the keepalive timer for any packet we already receive, so I think the proiritizing of keepalive packets make no sense there. All peers eventually end up adding a connect->t_process_packet event, which is per peer and why I don't think this will work.

Typo: the main issue is related to BGP open messages that are processed too lately.
This said, the same problem should happen with KEEPALIVE messages if user configures lower keepalive values.

@pguibert6WIND
Copy link
Member Author

pguibert6WIND commented Aug 5, 2024

I have reservations about this: if a system is so loaded that it is running seconds behind, as in the description, should it be prioritizing more connections and more work? what good would come from that?

scenario is a RR with 998 listening peers and 2 speaking peers, advertising 4M L3VPN prefixes.

@donaldsharp
Copy link
Member

You don't explain how this helps, given the way the system works. If I have 998 peers that means I possibly have 998 events that are before the new peers event to handle it. Each of those events can process a whole bunch of packets already. Those events are the ones that are not allowing us to get to the problem, as you are describing. Remember we have a event per peer to handle it's i/o, adding a priority queue for the peer doesn't help when I have to process the other events from other peers first( as that their input queues could be filled). To solve the problem from my perspective you would need to have all priority events handled first before any normal packet processing could occur. This implies some sort of equivalent to zebra's MetaQ handling. Finally, your answer about keepalives makes no sense to me as well. As that FRR currently counts any packets received as triggering the keepalive already. When we made this change I have not seen keepalive problems since then at all.

Is this code you have had internally for a long time and are now just forward porting?

@pguibert6WIND
Copy link
Member Author

pguibert6WIND commented Aug 6, 2024

You don't explain how this helps, given the way the system works. If I have 998 peers that means I possibly have 998 events that are before the new peers event to handle it. Each of those events can process a whole bunch of packets already. Those events are the ones that are not allowing us to get to the problem, as you are describing. Remember we have a event per peer to handle it's i/o, adding a priority queue for the peer doesn't help when I have to process the other events from other peers first( as that their input queues could be filled). To solve the problem from my perspective you would need to have all priority events handled first before any normal packet processing could occur. This implies some sort of equivalent to zebra's MetaQ handling. Finally, your answer about keepalives makes no sense to me as well. As that FRR currently counts any packets received as triggering the keepalive already. When we made this change I have not seen keepalive problems since then at all.

Is this code you have had internally for a long time and are now just forward porting?

This code is a trial done a few weeks ago.
The impact during the test was not obvious, and we did not use it then.
The intent here was to share the code for further review.

You are totally right that about the per peer queue case. thanks for pointing out this.

  • The processing of a given peer will not speed up the processing of an other peer.
    => I need to think of an alternate change.

About the bgp keepalive, I think this may happen.

  • If we take delay to process bgp updates for a given peer, and we also have bgp keepalives, the bgp keepalives packet will only be processed after the bgp updates. and the delay may take seconds..afaik, the keepalive thread is done for transmitting bgp keepalive, not receiving bgp keepalives.

@pguibert6WIND pguibert6WIND marked this pull request as draft August 6, 2024 10:01
@pguibert6WIND
Copy link
Member Author

moving it to draft to decide of the future of this commit:

  • to propose better solution to anticipate bgp open message handling. it becomes clear it is not significant to bgp open messages.
  • or only to focus on bgp keepalive (and bgp notifications).

Copy link
Member

@donaldsharp donaldsharp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand the code, this change will not do anything to make the situation better. I am just adding a Request Changes so that nothing gets added until we resolve the actual problem trying to be resolved

@pushpasis
Copy link
Contributor

pushpasis commented Aug 8, 2024

I have reservations about this: if a system is so loaded that it is running seconds behind, as in the description, should it be prioritizing more connections and more work? what good would come from that?

I agree with @mjstapp here. I feel a better approach will be to prioritize the keep-alive messages instead, so that there's no flap of the existing sessions due to high amount of updates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants