lib: Correctly handle ppoll pfds.events == 0 #17025

donaldsharp · 2024-10-08T02:02:02Z

The frrevent system is spitting out this message in bgpd: 20:40:15 mem1-roc-f2-b1-r5-t2-d4 bgpd[13166]: [XETTR-D5MR0][EC 100663316] Attempting to process an I/O event but for fd: 214(8) no thread to handle this!

This is because as each io event is processed, it is possible that a .events is set to 0. This can leave a situation where we ask ppoll to handle anything that happens on a fd with a .events of 0, in this situation ppoll can return POLLERR, which indicates that something bad has happened on the fd.

Let's cleanup the pfds after io events are handled that can be cleaned up.

mjstapp · 2024-10-15T12:53:56Z

lib/event.c

@@ -1690,6 +1690,35 @@ static void thread_process_io(struct event_loop *m, unsigned int num)
 	for (i = 0; i < last_read && ready < num; ++i)
 		thread_process_io_inner_loop(m, num, pfds, &i, &ready);

+	/*
+	 * At this point any fd's that were processed


I'm a little confused about the issue: if a process closes an fd without cancelling the read or write task that uses the fd, that sounds like an application issue that should be fixed?

it is cancelling the read/writes! It's a different problem. Here's the normal loop of events for read or writes:

program registers with the event system a read/write event. ppoll fds array is modified appropriately
ppoll is run eventually. Read or write event happens. lib/event.c clears the read/write event leaving a .event with 0 and calls the appropriate handler.
Handler wakes up and decides that no more reads or writes should be put on the ppoll fds and does not reregister itself. .
ppoll is entered again and the socket is closed and ppoll sends us a POLLERR which means that something bad happened with the fd, but since nothing got reregistered it puts out the warning.

What I mean is that the array of fds - handler.pfds - isn't removing an fd after each event on the fd. doesn't it look like the fd is left in place, with zero poll bits set? what's clearing the fd from the pfds array, other than the event cancel apis - is it the POLLNVAL path?

that is exactly what it is doing.

In any event I thought of a different way to do this that removes a expensive walk. testing it now.

I think I understand the desire to keep the array stable - in most patterns, an application is going to use a socket ... more than once. but I wonder if it's the copy of the array that's maybe the problem. it's just a brute-force copy right now, but if it were a little smarter, and avoided copying the fds with poll bits of zero, that'd be a sort of compromise? there's code that assumes the 'copy' is an exact copy, so that'd have to be fixed, and maybe that'd be costly.

eqvinox · 2024-10-17T12:54:22Z

lib/event.c

+			 * into that spot, so note it
+			 */
+			if (m->handler.pfds[i].fd == -1 && earlierqueuepos == m->handler.pfdcount)
+				earlierqueuepos = i;


You can just check and set queuepos here and remove earlierqueuepos and queuepos_found… since the loop is gonna continue to run, and if we find the fd we overwrite queuepos anyway and break after…

The frrevent system is spitting out this message in bgpd: 20:40:15 mem1-roc-f2-b1-r5-t2-d4 bgpd[13166]: [XETTR-D5MR0][EC 100663316] Attempting to process an I/O event but for fd: 214(8) no thread to handle this! This is because as each io event is processed, it is possible that a .events is set to 0. This can leave a situation where we ask ppoll to handle anything that happens on a fd with a .events of 0, in this situation ppoll can return POLLERR, which indicates that something bad has happened on the fd. Let's set the ppoll fds.fd value to -1 when there are no more events to be processed. ppoll specifically calls out that it will just skip this particular one. Signed-off-by: Donald Sharp <[email protected]>

frrbot bot added the libfrr label Oct 8, 2024

github-actions bot added master size/S labels Oct 8, 2024

mjstapp reviewed Oct 15, 2024

View reviewed changes

donaldsharp force-pushed the ppoll_fix branch from 89f974a to b05fa44 Compare October 16, 2024 16:43

github-actions bot added size/M and removed size/S labels Oct 16, 2024

donaldsharp force-pushed the ppoll_fix branch from b05fa44 to 4f735b4 Compare October 17, 2024 11:49

eqvinox reviewed Oct 17, 2024

View reviewed changes

donaldsharp force-pushed the ppoll_fix branch from 4f735b4 to d11ad98 Compare October 17, 2024 15:08

github-actions bot added size/S and removed size/M labels Oct 17, 2024

eqvinox approved these changes Oct 18, 2024

View reviewed changes

eqvinox merged commit 841f7a4 into FRRouting:master Oct 18, 2024
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lib: Correctly handle ppoll pfds.events == 0 #17025

lib: Correctly handle ppoll pfds.events == 0 #17025

donaldsharp commented Oct 8, 2024

mjstapp Oct 15, 2024

donaldsharp Oct 16, 2024

mjstapp Oct 16, 2024

donaldsharp Oct 16, 2024

mjstapp Oct 16, 2024

eqvinox Oct 17, 2024

donaldsharp Oct 17, 2024

lib: Correctly handle ppoll pfds.events == 0 #17025

lib: Correctly handle ppoll pfds.events == 0 #17025

Conversation

donaldsharp commented Oct 8, 2024

mjstapp Oct 15, 2024

Choose a reason for hiding this comment

donaldsharp Oct 16, 2024

Choose a reason for hiding this comment

mjstapp Oct 16, 2024

Choose a reason for hiding this comment

donaldsharp Oct 16, 2024

Choose a reason for hiding this comment

mjstapp Oct 16, 2024

Choose a reason for hiding this comment

eqvinox Oct 17, 2024

Choose a reason for hiding this comment

donaldsharp Oct 17, 2024

Choose a reason for hiding this comment