Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lib: Correctly handle ppoll pfds.events == 0 #17025

Merged
merged 1 commit into from
Oct 18, 2024

Conversation

donaldsharp
Copy link
Member

The frrevent system is spitting out this message in bgpd: 20:40:15 mem1-roc-f2-b1-r5-t2-d4 bgpd[13166]: [XETTR-D5MR0][EC 100663316] Attempting to process an I/O event but for fd: 214(8) no thread to handle this!

This is because as each io event is processed, it is possible that a .events is set to 0. This can leave a situation where we ask ppoll to handle anything that happens on a fd with a .events of 0, in this situation ppoll can return POLLERR, which indicates that something bad has happened on the fd.

Let's cleanup the pfds after io events are handled that can be cleaned up.

lib/event.c Outdated
@@ -1690,6 +1690,35 @@ static void thread_process_io(struct event_loop *m, unsigned int num)
for (i = 0; i < last_read && ready < num; ++i)
thread_process_io_inner_loop(m, num, pfds, &i, &ready);

/*
* At this point any fd's that were processed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused about the issue: if a process closes an fd without cancelling the read or write task that uses the fd, that sounds like an application issue that should be fixed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is cancelling the read/writes! It's a different problem. Here's the normal loop of events for read or writes:

program registers with the event system a read/write event. ppoll fds array is modified appropriately
ppoll is run eventually. Read or write event happens. lib/event.c clears the read/write event leaving a .event with 0 and calls the appropriate handler.
Handler wakes up and decides that no more reads or writes should be put on the ppoll fds and does not reregister itself. .
ppoll is entered again and the socket is closed and ppoll sends us a POLLERR which means that something bad happened with the fd, but since nothing got reregistered it puts out the warning.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is that the array of fds - handler.pfds - isn't removing an fd after each event on the fd. doesn't it look like the fd is left in place, with zero poll bits set? what's clearing the fd from the pfds array, other than the event cancel apis - is it the POLLNVAL path?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is exactly what it is doing.

In any event I thought of a different way to do this that removes a expensive walk. testing it now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I understand the desire to keep the array stable - in most patterns, an application is going to use a socket ... more than once. but I wonder if it's the copy of the array that's maybe the problem. it's just a brute-force copy right now, but if it were a little smarter, and avoided copying the fds with poll bits of zero, that'd be a sort of compromise? there's code that assumes the 'copy' is an exact copy, so that'd have to be fixed, and maybe that'd be costly.

lib/event.c Outdated
* into that spot, so note it
*/
if (m->handler.pfds[i].fd == -1 && earlierqueuepos == m->handler.pfdcount)
earlierqueuepos = i;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just check and set queuepos here and remove earlierqueuepos and queuepos_found… since the loop is gonna continue to run, and if we find the fd we overwrite queuepos anyway and break after…

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

The frrevent system is spitting out this message in bgpd:
20:40:15 mem1-roc-f2-b1-r5-t2-d4 bgpd[13166]: [XETTR-D5MR0][EC 100663316] Attempting to process an I/O event but for fd: 214(8) no thread to handle this!

This is because as each io event is processed, it is possible that a
.events is set to 0.  This can leave a situation where we ask
ppoll to handle anything that happens on a fd with a .events of 0,
in this situation ppoll can return POLLERR, which indicates that
something bad has happened on the fd.

Let's set the ppoll fds.fd value to -1 when there are no more
events to be processed.  ppoll specifically calls out that
it will just skip this particular one.

Signed-off-by: Donald Sharp <[email protected]>
@eqvinox eqvinox merged commit 841f7a4 into FRRouting:master Oct 18, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants