-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libbroker: make_subscriber() SIGABRT on too many files in flare::flare() #312
Comments
I'm not so sure about that. If we replace subscribers with WebSocket clients, it'll run into the same issues. Each WebSocket connection still opens up a socket at the end of the day. How much is a "large number of workers"? I suppose well below 1k? IIRC, the limit for open file handles is ~1k by default. I'm just assuming that the number of workers is well below that. How did Zeek end up reaching that limit in the first place? Is it maybe using one consumer per peer/topic? Because they are using flares (i.e., pipes), users aren't really supposed to have a large number of these. Even if we changed |
The configuration was 9 worker nodes with 16 workers each which triggered the error. Less than that was okay. Possibly doubling both might not be unrealistic in a very large multi-worker cluster and that's then already ~640 workers (individual Zeek processes).
It's the zeekctl Python process that reached the limit. That just peers with each of the individual processes on demand IIUC: I hope this helps a bit.
Yeah, crashing would definitely be preferred. |
A failing
broker::endpoint::make_subscriber()
should not raise SIGABRT.A user on Slack reported zeekctl coredumping when configuring a large number of workers. Decreasing the number of workers avoided the coredumps. Cordumps were truncated by systemd-coredump, making it difficult to figure out where the error occurred.
Running
zeekctl
with GDB showed the following:When running zeekctl without
gdb
, the following message wasn't visible (might be a zeekctl thing). It would have helped otherwise.Having that error log is good, but
libbroker.so
API functions should not raise SIGABRT or any other signals in case of errors, particularly those that could be recovered or better reported by the embedding code. An abort() will take down the whole process making it hard to understand where the issue was. Rather, the error should be reported as exception or error code back to the caller.I presume the broker websocket interface makes addressing this a bit mood, as embedding libbroker will become rare in the future, but creating the ticket for reference.
The text was updated successfully, but these errors were encountered: