-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gearmand crashes or unreachable #345
Comments
Bug 1: Nope. But I think it's related to your usage of a persistent storage plugin. I've never seen that "Comparing queue %u to limit %u for priority %u" message in my usage, but I don't use any of the persistent storage plugins. Looking at the code, I'm guessing you're hitting the We try to discourage usage of the persistent storage plugins these days. The consensus among the gearmand developers is that they were a bad idea. We still accept PRs for maintaining the persistent storage layers, but we've stopped active development on them. It's better to use a design pattern where job persistence is implemented by workers. It scales better and is generally more robust. (There are two frameworks for implementing such a system, one is called Gearstore and another is called Garavini. You might want to look into them. It's also straighforward to implement your own persistent storage tasks once you understand the design pattern.) Bug 2: This sounds like issue #232. There's a prospective patch in that issue. Please try it and let us know if it resolves the issue. Also, if upgrading your OS broke gearmand, I would recommend running your gearmand in a Docker container on whatever OS version worked for you in the past. See PR #327 for the Dockerfiles I use. |
Thank you for your suggestions. We possibly identified two workers that seems to trigger both condition. We disabled them and since then gearmand is running stable. We still need to investigate these workers further. I'm not sure if we hit We have put multiple debug log lines in job.cc to find out where exactly gearmand went before going down with exit 1. Section of the log with the debug line:
The log indicates that gearmand went to If we find the issue in those workers, I'll post this in this issue for further reference. |
It's not outside the realm of possibility that poorly implemented clients and workers can cause problems with the gearmand server. If that's the case, it's a flaw and any help in tracking down what the problem might be would be appreciated. 😄 Were there any changes to these workers when you migrated to Debian 11? Or are these new workers? Also, what OS are you migrating from (where presumably everything worked correctly)? |
We recently upgraded two Gearman Job Servers to Debian 11 and are since then experiencing problems.
As far as we debugged the situation we have identified two possible bugs.
Bug one:
The gearman-job-server crashes with a exit 1. The debug log shows us that it's always after
Comparing queue %u to limit %u for priority %u -> libgearman-server/job.cc:175
Currently we are building a version with some extra debug lines (in job.cc) to investigate this.
Bug two:
The gearman-job-server uses 100% cpu and doesn't accept any connections from workers or gearadmin. As far as seen
kill 9
is the only way to restart the job server. Debug log shows that this cpu usages starts afterGear connection disconnected:
. After that only messages aboutAccepted connection from
are shownWe have used the packages that Debian provide us, but also build our own packages from master.
We use PHP and Perl workers.
Server version: 1.1.19.1
libgearman: 1.1.19.1
libgearman-client-perl: 2.004.015-1
Maybe this two bugs are connected to each other.
Anybody have seen this behavior before?
The text was updated successfully, but these errors were encountered: