-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
prod deployment hangs on resque-pool hotswap step #1783
Comments
If you control-c out of the hung resque-pool swap process and want to manually complete the deployment steps that were missed by capistrano (after confirming that the worker counts look ok and that there are no stale workers to deal with):
|
Note: |
I did some debugging on this last week, which stalled out since I was out sick part of Friday. What I've found so far:
could probably use some pairing on further troubleshooting. fuller debugging output from my testing last week: preservation_catalog % ssh [email protected] 'cd preservation_catalog/current/ && bundle exec resque-pool --daemon --hot-swap --environment production && echo $?'
0
^C%
preservation_catalog %
preservation_catalog % ssh [email protected] 'echo "will this ssh invocation hang?"'
will this ssh invocation hang?
preservation_catalog %
preservation_catalog % ssh [email protected] 'cd preservation_catalog/current/ && bundle exec echo "will this ssh invocation hang?"'
will this ssh invocation hang?
preservation_catalog % |
note: pre-assembly devopsdocs has info on how to check whether there are stale workers in a resque-pool instance: https://github.com/sul-dlss/DevOpsDocs/blob/master/projects/pre-assembly/operations-concerns.md#stop-unresponsive-workers-workers-with-jobs-that-are-taking-too-long-and-zombie-workers though also note, after running into this problem a number of times, we've discovered that the resque-pool instance generally restarts without a problem (despite the cap command hanging), and doesn't leave stale workers that need to be stopped manually. but as a quick check that all is well, it is worth confirming that the worker pools that are running on the worker VMs were brought up with the deployment that was just done, and that no strays are leftover from an old deployment. |
@mjgiarlo says: "I tried tweaking the cap task to print out more diagnostic information and it did so just fine. it's just something about that hot swap operation executed via cap? Like, it can invoke bundle exec resque-pool --help just fine." |
Has anyone tried to see what happens if we invoke the command manually on the server that cap is executing? I would assume the issue is not necessarily capistrano, but rather the hot swap operation command (whatever that is). |
surprisingly, that seems to work fine, tested that and ran into no problems, noted above, i think |
the extra weird part to me was that running the command over ssh indicated that it returned, and then something else caused the cap command to hang mysteriously after that, see end of this comment: #1783 (comment) |
@edsu ran across this resque-pool issue, which may be relevant: resque/resque-pool#107 |
It's a longshot but perhaps changing the QUIT to a INT over in dlss-capistrano might help? |
|
Describe the bug
starting with @aaron-collier running dependency updates last week, we noticed that when deploying the app to production, the deployment process will hang on the resque-pool hotswap step (where the workers are restarted). if the resque pool hotswap cap task is run alone against prod, it will hang too.
other observations:
User Impact
nothing direct, this just makes deployment more of a pain for devs (as the deployment must be cancelled out of using
^C
once it's apparent the process is hung, and then the dev should probably go check that things were deployed and restarted correctly). it makes it hard to include pres cat in the bulk deployments for weekly dependency updates (because it'll hang subsequent deployments that happen to fall after it, i.e. any project with a name lexically greater thanpreservation_catalog
).Object(s) In This Error State
N/A
Steps to Reproduce, if Known:
cap deploy
or run theresque:pool:hotswap
task for prodExpected behavior
resque pool hotswap doesn't hang, and returns in a timely manner on success or error
Screenshots
n/a
Additional context
n/a
The text was updated successfully, but these errors were encountered: