Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleaning up Batch jobs on unexpected termination #27

Open
spitz-dan-l opened this issue Apr 6, 2022 · 1 comment
Open

Cleaning up Batch jobs on unexpected termination #27

spitz-dan-l opened this issue Apr 6, 2022 · 1 comment

Comments

@spitz-dan-l
Copy link
Contributor

If a keyboard interrupt halts an in-progress redun execution, any in-flight AWS Batch jobs will keep on running after redun has exited. Ideally those jobs would be cancelled before redun shuts down.

Perhaps relatedly, on keyboard interrupt, redun hangs before exiting and after printing "Shutting down... Ctrl+C again to force shutdown." I'm unsure what conditions in my configuration are causing this hang.

Would adding this cleanup require changes to the scheduler and executor interfaces? Perhaps a new executor hook could be added to be invoked when a job is rejected? Though I'm unclear on whether redun actually tries to reject jobs on keyboard interrupt.

Cheers!
Dan Spitz

@mattrasmus
Copy link
Collaborator

Thanks @spitz-dan-l for posting this. The current behavior is opinionated. As you state, if the scheduler is killed (e.g. with Ctrl+C) the AWS Batch jobs do continue until completion. If you start the scheduler again, redun will attempt to reunite with the jobs or their final outputs in S3.

If you really want to kill all AWS Batch jobs, there is a lower level command to do that:

redun aws kill-jobs

There are also a few filters to kill a subset of jobs (e.g. by status).

There are some plans for adding job canceling / killing as a mechanism. It could be used during Ctrl+C if that's desired. It could also be used if one Job fails and there is no catch(), so all sibling jobs should be auto-canceled. We are thinking though how users can specify those different behaviors. Any ideas you have are welcomed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants