Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement volumes force detach #2242

Merged
merged 8 commits into from
Jan 30, 2025
Merged

Conversation

r4victor
Copy link
Collaborator

@r4victor r4victor commented Jan 29, 2025

Closes #2218

The PR:

  • Refactors job termination logic. Before, process_terminating_job could be called from process_terminating_run (e.g. for non-running provisioned jobs) (evidently to safe a processing iteration). Now, process_terminating_run only marks jobs as TERMINATION, and only process_termination_jobs background task performs the jobs termination. This simplifies locking management significantly and overall logic.
  • Fixes a locking bug when an instance volumes may not be detached on job termination.
  • Adds backend method to check volume attach status.
  • Adds job terminating logic that keeps checking volume attach status and force detaches volumes if they are stuck.
  • Adds stop_duration to run configuration/profile to control max duration a job waits until force terminating.
  • Fixes *_duration parameters parsing. E.g. "off" was handled for max_duration but not off and vice versa for idle_duration. true was allowed but parsed as 1.
  • Other small fixes.

Next:

  • Fix dstack stop CLI command to wait until the run enters the terminal status. Currently, the API is async meaning that run may be terminating for some time after the API returns.
  • Handle attaching/detaching volumes for gpu blocks.

@r4victor r4victor marked this pull request as ready for review January 30, 2025 07:47
@r4victor r4victor requested review from un-def and jvstme January 30, 2025 07:49
@r4victor r4victor merged commit 2c3d83a into master Jan 30, 2025
22 checks passed
@r4victor r4victor deleted the issue_2218_volumes_force_detach branch January 30, 2025 10:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Handle cases when a volume get stuck in the detaching state
2 participants