-
Notifications
You must be signed in to change notification settings - Fork 610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify behavioral guarantees for plugin api #2277
Comments
To checkpoint opened files, CRIU iterates through an array of file descriptors (FDs) in According to the note in
The handling is serialized. CRIU iterates through each task using deep first search. See
I'm assuming that you are referring to the |
RErrabolu: First of all let me thank for taking time to comment and noting a few aspects either via links or method names
RErrabolu: Thanks for describing the high-level call sequence that is involved in checkpointing. Per this statement making any assumptions regarding the order of calls is incorrect. A working implementation might break overnight should the order change.
RErrabolu: Thanks for noting that CRIU framework employs a scheme that has the properties of being serialized and depth-first (DFS) which checkpointing a process tree. RErrabolu: Per my experience restore seems to involve concurrency. Multiple calls, to restore device files, from CRIU framework are in-flight concurrently. It is not clear if restore traces its path from leaf nodes to the root.
RErrabolu: All in all I feel, in my eval, CRIU should allow plugins to specify a rule base which could guide the checkpoint and restore procedures. |
@rerrabolu checkpoint/restore of processes that use GPUs is a relatively new CRIU feature. I also encountered a couple of problems when evaluating it (#2248). If you want to add a feature, fix a bug, or implement missing functionality, feel free to do so! Patches are welcome! |
You are right. On restore, all tasks are restored concurrently and they are synchronized between stages: Line 256 in 5de9040
File descriptors are restored in open_fd(): Line 1142 in 5de9040
You can find that a file restore callback can return 1 if the file can't be restored due to dependencies to other files. In such case, CRIU tries to restore other files and then do another round to restore files skipped on a previous iteration: Line 1233 in 5de9040
We use cross-process mutex-es and unix sockets to do required synchronizations between processes.
It is unclear what exactly you need here. You can describe with all details what problem you are working on, and we will help to handle it in CRIU. Or you can propose changes of the plugin interface and we will discuss them to find the right solution. |
During dump/checkpoint, the first IOCTL that the plugin does into the amdgpu drivers (PROCESS_INFO) will cause the amdgpu drivers to pause all the queues associated with this process. This is done so that this process is effectively paused/frozen during the checkpoint process. Once the checkpoint is complete (or if the checkpoint fails), we need to do the UNPAUSE ioctl to resume the queues. |
In the current design dumping of device state of a process is accomplished by following event sequence:
The operations to dump and unpause are opaque to CRIU framework. CRIU rightfully does not care as to what is being done in these operations. The current design becomes limiting if the event that dumps needs to handle more than one device { /dev/kfd and /dev/dri/renderD }. This is illustrated by the following event sequence:
How to encode the above event sequence to Checkpoint, in a generic manner, should be thought out. @note: I am not sure if CRIU can handle C&R of a process that has state across two or more devices. For example an application that captures data from a Camera (/dev/)and uses GPU (/dev/GPU) to process it. I do not think it is possible to pause/continue a process on all devices (CPU, GPU, CAMERA, et) in an atomic manner. |
A friendly reminder that this issue had no activity for 30 days. |
Description of plugin api for external files is insufficient. It does not describe adequately their behavior when checkpointing & restoring a process tree consisting of multiple processes.
For example, consider a plugin binding that has multiple files per process - F1, F2, F3, etc. It is not clear as to what call sequence is guaranteed.
It is not clear from existing api description if there is a definite sequence/order in which these will be invoked. Without this baseline it is difficult to determine the list of artifacts that can be obtained and passed between the different calls - dump_file(F1), dump_file(F2), dump_file(F3), etc.
@note: Going forward dump_file(F1) will be referenced as F1() - similarly for others
It is also not clear how checkpointing and restore works when a process tree has more than one process. Will the handling be serialized or concurrent. For example is it legal for checkpoint calls P1::F1() and P2::F1() to be concurrent.
@note: Ignore calls to gating api's unpause() and resume()
Documentation of amdgpu plugin makes a brief reference to the context of dumping and restoring - LINK. However this does not fully answer questions raised above. Is it legal for the dumper process to call P1::F1() and P2::F2() or any permutation of the six calls involved.
In the current scheme for amdgpu plugin the call to unpause() a checkpointed process occurs towards the end of the call to dump_file(), a plugin api. Not sure as to how is this supposed to work when the process being checkpointed has more than one file that should be dumped. Should not this be similar to how resume() works?
I think plugin api description should clarify these aspects. this will enable implementations to build and cache artifacts that could be used in subsequent calls.
The text was updated successfully, but these errors were encountered: