You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
(I don't think this idea is worth doing anytime soon but I want to get my ideas written down.)
I think it's possible to build a library that encapsulates ptrace and hides most of the nastiness. This library would be based around an object representing a single ptraced task. This object would expose a rich "current state" that describes the last known state of the task, i.e. one of
Stopped in a signal stop (including signal number)
Stopped in a syscall-entry stop
Stopped in a syscall-exit stop
... etc ...
Stopped in a task-exit stop
Running freely
Running in a system call (possibly blocked)
Running to a task-exit stop (after we detect a task is dying due to unexpected SIGKILL)
Running from a task-exit stop to zombie
Reaped
State would only change when calling a cont() method to resume a stopped task or a wait() method to wait for a state change on a running task. We'd also have to have a mass-wait() API to wait for a state change in one of a whole group of running tasks.
The library would be responsible for hiding the complexity of tricky issues such as safe teardown of dying tasks (including ensuring correct order of waits when tearing down pid namespaces), tasks unexpectedly being kicked to PTRACE_EVENT_EXIT by SIGKILL (and how that can race with PTRACE_CONT), and (hopefully) the fiasco of task tid changes during execve(). It would encapsulate the WaitManager logic with its "SIGCHLD attention thread" for efficient multi-task waits.
Right now rr handles all these concerns with code that's spread through Task and its subclasses (particularly RecordTask). Putting this behind a strong API would simplify rr and be useful to other ptrace users. rr doesn't need it but for other users it would be cool if this was multithread-friendly.
Testing this library would be really hard so I think an aggressive and systematic approach would be needed. Ideally the entire kernel API surface used would be mocked so that unit tests can simulate a particular sequence of observed kernel states. Also you could simulate random valid kernel behaviors for testing. Currently I believe rr still intermittently hangs with pid namespace teardown in some cases; those bugs are super hard to diagnose with traditional testing and super duper hard to fix without regressing something else :-(.
The text was updated successfully, but these errors were encountered:
I know this might sound naive, but why can't you run fuzz-like/property-based like tests on a real kernel: simulating random sequences of real processes/system events on a isolated box in a vm which can be restarted as part of the test
why can't you run fuzz-like/property-based like tests on a real kernel: simulating random sequences of real processes/system events on a isolated box in a vm which can be restarted as part of the test
Because there are nasty race conditions that are very difficult to create reliably in a real kernel.
(I don't think this idea is worth doing anytime soon but I want to get my ideas written down.)
I think it's possible to build a library that encapsulates ptrace and hides most of the nastiness. This library would be based around an object representing a single ptraced task. This object would expose a rich "current state" that describes the last known state of the task, i.e. one of
... etc ...
State would only change when calling a
cont()
method to resume a stopped task or await()
method to wait for a state change on a running task. We'd also have to have a mass-wait()
API to wait for a state change in one of a whole group of running tasks.The library would be responsible for hiding the complexity of tricky issues such as safe teardown of dying tasks (including ensuring correct order of waits when tearing down pid namespaces), tasks unexpectedly being kicked to
PTRACE_EVENT_EXIT
bySIGKILL
(and how that can race withPTRACE_CONT
), and (hopefully) the fiasco of task tid changes duringexecve()
. It would encapsulate theWaitManager
logic with its "SIGCHLD
attention thread" for efficient multi-task waits.Right now rr handles all these concerns with code that's spread through
Task
and its subclasses (particularlyRecordTask
). Putting this behind a strong API would simplify rr and be useful to other ptrace users. rr doesn't need it but for other users it would be cool if this was multithread-friendly.Testing this library would be really hard so I think an aggressive and systematic approach would be needed. Ideally the entire kernel API surface used would be mocked so that unit tests can simulate a particular sequence of observed kernel states. Also you could simulate random valid kernel behaviors for testing. Currently I believe rr still intermittently hangs with pid namespace teardown in some cases; those bugs are super hard to diagnose with traditional testing and super duper hard to fix without regressing something else :-(.
The text was updated successfully, but these errors were encountered: