You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
See log tail below. I believe what's happening is that we're interrupting a futex operation which sets the deferred patching bit and then in the interrupting signal handler, we're attempting to do a (syscall-buffered) read which gets descheduled. Upon exiting that syscall, we then see the deferred patching bit set, but get confused because we're attempting to patch the wrong syscall.
I see two related issues:
I don't think we actually check whether there are any other tasks in the same address space that happen to be inside the patching region, which I think could cause the tracee to crash.
The signal handler sigreturns to a different ip.
I think the proper fix for this is probably to record the patching retry request in the syscall event to make sure we look at the right one. We can then simultaneously check that no other tasks are in the region and the last task to exit can do the patching.
[INFO log_pending_events()] SYSCALL: read
[INFO log_pending_events()] DESCHED
[INFO log_pending_events()] SIGNAL_HANDLER: SIGUSR2(async)
[INFO log_pending_events()] SYSCALL_INTERRUPTION: futex
[INFO log_pending_events()] (none)
[RecordSession] EXEC_SYSCALL_DONE: status=0x857f (SYSCALL)
[RecordSession] original_syscallno:0 (read); return val:0x8
[record_syscall] 865053: processing: SYSCALL: read -- time: 75445
[Task] (refreshing extra-register cache using XSAVE)
[RecordTask] Wrote event SYSCALL: read for time 75445
[RecordSession] exiting desched critical section
[RecordSession] desched: IN_SYSCALL
[RecordTask] Wrote event SYSCALLBUF_ABORT_COMMIT for time 75446
[RecordSession] desched: DISARMING_DESCHED_EVENT
[Task] resuming execution of 865053 with PTRACE_SYSCALL tick_period -1 wait 2
[PerfCounters] Resetting counters with period 1152921504606846976
[Scheduler] Starting 865053
[Task] going into blocking wait for 865053 ...
[Task] Task 865053 changed status to 0x857f (SYSCALL)
[Task] (refreshing register cache)
[Scheduler] Stopping 865053
[Task] Requesting registers from tracee 865053
[Task] resuming execution of 865053 with PTRACE_SYSCALL tick_period -2 wait 2
[Task] Flushing registers for tid 865053 { ip:0x70000017 args:(0x64,0x2401,0,0,0,0) orig_syscall: 16 syscallno: -38 }
[Scheduler] Starting 865053
[Task] going into blocking wait for 865053 ...
[Task] Task 865053 changed status to 0x857f (SYSCALL)
[Task] (refreshing register cache)
[Scheduler] Stopping 865053
[Task] Requesting registers from tracee 865053
[RecordSession] Retrying deferred syscall patching
[FATAL src/Monkeypatcher.cc:1270:try_patch_syscall()]
(task 865053 (rec:865053) at time 75447)
-> Assertion `ip.to_data_ptr<void>() < AddressSpace::rr_page_start() || ip.to_data_ptr<void>() >= AddressSpace::rr_page_end()' failed to hold.
The text was updated successfully, but these errors were encountered:
This fixes#3880 by moving the deferred syscall patching flag from
Task to the corresponding syscall event. Both to avoid the assertion
in that issue and for performance (since unpatched syscalls are traced),
it is important that we (attempt to) patch the correct syscall ip.
Additionally, we add the ability for tests to make sure that a particular
syscall was actually patched. In particular, our logic for bailing out
when another task is in the patch region was accidentally preventing
all future attempts at patching the same region, even if doing so
would have succeeded. With that fixed up, the new test will make sure
this doesn't regress.
Keno
linked a pull request
Nov 15, 2024
that will
close
this issue
See log tail below. I believe what's happening is that we're interrupting a futex operation which sets the deferred patching bit and then in the interrupting signal handler, we're attempting to do a (syscall-buffered) read which gets descheduled. Upon exiting that syscall, we then see the deferred patching bit set, but get confused because we're attempting to patch the wrong syscall.
I see two related issues:
I think the proper fix for this is probably to record the patching retry request in the syscall event to make sure we look at the right one. We can then simultaneously check that no other tasks are in the region and the last task to exit can do the patching.
The text was updated successfully, but these errors were encountered: