Assertion failure: `ip.to_data_ptr<void>() < AddressSpace::rr_page_start() || ip.to_data_ptr<void>() >= AddressSpace::rr_page_end()` #3880

Keno · 2024-11-14T10:46:18Z

See log tail below. I believe what's happening is that we're interrupting a futex operation which sets the deferred patching bit and then in the interrupting signal handler, we're attempting to do a (syscall-buffered) read which gets descheduled. Upon exiting that syscall, we then see the deferred patching bit set, but get confused because we're attempting to patch the wrong syscall.

I see two related issues:

I don't think we actually check whether there are any other tasks in the same address space that happen to be inside the patching region, which I think could cause the tracee to crash.
The signal handler sigreturns to a different ip.

I think the proper fix for this is probably to record the patching retry request in the syscall event to make sure we look at the right one. We can then simultaneously check that no other tasks are in the region and the last task to exit can do the patching.

[INFO log_pending_events()] SYSCALL: read
[INFO log_pending_events()] DESCHED
[INFO log_pending_events()] SIGNAL_HANDLER: SIGUSR2(async)
[INFO log_pending_events()] SYSCALL_INTERRUPTION: futex
[INFO log_pending_events()] (none)
[RecordSession] EXEC_SYSCALL_DONE: status=0x857f (SYSCALL)
[RecordSession]   original_syscallno:0 (read); return val:0x8
[record_syscall] 865053: processing: SYSCALL: read -- time: 75445
[Task]   (refreshing extra-register cache using XSAVE)
[RecordTask] Wrote event SYSCALL: read for time 75445
[RecordSession]   exiting desched critical section
[RecordSession] desched: IN_SYSCALL
[RecordTask] Wrote event SYSCALLBUF_ABORT_COMMIT for time 75446
[RecordSession] desched: DISARMING_DESCHED_EVENT
[Task] resuming execution of 865053 with PTRACE_SYSCALL tick_period -1 wait 2
[PerfCounters] Resetting counters with period 1152921504606846976
[Scheduler] Starting 865053
[Task] going into blocking wait for 865053 ...
[Task]   Task 865053 changed status to 0x857f (SYSCALL)
[Task]   (refreshing register cache)
[Scheduler] Stopping 865053
[Task] Requesting registers from tracee 865053
[Task] resuming execution of 865053 with PTRACE_SYSCALL tick_period -2 wait 2
[Task] Flushing registers for tid 865053 { ip:0x70000017 args:(0x64,0x2401,0,0,0,0) orig_syscall: 16 syscallno: -38 }
[Scheduler] Starting 865053
[Task] going into blocking wait for 865053 ...
[Task]   Task 865053 changed status to 0x857f (SYSCALL)
[Task]   (refreshing register cache)
[Scheduler] Stopping 865053
[Task] Requesting registers from tracee 865053
[RecordSession] Retrying deferred syscall patching
[FATAL src/Monkeypatcher.cc:1270:try_patch_syscall()] 
 (task 865053 (rec:865053) at time 75447)
 -> Assertion `ip.to_data_ptr<void>() < AddressSpace::rr_page_start() || ip.to_data_ptr<void>() >= AddressSpace::rr_page_end()' failed to hold.

The text was updated successfully, but these errors were encountered:

This fixes #3880 by moving the deferred syscall patching flag from Task to the corresponding syscall event. Both to avoid the assertion in that issue and for performance (since unpatched syscalls are traced), it is important that we (attempt to) patch the correct syscall ip. Additionally, we add the ability for tests to make sure that a particular syscall was actually patched. In particular, our logic for bailing out when another task is in the patch region was accidentally preventing all future attempts at patching the same region, even if doing so would have succeeded. With that fixed up, the new test will make sure this doesn't regress.

Keno linked a pull request Nov 15, 2024 that will close this issue

Properly track deferred syscall patching with the syscall event #3881

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Assertion failure: `ip.to_data_ptr<void>() < AddressSpace::rr_page_start() || ip.to_data_ptr<void>() >= AddressSpace::rr_page_end()` #3880

Assertion failure: `ip.to_data_ptr<void>() < AddressSpace::rr_page_start() || ip.to_data_ptr<void>() >= AddressSpace::rr_page_end()` #3880

Keno commented Nov 14, 2024

Assertion failure: ip.to_data_ptr<void>() < AddressSpace::rr_page_start() || ip.to_data_ptr<void>() >= AddressSpace::rr_page_end() #3880

Assertion failure: ip.to_data_ptr<void>() < AddressSpace::rr_page_start() || ip.to_data_ptr<void>() >= AddressSpace::rr_page_end() #3880

Comments

Keno commented Nov 14, 2024

Assertion failure: `ip.to_data_ptr<void>() < AddressSpace::rr_page_start() || ip.to_data_ptr<void>() >= AddressSpace::rr_page_end()` #3880

Assertion failure: `ip.to_data_ptr<void>() < AddressSpace::rr_page_start() || ip.to_data_ptr<void>() >= AddressSpace::rr_page_end()` #3880