Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rr --wait stuck not reaping zombie process #3882

Open
KJTsanaktsidis opened this issue Nov 16, 2024 · 3 comments · May be fixed by #3883
Open

rr --wait stuck not reaping zombie process #3882

KJTsanaktsidis opened this issue Nov 16, 2024 · 3 comments · May be fixed by #3883

Comments

@KJTsanaktsidis
Copy link
Contributor

I've got a very occasional issue where rr --wait is not properly reaping some child processes that have exited, and thus rr itself is stuck forever and not exiting.

The situation looks like this. There's an rr process, pid 1566, which has no traced children except for the defunct zombie process 1675.

[root@2afe6f2c30c3 /]# ps auxf
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root        1759  0.0  0.0   4984  3760 ?        S    06:27   0:00 -bash
root        1788  0.0  0.0   5796  3648 ?        R+   06:27   0:00  \_ ps auxf
minipc-+       1  0.0  0.0   2416  1240 ?        Ss   Nov13   0:00 /usr/bin/dumb-init -- ../build-ruby.rb --test-tool --rr --asan --pernosco
minipc-+       7  0.0  0.3 1889412 198044 ?      Ssl  Nov13   0:53 /usr/bin/ruby-mri ../build-ruby.rb --test-tool --rr --asan --pernosco
minipc-+    1566  0.0  0.1 933848 126316 ?       tl   Nov13   0:08  \_ rr record --output-trace-dir test_output_dir/tool__test__testunit__test_parallel/rr_trace --asan --wait --disable-avx-512 --disable-cpuid-features 0x80050440,0x40140400 --disable-cpuid-features-ext 0xc405814,0xe73fa021,0x3eff8ef --disable-cpuid-
minipc-+    1675  0.0  0.0      0     0 ?        Zl   Nov13   0:00 [ruby] <defunct>

Process 1675 has two threads. The thread-group-leader is in a zombie state, but the second thread is in a ptrace-stop:

[root@2afe6f2c30c3 /]# cat /proc/1675/status
Name:   ruby
State:  Z (zombie)
...
[root@2afe6f2c30c3 /]# cat /proc/1675/task/1681/status 
Name:   parallel.rb:40
Umask:  0022
State:  t (tracing stop)

I attached gdb to rr itself, to investigate why it got stuck into this state. rr is tracking both of these two threads still:

(gdb) frame 5
#5  0x00000000004dab3c in rr::RecordSession::record_step (this=0x24892fa0) at /root/rr/src/RecordSession.cc:2608
warning: 2608   /root/rr/src/RecordSession.cc: No such file or directory
(gdb) print this->task_map
$1 = std::map with 2 elements = {[1675] = 0x24918580, [1681] = 0x24936c50}

The thread-group-leader cannot be reaped because rr needs to reap the other threads first (

rr/src/RecordTask.cc

Lines 2146 to 2151 in f7067f1

for (auto it : thread_group()->task_set()) {
if (&*it != this) {
LOG(debug) << "Declining to reap " << tid << "; leader of non-empty thread-group with active thread " << it->tid;
return false;
}
}
)

(gdb) print ((rr::RecordTask*)(0x24918580))->waiting_for_reap
$7 = true
(gdb) print ((rr::RecordTask*)(0x24918580))->may_reap()
$8 = false

so the question we need to investigate is, why is the thread 1681 not yet reaped? We've seen it's PTRACE_EVENT_EXIT - that's the tracing-stop we're currently in. (394623 >> 16) & 0xFF == 6 which is PTRACE_EVENT_EXIT - https://elixir.bootlin.com/linux/v6.11.6/source/include/uapi/linux/ptrace.h#L161

(gdb) print ((rr::RecordTask*)(0x24936c50))->wait_status
$13 = {status = 394623}

If we look at some of its instance variables:

(gdb) print ((rr::RecordTask*)(0x24936c50))->seen_ptrace_exit_event_
$18 = true
(gdb) print ((rr::RecordTask*)(0x24936c50))->handled_ptrace_exit_event_
$19 = false
(gdb) print ((rr::RecordTask*)(0x24936c50))->was_reaped_
$20 = false
(gdb) print ((rr::RecordTask*)(0x24936c50))->in_unexpected_exit
$21 = false
(gdb) print (('rr::RecordTask'*)(0x24936c50))->emulated_stop_type
$3 = rr::NOT_STOPPED
(gdb) print (('rr::RecordTask'*)(0x24936c50))->is_stopped_
$4 = true
(gdb) print (('rr::RecordTask'*)(0x24936c50))->waiting_for_ptrace_exit
$5 = true

There are only two places that seen_ptrace_exit_event_ can be set:

  • rr/src/Task.cc

    Line 483 in f7067f1

    seen_ptrace_exit_event_ = true;
    - but if this line had been hit, then handled_ptrace_exit_event_ would also be true, which it's not
  • rr/src/Task.cc

    Line 2420 in f7067f1

    seen_ptrace_exit_event_ = true;
    - so it must have been set here (and hitting this line implies was_reaped_ is false too, which it is).

rr itself is blocked in this call to waitid(-1):

(gdb) bt
#0  0x00007fa2c585807d in __waitid (idtype=P_ALL, id=4294967295, infop=0x7ffd00c63eb0, options=1073741830) at ../sysdeps/unix/sysv/linux/waitid.c:29
#1  0x00000000005bafef in rr::WaitState::do_wait (this=this@entry=0x77c2c0 <_ZZN2rrL10wait_stateEvE12static_state.lto_priv.0>, tid=-1, consume=<optimized out>, type=type@entry=6, block_seconds=<optimized out>, status=...) at /root/rr/src/WaitManager.cc:58
#2  0x00000000005bb3b3 in rr::WaitState::wait (this=0x77c2c0 <_ZZN2rrL10wait_stateEvE12static_state.lto_priv.0>, options=..., type=6) at /root/rr/src/WaitManager.cc:114
#3  0x000000000055f597 in rr::wait_any (tid=@0x7ffd00c640b0: 1095914576, status=..., timeout=<optimized out>) at /root/rr/src/Scheduler.cc:569
#4  0x0000000000560665 in rr::Scheduler::reschedule (this=0x248931f0, switchable=<optimized out>) at /root/rr/src/Scheduler.cc:926
#5  0x00000000004dab3c in rr::RecordSession::record_step (this=0x24892fa0) at /root/rr/src/RecordSession.cc:2608
#6  0x00000000004ccd33 in rr::record (args=..., flags=...) at /root/rr/src/RecordCommand.cc:721
#7  rr::RecordCommand::run (this=<optimized out>, args=...) at /root/rr/src/RecordCommand.cc:884
#8  0x000000000041cf90 in main (argc=<optimized out>, argv=<optimized out>) at /root/rr/src/main.cc:278

This can't make any forward progress, because what needs to happen (I think) is that we need to continue thread 1681 so it leaves the ptrace-stop and makes it to a zombie state, which will generate another wait notification with the status and allow everything to be cleaned up.

I'm not entirely sure how we got into this state, but i suspect the fact that waiting_for_ptrace_exit == true is part of the story - that gets set when a core-dumping signal is sent to the process (

((RecordTask *)ot)->waiting_for_ptrace_exit = true;
).

I'll try and debug this a bit further this weekend but I wondered if this triggered any spidy-senses about what to look for here 🤔

@KJTsanaktsidis
Copy link
Contributor Author

This is the ptrace-stop that the thread is blocked in, but it's probably not very surprising

[<0>] ptrace_stop+0x127/0x330
[<0>] ptrace_do_notify+0x93/0xc0
[<0>] ptrace_notify+0x54/0x80
[<0>] do_exit+0x89f/0xab0
[<0>] do_group_exit+0x30/0x80
[<0>] get_signal+0x92c/0x950
[<0>] arch_do_signal_or_restart+0x3b/0x240
[<0>] syscall_exit_to_user_mode+0x1ba/0x220
[<0>] do_syscall_64+0x8e/0x160
[<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e

@KJTsanaktsidis
Copy link
Contributor Author

OK - I think I kind of see what's going on. I managed to make the problem happen again with --log all:debug turned on, and collected this (task 119056 is the thread-group-leader, and 119060 is the thread which gets stuck in the PTRACE_EVENT_EXIT ptrace-stop).

Line 185782:
----------------------------
[RecordSession] trace time 8274: Active task is 119055. Events:
[INFO log_pending_events()] SIGNAL: SIGSEGV(async)
[INFO log_pending_events()] SIGNAL_HANDLER: SIGSEGV(async)
[INFO log_pending_events()] SYSCALL_INTERRUPTION: futex
[INFO log_pending_events()] (none)
[Task]   (refreshing extra-register cache using XSAVE)
[RecordTask] Wrote event SIGNAL: SIGSEGV(async) for time 8274
[RecordSession]   119055: no user handler for SIGSEGV
[Scheduler] Scheduling next task (PREVENT_SWITCH)
[Scheduler]   (119055 is un-switchable at SIGNAL_DELIVERY: SIGSEGV(async))
[RecordSession] trace time 8275: Active task is 119055. Events:
[INFO log_pending_events()] SIGNAL_DELIVERY: SIGSEGV(async)
[INFO log_pending_events()] SIGNAL_HANDLER: SIGSEGV(async)
[INFO log_pending_events()] SYSCALL_INTERRUPTION: futex
[INFO log_pending_events()] (none)
[RecordTask] Wrote event SIGNAL_DELIVERY: SIGSEGV(async) for time 8275
[RecordSession]     in signal-stop for SIGSEGV
[RecordSession]     injecting signal SIGSEGV
[RecordTask] Refreshed sigmask, now 0
[Task] resuming execution of 119055 with PTRACE_CONT, signal SIGSEGV tick_period -2 wait 0
[Scheduler] Starting 119055
[Scheduler] Scheduling next task (ALLOW_SWITCH)
[Scheduler] Task event is SIGNAL_DELIVERY: SIGSEGV(async)
[Scheduler]   119055 is blocked on SIGNAL_DELIVERY: SIGSEGV(async); checking status ...
[Scheduler]   still blocked
[Scheduler]   need to reschedule

Line 187452:
-----------------------------
[Scheduler] Starting 119056
[Scheduler] Scheduling next task (PREVENT_SWITCH)
[Scheduler]   (119056 is un-switchable at SIGNAL_HANDLER: SIGPIPE(async))
[Scheduler]   and running; waiting for state change
[Scheduler] Using unlimited ticks mode
[Task]   Task 119060 changed status to 0x6057f (PTRACE_EVENT_EXIT)
[Task]   (refreshing register cache)
[Scheduler] Stopping 119060
[Task] Requesting registers from tracee 119060
[Scheduler]   new status is 0x57f (STOP-SIGTRAP)
[Scheduler]   But that's not our current task...
[Task] going into blocking wait for 119056 ...

Line 188702:
-------------------------------
[Scheduler] Task event is SIGNAL_DELIVERY: SIGSEGV(async)
[Scheduler]   119055 is blocked on SIGNAL_DELIVERY: SIGSEGV(async); checking status ...
[Scheduler] wait on 119055 returns 0x6057f (PTRACE_EVENT_EXIT)
[Task]   Task 119055 changed status to 0x6057f (PTRACE_EVENT_EXIT)
[Task]   (refreshing register cache)
[Scheduler] Stopping 119055
[Task] Requesting registers from tracee 119055
[Scheduler]   ready with status 0x6057f (PTRACE_EVENT_EXIT)
[Scheduler]   selecting task 119055
[RecordSession] trace time 8376: Active task is 119055. Events:
[INFO log_pending_events()] SIGNAL_DELIVERY: SIGSEGV(async)
[INFO log_pending_events()] SIGNAL_HANDLER: SIGSEGV(async)
[INFO log_pending_events()] SYSCALL_INTERRUPTION: futex
[INFO log_pending_events()] (none)
[RecordTask] Wrote event EXIT for time 8376
[Task] Advancing tid 119055 to exit; wait=0
[Scheduler] Scheduling next task (ALLOW_SWITCH)
[RecordTask] Declining to reap 119055; leader of non-empty thread-group with active thread 119060
[Scheduler]   119055 is waiting to be reaped, but can't be reaped yet
[Scheduler]   need to reschedule

Line 188719:
--------------------------------
[RecordTask] Declining to reap 119055; leader of non-empty thread-group with active thread 119060
[Scheduler]   119055 is waiting to be reaped, but can't be reaped yet
[Scheduler]   need to reschedule
[Scheduler] Task event is SYSCALL: poll
[Scheduler]   119060 is waiting to exit; checking status ...
[Scheduler]   still blocked

I think I can conclude from this:

  • Process 119055 has two threads, tids 119055 and 119060
  • The 119055 process receives a process-directed SIGSEGV asynchronously, by another process calling tgkill(2) on it (I confirmed the test-case I was running here actually does do that! [1]). We inject that signal and PTRACE_CONTINUE it. This is "Line 185782" above.
  • This kind of takes a while, because I've got /proc/sys/kernel/core_pattern set to systemd-coredump so the kernel has to invoke all that machinery
  • In the meanwhile, we pick something else to run. Process 119056 is runnable. It's actually the only process/thread not blocked, so we run it in running in unlimited-ticks mode
  • While that's happening, rr is blocked in the the waitid(P_ALL) syscall here -
    WaitResultCode wait_result = wait_any(tid, status, -1);
    .
  • The kernel delivers the PTRACE_EVENT_EXIT notification to the thread 119060, before notifying the thread-group-leader 119055. This is "Line 187452" above.
  • So waited->did_waitpid(status) runs on it
    if (!waited->did_waitpid(status)) {
    and sets seen_ptrace_exit_event_. At this point, the thread 119060 is in a ptrace-stop after the exit event.
  • The kernel then sends PTRACE_EVENT_EXIT to the thread-group-leader 119055. We schedule that thread.
  • t->proceed_to_exit(may_wait_exit);
    runs and PTRACE_CONTs the thread-group-leader out of the exit stop and into the zombie state. This is "Line 188702" above.
  • We don't reap though, because we still have a non-zombie thread in the thread group - 119060 is still ptrace-stopped and nothing ever continued it to the zombie state! So we see the message from "Line 188719" continuously thereafter, (until there are no more other processes left to run in the trace, after which rr just gets perma-blocked in waitid(P_ALL)).

[1] the program being traced here actually has a handler for SIGSEGV, but the handler calls signal(SIGSEGV, SIG_DFL); raise(SIGSEGV); at the end, so it's still an asynchronous SIGSEGV signal.

KJTsanaktsidis added a commit to KJTsanaktsidis/rr that referenced this issue Nov 16, 2024
If we see the PTRACE_EVENT_EXIT for a task while running a different
task in unlimited-ticks mode in `Scheduler::reschedule`, it looks like
nothing ever actually calls `handle_ptrace_exit_event` on it, and so
nothing ever PTRACE_CONT's the task out of the exit-stop and into the
zombie state.

This seems to manifest itself as rr not reaping processes properly when
they receive asynchronous core-dumping signals (e.g. SIGSEGV sent by
`raise` or `kill`).

Fix this issue by checking if there's a pending PTRACE_EVENT_EXIT to
deal with on the task in `Scheduler::is_task_runnable`, and allowing the
task to be executed if so.

Fixes rr-debugger#3882
@KJTsanaktsidis
Copy link
Contributor Author

KJTsanaktsidis commented Nov 16, 2024

☝️ that PR seems to fix my problem, but I'll leave it running on my laptop overnight tonight to make sure :)

EDIT: Confirmed, this does seem to fix my problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant