-
-
Notifications
You must be signed in to change notification settings - Fork 606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running signal handler that uses thread local variables in statically linked executables crashes #1278
Comments
Yes :-( When I first implemented signals in OSv, ten years ago (see b9ed15c), we considered them one of those things that were important in a general-purpose multi-process Unix system, but not important on a single-process VM. On one hand, it appears this assumption wasn't entirely correct, but on the other hand, it was "mostly" correct (most application do not, indeed, care about signals). I guess we could rethink the whole signal thing to stop using a new thread... Perhaps a workaround just for this issue can be to set the new thread's TLS pointer to some existing thread (random thread with the "application" flag??). So it won't run in that thread but be able to read from them (if it writes, it can cause a big mess, but this is probably true for normal signal handlers as well). |
This patch changes logic around executing user signal handler in kill() to make new signal handler thread use app TLS of an application thread if available. Normally, application threads share thread local storage area (TLS) with kernel or use one created by kernel. But when running statically linked executables or dynamically ones with Linux dynamic linker, the application threads create TLS on its own and user signal handler may reference thread local variables which do not exist in the TLS of the new signal handler thread. As a result the app would crash like so: ``` 9 0x000000004030bd54 in page_fault (ef=0x40007fe83088) at arch/x64/mmu.cc:42 10 <signal handler called> 11 __GI___pthread_cleanup_upto (target=target@entry=0x200000205080 <sigend_jmp_buf>, targetframe=targetframe@entry=0x40007fe93fc8 "\005\"D\177") at ./nptl/pthread_cleanup_upto.c:32 12 0x000020007f44232c in _longjmp_unwind (env=env@entry=0x200000205080 <sigend_jmp_buf>, val=val@entry=1) at ../sysdeps/nptl/jmp-unwind.c:27 13 0x000020007f442205 in __libc_siglongjmp (env=0x200000205080 <sigend_jmp_buf>, val=1) at ../setjmp/longjmp.c:30 14 0x0000200000202543 in sigend_handler (sig=2) at main.c:121 15 0x000000004037f0ee in sched::thread::main (this=0x40007fe7e040) at core/sched.cc:1415 16 sched::thread_main_c (t=0x40007fe7e040) at arch/x64/arch-switch.hh:377 17 0x000000004030bc52 in thread_main () at arch/x64/entry.S:161 ``` This patch applies a bit of trickery (potentially dangerous if user handler tries to write to TLS) and select an application TLS of a current thread or one of the other application threads and makes it available to the new signal handler thread. The signal handler thread in such case would switch from kernel TLS to app TLS before executing handler routine. This patch makes following tests pass when running with Linux dynamic linker: - tst-kill - tst-sigwait - tst-sigaction Refers cloudius-systems#1278 Signed-off-by: Waldemar Kozaczuk <[email protected]>
This can be reproduced by running
iperf3
and then doing Ctrl-C which causes this:Connecting to gdb yields this:
One can see that OSv executes the handler on the thread that does not have the same thread-local variables (see https://github.com/lattera/glibc/blob/895ef79e04a953cac1493863bcae29ad85657ee1/nptl/pt-cleanup.c#L24C1-L38)/
The text was updated successfully, but these errors were encountered: