Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fatal Python error: _enter_buffered_busy: could not acquire lock for <_io.BufferedWriter name='<stderr>'> at interpreter shutdown, possibly due to daemon threads #129536

Open
ShaneHarvey opened this issue Feb 1, 2025 · 6 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) pending The issue will be closed if no feedback is provided type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@ShaneHarvey
Copy link
Contributor

ShaneHarvey commented Feb 1, 2025

Crash report

What happened?

Python crashes at interpreter shutdown when running this script which starts a misconfigured SSL server:

import ssl
import socket
import sys
import threading
import time

SERVER_ADDR = ("127.0.0.1", 37017)
CA_FILE = "test/certificates/ca.pem"
SERVER_CERT = "test/certificates/server.pem"
CLIENT_CERT = "test/certificates/client.pem"


def run_server():
    # Intentionally omit cafile/load_cert_chain causes CPython to crash
    context = ssl.create_default_context(ssl.Purpose.CLIENT_AUTH)#, cafile=CA_FILE)
    # context.load_cert_chain(SERVER_CERT)
    context.check_hostname = False
    context.verify_mode = ssl.CERT_NONE
    server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    server.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
    server = context.wrap_socket(server, server_side=True)
    server.bind(SERVER_ADDR)
    server.listen(0)

    while True:
        connection, client_address = server.accept()
        t = threading.Thread(target=handle_server_connection, args=(connection, client_address), daemon=True)
        t.start()


def handle_server_connection(connection, client_address):
    client_address = f"{client_address[0]}:{client_address[1]}"
    print(f"server opened connection from {client_address}")
    while True:
        data = connection.recv(1024)
        if not data:
            print(f"server closed connection from {client_address}")
            return
        print(f"server got data from {client_address}: {data}")
        if data == b"CLOSE":
            print(f"server closing {client_address}")
            connection.close()
            return
        # Echo back
        connection.sendall(data)


def get_client():
    # Intentionally omit cafile/load_cert_chain causes CPython to crash
    context = ssl.create_default_context() #cafile=CA_FILE)
    # context.load_cert_chain(CLIENT_CERT)
    context.check_hostname = False
    context.verify_mode = ssl.CERT_NONE
    print(f"client connecting")
    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
    sock.connect(SERVER_ADDR)
    sock = context.wrap_socket(sock)
    return sock


def main():
    print(f"{sys.version=}\n{ssl.OPENSSL_VERSION=}")
    server = threading.Thread(target=run_server, daemon=True)
    server.start()
    time.sleep(1)
    client1 = get_client()


if __name__ == "__main__":
    main()
$ python repro-ssl-crash-bug.py
sys.version='3.13.0 (v3.13.0:60403a5409f, Oct  7 2024, 00:37:40) [Clang 15.0.0 (clang-1500.3.9.4)]'
ssl.OPENSSL_VERSION='OpenSSL 3.0.15 3 Sep 2024'
client connecting
Exception in thread Thread-1 (run_server):
Traceback (most recent call last):
Traceback (most recent call last):
  File "/Users/shane/git/mongo-python-driver/repro-pypy-ssl-bug.py", line 71, in <module>
    main()
    ~~~~^^
  File "/Users/shane/git/mongo-python-driver/repro-pypy-ssl-bug.py", line 67, in main
    client1 = get_client()
  File "/Users/shane/git/mongo-python-driver/repro-pypy-ssl-bug.py", line 58, in get_client
    sock = context.wrap_socket(sock)
  File "/Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/ssl.py", line 455, in wrap_socket
    return self.sslsocket_class._create(
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        sock=sock,
        ^^^^^^^^^^
    ...<5 lines>...
        session=session
        ^^^^^^^^^^^^^^^
    )
    ^
  File "/Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/threading.py", line 1041, in _bootstrap_inner
    self.run()
    ~~~~~~~~^^
  File "/Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/threading.py", line 992, in run
    self._target(*self._args, **self._kwargs)
    ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/ssl.py", line 1076, in _create
    self.do_handshake()
    ~~~~~~~~~~~~~~~~~^^
  File "/Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/ssl.py", line 1372, in do_handshake
    self._sslobj.do_handshake()
    ~~~~~~~~~~~~~~~~~~~~~~~~~^^
  File "/Users/shane/git/mongo-python-driver/repro-pypy-ssl-bug.py", line 26, in run_server
    connection, client_address = server.accept()
                                 ~~~~~~~~~~~~~^^
  File "/Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/ssl.py", line 1418, in accept
    newsock = self.context.wrap_socket(newsock,
                do_handshake_on_connect=self.do_handshake_on_connect,
                suppress_ragged_eofs=self.suppress_ragged_eofs,
                server_side=True)
  File "/Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/ssl.py", line 455, in wrap_socket
    return self.sslsocket_class._create(
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        sock=sock,
        ^^^^^^^^^^
    ...<5 lines>...
        session=session
        ^^^^^^^^^^^^^^^
    )
    ^
  File "/Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/ssl.py", line 1076, in _create
    self.do_handshake()
    ~~~~~~~~~~~~~~~~~^^
  File "/Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/ssl.py", line 1372, in do_handshake
    self._sslobj.do_handshake()
    ~~~~~~~~~~~~~~~~~~~~~~~~~^^
ssl.SSLError: [SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure (_ssl.c:1020)
Fatal Python error: _enter_buffered_busy: could not acquire lock for <_io.BufferedWriter name='<stderr>'> at interpreter shutdown, possibly due to daemon threads
Python runtime state: finalizing (tstate=0x00000001021ec560)

Current thread 0x0000000205de8f80 (most recent call first):
  <no Python frame>
[1]    45370 abort      python repro-pypy-ssl-bug.py

Here's some of the apple crash report:

Translated Report (Full Report Below)
-------------------------------------

Process:               Python [45370]
Path:                  /Library/Frameworks/Python.framework/Versions/3.13/Resources/Python.app/Contents/MacOS/Python
Identifier:            org.python.python
Version:               3.13.0 (3.13.0)
Code Type:             ARM-64 (Native)
Parent Process:        zsh [45114]
Responsible:           pycharm [65042]
User ID:               502

Date/Time:             2025-01-31 16:19:54.0206 -0800
OS Version:            macOS 14.7.2 (23H311)
Report Version:        12

System Integrity Protection: enabled

Crashed Thread:        0  Dispatch queue: com.apple.main-thread

Exception Type:        EXC_CRASH (SIGABRT)
Exception Codes:       0x0000000000000000, 0x0000000000000000

Termination Reason:    Namespace SIGNAL, Code 6 Abort trap: 6
Terminating Process:   Python [45370]

Application Specific Information:
abort() called


Thread 0 Crashed::  Dispatch queue: com.apple.main-thread
0   libsystem_kernel.dylib        	       0x19df595d0 __pthread_kill + 8
1   libsystem_pthread.dylib       	       0x19df91c20 pthread_kill + 288
2   libsystem_c.dylib             	       0x19de9ea30 abort + 180
3   Python                        	       0x101f6e710 _Py_FatalErrorFormat + 40
4   Python                        	       0x101fd9418 _enter_buffered_busy + 288
5   Python                        	       0x101fdbd3c _io__Buffered_flush + 600
6   Python                        	       0x101d39f04 method_vectorcall_NOARGS + 120
7   Python                        	       0x101d298e4 PyObject_VectorcallMethod + 152
8   Python                        	       0x101fe1784 _io_TextIOWrapper_flush + 140
9   Python                        	       0x101d39f04 method_vectorcall_NOARGS + 120
10  Python                        	       0x101d298e4 PyObject_VectorcallMethod + 152
11  Python                        	       0x101f69e10 flush_std_files + 448
12  Python                        	       0x101f69724 fatal_error + 396
13  Python                        	       0x101f6e7cc _Py_FatalErrorFormat + 228
14  Python                        	       0x101fd9418 _enter_buffered_busy + 288
15  Python                        	       0x101fdb958 _io_BufferedWriter_write + 1240
16  Python                        	       0x101d3a14c method_vectorcall_O + 116
17  Python                        	       0x101d298e4 PyObject_VectorcallMethod + 152
18  Python                        	       0x101fe2a2c _textiowrapper_writeflush + 656
19  Python                        	       0x101fe1760 _io_TextIOWrapper_flush + 104
20  Python                        	       0x101d39f04 method_vectorcall_NOARGS + 120
21  Python                        	       0x101d298e4 PyObject_VectorcallMethod + 152
22  Python                        	       0x101f69e10 flush_std_files + 448
23  Python                        	       0x101f6a26c _Py_Finalize + 320
24  Python                        	       0x101fa0460 Py_RunMain + 620
25  Python                        	       0x101fa1dcc pymain_main + 500
26  Python                        	       0x101fa1f34 Py_BytesMain + 40
27  dyld                          	       0x19dc07154 start + 2476


Thread 0 crashed with ARM Thread State (64-bit):
    x0: 0x0000000000000000   x1: 0x0000000000000000   x2: 0x0000000000000000   x3: 0x0000000000000000
    x4: 0xfffffffffffb7d30   x5: 0x0000000000000020   x6: 0x000000000000003e   x7: 0x000000003b9ac618
    x8: 0x88027b3a413da6b8   x9: 0x88027b3844e32938  x10: 0x00000001022156f8  x11: 0x0000000000000000
   x12: 0x0000000000000000  x13: 0x0000000000000001  x14: 0x00000001021b0078  x15: 0x00000001021b0068
   x16: 0x0000000000000148  x17: 0x00000002104e6e40  x18: 0x0000000000000000  x19: 0x0000000000000006
   x20: 0x0000000205de8f80  x21: 0x0000000000000103  x22: 0x0000000205de9060  x23: 0x8000000000000001
   x24: 0x7fffffffffffffde  x25: 0x0000000100b9ec88  x26: 0x0000000000000000  x27: 0x0000000000000000
   x28: 0x0000000000000000   fp: 0x000000016f28a6e0   lr: 0x000000019df91c20
    sp: 0x000000016f28a6c0   pc: 0x000000019df595d0 cpsr: 0x40001000
   far: 0x0000000000000000  esr: 0x56000080  Address size fault

I also see the same crash on Python 3.9. Is this expected behavior?

CPython versions tested on:

3.13

Operating systems tested on:

macOS

Output from running 'python -VV' on the command line:

Python 3.13.0 (v3.13.0:60403a5409f, Oct 7 2024, 00:37:40) [Clang 15.0.0 (clang-1500.3.9.4)]

@ShaneHarvey ShaneHarvey added the type-crash A hard crash of the interpreter, possibly with a core dump label Feb 1, 2025
@picnixz picnixz added extension-modules C modules in the Modules dir topic-SSL interpreter-core (Objects, Python, Grammar, and Parser dirs) and removed extension-modules C modules in the Modules dir topic-SSL labels Feb 1, 2025
@ZeroIntensity
Copy link
Member

There's not much we can do, the error tells you exactly what's wrong: daemon threads cause a deadlock if they hold the stderr lock (and thus cannot write to it), so the interpreter bails out with a crash. The solution would be to just avoid using daemon threads.

@ZeroIntensity ZeroIntensity added the pending The issue will be closed if no feedback is provided label Feb 1, 2025
@picnixz
Copy link
Member

picnixz commented Feb 1, 2025

We could perhaps make SSL reporting error more friendly for those cases? or is this something that can't be done on our side? or is this something we don't do for other modules?

Or is this something that has nothing to do with SSL in general and that's the only reproducer we have? I mean, it's pretty common to use daemon threads as dirty hacks instead of two separate applications (in this case, I would recommend two different scripts, one for the server, one for the client, or a script that runs both scripts using separate processes)

@ZeroIntensity
Copy link
Member

As far as I can tell, it's unrelated to SSL. A daemon thread, which will probably be hung at this point of finalization, holds the lock to stderr. Python can't do anything at that point, so it just bails out with a fatal error.

@picnixz
Copy link
Member

picnixz commented Feb 1, 2025

AFAICT, the server is trying to perform handshake but because of an exception, the program exits and the daemon thread then tries to finalize something (maybe it tries to report the error on stderr, but since it's daemon and the program is about to exit, it cannot do it properly).

So maybe we should check that we don't report SSL failures if we are finalizing as otherwise we'll need to acquire a lock on stderr (remember that the SSL path for creating exceptions is slow and that may also be the reason why the thread dies before we can create and report that exception).

Now, I don't think we need to dig more as a simple workaround would be to use two different processes (one for the client and one for the server), which is what should be done in this situation IMO.

@cmaloney
Copy link
Contributor

cmaloney commented Feb 1, 2025

I think some of these cases can be made to not need a lock (esp. on shutdown), stderr is line buffered and usually w/ print or logging, lines are being written at a time, at which point kernel ordering the write() calls is from my perspective is as good as Python ordering them using the locks (and in some ways preferable). That's a lot of reworking in how BufferedIO works / I don't think in scope for this, which has engineering workarounds (Also potentially some print() reworking).

@ZeroIntensity
Copy link
Member

So maybe we should check that we don't report SSL failures if we are finalizing as otherwise we'll need to acquire a lock on stderr (remember that the SSL path for creating exceptions is slow and that may also be the reason why the thread dies before we can create and report that exception).

That seems like a reasonable temporary fix.

IMO, avoiding locks isn't a great permanent solution. Really, we need a better way to shut down daemon threads, rather than just hanging the re-acquisition of thread states. The issue here isn't specific to IO, but to any lock. For example:

Py_BEGIN_ALLOW_THREADS;
// tstate is detached
PyMutex_Lock(&whatever_lock); // or PyThread_acquire_lock
Py_END_ALLOW_THREADS; // Daemon thread gets hung with the lock held!

If the main thread tries to acquire that lock, deadlock ensues. Something akin to how stop-the-world works on free-threading would be better. Probably something like routine PyThreadState_MustExit checks by the eval loop rather than just disappearing at the nearest _PyThreadState_Attach would be significantly more robust.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) pending The issue will be closed if no feedback is provided type-crash A hard crash of the interpreter, possibly with a core dump
Projects
None yet
Development

No branches or pull requests

4 participants