[Bug] maximum recursion depth exceeded #3518

kebe7jun · 2025-02-12T08:49:44Z

Checklist

1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.
3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
5. Please use English, otherwise it will be closed.

Describe the bug

Maximum recursion depth triggered on exception exit.

    self._send_signal(sig)
  File "/usr/local/lib/python3.10/dist-packages/psutil/__init__.py", line 1266, in _send_signal
    os.kill(self.pid, sig)
  File "/sgl-workspace/sglang/python/sglang/srt/entrypoints/engine.py", line 333, in sigquit_handler
    kill_process_tree(os.getpid())
  File "/sgl-workspace/sglang/python/sglang/srt/utils.py", line 492, in kill_process_tree
    children = itself.children(recursive=True)
  File "/usr/local/lib/python3.10/dist-packages/psutil/__init__.py", line 971, in children
    self._raise_if_pid_reused()
  File "/usr/local/lib/python3.10/dist-packages/psutil/__init__.py", line 461, in _raise_if_pid_reused
    if self._pid_reused or (not self.is_running() and self._pid_reused):
  File "/usr/local/lib/python3.10/dist-packages/psutil/__init__.py", line 636, in is_running
    self._pid_reused = self != Process(self.pid)
  File "/usr/local/lib/python3.10/dist-packages/psutil/__init__.py", line 319, in __init__
    self._init(pid)
  File "/usr/local/lib/python3.10/dist-packages/psutil/__init__.py", line 355, in _init
    self._ident = self._get_ident()
  File "/usr/local/lib/python3.10/dist-packages/psutil/__init__.py", line 396, in _get_ident
    return (self.pid, self.create_time())
  File "/usr/local/lib/python3.10/dist-packages/psutil/__init__.py", line 778, in create_time
    self._create_time = self._proc.create_time()
  File "/usr/local/lib/python3.10/dist-packages/psutil/_pslinux.py", line 1716, in wrapper
    return fun(self, *args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/psutil/_pslinux.py", line 1957, in create_time
    ctime = float(self._parse_stat_file()['create_time'])
  File "/usr/local/lib/python3.10/dist-packages/psutil/_pslinux.py", line 1716, in wrapper
    return fun(self, *args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/psutil/_common.py", line 508, in wrapper
    raise raise_from(err, None)
  File "<string>", line 3, in raise_from
  File "/usr/local/lib/python3.10/dist-packages/psutil/_common.py", line 506, in wrapper
    return fun(self)
  File "/usr/local/lib/python3.10/dist-packages/psutil/_pslinux.py", line 1784, in _parse_stat_file
    data = bcat("%s/%s/stat" % (self._procfs_path, self.pid))
  File "/usr/local/lib/python3.10/dist-packages/psutil/_common.py", line 851, in bcat
    return cat(fname, fallback=fallback, _open=open_binary)
  File "/usr/local/lib/python3.10/dist-packages/psutil/_common.py", line 839, in cat
    with _open(fname) as f:
  File "/usr/local/lib/python3.10/dist-packages/psutil/_common.py", line 799, in open_binary
    return open(fname, "rb", buffering=FILE_READ_BUFFER_SIZE)
RecursionError: maximum recursion depth exceeded while calling a Python object

Reproduction

N/A

Environment

root@g1805:/sgl-workspace# python3 -m sglang.check_env
INFO 02-12 08:49:13 init.py:190] Automatically detected platform cuda.
Python: 3.10.12 (main, Jan 17 2025, 14:35:34) [GCC 11.4.0]
CUDA available: True
GPU 0,1,2,3,4,5,6,7: NVIDIA GeForce RTX 4090
GPU 0,1,2,3,4,5,6,7 Compute Capability: 8.9
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 12.4, V12.4.131
CUDA Driver Version: 550.78
PyTorch: 2.5.1+cu124
sgl_kernel: 0.0.3.post3
flashinfer: 0.2.0.post2+cu124torch2.5
triton: 3.1.0
transformers: 4.48.3
torchao: 0.8.0
numpy: 1.26.4
aiohttp: 3.11.12
fastapi: 0.115.8
hf_transfer: 0.1.9
huggingface_hub: 0.28.1
interegular: 0.3.3
modelscope: 1.22.3
orjson: 3.10.15
packaging: 24.2
psutil: 6.1.1
pydantic: 2.10.6
multipart: 0.0.20
zmq: 26.2.1
uvicorn: 0.34.0
uvloop: 0.21.0
vllm: 0.7.2
openai: 1.61.1
tiktoken: 0.8.0
anthropic: 0.45.2
decord: 0.6.0
NVIDIA Topology:
GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X PHB SYS SYS SYS SYS SYS SYS 0-27,56-83 0 N/A
GPU1 PHB X SYS SYS SYS SYS SYS SYS 0-27,56-83 0 N/A
GPU2 SYS SYS X PHB SYS SYS SYS SYS 0-27,56-83 0 N/A
GPU3 SYS SYS PHB X SYS SYS SYS SYS 0-27,56-83 0 N/A
GPU4 SYS SYS SYS SYS X PHB SYS SYS 28-55,84-111 1 N/A
GPU5 SYS SYS SYS SYS PHB X SYS SYS 28-55,84-111 1 N/A
GPU6 SYS SYS SYS SYS SYS SYS X PHB 28-55,84-111 1 N/A
GPU7 SYS SYS SYS SYS SYS SYS PHB X 28-55,84-111 1 N/A

Legend:

X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks

ulimit soft: 65535

The text was updated successfully, but these errors were encountered:

sk2011-ship-it · 2025-02-12T13:06:12Z

this is the issue in latest docker build

jhinpan · 2025-02-12T18:17:20Z

this is the issue in latest docker build

This issue is duplicate as #3525

jhinpan · 2025-02-12T18:20:38Z

Thank you for pointing that out @kebe7jun . We will quickly review your PR and let you know. cc @zhaochenyang20

zhaochenyang20 · 2025-02-13T06:17:35Z

Do you have a PR to fix? @kebe7jun @sk2011-ship-it @jhinpan

jhinpan · 2025-02-13T19:22:19Z

@zhaochenyang20 I believe @kebe7jun 's PR to fix this issue is here #3519, waiting for check.

zhaochenyang20 · 2025-02-14T00:18:50Z

@jhinpan I will take a look. THnaks!

robscc · 2025-02-17T01:24:33Z

I have installed datasets and the issue still exists, seems not the dependency problem

robscc · 2025-02-17T01:28:00Z

this is the issue in latest docker build

how to enable the subprocess logging or watch the subprocess log? any tips would help are welcome

zwdgit · 2025-02-17T04:22:13Z

I encountered the same problem. @zhaochenyang20

zhaochenyang20 · 2025-02-17T17:28:26Z

#3519

I will merge this PR today. @zwdgit @jhinpan @kebe7jun Ping me if not finished today.

zwdgit · 2025-02-19T05:58:56Z

#3519

I will merge this PR today. @zwdgit @jhinpan @kebe7jun Ping me if not finished today.

@zhaochenyang20 The merger failed

zhaochenyang20 · 2025-02-19T08:00:47Z

@zwdgit Too many to merge. Please remind me. Thanks!

issaccv · 2025-02-25T07:09:12Z

Is there any new progress? @zhaochenyang20

zhaochenyang20 · 2025-02-25T08:49:33Z

@issaccv I am trying to pass the ci and merge it

kebe7jun mentioned this issue Feb 12, 2025

Fix maximum recursion depth triggered on exception exit #3519

Merged

5 tasks

jhinpan self-assigned this Feb 12, 2025

zhaochenyang20 closed this as completed in #3519 Feb 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] maximum recursion depth exceeded #3518

[Bug] maximum recursion depth exceeded #3518

kebe7jun commented Feb 12, 2025

sk2011-ship-it commented Feb 12, 2025

jhinpan commented Feb 12, 2025

jhinpan commented Feb 12, 2025

zhaochenyang20 commented Feb 13, 2025

jhinpan commented Feb 13, 2025

zhaochenyang20 commented Feb 14, 2025

robscc commented Feb 17, 2025

robscc commented Feb 17, 2025

zwdgit commented Feb 17, 2025 •

edited

Loading

zhaochenyang20 commented Feb 17, 2025

zwdgit commented Feb 19, 2025

zhaochenyang20 commented Feb 19, 2025

issaccv commented Feb 25, 2025

zhaochenyang20 commented Feb 25, 2025

[Bug] maximum recursion depth exceeded #3518

[Bug] maximum recursion depth exceeded #3518

Comments

kebe7jun commented Feb 12, 2025

Checklist

Describe the bug

Reproduction

Environment

sk2011-ship-it commented Feb 12, 2025

jhinpan commented Feb 12, 2025

jhinpan commented Feb 12, 2025

zhaochenyang20 commented Feb 13, 2025

jhinpan commented Feb 13, 2025

zhaochenyang20 commented Feb 14, 2025

robscc commented Feb 17, 2025

robscc commented Feb 17, 2025

zwdgit commented Feb 17, 2025 • edited Loading

zhaochenyang20 commented Feb 17, 2025

zwdgit commented Feb 19, 2025

zhaochenyang20 commented Feb 19, 2025

issaccv commented Feb 25, 2025

zhaochenyang20 commented Feb 25, 2025

zwdgit commented Feb 17, 2025 •

edited

Loading