Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] maximum recursion depth exceeded #3518

Closed
2 of 5 tasks
kebe7jun opened this issue Feb 12, 2025 · 14 comments · Fixed by #3519
Closed
2 of 5 tasks

[Bug] maximum recursion depth exceeded #3518

kebe7jun opened this issue Feb 12, 2025 · 14 comments · Fixed by #3519
Assignees

Comments

@kebe7jun
Copy link
Contributor

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.
  • 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
  • 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
  • 5. Please use English, otherwise it will be closed.

Describe the bug

Maximum recursion depth triggered on exception exit.

    self._send_signal(sig)
  File "/usr/local/lib/python3.10/dist-packages/psutil/__init__.py", line 1266, in _send_signal
    os.kill(self.pid, sig)
  File "/sgl-workspace/sglang/python/sglang/srt/entrypoints/engine.py", line 333, in sigquit_handler
    kill_process_tree(os.getpid())
  File "/sgl-workspace/sglang/python/sglang/srt/utils.py", line 492, in kill_process_tree
    children = itself.children(recursive=True)
  File "/usr/local/lib/python3.10/dist-packages/psutil/__init__.py", line 971, in children
    self._raise_if_pid_reused()
  File "/usr/local/lib/python3.10/dist-packages/psutil/__init__.py", line 461, in _raise_if_pid_reused
    if self._pid_reused or (not self.is_running() and self._pid_reused):
  File "/usr/local/lib/python3.10/dist-packages/psutil/__init__.py", line 636, in is_running
    self._pid_reused = self != Process(self.pid)
  File "/usr/local/lib/python3.10/dist-packages/psutil/__init__.py", line 319, in __init__
    self._init(pid)
  File "/usr/local/lib/python3.10/dist-packages/psutil/__init__.py", line 355, in _init
    self._ident = self._get_ident()
  File "/usr/local/lib/python3.10/dist-packages/psutil/__init__.py", line 396, in _get_ident
    return (self.pid, self.create_time())
  File "/usr/local/lib/python3.10/dist-packages/psutil/__init__.py", line 778, in create_time
    self._create_time = self._proc.create_time()
  File "/usr/local/lib/python3.10/dist-packages/psutil/_pslinux.py", line 1716, in wrapper
    return fun(self, *args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/psutil/_pslinux.py", line 1957, in create_time
    ctime = float(self._parse_stat_file()['create_time'])
  File "/usr/local/lib/python3.10/dist-packages/psutil/_pslinux.py", line 1716, in wrapper
    return fun(self, *args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/psutil/_common.py", line 508, in wrapper
    raise raise_from(err, None)
  File "<string>", line 3, in raise_from
  File "/usr/local/lib/python3.10/dist-packages/psutil/_common.py", line 506, in wrapper
    return fun(self)
  File "/usr/local/lib/python3.10/dist-packages/psutil/_pslinux.py", line 1784, in _parse_stat_file
    data = bcat("%s/%s/stat" % (self._procfs_path, self.pid))
  File "/usr/local/lib/python3.10/dist-packages/psutil/_common.py", line 851, in bcat
    return cat(fname, fallback=fallback, _open=open_binary)
  File "/usr/local/lib/python3.10/dist-packages/psutil/_common.py", line 839, in cat
    with _open(fname) as f:
  File "/usr/local/lib/python3.10/dist-packages/psutil/_common.py", line 799, in open_binary
    return open(fname, "rb", buffering=FILE_READ_BUFFER_SIZE)
RecursionError: maximum recursion depth exceeded while calling a Python object

Reproduction

N/A

Environment

root@g1805:/sgl-workspace# python3 -m sglang.check_env
INFO 02-12 08:49:13 init.py:190] Automatically detected platform cuda.
Python: 3.10.12 (main, Jan 17 2025, 14:35:34) [GCC 11.4.0]
CUDA available: True
GPU 0,1,2,3,4,5,6,7: NVIDIA GeForce RTX 4090
GPU 0,1,2,3,4,5,6,7 Compute Capability: 8.9
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 12.4, V12.4.131
CUDA Driver Version: 550.78
PyTorch: 2.5.1+cu124
sgl_kernel: 0.0.3.post3
flashinfer: 0.2.0.post2+cu124torch2.5
triton: 3.1.0
transformers: 4.48.3
torchao: 0.8.0
numpy: 1.26.4
aiohttp: 3.11.12
fastapi: 0.115.8
hf_transfer: 0.1.9
huggingface_hub: 0.28.1
interegular: 0.3.3
modelscope: 1.22.3
orjson: 3.10.15
packaging: 24.2
psutil: 6.1.1
pydantic: 2.10.6
multipart: 0.0.20
zmq: 26.2.1
uvicorn: 0.34.0
uvloop: 0.21.0
vllm: 0.7.2
openai: 1.61.1
tiktoken: 0.8.0
anthropic: 0.45.2
decord: 0.6.0
NVIDIA Topology:
GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X PHB SYS SYS SYS SYS SYS SYS 0-27,56-83 0 N/A
GPU1 PHB X SYS SYS SYS SYS SYS SYS 0-27,56-83 0 N/A
GPU2 SYS SYS X PHB SYS SYS SYS SYS 0-27,56-83 0 N/A
GPU3 SYS SYS PHB X SYS SYS SYS SYS 0-27,56-83 0 N/A
GPU4 SYS SYS SYS SYS X PHB SYS SYS 28-55,84-111 1 N/A
GPU5 SYS SYS SYS SYS PHB X SYS SYS 28-55,84-111 1 N/A
GPU6 SYS SYS SYS SYS SYS SYS X PHB 28-55,84-111 1 N/A
GPU7 SYS SYS SYS SYS SYS SYS PHB X 28-55,84-111 1 N/A

Legend:

X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks

ulimit soft: 65535

@sk2011-ship-it
Copy link

Image this is the issue in latest docker build

@jhinpan
Copy link
Collaborator

jhinpan commented Feb 12, 2025

Image this is the issue in latest docker build

This issue is duplicate as #3525

@jhinpan jhinpan self-assigned this Feb 12, 2025
@jhinpan
Copy link
Collaborator

jhinpan commented Feb 12, 2025

Thank you for pointing that out @kebe7jun . We will quickly review your PR and let you know. cc @zhaochenyang20

@zhaochenyang20
Copy link
Collaborator

Do you have a PR to fix? @kebe7jun @sk2011-ship-it @jhinpan

@jhinpan
Copy link
Collaborator

jhinpan commented Feb 13, 2025

@zhaochenyang20 I believe @kebe7jun 's PR to fix this issue is here #3519, waiting for check.

@zhaochenyang20
Copy link
Collaborator

@jhinpan I will take a look. THnaks!

@robscc
Copy link

robscc commented Feb 17, 2025

I have installed datasets and the issue still exists, seems not the dependency problem

@robscc
Copy link

robscc commented Feb 17, 2025

Image this is the issue in latest docker build

how to enable the subprocess logging or watch the subprocess log? any tips would help are welcome

@zwdgit
Copy link

zwdgit commented Feb 17, 2025

I encountered the same problem. @zhaochenyang20

@zhaochenyang20
Copy link
Collaborator

#3519

I will merge this PR today. @zwdgit @jhinpan @kebe7jun Ping me if not finished today.

@zwdgit
Copy link

zwdgit commented Feb 19, 2025

#3519

I will merge this PR today. @zwdgit @jhinpan @kebe7jun Ping me if not finished today.

@zhaochenyang20 The merger failed

@zhaochenyang20
Copy link
Collaborator

@zwdgit Too many to merge. Please remind me. Thanks!

@issaccv
Copy link

issaccv commented Feb 25, 2025

Is there any new progress? @zhaochenyang20

@zhaochenyang20
Copy link
Collaborator

@issaccv I am trying to pass the ci and merge it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants