updated #50

robertgshaw2-redhat · 2025-01-03T16:15:16Z

SUMMARY:

handle exception in multiproc worker busy loop (currently, we hang if this happens)

This is what the stack trace looks like after this PR:

INFO:     Started server process [4028321]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8001 (Press CTRL+C to quit)
INFO:     127.0.0.1:58610 - "GET /v1/models HTTP/1.1" 200 OK
INFO 01-03 16:21:03 logger.py:37] Received request cmpl-ac2ac919986b405fb3528f1a0e8e1613-0: prompt: 'Hello my name is', params: SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=1.0, top_p=1.0, top_k=-1, min_p=0.0, seed=None, stop=[], stop_token_ids=[], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=100, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=None), prompt_token_ids: [128000, 9906, 856, 836, 374], lora_request: None, prompt_adapter_request: None.
INFO:     127.0.0.1:58610 - "POST /v1/completions HTTP/1.1" 200 OK
INFO 01-03 16:21:04 async_llm.py:191] Added request cmpl-ac2ac919986b405fb3528f1a0e8e1613-0.
ERROR 01-03 16:21:05 core.py:200] EngineCore hit an exception: Traceback (most recent call last):
ERROR 01-03 16:21:05 core.py:200]   File "/home/rshaw/vllm/vllm/v1/engine/core.py", line 193, in run_engine_core
ERROR 01-03 16:21:05 core.py:200]     engine_core.run_busy_loop()
ERROR 01-03 16:21:05 core.py:200]   File "/home/rshaw/vllm/vllm/v1/engine/core.py", line 231, in run_busy_loop
ERROR 01-03 16:21:05 core.py:200]     outputs = self.step()
ERROR 01-03 16:21:05 core.py:200]               ^^^^^^^^^^^
ERROR 01-03 16:21:05 core.py:200]   File "/home/rshaw/vllm/vllm/v1/engine/core.py", line 124, in step
ERROR 01-03 16:21:05 core.py:200]     output = self.model_executor.execute_model(scheduler_output)
ERROR 01-03 16:21:05 core.py:200]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 01-03 16:21:05 core.py:200]   File "/home/rshaw/vllm/vllm/v1/executor/multiproc_executor.py", line 163, in execute_model
ERROR 01-03 16:21:05 core.py:200]     model_output = self.collective_rpc("execute_model",
ERROR 01-03 16:21:05 core.py:200]                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 01-03 16:21:05 core.py:200]   File "/home/rshaw/vllm/vllm/v1/executor/multiproc_executor.py", line 157, in collective_rpc
ERROR 01-03 16:21:05 core.py:200]     raise e
ERROR 01-03 16:21:05 core.py:200]   File "/home/rshaw/vllm/vllm/v1/executor/multiproc_executor.py", line 146, in collective_rpc
ERROR 01-03 16:21:05 core.py:200]     raise result
ERROR 01-03 16:21:05 core.py:200] ValueError: SIMULATE CUDA EXCEPTION
ERROR 01-03 16:21:05 core.py:200] 
CRITICAL 01-03 16:21:05 async_llm.py:53] AsyncLLM got SIGQUIT from worker processes, shutting down. See stack trace above for root cause issue.
Killed

…11632) Signed-off-by: Roger Wang <[email protected]> Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: Isotr0py <[email protected]> Co-authored-by: DarkLight1337 <[email protected]> Co-authored-by: Isotr0py <[email protected]>

Signed-off-by: ApostaC <[email protected]> Co-authored-by: KuntaiDu <[email protected]>

…1660) Signed-off-by: Joe Runde <[email protected]>

…llm-project#11661) Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: Roger Wang <[email protected]> Co-authored-by: Roger Wang <[email protected]>

Signed-off-by: Jee Jee Li <[email protected]>

Signed-off-by: Lu Fang <[email protected]>

Signed-off-by: Kazuhiro Serizawa <[email protected]>

Signed-off-by: Woosuk Kwon <[email protected]>

…essor (vllm-project#11669) Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: Isotr0py <[email protected]> Co-authored-by: Isotr0py <[email protected]>

…llm-project#11674) Signed-off-by: DarkLight1337 <[email protected]>

Signed-off-by: Tobias Pitters <[email protected]>

Signed-off-by: DarkLight1337 <[email protected]>

…_backend (vllm-project#11689)

… sequence group. (vllm-project#10013) Signed-off-by: Kathy Yu <[email protected]>

Signed-off-by: Woosuk Kwon <[email protected]>

…roject#11688) Signed-off-by: bjmsong <[email protected]> Co-authored-by: bjmsong <[email protected]>

Signed-off-by: wchen61 <[email protected]>

…#11576)

…#11694)

…ject#11673) Signed-off-by: Lu Fang <[email protected]>

…-project#11710)

Co-authored-by: Aurick Qiao <[email protected]>

github-actions · 2025-01-03T16:15:29Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

robertgshaw2-redhat · 2025-01-03T16:16:09Z

vllm/v1/executor/multiproc_executor.py

@@ -34,10 +35,25 @@
 class MultiprocExecutor(Executor):

    def __init__(self, vllm_config: VllmConfig) -> None:
-        # Call self.shutdown at exit to clean up


NOTE: removed this because this creates a circular reference that can prevent us from being gced (finalizer shutdown function cannot be a bound method of self + EngineCore calls already calls executor.shutdown() at its exit.

Co-authored-by: Tyler Michael Smith <[email protected]>

…o tp-shutdown

Co-authored-by: Tyler Michael Smith <[email protected]>

ywang96 and others added 23 commits December 31, 2024 21:17

[Benchmark] Add benchmark script for CPU offloading (vllm-project#11533)

0c6f998

Signed-off-by: ApostaC <[email protected]> Co-authored-by: KuntaiDu <[email protected]>

[Bugfix][Refactor] Unify model management in frontend (vllm-project#1…

4db72e5

…1660) Signed-off-by: Joe Runde <[email protected]>

[VLM] Add max-count checking in data parser for single image models (v…

365801f

…llm-project#11661) Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: Roger Wang <[email protected]> Co-authored-by: Roger Wang <[email protected]>

[Misc] Optimize Qwen2-VL LoRA test (vllm-project#11663)

11d8a09

Signed-off-by: Jee Jee Li <[email protected]>

[Misc] Replace space with - in the file names (vllm-project#11667)

f962f42

Signed-off-by: Lu Fang <[email protected]>

[Doc] Fix typo (vllm-project#11666)

6d70198

Signed-off-by: Kazuhiro Serizawa <[email protected]>

[V1] Implement Cascade Attention (vllm-project#11635)

7300144

Signed-off-by: Woosuk Kwon <[email protected]>

[VLM] Move supported limits and max tokens to merged multi-modal proc…

a115ac4

…essor (vllm-project#11669) Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: Isotr0py <[email protected]> Co-authored-by: Isotr0py <[email protected]>

[VLM][Bugfix] Multi-modal processor compatible with V1 multi-input (v…

23c1b10

…llm-project#11674) Signed-off-by: DarkLight1337 <[email protected]>

[mypy] Pass type checking in vllm/inputs (vllm-project#11680)

b6087a6

Signed-off-by: Tobias Pitters <[email protected]>

[VLM] Merged multi-modal processor for LLaVA-NeXT (vllm-project#11682)

8c38ee7

Signed-off-by: DarkLight1337 <[email protected]>

According to vllm.EngineArgs, the name should be distributed_executor…

84c35c3

…_backend (vllm-project#11689)

[Bugfix] Free cross attention block table for preempted-for-recompute…

2f38518

… sequence group. (vllm-project#10013) Signed-off-by: Kathy Yu <[email protected]>

[V1][Minor] Optimize token_ids_cpu copy (vllm-project#11692)

b55ed6e

Signed-off-by: Woosuk Kwon <[email protected]>

[Bugfix] Change kv scaling factor by param json on nvidia gpu (vllm-p…

187e329

…roject#11688) Signed-off-by: bjmsong <[email protected]> Co-authored-by: bjmsong <[email protected]>

Resolve race conditions in Marlin kernel (vllm-project#11493)

5dba257

Signed-off-by: wchen61 <[email protected]>

[Misc] Minimum requirements for SageMaker compatibility (vllm-project…

68d3780

…#11576)

Update default max_num_batch_tokens for chunked prefill (vllm-project…

2f1e8e8

…#11694)

[Bugfix] Check chain_speculative_sampling before calling it (vllm-pro…

07064cb

…ject#11673) Signed-off-by: Lu Fang <[email protected]>

[perf-benchmark] Fix dependency for steps in benchmark pipeline (vllm…

fd3a62a

…-project#11710)

[Model] Whisper model implementation (vllm-project#11280)

e1a5c2f

Co-authored-by: Aurick Qiao <[email protected]>

updated

1c4b92a

robertgshaw2-redhat requested review from njhill and alexm-neuralmagic as code owners January 3, 2025 16:15

robertgshaw2-redhat commented Jan 3, 2025

View reviewed changes

robertgshaw2-redhat and others added 3 commits January 3, 2025 16:26

stash

eb9b00b

[V1] Simplify Shutdown (vllm-project#11659)

80c751e

updated

1da99a8

Merge branch 'main' into tp-shutdown

ca7b92d

robertgshaw2-redhat requested review from tlrmchlsmth and youkaichao as code owners January 3, 2025 17:58

robertgshaw2-redhat and others added 17 commits January 3, 2025 17:59

updated

2743166

stash

8e257c1

revert spurious change

b7c50dc

updated

dcfd3b8

stash

6e0e0d4

updated

55a6195

updated

aa6954f

remove cruft

1d15ae0

Update vllm/v1/executor/multiproc_executor.py

0347baa

Co-authored-by: Tyler Michael Smith <[email protected]>

stash

20b8fa2

Merge branch 'tp-shutdown' of https://github.com/neuralmagic/vllm int…

32840f2

…o tp-shutdown

switch to SIGUSR1

884879a

updated

bb86a03

Update vllm/v1/engine/core_client.py

405bcc1

Co-authored-by: Tyler Michael Smith <[email protected]>

update message

25e0fea

updated

efd6270

fixed!

a5a306e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

updated #50

updated #50

robertgshaw2-redhat commented Jan 3, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Jan 3, 2025

robertgshaw2-redhat Jan 3, 2025 •

edited

Loading

updated #50

Are you sure you want to change the base?

updated #50

Conversation

robertgshaw2-redhat commented Jan 3, 2025 • edited by github-actions bot Loading

github-actions bot commented Jan 3, 2025

robertgshaw2-redhat Jan 3, 2025 • edited Loading

Choose a reason for hiding this comment

robertgshaw2-redhat commented Jan 3, 2025 •

edited by github-actions bot

Loading

robertgshaw2-redhat Jan 3, 2025 •

edited

Loading