Skip to content

Commit

Permalink
Update README.md (#12877)
Browse files Browse the repository at this point in the history
  • Loading branch information
gc-fu authored Feb 24, 2025
1 parent 10400ab commit 02ec313
Showing 1 changed file with 7 additions and 0 deletions.
7 changes: 7 additions & 0 deletions python/llm/example/GPU/vLLM-Serving/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -241,3 +241,10 @@ llm = LLM(model="DeepSeek-R1-Distill-Qwen-7B", # Unquantized model path on disk
When finish executing, the low-bit model has been saved at `/llm/fp8-model-path`.

Later we can use the option `--low-bit-model-path /llm/fp8-model-path` to use the low-bit model.


### 5. Known issues

#### Runtime memory

If runtime memory is a concern, you can set --swap-space 0.5 to reduce memory consumption during execution. The default value for --swap-space is 4, which means that by default, the system reserves 4GB of memory for use when GPU memory is insufficient.

0 comments on commit 02ec313

Please sign in to comment.