From 89941ed43b2c8117f9a5faccb7aa65d7f94ee222 Mon Sep 17 00:00:00 2001
From: xffxff <1247714429@qq.com>
Date: Wed, 13 Nov 2024 07:53:04 +0000
Subject: [PATCH] docs: add a note on potential errors when enabling tensor
 parallelism for vLLM

---
 docs/inference.md | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/docs/inference.md b/docs/inference.md
index 319e318..9bba970 100644
--- a/docs/inference.md
+++ b/docs/inference.md
@@ -83,6 +83,12 @@ pip install -e .[vllm]
 ```
 
 ### How to Use:
+
+> **NOTE:** If you encounter a "RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method" when enabling tensor parallelism, you can try setting the following environment variable:
+> ```bash
+> export VLLM_WORKER_MULTIPROC_METHOD="spawn"
+> ```
+
 ```python
 from PIL import Image
 from transformers import AutoTokenizer