xinference[vllm]无法单机多卡启动一个模型 #2513

Weishaoya · 2024-11-04T15:45:32Z

ubuntu 22.04
python 3.11.10

xinference[vllm] 0.16.2

GRADIO_DEFAULT_CONCURRENCY_LIMIT=10 xinference-local --host 0.0.0.0 --port 10860

我直接用vllm框架在两张v100 16g上启动一个7b的模型，是没有问题的，而且可以正常对话。
当我用xinference[vllm]在两张v100 16g上启动一个7b的模型，它报错了，以下是我的参数设置和报错信息：

我现在在用ragflow框架，推理服务接的xinference, 这对我很重要，希望官方能重视这个bug.

qinxuye · 2024-11-04T16:23:06Z

不需要设置tensor_parallel_size，去掉试试。

Weishaoya · 2024-11-05T00:35:00Z

不需要设置tensor_parallel_size，去掉试试。

我去掉了tensor_parallel_size参数设置，它报另外一个错误，以上是报错信息。希望你们能修复这个bug，感谢！

github-actions · 2024-11-12T19:03:33Z

This issue is stale because it has been open for 7 days with no activity.

redreamality · 2024-11-15T17:22:18Z

same issue

Weishaoya · 2024-11-18T13:35:50Z

same issue
I have updated the version of xinference to 1.0.0, and it worked.

XprobeBot added this to the v0.16 milestone Nov 4, 2024

github-actions bot added the stale label Nov 12, 2024

github-actions bot removed the stale label Nov 15, 2024

Provide feedback