vllm+cpu 后端（无 gpu 硬件）时，tensor_parallel_size 应该默认设置成 1 而不是 cuda_count(等于 0) #2552

Diffizle · 2024-11-14T08:06:16Z

System Info / 系統信息

cpu backend with no gpu

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

docker / docker
pip install / 通过 pip install 安装
installation from source / 从源码安装

Version info / 版本信息

xinference-latest+vllm-0.6.2

The command used to start Xinference / 用以启动 xinference 的命令

UI 界面启动

Reproduction / 复现过程

vllm+cpu 后端（无 gpu 硬件）时，tensor_parallel_size 应该默认设置成 1 而不是 cuda_count(等于 0)。
否则 vllm 内代码报错：
if total_num_attention_heads % tensor_parallel_size != 0:
^^^^^^^^^^^^^^^^^
ZeroDivisionError: integer modulo by zero

Expected behavior / 期待表现

也许没人用 cpu 吧...

qinxuye · 2024-11-14T08:07:47Z

ok，正常应该设置成了 cuda_count，欢迎提交 PR，应该 cuda_coutn 为 0 的时候设置 1

github-actions · 2024-11-21T19:03:39Z

This issue is stale because it has been open for 7 days with no activity.

XprobeBot added the gpu label Nov 14, 2024

XprobeBot added this to the v0.16 milestone Nov 14, 2024

github-actions bot added the stale label Nov 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vllm+cpu 后端（无 gpu 硬件）时，tensor_parallel_size 应该默认设置成 1 而不是 cuda_count(等于 0) #2552

vllm+cpu 后端（无 gpu 硬件）时，tensor_parallel_size 应该默认设置成 1 而不是 cuda_count(等于 0) #2552

Diffizle commented Nov 14, 2024

qinxuye commented Nov 14, 2024

github-actions bot commented Nov 21, 2024

vllm+cpu 后端（无 gpu 硬件）时，tensor_parallel_size 应该默认设置成 1 而不是 cuda_count(等于 0) #2552

vllm+cpu 后端（无 gpu 硬件）时，tensor_parallel_size 应该默认设置成 1 而不是 cuda_count(等于 0) #2552

Comments

Diffizle commented Nov 14, 2024

System Info / 系統信息

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

Version info / 版本信息

The command used to start Xinference / 用以启动 xinference 的命令

Reproduction / 复现过程

Expected behavior / 期待表现

qinxuye commented Nov 14, 2024

github-actions bot commented Nov 21, 2024