Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vllm+cpu 后端(无 gpu 硬件)时,tensor_parallel_size 应该默认设置成 1 而不是 cuda_count(等于 0) #2552

Open
3 tasks
Diffizle opened this issue Nov 14, 2024 · 2 comments
Milestone

Comments

@Diffizle
Copy link

System Info / 系統信息

cpu backend with no gpu

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?

  • docker / docker
  • pip install / 通过 pip install 安装
  • installation from source / 从源码安装

Version info / 版本信息

xinference-latest+vllm-0.6.2

The command used to start Xinference / 用以启动 xinference 的命令

UI 界面启动

Reproduction / 复现过程

vllm+cpu 后端(无 gpu 硬件)时,tensor_parallel_size 应该默认设置成 1 而不是 cuda_count(等于 0)。
否则 vllm 内代码报错:
if total_num_attention_heads % tensor_parallel_size != 0:
^^^^^^^^^^^^^^^^^
ZeroDivisionError: integer modulo by zero

Expected behavior / 期待表现

也许没人用 cpu 吧...

@XprobeBot XprobeBot added the gpu label Nov 14, 2024
@XprobeBot XprobeBot added this to the v0.16 milestone Nov 14, 2024
@qinxuye
Copy link
Contributor

qinxuye commented Nov 14, 2024

ok,正常应该设置成了 cuda_count,欢迎提交 PR,应该 cuda_coutn 为 0 的时候设置 1

Copy link

This issue is stale because it has been open for 7 days with no activity.

@github-actions github-actions bot added the stale label Nov 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants