-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vllm多卡推理问题 #18
Comments
vllm 已经支持telechat2 可以在官网拉取vllm最新代码安装vllm使用telechat2 |
我用的是文档推荐的0.6.1.post2版本,但仍有上述问题。 |
vllm 加载模型,第一次启动多卡的模型可以正常启动,但是模型退出之后,要再次启动,就会卡住。《VLLM启动时NCCL遇到显卡P2P通信问题》 |
telechat还能跑在sglang上的? |
请问这个问题解决了吗?我也遇到了相同的问题,使用vllm部署qwen与ds,第一次正常启动;退出后,后续启动阶段模型加载到500MB时会卡住,GPU利用率达100%。 |
我的命令:
运行之后,会一直卡在一步,不继续加载模型:
显卡每张只加载了400多m

目前不知道是什么问题,想知道我的参数有没有错误,或者是哪里的配置未改。
ps. 单卡加载推理的话是正常的
The text was updated successfully, but these errors were encountered: