Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

量化模型再T4服务器不能调用GPU资源,是什么原因呢? #331

Open
hithepeng opened this issue Jun 28, 2024 · 1 comment
Open

Comments

@hithepeng
Copy link

hithepeng commented Jun 28, 2024

T4服务器启用了加速能力:
CMAKE_ARGS="-DGGML_CUDA=ON" pip install -U chatglm-cpp
CMAKE_ARGS="-DGGML_CUDA=ON" pip install 'chatglm-cpp[api]'
对量化模型进行启动:
MODEL=/home/ops/chatglm/chatglm.cpp/models/chatglm3-q8-0-ggml.bin uvicorn chatglm_cpp.openai_api:app --host xx.xx.xx.xx --port 8000
启动后的应用不调用GPU资源,请问是什么原因呢,有什么比较好的解决办法呢?
uvicorn chatglm_cpp.openai_api:app --host xx.xx.xx.xx --port 8000
INFO: Started server process [6821]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://xx.xx.xx.xx:8000 (Press CTRL+C to quit)

补充一下:
使用如下脚本可以调用GPU资源
./build/bin/main -m /home/ops/chatglm/chatglm.cpp/models/chatglm3-q8-0-ggml.bin -i --top_p 0.8 --temp 0.8

@shendingjun
Copy link

我的也是一样的情况

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants