量化模型再T4服务器不能调用GPU资源，是什么原因呢？ #331

hithepeng · 2024-06-28T10:42:24Z

T4服务器启用了加速能力：
CMAKE_ARGS="-DGGML_CUDA=ON" pip install -U chatglm-cpp
CMAKE_ARGS="-DGGML_CUDA=ON" pip install 'chatglm-cpp[api]'
对量化模型进行启动：
MODEL=/home/ops/chatglm/chatglm.cpp/models/chatglm3-q8-0-ggml.bin uvicorn chatglm_cpp.openai_api:app --host xx.xx.xx.xx --port 8000
启动后的应用不调用GPU资源，请问是什么原因呢，有什么比较好的解决办法呢？
uvicorn chatglm_cpp.openai_api:app --host xx.xx.xx.xx --port 8000
INFO: Started server process [6821]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://xx.xx.xx.xx:8000 (Press CTRL+C to quit)

补充一下：
使用如下脚本可以调用GPU资源
./build/bin/main -m /home/ops/chatglm/chatglm.cpp/models/chatglm3-q8-0-ggml.bin -i --top_p 0.8 --temp 0.8

shendingjun · 2024-07-10T09:24:40Z

我的也是一样的情况

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

量化模型再T4服务器不能调用GPU资源，是什么原因呢？ #331

量化模型再T4服务器不能调用GPU资源，是什么原因呢？ #331

hithepeng commented Jun 28, 2024 •

edited

Loading

shendingjun commented Jul 10, 2024

量化模型再T4服务器不能调用GPU资源，是什么原因呢？ #331

量化模型再T4服务器不能调用GPU资源，是什么原因呢？ #331

Comments

hithepeng commented Jun 28, 2024 • edited Loading

shendingjun commented Jul 10, 2024

hithepeng commented Jun 28, 2024 •

edited

Loading