- Open the "nvidia-runtime"
vim /etc/docker/daemon.json
Add the following content
{
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
}
}
Restart docker
systemctl daemon-reload
systemctl restart docker
- Install nvidia-container-runtime and nvidia-docker2
Run
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.repo | \
sudo tee /etc/yum.repos.d/nvidia-container-runtime.repo
Run
yum install nvidia-container-runtime nvidia-docker2 -y
Restart docker
systemctl restart docker
- Start vllm Server
docker run -d --runtime nvidia --gpus all -v /root/SuperAdapters/output/llama3.1-combined:/root/SuperAdapters/output/llama3.1-combined -p 8000:8000 --ipc=host vllm/vllm-openai:latest --model /root/SuperAdapters/output/llama3.1-combined --trust-remote-code
P.S. If you use V100, you should add option "--max_model_len", like "--max_model_len 30000"