This project allows easy deployment of any built-in LLM of Xinference using Docker. Uses Docker and Docker compose, available here. Note that this setup only works on Linux machines with dedicated Nvidia graphics cards. For other solutions, check the Xinference docs; for instance, you can run the xinference library natively on Mac machines.
The pre-built image is available on Docker Hub under the name
biocypher/xinference-builtin
as a multi-arch image. You can pull it using
docker pull biocypher/xinference-builtin
. The image is built for amd64 and
arm64 architectures. If you want to build the image yourself, you can use the
Dockerfile in this repository (step 2).
-
Install nvidia-docker libraries (find details about the Nvidia-Container Toolkit here).
-
Run
docker compose pull
to use a pre-built image ordocker compose build
to build it locally. -
Run
docker compose up -d
. This should start a container in the background that downloads and runs the zephyr-7b model. To change the model, change the env_file parameter in thedocker-compose.yml
file, for instance tollama-2-13b.env
. -
Optional: There are two example environment file examples that can be commented and un-commented in the docker-compose.yml. The llama-2-chat file shows you how to use models that require a huggingface access token (if the token is placed in the .env file).
You can find a list of available LLM models in two ways:
-
Set the environment variable
LIST=1
in the active .env file. Rundocker-compose up
, which will run the container attached until it prints a list of all available LLMs -
Find a maybe not up-to-date list in the xinference documentation here