Ollama Deployment

Example deployment of the Ollama server for OpenShift.

This deployment uses a UBI9 image defined in Containerfile and accessible at quay.io/rh-aiservices-bu/ollama-ubi9 (check for the latest version available). The image is compiled for the avx2 instruction set (so all Intel and AMD processors post-2016). However this image does not support GPU acceleration to reduce its size. It is meant to be used as a simple LLM server for tests when you don't have access to GPUs.

A notebook example using Langchain is available here.

Installation

The default installation deploys the Mistral-7B-Instruct-v0.2 model, of course in its quantized Ollama version. See Advanced installation for instructions on how to change the model as well as various settings.

Automated Deployment:

Use the OpenShift GitOps/ArgoCD Application definition at gitops/vllm-app.yaml

Manual Deployment (from the gitops folder):

Create the PVC named using the file pvc.yaml to hold the models cache.
Create the Deployment using the file deployment.yaml.
Create the Service using file service.yaml.
If you want to expose the server outside of your OpenShift cluster, create the Route with the file route.yaml

The API is now accessible at the endpoints:

defined by your Service, accessible internally on port 11434 using http. E.g. http://ollama.your-project.svc.cluster.local:11434/
defined by your Route, accessible externally through https, e.g. https://vllm.your-project.your-cluster/

Usage

You can directly query the model using either the /api/generate endpoint:

curl http://service:11434/api/generate \
      -H "Content-Type: application/json" \
      -d '{
        "model": "mistral",
        "prompt":"Why is the sky blue?"
      }'

or from Python:

pip install ollama

import ollama

client = ollama.Client(host='http://service:11434')

stream = client.chat(
  model='mistral',
  messages=[{'role': 'user', 'content': 'Why is the sky blue?'}],
  stream=True,
)

for chunk in stream:
  print(chunk['message']['content'], end='', flush=True)

You can also find a notebook example using Langchain to query Ollama in this repo here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Ollama Deployment

Installation

Usage

Files

README.md

Latest commit

History

README.md

File metadata and controls

Ollama Deployment

Installation

Usage