From 9083aea378c44722fe4a2d1eba1fee94667a7577 Mon Sep 17 00:00:00 2001 From: Neko Ayaka Date: Tue, 16 Apr 2024 17:40:32 +0800 Subject: [PATCH] docs: updated README.md with getting started docs Signed-off-by: Neko Ayaka --- README.md | 146 ++++++++++++++++++++++++++++++++++-------------------- 1 file changed, 93 insertions(+), 53 deletions(-) diff --git a/README.md b/README.md index 9309669..3c5b778 100644 --- a/README.md +++ b/README.md @@ -33,56 +33,25 @@ The journey to large language models, AIGC, localized agents, [🦜🔗 Langchai - ✅ Easy to expose with existing Kubernetes services, ingress, etc. - ✅ Doesn't require any additional dependencies, just Kubernetes -## Description +## Getting started -Unlock the abilities to run the following models with the Ollama Operator over Kubernetes: - -> [!TIP] -> By the power of [`Modelfile`](https://github.com/ollama/ollama/blob/main/docs/modelfile.md) backed by Ollama, you can create and bundle any of your own model. **As long as it's a GGUF formatted model.** - -| Model | Parameters | Size | Model image | Full model image URL | -| ----------------------- | ---------- | ----- | ------------------- | ---------------------------------------------- | -| Llama 2 | 7B | 3.8GB | `llama2` | `registry.ollama.ai/library/llama2` | -| Mistral | 7B | 4.1GB | `mistral` | `registry.ollama.ai/library/mistral` | -| Dolphin Phi | 2.7B | 1.6GB | `dolphin-phi` | `registry.ollama.ai/library/dolphin-phi` | -| Phi-2 | 2.7B | 1.7GB | `phi` | `registry.ollama.ai/library/phi` | -| Neural Chat | 7B | 4.1GB | `neural-chat` | `registry.ollama.ai/library/neural-chat` | -| Starling | 7B | 4.1GB | `starling-lm` | `registry.ollama.ai/library/starling-lm` | -| Code Llama | 7B | 3.8GB | `codellama` | `registry.ollama.ai/library/codellama` | -| Llama 2 Uncensored | 7B | 3.8GB | `llama2-uncensored` | `registry.ollama.ai/library/llama2-uncensored` | -| Llama 2 13B | 13B | 7.3GB | `llama2:13b` | `registry.ollama.ai/library/llama2:13b` | -| Llama 2 70B | 70B | 39GB | `llama2:70b` | `registry.ollama.ai/library/llama2:70b` | -| Orca Mini | 3B | 1.9GB | `orca-mini` | `registry.ollama.ai/library/orca-mini` | -| Vicuna | 7B | 3.8GB | `vicuna` | `registry.ollama.ai/library/vicuna` | -| LLaVA | 7B | 4.5GB | `llava` | `registry.ollama.ai/library/llava` | -| Gemma | 2B | 1.4GB | `gemma:2b` | `registry.ollama.ai/library/gemma:2b` | -| Gemma | 7B | 4.8GB | `gemma:7b` | `registry.ollama.ai/library/gemma:7b` | - -Full list of available images can be found at [Ollama Library](https://ollama.com/library). - -> [!WARNING] -> You should have at least 8 GB of RAM available on your node to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models. +### Install operator -> [!WARNING] -> The actual size of downloaded large language models are huge by comparing to the size of general container images. -> -> 1. Fast and stable network connection is recommended to download the models. -> 2. Efficient storage is required to store the models if you want to run models larger than 13B. +```shell +kubectl apply -f https://raw.githubusercontent.com/nekomeowww/ollama-operator/main/dist/install.yaml +``` -## Getting Started +### Wait for the operator to be ready -```yaml -apiVersion: ollama.ayaka.io/v1 -kind: Model -metadata: - name: phi -spec: - image: phi +```shell +kubectl wait --for=jsonpath='{.status.replicas}'=2 deployment/ollama-operator-controller-manager -n ollama-operator-system ``` +### Create model + > [!IMPORTANT] > Working with `kind`? -> +> > The default provisioned `StorageClass` in `kind` is `standard`, and will only work with `ReadWriteOnce` access mode, therefore if you would need to run the operator with `kind`, you should specify `persistentVolume` with `accessMode: ReadWriteOnce` in the `Model` CRD: > ```yaml > apiVersion: ollama.ayaka.io/v1 @@ -95,6 +64,41 @@ spec: > accessMode: ReadWriteOnce > ``` +```yaml +apiVersion: ollama.ayaka.io/v1 +kind: Model +metadata: + name: phi +spec: + image: phi +``` + +Apply the `Model` CRD to your Kubernetes cluster: + +```shell +kubectl apply -f ollama-model-phi.yaml +``` + +Wait for the model to be ready: + +```shell +kubectl wait --for=jsonpath='{.status.readyReplicas}'=1 deployment/ollama-model-phi +``` + +### Access the model + +1. Ready! Now let's forward the ports to access the model: + +```shell +kubectl port-forward svc/ollama-model-phi ollama +``` + +7. Interact with the model: + +```shell +ollama run phi +``` + ### Full options ```yaml @@ -116,6 +120,42 @@ spec: accessMode: ReadWriteOnce ``` +## Supported models + +Unlock the abilities to run the following models with the Ollama Operator over Kubernetes: + +> [!TIP] +> By the power of [`Modelfile`](https://github.com/ollama/ollama/blob/main/docs/modelfile.md) backed by Ollama, you can create and bundle any of your own model. **As long as it's a GGUF formatted model.** + +| Model | Parameters | Size | Model image | Full model image URL | +| ----------------------- | ---------- | ----- | ------------------- | ---------------------------------------------- | +| Llama 2 | 7B | 3.8GB | `llama2` | `registry.ollama.ai/library/llama2` | +| Mistral | 7B | 4.1GB | `mistral` | `registry.ollama.ai/library/mistral` | +| Dolphin Phi | 2.7B | 1.6GB | `dolphin-phi` | `registry.ollama.ai/library/dolphin-phi` | +| Phi-2 | 2.7B | 1.7GB | `phi` | `registry.ollama.ai/library/phi` | +| Neural Chat | 7B | 4.1GB | `neural-chat` | `registry.ollama.ai/library/neural-chat` | +| Starling | 7B | 4.1GB | `starling-lm` | `registry.ollama.ai/library/starling-lm` | +| Code Llama | 7B | 3.8GB | `codellama` | `registry.ollama.ai/library/codellama` | +| Llama 2 Uncensored | 7B | 3.8GB | `llama2-uncensored` | `registry.ollama.ai/library/llama2-uncensored` | +| Llama 2 13B | 13B | 7.3GB | `llama2:13b` | `registry.ollama.ai/library/llama2:13b` | +| Llama 2 70B | 70B | 39GB | `llama2:70b` | `registry.ollama.ai/library/llama2:70b` | +| Orca Mini | 3B | 1.9GB | `orca-mini` | `registry.ollama.ai/library/orca-mini` | +| Vicuna | 7B | 3.8GB | `vicuna` | `registry.ollama.ai/library/vicuna` | +| LLaVA | 7B | 4.5GB | `llava` | `registry.ollama.ai/library/llava` | +| Gemma | 2B | 1.4GB | `gemma:2b` | `registry.ollama.ai/library/gemma:2b` | +| Gemma | 7B | 4.8GB | `gemma:7b` | `registry.ollama.ai/library/gemma:7b` | + +Full list of available images can be found at [Ollama Library](https://ollama.com/library). + +> [!WARNING] +> You should have at least 8 GB of RAM available on your node to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models. + +> [!WARNING] +> The actual size of downloaded large language models are huge by comparing to the size of general container images. +> +> 1. Fast and stable network connection is recommended to download the models. +> 2. Efficient storage is required to store the models if you want to run models larger than 13B. + ## Architecture Overview There are two major components that the Ollama Operator will create for: @@ -128,17 +168,17 @@ There are two major components that the Ollama Operator will create for: The detailed resources it creates, and the relationships between them are shown in the following diagram: - - - - - + + + + + ## Contributing