Skip to content

Commit

Permalink
docs: updated README.md with getting started docs
Browse files Browse the repository at this point in the history
Signed-off-by: Neko Ayaka <[email protected]>
  • Loading branch information
nekomeowww committed Apr 16, 2024
1 parent 29ccb5f commit 9083aea
Showing 1 changed file with 93 additions and 53 deletions.
146 changes: 93 additions & 53 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,56 +33,25 @@ The journey to large language models, AIGC, localized agents, [🦜🔗 Langchai
- ✅ Easy to expose with existing Kubernetes services, ingress, etc.
- ✅ Doesn't require any additional dependencies, just Kubernetes

## Description
## Getting started

Unlock the abilities to run the following models with the Ollama Operator over Kubernetes:

> [!TIP]
> By the power of [`Modelfile`](https://github.com/ollama/ollama/blob/main/docs/modelfile.md) backed by Ollama, you can create and bundle any of your own model. **As long as it's a GGUF formatted model.**
| Model | Parameters | Size | Model image | Full model image URL |
| ----------------------- | ---------- | ----- | ------------------- | ---------------------------------------------- |
| Llama 2 | 7B | 3.8GB | `llama2` | `registry.ollama.ai/library/llama2` |
| Mistral | 7B | 4.1GB | `mistral` | `registry.ollama.ai/library/mistral` |
| Dolphin Phi | 2.7B | 1.6GB | `dolphin-phi` | `registry.ollama.ai/library/dolphin-phi` |
| Phi-2 | 2.7B | 1.7GB | `phi` | `registry.ollama.ai/library/phi` |
| Neural Chat | 7B | 4.1GB | `neural-chat` | `registry.ollama.ai/library/neural-chat` |
| Starling | 7B | 4.1GB | `starling-lm` | `registry.ollama.ai/library/starling-lm` |
| Code Llama | 7B | 3.8GB | `codellama` | `registry.ollama.ai/library/codellama` |
| Llama 2 Uncensored | 7B | 3.8GB | `llama2-uncensored` | `registry.ollama.ai/library/llama2-uncensored` |
| Llama 2 13B | 13B | 7.3GB | `llama2:13b` | `registry.ollama.ai/library/llama2:13b` |
| Llama 2 70B | 70B | 39GB | `llama2:70b` | `registry.ollama.ai/library/llama2:70b` |
| Orca Mini | 3B | 1.9GB | `orca-mini` | `registry.ollama.ai/library/orca-mini` |
| Vicuna | 7B | 3.8GB | `vicuna` | `registry.ollama.ai/library/vicuna` |
| LLaVA | 7B | 4.5GB | `llava` | `registry.ollama.ai/library/llava` |
| Gemma | 2B | 1.4GB | `gemma:2b` | `registry.ollama.ai/library/gemma:2b` |
| Gemma | 7B | 4.8GB | `gemma:7b` | `registry.ollama.ai/library/gemma:7b` |

Full list of available images can be found at [Ollama Library](https://ollama.com/library).

> [!WARNING]
> You should have at least 8 GB of RAM available on your node to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models.
### Install operator

> [!WARNING]
> The actual size of downloaded large language models are huge by comparing to the size of general container images.
>
> 1. Fast and stable network connection is recommended to download the models.
> 2. Efficient storage is required to store the models if you want to run models larger than 13B.
```shell
kubectl apply -f https://raw.githubusercontent.com/nekomeowww/ollama-operator/main/dist/install.yaml
```

## Getting Started
### Wait for the operator to be ready

```yaml
apiVersion: ollama.ayaka.io/v1
kind: Model
metadata:
name: phi
spec:
image: phi
```shell
kubectl wait --for=jsonpath='{.status.replicas}'=2 deployment/ollama-operator-controller-manager -n ollama-operator-system
```

### Create model

> [!IMPORTANT]
> Working with `kind`?
>
>
> The default provisioned `StorageClass` in `kind` is `standard`, and will only work with `ReadWriteOnce` access mode, therefore if you would need to run the operator with `kind`, you should specify `persistentVolume` with `accessMode: ReadWriteOnce` in the `Model` CRD:
> ```yaml
> apiVersion: ollama.ayaka.io/v1
Expand All @@ -95,6 +64,41 @@ spec:
> accessMode: ReadWriteOnce
> ```
```yaml
apiVersion: ollama.ayaka.io/v1
kind: Model
metadata:
name: phi
spec:
image: phi
```
Apply the `Model` CRD to your Kubernetes cluster:
```shell
kubectl apply -f ollama-model-phi.yaml
```
Wait for the model to be ready:
```shell
kubectl wait --for=jsonpath='{.status.readyReplicas}'=1 deployment/ollama-model-phi
```
### Access the model
1. Ready! Now let's forward the ports to access the model:
```shell
kubectl port-forward svc/ollama-model-phi ollama
```
7. Interact with the model:
```shell
ollama run phi
```
### Full options
```yaml
Expand All @@ -116,6 +120,42 @@ spec:
accessMode: ReadWriteOnce
```
## Supported models
Unlock the abilities to run the following models with the Ollama Operator over Kubernetes:
> [!TIP]
> By the power of [`Modelfile`](https://github.com/ollama/ollama/blob/main/docs/modelfile.md) backed by Ollama, you can create and bundle any of your own model. **As long as it's a GGUF formatted model.**
| Model | Parameters | Size | Model image | Full model image URL |
| ----------------------- | ---------- | ----- | ------------------- | ---------------------------------------------- |
| Llama 2 | 7B | 3.8GB | `llama2` | `registry.ollama.ai/library/llama2` |
| Mistral | 7B | 4.1GB | `mistral` | `registry.ollama.ai/library/mistral` |
| Dolphin Phi | 2.7B | 1.6GB | `dolphin-phi` | `registry.ollama.ai/library/dolphin-phi` |
| Phi-2 | 2.7B | 1.7GB | `phi` | `registry.ollama.ai/library/phi` |
| Neural Chat | 7B | 4.1GB | `neural-chat` | `registry.ollama.ai/library/neural-chat` |
| Starling | 7B | 4.1GB | `starling-lm` | `registry.ollama.ai/library/starling-lm` |
| Code Llama | 7B | 3.8GB | `codellama` | `registry.ollama.ai/library/codellama` |
| Llama 2 Uncensored | 7B | 3.8GB | `llama2-uncensored` | `registry.ollama.ai/library/llama2-uncensored` |
| Llama 2 13B | 13B | 7.3GB | `llama2:13b` | `registry.ollama.ai/library/llama2:13b` |
| Llama 2 70B | 70B | 39GB | `llama2:70b` | `registry.ollama.ai/library/llama2:70b` |
| Orca Mini | 3B | 1.9GB | `orca-mini` | `registry.ollama.ai/library/orca-mini` |
| Vicuna | 7B | 3.8GB | `vicuna` | `registry.ollama.ai/library/vicuna` |
| LLaVA | 7B | 4.5GB | `llava` | `registry.ollama.ai/library/llava` |
| Gemma | 2B | 1.4GB | `gemma:2b` | `registry.ollama.ai/library/gemma:2b` |
| Gemma | 7B | 4.8GB | `gemma:7b` | `registry.ollama.ai/library/gemma:7b` |
Full list of available images can be found at [Ollama Library](https://ollama.com/library).
> [!WARNING]
> You should have at least 8 GB of RAM available on your node to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models.
> [!WARNING]
> The actual size of downloaded large language models are huge by comparing to the size of general container images.
>
> 1. Fast and stable network connection is recommended to download the models.
> 2. Efficient storage is required to store the models if you want to run models larger than 13B.
## Architecture Overview
There are two major components that the Ollama Operator will create for:
Expand All @@ -128,17 +168,17 @@ There are two major components that the Ollama Operator will create for:
The detailed resources it creates, and the relationships between them are shown in the following diagram:
<picture>
<source
srcset="./docs/public/architecture-theme-night.png"
media="(prefers-color-scheme: dark)"
/>
<source
srcset="./docs/public/architecture-theme-day.png"
media="(prefers-color-scheme: light), (prefers-color-scheme: no-preference)"
/>
<img src="./docs/public/architecture-theme-day.png" />
</picture>
<picture>
<source
srcset="./docs/public/architecture-theme-night.png"
media="(prefers-color-scheme: dark)"
/>
<source
srcset="./docs/public/architecture-theme-day.png"
media="(prefers-color-scheme: light), (prefers-color-scheme: no-preference)"
/>
<img src="./docs/public/architecture-theme-day.png" />
</picture>
## Contributing
Expand Down

0 comments on commit 9083aea

Please sign in to comment.