Skip to content

Commit

Permalink
Some rewording/restructuring of docs
Browse files Browse the repository at this point in the history
  • Loading branch information
tomusher committed Dec 18, 2023
1 parent ad56b86 commit d9703d5
Show file tree
Hide file tree
Showing 6 changed files with 148 additions and 161 deletions.
3 changes: 1 addition & 2 deletions docs/.pages
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
nav:
- installation.md
- editor-integration.md
- "Backends":
- llm-backend.md
- ai-backends.md
- text-splitting.md
109 changes: 109 additions & 0 deletions docs/ai-backends.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
# AI Backends

Wagtail AI can be configured to use different backends to support different AI services.

Currently the only (and default) backend available in Wagtail AI is the [LLM Backend](#llm-backend)

## LLM Backend

This backend uses the [llm library](https://llm.datasette.io/en/stable/) which offers support for many AI services through plugins.

By default, it is configured to use OpenAI's `gpt-3.5-turbo` model.

### Using other models

You can use the command line interface to see the llm models installed in your environment:

```sh
llm models
```

Then you can swap `MODEL_ID` in the configuration to use a different model. For example, to use GPT-4:

```python
WAGTAIL_AI = {
"BACKENDS": {
"default": {
"CLASS": "wagtail_ai.ai.llm.LLMBackend",
"CONFIG": {
"MODEL_ID": "gpt-4",
},
}
}
}
```

!!! info

The `llm` package comes with OpenAI models installed by default.

You can install other models using [`llm`'s plugin functionality](https://llm.datasette.io/en/stable/plugins/index.html).

### Customisations

There are two settings that you can use with the LLM backend:

- `INIT_KWARGS`
- `PROMPT_KWARGS`

#### `INIT_KWARGS`

These are passed to `llm` as ["Model Options"](https://llm.datasette.io/en/stable/python-api.html#model-options). You can use them to customize the model's initialization.

For example, for OpenAI models you can set a custom API key. By default the `openai` library will use the value of the `OPENAI_API_KEY` environment variable.

```python
WAGTAIL_AI = {
"BACKENDS": {
"default": {
"CLASS": "wagtail_ai.ai.llm.LLMBackend",
"CONFIG": {
"MODEL_ID": "gpt-3.5-turbo", # Model ID recognizable by the llm package.
"INIT_KWARGS": {"key": "your-custom-api-key"},
},
}
}
}
```

#### `PROMPT_KWARGS`

Using `PROMPT_KWARGS` you can pass arguments to [`llm`'s `prompt` method](https://llm.datasette.io/en/stable/python-api.html#system-prompts), e.g. a system prompt which is passsed with every request.

```python
WAGTAIL_AI = {
"BACKENDS": {
"default": {
"CLASS": "wagtail_ai.ai.llm.LLMBackend",
"CONFIG": {
"MODEL_ID": "gpt-3.5-turbo", # Model ID recognizable by the llm package.
"PROMPT_KWARGS": {"system": "A custom, global system prompt."},
},
}
}
}
```

#### Specify the token limit for a model

!!! info

Token limit is referred to as "context window" which is the maximum amount of tokens in a single context that a specific chat model supports.

While Wagtail AI knows the token limit of some models (see [`tokens.py`](https://github.com/wagtail/wagtail-ai/blob/main/src/wagtail_ai/tokens.py)), you might choose to use a model that isn't in this mappping, or you might want to set a lower token limit for an existing model.

You can do this by setting `TOKEN_LIMIT`.

```python
WAGTAIL_AI = {
"BACKENDS": {
"default": {
"CLASS": "wagtail_ai.ai.llm.LLMBackend",
"CONFIG": {
"MODEL_ID": "gpt-3.5-turbo",
"TOKEN_LIMIT": 4096,
},
}
}
}
```
13 changes: 8 additions & 5 deletions docs/editor-integration.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,17 @@
# Editor Integration

Wagtail AI integrates with Wagtail's Draftail rich text editor to provide tools to help write content.
Wagtail AI integrates with Wagtail's Draftail rich text editor to provide tools to help write content. To use it, highlight some text and click the 'magic wand' icon in the toolbar.

By default, it includes tools to:
By default, it includes prompts that:

* Run AI assisted spelling/grammar checks on your content
* Generate additional content based on what you're writing

You can also define your own prompts:

### Adding Your Own Prompts

Explore the `AI Prompts` settings, accessible via the Wagtail settings menu. Here you'll be able to view, edit and add new prompts.
You can add your own prompts and customise existing prompts from the Wagtail admin under Settings -> Prompts.

When creating prompts you can provide a label and description to help describe the prompt to your editors, specify the full prompt that will be passed with your text to the AI, and a 'method', which can be one of:

- 'Append after existing content' - keep your existing content intact and add the response from the AI to the end (useful for completions/suggestions).
- 'Replace content' - replace the content in the editor with the response from the AI (useful for corrections, rewrites and translations.)
59 changes: 12 additions & 47 deletions docs/installation.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,17 @@
# Installation

At this moment in time the only backend that ships by default with wagtail-ai is [llm](https://llm.datasette.io/en/stable/)
that lets you use a number of different chat models, including OpenAI's.

1. Install the package along with the relevant client libraries for the AI Backend you want to use:
- For [llm](https://llm.datasette.io/en/stable/) which includes OpenAI chat models,
`python -m pip install wagtail-ai[llm]`
1. Install the package along with the relevant client libraries for the default [AI Backend](ai-backends.md):
```bash
python -m pip install wagtail-ai[llm]
```
2. Add `wagtail_ai` to your `INSTALLED_APPS`
3. Add an AI chat model and backend configuration (any model supported by [llm](https://llm.datasette.io/en/stable/)).
```
INSTALLED_APPS = [
"wagtail_ai",
# ...
]
```
3. Add an AI chat model and backend configuration (by default, `MODEL_ID` can be any model supported by [llm](https://llm.datasette.io/en/stable/)).
```python
WAGTAIL_AI = {
"BACKENDS": {
Expand All @@ -20,43 +24,4 @@ that lets you use a number of different chat models, including OpenAI's.
}
}
```

The openai package can be provided with the API key via the `OPENAI_API_KEY`
environment variable. If you want to provide a custom API key for
each backend please read the llm backend's documentation page.

Read more about the [llm backend here](llm-backend.md).


## Specify the token limit for a backend

!!! info

Token limit is referred to as "context window" which is the maximum amount
of tokens in a single context that a specific chat model supports.

If you want to use a chat model that does not have a default token limit configured
or want to change the default token limit, you can do so by adding the `TOKEN_LIMIT`
setting.

```python
WAGTAIL_AI = {
"BACKENDS": {
"default": {
"CLASS": "wagtail_ai.ai.llm.LLMBackend",
"CONFIG": {
"MODEL_ID": "gpt-3.5-turbo",
"TOKEN_LIMIT": 4096,
},
}
}
}
```

This `TOKEN_LIMIT` value depend on the chat model you select as each of them support
a different token limit, e.g. `gpt-3.5-turbo` supports up to 4096 tokens,
`gpt-3.5-turbo-16k` supports up to 16384 tokens.

!!! info "Text splitting"

[Read more about text splitting and Wagtail AI customization options here](text-splitting.md).
4. If you're using an OpenAI model, specify an API key using the `OPENAI_API_KEY` environment variable, or by setting it as a key in [`INIT_KWARGS`](ai-backends.md#init-kwargs).
72 changes: 0 additions & 72 deletions docs/llm-backend.md

This file was deleted.

53 changes: 18 additions & 35 deletions docs/text-splitting.md
Original file line number Diff line number Diff line change
@@ -1,35 +1,22 @@
# Text splitting

Using chat models requires splitting the text into smaller chunks that the model can process.
Sometimes when we send text to an AI model, we need to send more text than the model can process in one go. To do this, we need to split the text you provide in to smaller chunks.

There are two components to this:
Wagtail AI provides two components that help with this:

- Splitter length calculator
- Splitter

The splitter needs the length calculator to know when to split for each different chat model.

This can be controlled with the `TOKEN_LIMIT` in the backend configuration.
- Splitter length calculator - which decides how many characters will fit inside a model's context window based on the `TOKEN_LIMIT` specified in your backend configuration.
- Splitter - which splits your text in to sensible chunks.

## Defaults

By default, Wagtail AI comes with:

- Langchain `RecursiveCharacterTextSplitter` class that is vendored in Wagtail AI.
- A naive splitter length calculator that does not actually do a proper text splitting,
only estimates how many tokens there are in the supplied text.

By default Wagtail AI does not require you to use any third-party dependencies to
achieve the text splitting required for most chat models. That's why we've vendored
the Langchain splitter so it avoids relying on big external packages for a single task.

In the future development of Wagtail AI we might add support for more precise
optional backends in addition to the default ones.
- A naive splitter length calculator that tries to conservatively estimate how many characters will fit without any additional dependencies.
- A recursive text splitter vendored from Langchain that tries to split on paragraphs, then new lines, then spaces.

## Customization

Wagtail AI allows you to customize the splitter and the splitter length calculator logic
for each backend so that then you can tailor them to the specific chat model you want to use.
You may wish to create your own splitters or length calculators. To do this, you can override the default classes with your own as follows:

```python
WAGTAIL_AI = {
Expand All @@ -50,30 +37,28 @@ WAGTAIL_AI = {

### Custom text splitter

The spliter class must implement the `TextSplitterProtocol`
([source](https://github.com/wagtail/wagtail-ai/blob/main/src/wagtail_ai/types.py)).

The spliter class must implement the [`TextSplitterProtocol`](https://github.com/wagtail/wagtail-ai/blob/main/src/wagtail_ai/types.py).

E.g. if you wanted to use the actual Langchain dependency, you could specify
a custom class like this:
For example, if you wanted to use a different splitter from Langchain:

```python
from collections.abc import Callable, Iterator
from typing import Any

from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.text_splitter import (
HTMLHeaderTextSplitter as LangchainHTMLHeaderTextSplitter,
)

from wagtail_ai.types import TextSplitterProtocol


class RecursiveCharacterTextSplitter(TextSplitterProtocol):
class HTMLHeaderTextSplitter(TextSplitterProtocol):
def __init__(
self, *, chunk_size: int, length_function: Callable[[str], int], **kwargs: Any
) -> None:
self.splitter = RecursiveCharacterTextSplitter(
self.splitter = LangchainHTMLHeaderTextSplitter(
chunk_size=chunk_size,
length_function=length_function,
keep_separator=kwargs.get("keep_separator", True),
)

def split_text(self, text: str) -> list[str]:
Expand All @@ -82,13 +67,11 @@ class RecursiveCharacterTextSplitter(TextSplitterProtocol):

### Custom splitter length calculator class

Each chat model comes with their own tokenizing logic. You would have to implement
a custom splitter for each model that you want to use if you want to use a more
precise length calculator, e.g. [tiktoken](https://github.com/openai/tiktoken)
for OpenAI models.
You may want to implement a custom length calculator to get a more accurate length estimate for your chosen model.

The spliter length class must implement the [`TextSplitterLengthCalculatorProtocol`](https://github.com/wagtail/wagtail-ai/blob/main/src/wagtail_ai/types.py).

E.g. a custom calculator for the ChatGPT 3.5 Turbo chat model that uses
the proper tokenizer.
For example, using [tiktoken](https://github.com/openai/tiktoken) for OpenAI models.:

```python
import tiktoken
Expand Down

0 comments on commit d9703d5

Please sign in to comment.