Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Quickstart chapters 1 & 2 #212

Merged
merged 7 commits into from
Dec 3, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,11 @@ jobs:
source .venv/bin/activate
./check_licenses.sh

- name: Check documentation builds correctly
run: |
source .venv/bin/activate
mkdocs build --strict

- name: Generate pip freeze
run: |
source .venv/bin/activate
Expand Down
2 changes: 2 additions & 0 deletions docs/api_reference/document_search/documents.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

::: ragbits.document_search.documents.document.Document

::: ragbits.document_search.documents.document.DocumentType

::: ragbits.document_search.documents.element.Element

::: ragbits.document_search.documents.sources.Source
44 changes: 44 additions & 0 deletions docs/how-to/prompts_lab.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# How to Manage Prompts using GUI with Prompts Lab

Prompts Lab is a GUI tool that automatically detects prompts in your project and allows you to interact with them. You can use it to test your prompts with Large Language Models and see how the model responds to different prompts.

!!! note
To follow this guide, ensure that you have installed the `ragbits` package and are in a directory with Python files that define some ragbits prompts (usually, this would be the root directory of your project) in your command line terminal. If you haven't defined any prompts yet, you can use the `SongPrompt` example from [Ragbit's Quickstart Guide](../quickstart/quickstart1_prompts.md) and save it in a Python file with a name starting with "prompt_" in your project directory.

## Starting Prompts Lab

Start Prompts Lab by running the following command in your terminal:

```bash
ragbits prompts lab
```

The tool will open in your default web browser. You will see a list of prompts detected in your project.

!!! note
By default, Prompts Lab assumes that prompts are defined in Python files with names starting with "prompt_". If you use a different naming convention, you can specify a different file name pattern using the `--file-pattern` option. For instance, if you want to search for prompts in all Python files in your project, run the following command:

```bash
ragbits prompts lab --file-pattern "**/*.py"
```

You can also change the default pattern for your entire project by setting the `prompt_path_pattern` configuration option in the `[tool.ragbits]` section of your `pyproject.toml` file.

## Interacting with Prompts

To work with a specific prompt, select it from the list. The "Inputs" pane allows you to enter the values for the placeholders in the prompt. For the `SongPrompt` prompt example, this would be the subject, age group, and genre of the song:

![Prompts Lab](./prompts_lab_input.png){style="max-width: 300px; display: block; margin: 0 auto;"}

Then, click "Render prompt" to view the final prompt content, with all placeholders replaced with the values you provided. To check how the Large Language Model responds to the prompt, click "Send to LLM".

!!! note
If there is no default LLM configured for your project, Prompts Lab will use OpenAI's gpt-3.5-turbo. Ensure that the OPENAI_API_KEY environment variable is set and contains your OpenAI API key.

Alternatively, you can use your own custom LLM factory (a function that creates an instance of [ragbit's LLM class][ragbits.core.llms.LLM]) by specifying the path to the factory function using the `--llm-factory` option with the `ragbits prompts lab` command.

<!-- TODO: link to the how-to on configuring default LLMs in pyproject.toml -->

## Conclusion

In this guide, you learned how to use the `ragbits` CLI to interact with prompts that you have defined in your project using the Prompts Lab tool. This tool enables you to test your prompts with Large Language Models and see how the model responds to different prompts.
Binary file added docs/how-to/prompts_lab_input.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,8 @@ hide:
</style>

<div align="center" markdown="span">
![ragbits logo](./assets/ragbits.png#only-light){ width="50%" }
![ragbits logo](./assets/ragbits.png#only-dark){ width="50%" }
<!-- TODO: Shouldn't custom assets live in this repo too? -->
<img alt="ragbits logo" src="./assets/ragbits.png" width="50%">
</div>

<p align="center">
Expand Down
113 changes: 113 additions & 0 deletions docs/quickstart/quickstart1_prompts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
# Quickstart 1: Working with Prompts and LLMs

In this Quickstart guide, you will learn how to define a dynamic prompt in Ragbits and how to use such a prompt with Large Language Models.

## Defining a Static Prompt
The most standard way to define a prompt in Ragbits is to create a class that inherits from the `Prompt` class and configure it by setting values for appropriate properties. Here is an example of a simple prompt that asks the model to write a song about Ragbits:

```python
from ragbits.core.prompt import Prompt

class SongPrompt(Prompt):
user_prompt = """
Write a song about a Python library called Ragbits.
"""
```

In this case, all you had to do was set the `user_prompt` property to the desired prompt. That's it! This prompt can now be used anytime you want to pass a prompt to Ragbits.

Next, we'll learn how to make this prompt more dynamic (e.g., by adding placeholders for user inputs). But first, let's see how to use this prompt with a Large Language Model.

## Testing the Prompt from the CLI
Even at this stage, you can test the prompt using the built-in `ragbits` CLI tool. To do this, you need to run the following command in your terminal:

```bash
uv run ragbits prompts exec path.within.your.project:SongPrompt
```

Where `path.within.your.project` is the path to the Python module where the prompt is defined. In the simplest case, when you are in the same directory as the file, it will be the name of the file without the `.py` extension. For example, if the prompt is defined in a file named `song_prompt.py`, you would run:

```bash
uv run ragbits prompts exec song_prompt:SongPrompt
```

This command will send the prompt to the default Large Language Model and display the generated response in the terminal.

!!! note
If there is no default LLM configured for your project, Ragbits will use OpenAI's gpt-3.5-turbo. Ensure that the `OPENAI_API_KEY` environment variable is set and contains your OpenAI API key.

Alternatively, you can use your custom LLM factory (a function that creates an instance of [Ragbits's LLM class][ragbits.core.llms.LLM]) by specifying the path to the factory function using the `--llm-factory` option with the `ragbits prompts exec` command.

<!-- TODO: link to the how-to on configuring default LLMs in pyproject.toml -->

## Using the Prompt in Python Code
To use the defined prompt with a Large Language Model in Python, you need to create an instance of the model and pass the prompt to it. For instance:

```python
from ragbits.core.llms.litellm import LiteLLM

llm = LiteLLM("gpt-4")
response = await llm.generate(prompt)
print(f"Generated song: {response}")
```

In this code snippet, we first created an instance of the `LiteLLM` class and configured it to use OpenAI's `gpt-4` model. We then generated a response by passing the prompt to the model. As a result, the model will generate a song about Ragbits based on the provided prompt.

## Making the Prompt Dynamic
You can make the prompt dynamic by declaring a Pydantic model that serves as the prompt's input schema (i.e., declares the shape of the data that you will be able to use in the prompt). Here's an example:

```python
from pydantic import BaseModel

class SongIdea(BaseModel):
subject: str
age_group: int
genre: str
```

The defined `SongIdea` model describes the desired song - its subject, the target age group, and the genre. This model can now be used to create a dynamic prompt:

```python
class SongPrompt(Prompt[SongIdea]):
user_prompt = """
Write a song about a {{subject}} for {{age_group}} years old {{genre}} fans.
"""
```

In addition to using placeholders in the prompt, you can also employ the robust features of the [Jinja2](https://jinja.palletsprojects.com/) templating language to create more intricate prompts. Here's an example that incorporates a condition based on the input:

```python
class SongPrompt(Prompt[SongIdea]):
system_prompt = """
You are a professional songwriter.
{% if age_group < 18 %}
You only use language that is appropriate for children.
{% endif %}
"""

user_prompt = """
Write a song about a {{subject}} for {{age_group}} years old {{genre}} fans.
"""
```

This example illustrates how to set a system prompt and use conditional statements in the prompt.

## Testing the Dynamic Prompt in CLI
Besides using the dynamic prompt in Python, you can still test it using the `ragbits` CLI tool. The only difference is that now you need to provide the values for the placeholders in the prompt in JSON format. Here's an example:

```bash
uv run ragbits prompts exec song_prompt:SongPrompt --payload '{"subject": "unicorns", "age_group": 12, "genre": "pop"}'
```

Remember to change `song_prompt` to the name of the module where the prompt is defined and adjust the values of the placeholders to your liking.

!!! tip
Ragbits also comes with a built-in GUI tool called Prompts Lab that allows you to manage and interact with prompts in a more user-friendly way. To learn more about using Prompts Lab, see the how-to article [How to Manage Prompts using GUI with Prompts Lab](../how-to/prompts_lab.md).

## Conclusion
You now know how to define a prompt in Ragbits and how to use it with Large Language Models. You've also learned to make the prompt dynamic by using Pydantic models and the Jinja2 templating language. To learn more about defining prompts, such as configuring the desired output format, refer to the how-to article [How to define and use Prompts in Ragbits](../how-to/use_prompting.md).

<!-- TODO: Add a link to the how-to articles on using images in prompts and on defining custom prompt sources -->

## Next Step
In the next Quickstart guide, you will learn how to use Ragbits's Document Search capabilities to retrieve relevant documents for your prompts: [Quickstart 2: Adding RAG Capabilities](quickstart2_rag.md).
157 changes: 157 additions & 0 deletions docs/quickstart/quickstart2_rag.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
# Quickstart 2: Adding RAG Capabilities

In this chapter, we will explore how to use Ragbit's Document Search capabilities to retrieve relevant documents for your prompts. This technique is based on the Retrieval Augmented Generation (RAG) architecture, which allows the LLM to generate responses informed by relevant information from your documents.

To work with document content, we first need to "ingest" them (i.e., process, embed, and store them in a vector database). Afterwards, we can search for relevant documents based on the user's input and use the retrieved information to enhance the LLM's response.

We will continue with the example of generating custom songs. In the previous chapters, you learned how to define a prompt and interact with it using the `ragbits` CLI. We will now upgrade the prompt with a document search capability to provide the LLM with additional context when generating a song on the given subject (in this case: inspirations from children's stories).

## Getting the Documents

To leverage the RAG capabilities, you need to provide a set of documents that the model can use to generate responses. This guide uses an [open-licensed (CC-BY 4.0) collection of children's stories](https://github.com/global-asp/pb-source/tree/master) as examples. You should download these documents and place them next to your Python file:

```bash
git clone https://github.com/global-asp/pb-source.git
```

The short stories are in Markdown format. Ragbits supports [various document formats][ragbits.document_search.documents.document.DocumentType], including PDF and DOC, as well as non-textual files such as images.

## Defining the Document Search Object

The `DocumentSearch` class serves as the main entry point for working with documents in Ragbits. It requires an embedder and a vector store to work. This example uses the `LiteLLMEmbeddings` embedder and the `InMemoryVectorStore` vector store:

```python
from ragbits.core.embeddings.litellm import LiteLLMEmbeddings
from ragbits.core.vector_stores.in_memory import InMemoryVectorStore
from ragbits.document_search import DocumentSearch

embedder = LiteLLMEmbeddings(
model="text-embedding-3-small",
)
vector_store = InMemoryVectorStore()
document_search = DocumentSearch(
embedder=embedder,
vector_store=vector_store,
)
```

!!! note
`InMemoryVectorStore` is a simple in-memory vector store suitable for demonstration purposes. In real-world scenarios, you would typically use one of the persistent vector stores like [`ChromaVectorStore`][ragbits.core.vector_stores.chroma.ChromaVectorStore] or [`QdrantVectorStore`][ragbits.core.vector_stores.qdrant.QdrantVectorStore].

## Defining the Source of the Documents

We first need to direct Ragbits to the location of the documents to load them. This code will load the first 100 documents from the `pb-source/en` directory:

```python
from pathlib import Path
from ragbits.document_search.documents.sources import LocalFileSource

# Path to the directory with markdown files to ingest
documents_path = Path(__file__).parent / "pb-source/en"
documents = LocalFileSource.list_sources(documents_path, file_pattern="*.md")[:100]
```

Because the documents are stored locally, we are using `LocalFileSource` here. Ragbits also supports a variety of other sources including Google Cloud Storage, Hugging Face, and custom sources.

## Ingesting the Documents

Having established the documents and the `DocumentSearch` object, we can now ingest the documents:

```python
import asyncio

async def main():
await document_search.ingest(documents)

if __name__ == "__main__":
asyncio.run(main())
```

This procedure will process, embed, and store the documents in the vector database.

Now, we can use the `document_search` object to find relevant documents. Let’s try a manual search:

```python
print(await document_search.search("school"))
```

This function will return fragments of ingested documents that semantically match the query “school.”

## Using the Documents in the Prompt

To include the retrieved documents in the prompt, we need to modify the prompt defined in [Quickstart 1](quickstart1_prompts.md).

First, we'll alter the data model of the prompt to include the retrieved documents:

```python
from pydantic import BaseModel

class SongIdea(BaseModel):
subject: str
age_group: int
genre: str
inspirations: list[str]
```

The updated model looks similar to the earlier model, but now incorporates a new field, `inspirations`. This field will contain inspirations for the song, retrieved from the documents.

Next, we need to adjust the prompt to include these inspirations in the prompt text:

```python
from ragbits.core.prompt.prompt import Prompt

class SongPrompt(Prompt[SongIdea]):
system_prompt = """
You are a professional songwriter.
{% if age_group < 18 %}
You only use language that is appropriate for children.
{% endif %}
"""

user_prompt = """
Write a song about a {{subject}} for {{age_group}} years old {{genre}} fans.

Here are some fragments of short stories for inspiration:
{% for inspiration in inspirations %}
# Fragment {{loop.index}}
{{inspiration}}

{% endfor %}
"""
```

The prompt looks similar to the previous one but now includes a section with inspirations sourced from the retrieved documents.

## Using the Prompt with the LLM

Now that we have a prompt that includes inspirations from the documents, we can create a function that uses the LLM to generate a song given a subject, age group, and genre. At the same time, this function will automatically supply inspirations from the ingested documents:

```python
from ragbits.core.llms.litellm import LiteLLM

llm = LiteLLM("gpt-4")

async def get_song_idea(subject: str, age_group: int, genre: str) -> str:
elements = await document_search.search(subject)
inspirations = [element.text_representation for element in elements if element.text_representation]
prompt = SongPrompt(SongIdea(subject=subject, age_group=age_group, genre=genre, inspirations=inspirations))

return await llm.generate(prompt)
```

This function searches for documents related to the subject, extracts the text representations of the found elements, and passes them to the prompt alongside the subject, age group, and genre. The LLM then generates a song based on the provided prompt.

We can now modify the `main` function to use the function we just created:

```python
async def main():
await document_search.ingest(documents)
print(await get_song_idea("school", 10, "pop"))
```

!!! note
In real-world scenarios, you wouldn’t simultaneously ingest and search for documents in the same function. You would ingest the documents once (or periodically) and then use the `document_search` object to search for relevant documents as needed.

## Conclusion

In this guide, you learned how to use Ragbits' Document Search capabilities to find documents relevant to the user's question and utilize them to enhance the LLM's responses. By incorporating the RAG architecture with your prompts, you can provide the LLM with additional context and information to produce more accurate and relevant responses.
17 changes: 11 additions & 6 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,17 +6,23 @@ repo_url: https://github.com/deepsense-ai/ragbits
copyright: Copyright &copy; 2024 deepsense.ai
nav:
- rabgbits: index.md
- Quick Start:
- quickstart/quickstart1_prompts.md
- quickstart/quickstart2_rag.md
- How-to Guides:
- how-to/use_prompting.md
- how-to/prompts_lab.md
- how-to/optimize.md
- how-to/use_guardrails.md
- how-to/integrations/promptfoo.md
- how-to/use_prompting.md
- how-to/generate_dataset.md
- Document Search:
- how-to/document_search/async_processing.md
- how-to/document_search/create_custom_execution_strategy.md
- how-to/document-search/search_documents.md
- how-to/document-search/use_rephraser.md
- how-to/document-search/use_reranker.md
- how-to/document_search/search_documents.md
- how-to/document_search/use_rephraser.md
- how-to/document_search/use_reranker.md
- how-to/document_search/distributed_ingestion.md
- API Reference:
- Core:
- api_reference/core/prompt.md
Expand Down Expand Up @@ -84,8 +90,7 @@ markdown_extensions:
permalink: "#"
plugins:
- search
- autorefs:
enable: true
- autorefs
- mkdocstrings:
handlers:
python:
Expand Down
2 changes: 1 addition & 1 deletion packages/ragbits-core/src/ragbits/core/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ def register(app: typer.Typer) -> None:
@prompts_app.command()
def lab(
file_pattern: str = core_config.prompt_path_pattern,
llm_factory: str | None = core_config.default_llm_factories[LLMType.TEXT],
llm_factory: str = core_config.default_llm_factories[LLMType.TEXT],
) -> None:
"""
Launches the interactive application for listing, rendering, and testing prompts
Expand Down
Loading
Loading