deepsense-ai · ludwiktrammer · Dec 3, 2024 · Nov 28, 2024 · Dec 2, 2024 · Dec 3, 2024
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -82,6 +82,11 @@ jobs:
           source .venv/bin/activate
           ./check_licenses.sh
 
+      - name: Check documentation builds correctly
+        run: |
+          source .venv/bin/activate
+          mkdocs build --strict
+
       - name: Generate pip freeze
         run: |
           source .venv/bin/activate

diff --git a/docs/api_reference/document_search/documents.md b/docs/api_reference/document_search/documents.md
@@ -2,6 +2,8 @@
 
 ::: ragbits.document_search.documents.document.Document
 
+::: ragbits.document_search.documents.document.DocumentType
+
 ::: ragbits.document_search.documents.element.Element
 
 ::: ragbits.document_search.documents.sources.Source
diff --git a/...ow-to/document-search/search_documents.md → ...ow-to/document_search/search_documents.md b/...ow-to/document-search/search_documents.md → ...ow-to/document_search/search_documents.md
diff --git a/docs/how-to/document-search/use_rephraser.md → docs/how-to/document_search/use_rephraser.md b/docs/how-to/document-search/use_rephraser.md → docs/how-to/document_search/use_rephraser.md
diff --git a/docs/how-to/document-search/use_reranker.md → docs/how-to/document_search/use_reranker.md b/docs/how-to/document-search/use_reranker.md → docs/how-to/document_search/use_reranker.md
diff --git a/docs/how-to/prompts_lab.md b/docs/how-to/prompts_lab.md
@@ -0,0 +1,44 @@
+# How to Manage Prompts using GUI with Prompts Lab
+
+Prompts Lab is a GUI tool that automatically detects prompts in your project and allows you to interact with them. You can use it to test your prompts with Large Language Models and see how the model responds to different prompts.
+
+!!! note
+    To follow this guide, ensure that you have installed the `ragbits` package and are in a directory with Python files that define some ragbits prompts (usually, this would be the root directory of your project) in your command line terminal. If you haven't defined any prompts yet, you can use the `SongPrompt` example from [Ragbit's Quickstart Guide](../quickstart/quickstart1_prompts.md) and save it in a Python file with a name starting with "prompt_" in your project directory.
+
+## Starting Prompts Lab
+
+Start Prompts Lab by running the following command in your terminal:
+
+```bash
+ragbits prompts lab
+```
+
+The tool will open in your default web browser. You will see a list of prompts detected in your project.
+
+!!! note
+    By default, Prompts Lab assumes that prompts are defined in Python files with names starting with "prompt_". If you use a different naming convention, you can specify a different file name pattern using the `--file-pattern` option. For instance, if you want to search for prompts in all Python files in your project, run the following command:
+
+    ```bash
+    ragbits prompts lab --file-pattern "**/*.py"
+    ```
+
+    You can also change the default pattern for your entire project by setting the `prompt_path_pattern` configuration option in the `[tool.ragbits]` section of your `pyproject.toml` file.
+
+## Interacting with Prompts
+
+To work with a specific prompt, select it from the list. The "Inputs" pane allows you to enter the values for the placeholders in the prompt. For the `SongPrompt` prompt example, this would be the subject, age group, and genre of the song:
+
+![Prompts Lab](./prompts_lab_input.png){style="max-width: 300px; display: block; margin: 0 auto;"}
+
+Then, click "Render prompt" to view the final prompt content, with all placeholders replaced with the values you provided. To check how the Large Language Model responds to the prompt, click "Send to LLM".
+
+!!! note
+    If there is no default LLM configured for your project, Prompts Lab will use OpenAI's gpt-3.5-turbo. Ensure that the OPENAI_API_KEY environment variable is set and contains your OpenAI API key.
+
+    Alternatively, you can use your own custom LLM factory (a function that creates an instance of [ragbit's LLM class][ragbits.core.llms.LLM]) by specifying the path to the factory function using the `--llm-factory` option with the `ragbits prompts lab` command.
+
+    <!-- TODO: link to the how-to on configuring default LLMs in pyproject.toml -->
+
+## Conclusion
+
+In this guide, you learned how to use the `ragbits` CLI to interact with prompts that you have defined in your project using the Prompts Lab tool. This tool enables you to test your prompts with Large Language Models and see how the model responds to different prompts.
diff --git a/docs/how-to/prompts_lab_input.png b/docs/how-to/prompts_lab_input.png
diff --git a/docs/index.md b/docs/index.md
@@ -10,8 +10,8 @@ hide:
 </style>
 
 <div align="center" markdown="span">
-  ![ragbits logo](./assets/ragbits.png#only-light){ width="50%" }
-  ![ragbits logo](./assets/ragbits.png#only-dark){ width="50%" }
+  <!-- TODO: Shouldn't custom assets live in this repo too? -->
+  <img alt="ragbits logo" src="./assets/ragbits.png" width="50%">
 </div>
 
 <p align="center">

diff --git a/docs/quickstart/quickstart1_prompts.md b/docs/quickstart/quickstart1_prompts.md
@@ -0,0 +1,113 @@
+# Quickstart 1: Working with Prompts and LLMs
+
+In this Quickstart guide, you will learn how to define a dynamic prompt in Ragbits and how to use such a prompt with Large Language Models.
+
+## Defining a Static Prompt
+The most standard way to define a prompt in Ragbits is to create a class that inherits from the `Prompt` class and configure it by setting values for appropriate properties. Here is an example of a simple prompt that asks the model to write a song about Ragbits:
+
+```python
+from ragbits.core.prompt import Prompt
+
+class SongPrompt(Prompt):
+    user_prompt = """
+        Write a song about a Python library called Ragbits.
+    """
+```
+
+In this case, all you had to do was set the `user_prompt` property to the desired prompt. That's it! This prompt can now be used anytime you want to pass a prompt to Ragbits.
+
+Next, we'll learn how to make this prompt more dynamic (e.g., by adding placeholders for user inputs). But first, let's see how to use this prompt with a Large Language Model.
+
+## Testing the Prompt from the CLI
+Even at this stage, you can test the prompt using the built-in `ragbits` CLI tool. To do this, you need to run the following command in your terminal:
+
+```bash
+uv run ragbits prompts exec path.within.your.project:SongPrompt
+```
+
+Where `path.within.your.project` is the path to the Python module where the prompt is defined. In the simplest case, when you are in the same directory as the file, it will be the name of the file without the `.py` extension. For example, if the prompt is defined in a file named `song_prompt.py`, you would run:
+
+```bash
+uv run ragbits prompts exec song_prompt:SongPrompt
+```
+
+This command will send the prompt to the default Large Language Model and display the generated response in the terminal.
+
+!!! note
+    If there is no default LLM configured for your project, Ragbits will use OpenAI's gpt-3.5-turbo. Ensure that the `OPENAI_API_KEY` environment variable is set and contains your OpenAI API key.
+
+    Alternatively, you can use your custom LLM factory (a function that creates an instance of [Ragbits's LLM class][ragbits.core.llms.LLM]) by specifying the path to the factory function using the `--llm-factory` option with the `ragbits prompts exec` command.
+
+    <!-- TODO: link to the how-to on configuring default LLMs in pyproject.toml -->
+
+## Using the Prompt in Python Code
+To use the defined prompt with a Large Language Model in Python, you need to create an instance of the model and pass the prompt to it. For instance:
+
+```python
+from ragbits.core.llms.litellm import LiteLLM
+
+llm = LiteLLM("gpt-4")
+response = await llm.generate(prompt)
+print(f"Generated song: {response}")
+```
+
+In this code snippet, we first created an instance of the `LiteLLM` class and configured it to use OpenAI's `gpt-4` model. We then generated a response by passing the prompt to the model. As a result, the model will generate a song about Ragbits based on the provided prompt.
+
+## Making the Prompt Dynamic
+You can make the prompt dynamic by declaring a Pydantic model that serves as the prompt's input schema (i.e., declares the shape of the data that you will be able to use in the prompt). Here's an example:
+
+```python
+from pydantic import BaseModel
+
+class SongIdea(BaseModel):
+    subject: str
+    age_group: int
+    genre: str
+```
+
+The defined `SongIdea` model describes the desired song - its subject, the target age group, and the genre. This model can now be used to create a dynamic prompt:
+
+```python
+class SongPrompt(Prompt[SongIdea]):
+    user_prompt = """
+        Write a song about a {{subject}} for {{age_group}} years old {{genre}} fans.
+    """
+```
+
+In addition to using placeholders in the prompt, you can also employ the robust features of the [Jinja2](https://jinja.palletsprojects.com/) templating language to create more intricate prompts. Here's an example that incorporates a condition based on the input:
+
+```python
+class SongPrompt(Prompt[SongIdea]):
+    system_prompt = """
+        You are a professional songwriter.
+        {% if age_group < 18 %}
+            You only use language that is appropriate for children.
+        {% endif %}
+    """
+
+    user_prompt = """
+        Write a song about a {{subject}} for {{age_group}} years old {{genre}} fans.
+    """
+```
+
+This example illustrates how to set a system prompt and use conditional statements in the prompt.
+
+## Testing the Dynamic Prompt in CLI
+Besides using the dynamic prompt in Python, you can still test it using the `ragbits` CLI tool. The only difference is that now you need to provide the values for the placeholders in the prompt in JSON format. Here's an example:
+
+```bash
+uv run ragbits prompts exec song_prompt:SongPrompt --payload '{"subject": "unicorns", "age_group": 12, "genre": "pop"}'
+```
+
+Remember to change `song_prompt` to the name of the module where the prompt is defined and adjust the values of the placeholders to your liking.
+
+!!! tip
+    Ragbits also comes with a built-in GUI tool called Prompts Lab that allows you to manage and interact with prompts in a more user-friendly way. To learn more about using Prompts Lab, see the how-to article [How to Manage Prompts using GUI with Prompts Lab](../how-to/prompts_lab.md).
+
+## Conclusion
+You now know how to define a prompt in Ragbits and how to use it with Large Language Models. You've also learned to make the prompt dynamic by using Pydantic models and the Jinja2 templating language. To learn more about defining prompts, such as configuring the desired output format, refer to the how-to article [How to define and use Prompts in Ragbits](../how-to/use_prompting.md).
+
+<!-- TODO: Add a link to the how-to articles on using images in prompts and on defining custom prompt sources -->
+
+## Next Step
+In the next Quickstart guide, you will learn how to use Ragbits's Document Search capabilities to retrieve relevant documents for your prompts: [Quickstart 2: Adding RAG Capabilities](quickstart2_rag.md).
diff --git a/docs/quickstart/quickstart2_rag.md b/docs/quickstart/quickstart2_rag.md
@@ -0,0 +1,157 @@
+# Quickstart 2: Adding RAG Capabilities
+
+In this chapter, we will explore how to use Ragbit's Document Search capabilities to retrieve relevant documents for your prompts. This technique is based on the Retrieval Augmented Generation (RAG) architecture, which allows the LLM to generate responses informed by relevant information from your documents.
+
+To work with document content, we first need to "ingest" them (i.e., process, embed, and store them in a vector database). Afterwards, we can search for relevant documents based on the user's input and use the retrieved information to enhance the LLM's response.
+
+We will continue with the example of generating custom songs. In the previous chapters, you learned how to define a prompt and interact with it using the `ragbits` CLI. We will now upgrade the prompt with a document search capability to provide the LLM with additional context when generating a song on the given subject (in this case: inspirations from children's stories).
+
+## Getting the Documents
+
+To leverage the RAG capabilities, you need to provide a set of documents that the model can use to generate responses. This guide uses an [open-licensed (CC-BY 4.0) collection of children's stories](https://github.com/global-asp/pb-source/tree/master) as examples. You should download these documents and place them next to your Python file:
+
+```bash
+git clone https://github.com/global-asp/pb-source.git
+```
+
+The short stories are in Markdown format. Ragbits supports [various document formats][ragbits.document_search.documents.document.DocumentType], including PDF and DOC, as well as non-textual files such as images.
+
+## Defining the Document Search Object
+
+The `DocumentSearch` class serves as the main entry point for working with documents in Ragbits. It requires an embedder and a vector store to work. This example uses the `LiteLLMEmbeddings` embedder and the `InMemoryVectorStore` vector store:
+
+```python
+from ragbits.core.embeddings.litellm import LiteLLMEmbeddings
+from ragbits.core.vector_stores.in_memory import InMemoryVectorStore
+from ragbits.document_search import DocumentSearch
+
+embedder = LiteLLMEmbeddings(
+    model="text-embedding-3-small",
+)
+vector_store = InMemoryVectorStore()
+document_search = DocumentSearch(
+    embedder=embedder,
+    vector_store=vector_store,
+)
+```
+
+!!! note
+    `InMemoryVectorStore` is a simple in-memory vector store suitable for demonstration purposes. In real-world scenarios, you would typically use one of the persistent vector stores like [`ChromaVectorStore`][ragbits.core.vector_stores.chroma.ChromaVectorStore] or [`QdrantVectorStore`][ragbits.core.vector_stores.qdrant.QdrantVectorStore].
+
+## Defining the Source of the Documents
+
+We first need to direct Ragbits to the location of the documents to load them. This code will load the first 100 documents from the `pb-source/en` directory:
+
+```python
+from pathlib import Path
+from ragbits.document_search.documents.sources import LocalFileSource
+
+# Path to the directory with markdown files to ingest
+documents_path = Path(__file__).parent / "pb-source/en"
+documents = LocalFileSource.list_sources(documents_path, file_pattern="*.md")[:100]
+```
+
+Because the documents are stored locally, we are using `LocalFileSource` here. Ragbits also supports a variety of other sources including Google Cloud Storage, Hugging Face, and custom sources.
+
+## Ingesting the Documents
+
+Having established the documents and the `DocumentSearch` object, we can now ingest the documents:
+
+```python
+import asyncio
+
+async def main():
+    await document_search.ingest(documents)
+
+if __name__ == "__main__":
+    asyncio.run(main())
+```
+
+This procedure will process, embed, and store the documents in the vector database.
+
+Now, we can use the `document_search` object to find relevant documents. Let’s try a manual search:
+
+```python
+print(await document_search.search("school"))
+```
+
+This function will return fragments of ingested documents that semantically match the query “school.”
+
+## Using the Documents in the Prompt
+
+To include the retrieved documents in the prompt, we need to modify the prompt defined in [Quickstart 1](quickstart1_prompts.md).
+
+First, we'll alter the data model of the prompt to include the retrieved documents:
+
+```python
+from pydantic import BaseModel
+
+class SongIdea(BaseModel):
+    subject: str
+    age_group: int
+    genre: str
+    inspirations: list[str]
+```
+
+The updated model looks similar to the earlier model, but now incorporates a new field, `inspirations`. This field will contain inspirations for the song, retrieved from the documents.
+
+Next, we need to adjust the prompt to include these inspirations in the prompt text:
+
+```python
+from ragbits.core.prompt.prompt import Prompt
+
+class SongPrompt(Prompt[SongIdea]):
+    system_prompt = """
+        You are a professional songwriter.
+        {% if age_group < 18 %}
+            You only use language that is appropriate for children.
+        {% endif %}
+    """
+
+    user_prompt = """
+        Write a song about a {{subject}} for {{age_group}} years old {{genre}} fans.
+
+        Here are some fragments of short stories for inspiration:
+        {% for inspiration in inspirations %}
+            # Fragment {{loop.index}}
+            {{inspiration}}
+
+        {% endfor %}
+    """
+```
+
+The prompt looks similar to the previous one but now includes a section with inspirations sourced from the retrieved documents.
+
+## Using the Prompt with the LLM
+
+Now that we have a prompt that includes inspirations from the documents, we can create a function that uses the LLM to generate a song given a subject, age group, and genre. At the same time, this function will automatically supply inspirations from the ingested documents:
+
+```python
+from ragbits.core.llms.litellm import LiteLLM
+
+llm = LiteLLM("gpt-4")
+
+async def get_song_idea(subject: str, age_group: int, genre: str) -> str:
+    elements = await document_search.search(subject)
+    inspirations = [element.text_representation for element in elements if element.text_representation]
+    prompt = SongPrompt(SongIdea(subject=subject, age_group=age_group, genre=genre, inspirations=inspirations))
+
+    return await llm.generate(prompt)
+```
+
+This function searches for documents related to the subject, extracts the text representations of the found elements, and passes them to the prompt alongside the subject, age group, and genre. The LLM then generates a song based on the provided prompt.
+
+We can now modify the `main` function to use the function we just created:
+
+```python
+async def main():
+    await document_search.ingest(documents)
+    print(await get_song_idea("school", 10, "pop"))
+```
+
+!!! note
+    In real-world scenarios, you wouldn’t simultaneously ingest and search for documents in the same function. You would ingest the documents once (or periodically) and then use the `document_search` object to search for relevant documents as needed.
+
+## Conclusion
+
+In this guide, you learned how to use Ragbits' Document Search capabilities to find documents relevant to the user's question and utilize them to enhance the LLM's responses. By incorporating the RAG architecture with your prompts, you can provide the LLM with additional context and information to produce more accurate and relevant responses.
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -6,17 +6,23 @@ repo_url: https://github.com/deepsense-ai/ragbits
 copyright: Copyright &copy; 2024 deepsense.ai
 nav:
   - rabgbits: index.md
+  - Quick Start:
+      - quickstart/quickstart1_prompts.md
+      - quickstart/quickstart2_rag.md
   - How-to Guides:
+      - how-to/use_prompting.md
+      - how-to/prompts_lab.md
       - how-to/optimize.md
       - how-to/use_guardrails.md
       - how-to/integrations/promptfoo.md
-      - how-to/use_prompting.md
+      - how-to/generate_dataset.md
       - Document Search:
           - how-to/document_search/async_processing.md
           - how-to/document_search/create_custom_execution_strategy.md
-          - how-to/document-search/search_documents.md
-          - how-to/document-search/use_rephraser.md
-          - how-to/document-search/use_reranker.md
+          - how-to/document_search/search_documents.md
+          - how-to/document_search/use_rephraser.md
+          - how-to/document_search/use_reranker.md
+          - how-to/document_search/distributed_ingestion.md
   - API Reference:
       - Core:
           - api_reference/core/prompt.md
@@ -84,8 +90,7 @@ markdown_extensions:
       permalink: "#"
 plugins:
   - search
-  - autorefs:
-      enable: true
+  - autorefs
   - mkdocstrings:
       handlers:
         python:

diff --git a/packages/ragbits-core/src/ragbits/core/cli.py b/packages/ragbits-core/src/ragbits/core/cli.py
@@ -39,7 +39,7 @@ def register(app: typer.Typer) -> None:
     @prompts_app.command()
     def lab(
         file_pattern: str = core_config.prompt_path_pattern,
-        llm_factory: str | None = core_config.default_llm_factories[LLMType.TEXT],
+        llm_factory: str = core_config.default_llm_factories[LLMType.TEXT],
     ) -> None:
         """
         Launches the interactive application for listing, rendering, and testing prompts