Skip to content

Commit

Permalink
Merge branch 'main' into 85-feat-end-to-end-support-for-images
Browse files Browse the repository at this point in the history
  • Loading branch information
ludwiktrammer committed Nov 6, 2024
2 parents d42e61b + e674abc commit 5766e73
Show file tree
Hide file tree
Showing 44 changed files with 1,571 additions and 105 deletions.
6 changes: 3 additions & 3 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -87,11 +87,11 @@ cmake-build-*/
**/.terraform.lock.hcl
**/.terraform

# benchmarks
benchmarks/sql/data/

# mkdocs generated files
site/

# build artifacts
dist/

# examples
chroma/
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,11 +19,11 @@
- [X] **[Core](packages/ragbits-core)** - Fundamental tools for working with prompts and LLMs.
- [X] **[Document Search](packages/ragbits-document-search)** - Handles vector search to retrieve relevant documents.
- [X] **[CLI](packages/ragbits-cli)** - The `ragbits` shell command, enabling tools such as GUI prompt management.
- [x] **[Guardrails](packages/ragbits-guardrails)** - Ensures response safety and relevance.
- [x] **[Evaluation](packages/ragbits-evaluate)** - Unified evaluation framework for Ragbits components.
- [ ] **Flow Controls** - Manages multi-stage chat flows for performing advanced actions *(coming soon)*.
- [ ] **Structured Querying** - Queries structured data sources in a predictable manner *(coming soon)*.
- [ ] **Caching** - Adds a caching layer to reduce costs and response times *(coming soon)*.
- [ ] **Observability & Audit** - Tracks user queries and events for easier troubleshooting *(coming soon)*.
- [ ] **Guardrails** - Ensures response safety and relevance *(coming soon)*.

## Installation

Expand Down
48 changes: 48 additions & 0 deletions docs/how-to/use_guardrails.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# How-To: Use Guardrails

Ragbits offers an expandable guardrails system. You can use one of the available guardrails or create your own to prevent toxic language, PII leaks etc.

In this guide we will show you how to use guardrail based on OpenAI moderation and how to creat your own guardrail.


## Using existing guardrail
To use one of the existing guardrails you need to import it together with `GuardrailManager`. Next you simply pass a list of guardrails to the manager
and call `verify()` function that will check the input (`str` or `Prompt`) against all provided guardrails asynchronously.

```python
import asyncio
from ragbits.guardrails.base import GuardrailManager, GuardrailVerificationResult
from ragbits.guardrails.openai_moderation import OpenAIModerationGuardrail


async def verify_message(message: str) -> list[GuardrailVerificationResult]:
manager = GuardrailManager([OpenAIModerationGuardrail()])
return await manager.verify(message)


if __name__ == '__main__':
print(asyncio.run(verify_message("Test message")))
```

The expected output is an object with the following properties:
```python
guardrail_name: str
succeeded: bool
fail_reason: str | None
```
It allows you to see which guardrail was used, whether the check was successful and optionally a fail reason.

## Implementing custom guardrail
We need to create a new class that inherits from `Guardrail` and implements abstract method `verify`.

```python
from ragbits.core.prompt import Prompt
from ragbits.guardrails.base import Guardrail, GuardrailVerificationResult

class CustomGuardrail(Guardrail):

async def verify(self, input_to_verify: Prompt | str) -> GuardrailVerificationResult:
pass
```

With that you can pass your `CustomGuardrail` to the `GuardrailManager` as shown in [using existing guardrails section](#using-existing-guardrail).
44 changes: 41 additions & 3 deletions examples/document-search/basic.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,35 @@
"""
Ragbits Document Search Example: Basic
This example demonstrates how to use the `DocumentSearch` class to search for documents with a minimal setup.
We will use the `LiteLLMEmbeddings` class to embed the documents and the query and the `InMemoryVectorStore` class
to store the embeddings.
The script performs the following steps:
1. Create a list of documents.
2. Initialize the `LiteLLMEmbeddings` class with the OpenAI `text-embedding-3-small` embedding model.
3. Initialize the `InMemoryVectorStore` class.
4. Initialize the `DocumentSearch` class with the embedder and the vector store.
5. Ingest the documents into the `DocumentSearch` instance.
6. Search for documents using a query.
7. Print the search results.
To run the script, execute the following command:
```bash
uv run examples/document-search/basic.py
```
"""

# /// script
# requires-python = ">=3.10"
# dependencies = [
# "ragbits-document-search",
# "ragbits-core[litellm]",
# ]
# ///

import asyncio

from ragbits.core.embeddings.litellm import LiteLLMEmbeddings
Expand All @@ -13,12 +38,25 @@
from ragbits.document_search.documents.document import DocumentMeta

documents = [
DocumentMeta.create_text_document_from_literal("RIP boiled water. You will be mist."),
DocumentMeta.create_text_document_from_literal(
"Why doesn't James Bond fart in bed? Because it would blow his cover."
"""
RIP boiled water. You will be mist.
"""
),
DocumentMeta.create_text_document_from_literal(
"""
Why doesn't James Bond fart in bed? Because it would blow his cover.
"""
),
DocumentMeta.create_text_document_from_literal(
"""
Why programmers don't like to swim? Because they're scared of the floating points.
"""
),
DocumentMeta.create_text_document_from_literal(
"Why programmers don't like to swim? Because they're scared of the floating points."
"""
This one is completely unrelated.
"""
),
]

Expand Down
47 changes: 44 additions & 3 deletions examples/document-search/chroma.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,36 @@
"""
Ragbits Document Search Example: Chroma
This example demonstrates how to use the `DocumentSearch` class to search for documents with a more advanced setup.
We will use the `LiteLLMEmbeddings` class to embed the documents and the query, the `ChromaVectorStore` class to store
the embeddings.
The script performs the following steps:
1. Create a list of documents.
2. Initialize the `LiteLLMEmbeddings` class with the OpenAI `text-embedding-3-small` embedding model.
3. Initialize the `ChromaVectorStore` class with a `PersistentClient` instance and an index name.
4. Initialize the `DocumentSearch` class with the embedder and the vector store.
5. Ingest the documents into the `DocumentSearch` instance.
6. List all documents in the vector store.
7. Search for documents using a query.
8. Print the list of all documents and the search results.
To run the script, execute the following command:
```bash
uv run examples/document-search/chroma.py
```
"""

# /// script
# requires-python = ">=3.10"
# dependencies = [
# "ragbits-document-search",
# "ragbits-core[chroma,litellm]",
# ]
# ///

import asyncio

from chromadb import PersistentClient
Expand All @@ -15,11 +41,26 @@
from ragbits.document_search.documents.document import DocumentMeta

documents = [
DocumentMeta.create_text_document_from_literal("RIP boiled water. You will be mist."),
DocumentMeta.create_text_document_from_literal(
"Why programmers don't like to swim? Because they're scared of the floating points."
"""
RIP boiled water. You will be mist.
"""
),
DocumentMeta.create_text_document_from_literal(
"""
Why doesn't James Bond fart in bed? Because it would blow his cover.
"""
),
DocumentMeta.create_text_document_from_literal(
"""
Why programmers don't like to swim? Because they're scared of the floating points.
"""
),
DocumentMeta.create_text_document_from_literal(
"""
This one is completely unrelated.
"""
),
DocumentMeta.create_text_document_from_literal("This one is completely unrelated."),
]


Expand Down
137 changes: 137 additions & 0 deletions examples/document-search/chroma_otel.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
"""
Ragbits Document Search Example: Chroma x OpenTelemetry
This example demonstrates how to use the `DocumentSearch` class to search for documents with a more advanced setup.
We will use the `LiteLLMEmbeddings` class to embed the documents and the query, the `ChromaVectorStore` class to store
the embeddings, and the OpenTelemetry SDK to trace the operations.
The script performs the following steps:
1. Create a list of documents.
2. Initialize the `LiteLLMEmbeddings` class with the OpenAI `text-embedding-3-small` embedding model.
3. Initialize the `ChromaVectorStore` class with a `PersistentClient` instance and an index name.
4. Initialize the `DocumentSearch` class with the embedder and the vector store.
5. Ingest the documents into the `DocumentSearch` instance.
6. List all documents in the vector store.
7. Search for documents using a query.
8. Print the list of all documents and the search results.
To run the script, execute the following command:
```bash
uv run examples/document-search/chroma_otel.py
```
The script exports traces to the local OTLP collector running on `http://localhost:4317`. To visualize the traces,
you can use Jeager. The recommended way to run it is using the official Docker image:
1. Run Jaeger Docker container:
```bash
docker run -d --rm --name jaeger \
-p 16686:16686 \
-p 4317:4317 \
jaegertracing/all-in-one:1.62.0
```
2. Open the Jaeger UI in your browser:
```
http://localhost:16686
```
"""

# /// script
# requires-python = ">=3.10"
# dependencies = [
# "ragbits-document-search",
# "ragbits-core[chroma,litellm,otel]",
# ]
# ///

import asyncio

from chromadb import PersistentClient
from opentelemetry import trace
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.resources import SERVICE_NAME, Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor

from ragbits.core import audit
from ragbits.core.embeddings.litellm import LiteLLMEmbeddings
from ragbits.core.vector_stores.chroma import ChromaVectorStore
from ragbits.document_search import DocumentSearch, SearchConfig
from ragbits.document_search.documents.document import DocumentMeta

provider = TracerProvider(resource=Resource({SERVICE_NAME: "ragbits"}))
provider.add_span_processor(BatchSpanProcessor(OTLPSpanExporter("http://localhost:4317", insecure=True)))
trace.set_tracer_provider(provider)

audit.set_trace_handlers("otel")

documents = [
DocumentMeta.create_text_document_from_literal(
"""
RIP boiled water. You will be mist.
"""
),
DocumentMeta.create_text_document_from_literal(
"""
Why doesn't James Bond fart in bed? Because it would blow his cover.
"""
),
DocumentMeta.create_text_document_from_literal(
"""
Why programmers don't like to swim? Because they're scared of the floating points.
"""
),
DocumentMeta.create_text_document_from_literal(
"""
This one is completely unrelated.
"""
),
]


async def main() -> None:
"""
Run the example.
"""
embedder = LiteLLMEmbeddings(
model="text-embedding-3-small",
)
vector_store = ChromaVectorStore(
client=PersistentClient("./chroma"),
index_name="jokes",
)
document_search = DocumentSearch(
embedder=embedder,
vector_store=vector_store,
)

await document_search.ingest(documents)

all_documents = await vector_store.list()

print()
print("All documents:")
print([doc.metadata["content"] for doc in all_documents])

query = "I'm boiling my water and I need a joke"
vector_store_kwargs = {
"k": 2,
"max_distance": None,
}
results = await document_search.search(
query,
config=SearchConfig(vector_store_kwargs=vector_store_kwargs),
)

print()
print(f"Documents similar to: {query}")
print([element.get_key() for element in results])


if __name__ == "__main__":
asyncio.run(main())
Loading

0 comments on commit 5766e73

Please sign in to comment.