Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(document-search): automatic configuration selection based on evaluation #177

Merged
merged 12 commits into from
Nov 8, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
178 changes: 178 additions & 0 deletions docs/how-to/optimize.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,178 @@
# How to Autoconfigure Your Pipeline

Ragbits provides a feature that allows users to automatically configure hyperparameters for a pipeline. This functionality is agnostic to the type of optimized structure, with the only requirements being the following:

- The optimized pipeline must inherit from `ragbits.evaluate.pipelines.base.EvaluationPipeline`.
- The definition of optimized metrics must adhere to the `ragbits.evaluate.metrics.base.Metric` interface.
- These metrics should be gathered into an instance of `ragbits.evaluate.metrics.base.MetricSet`.
- An instance of a class inheriting from `ragbits.evaluate.metrics.loader.base.DataLoader` must be provided as the data source for optimization.

## Supported Parameter Types

The optimized parameters can be of the following types:

- **Continuous**
- **Ordinal**
- **Categorical**

For ordinal and continuous parameters, the values should be integers or floats. For categorical parameters, more sophisticated structures are supported, including the possibility of nested parameters of other types.

Each optimized variable should be marked with the `optimize=True` flag in the configuration.

For categorical variables, you must also provide the `choices` field, which lists all possible values to be considered during optimization. For continuous and ordinal variables, the `range` field should be specified as a two-element list defining the minimum and maximum values of interest. For continuous parameters, the elements must be floats, while for ordinal parameters, they must be integers.

## Example Usage

In this example, we will optimize a system prompt for a question-answering pipeline so that the answers contain the minimal number of tokens.

### Define the Optimized Pipeline Structure

```python
from dataclasses import dataclass
from ragbits.evaluate.pipelines.base import EvaluationResult, EvaluationPipeline
from ragbits.core.llms.litellm import LiteLLM
from ragbits.core.prompt import Prompt
from pydantic import BaseModel


@dataclass
class RandomQuestionPipelineResult(EvaluationResult):
answer: str


class QuestionRespondPromptInput(BaseModel):
system_prompt_content: str
question: str


class QuestionRespondPrompt(Prompt[QuestionRespondPromptInput]):
system_prompt = "{{ system_prompt_content }}"
user_prompt = "{{ question }}"


class RandomQuestionRespondPipeline(EvaluationPipeline):
async def __call__(self, data: dict[str, str]) -> RandomQuestionPipelineResult:
llm = LiteLLM()
input_prompt = QuestionRespondPrompt(
QuestionRespondPromptInput(
system_prompt_content=self.config.system_prompt_content,
question=data["question"],
)
)
answer = await llm.generate(prompt=input_prompt)
mhordynski marked this conversation as resolved.
Show resolved Hide resolved
return RandomQuestionPipelineResult(answer=answer)
```

### Define the Data Loader

Next, we define the data loader. We'll use Ragbits generation stack to create an artificial data loader:


```python
from ragbits.evaluate.loaders.base import DataLoader, DataT
from ragbits.core.llms.litellm import LiteLLM
from ragbits.core.prompt import Prompt
from pydantic import BaseModel
from omegaconf import OmegaConf


class DatasetGenerationPromptInput(BaseModel):
topic: str


class DatasetGenerationPrompt(Prompt[DatasetGenerationPromptInput]):
system_prompt = "Be a provider of random questions on a topic specified by the user."
user_prompt = "Generate a question about {{ topic }}"


class RandomQuestionsDataLoader(DataLoader):
async def load(self) -> list[dict[str, str]]:
questions = []
llm = LiteLLM()
for _ in range(self.config.num_questions):
question = await llm.generate(
DatasetGenerationPrompt(DatasetGenerationPromptInput(topic=self.config.question_topic))
)
questions.append({"question": question})
return questions


dataloader_config = OmegaConf.create(
{"num_questions": 10, "question_topic": "conspiracy theories"}
)
dataloader = RandomQuestionsDataLoader(dataloader_config)
```

### Define the Metrics and Run the Experiment

```python
from pprint import pp as pprint
import tiktoken
from ragbits.evaluate.optimizer import Optimizer
from ragbits.evaluate.metrics.base import Metric, MetricSet, ResultT
from omegaconf import OmegaConf


class TokenCountMetric(Metric):
def compute(self, results: list[ResultT]) -> dict[str, float]:
encoding = tiktoken.get_encoding("cl100k_base")
num_tokens = [len(encoding.encode(out.answer)) for out in results]
return {"num_tokens": sum(num_tokens) / len(num_tokens)}


metrics = MetricSet(TokenCountMetric())

optimization_cfg = OmegaConf.create(
{"direction": "minimize", "n_trials": 4, "max_retries_for_trial": 3}
)
optimizer = Optimizer(optimization_cfg)

optimized_params = OmegaConf.create(
{
"system_prompt_content": {
"optimize": True,
"choices": [
"Be a friendly bot answering user questions. Be as concise as possible",
"Be a silly bot answering user questions. Use as few tokens as possible",
"Be informative and straight to the point",
"Respond to user questions in as few words as possible",
],
}
}
)
configs_with_scores = optimizer.optimize(
pipeline_class=RandomQuestionRespondPipeline,
config_with_params=optimized_params,
metrics=metrics,
dataloader=dataloader,
)
pprint(configs_with_scores)
```
mhordynski marked this conversation as resolved.
Show resolved Hide resolved

After executing the code, your console should display an output structure similar to this:

```json
[({'system_prompt_content': 'Be a silly bot answering user questions. Use as few tokens as possible'},
6.0,
{'num_tokens': 6.0}),
({'system_prompt_content': 'Be a silly bot answering user questions. Use as few tokens as possible'},
10.7,
{'num_tokens': 10.7}),
({'system_prompt_content': 'Be a friendly bot answering user questions. Be as concise as possible'},
37.8,
{'num_tokens': 37.8}),
({'system_prompt_content': 'Be informative and straight to the point'},
113.2,
{'num_tokens': 113.2})]

```

This output consists of tuples, each containing three elements:

1. The configuration used in the trial.
2. The score achieved.
3. A dictionary of detailed metrics that contribute to the score.

The tuples are ordered from the best to the worst configuration based on the score.

Please note that the details may vary between runs due to the non-deterministic nature of both the LLM and the optimization algorithm.
8 changes: 5 additions & 3 deletions examples/evaluation/document-search/config/data/qa.yaml
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
name: "hf-docs-retrieval"
path: "micpst/hf-docs-retrieval"
split: "train"
type: ragbits.evaluate.loaders.hf:HFDataLoader
options:
name: "hf-docs-retrieval"
path: "micpst/hf-docs-retrieval"
split: "train"
Original file line number Diff line number Diff line change
Expand Up @@ -3,18 +3,19 @@
task:
name: chunking-1000

pipeline:
# used only for ingestion
providers:
txt:
config:
chunking_kwargs:
max_characters: 1000
md:
config:
chunking_kwargs:
max_characters: 1000
providers:
txt:
config:
chunking_kwargs:
max_characters: 1000
md:
config:
chunking_kwargs:
max_characters: 1000

# used for both ingestion and evaluation
vector_store:
config:
index_name: chunk-1000
# used for both ingestion and evaluation
vector_store:
config:
index_name: chunk-1000
Original file line number Diff line number Diff line change
Expand Up @@ -4,17 +4,18 @@ task:
name: chunking-250

# used only for ingestion
providers:
txt:
config:
chunking_kwargs:
max_characters: 250
md:
config:
chunking_kwargs:
max_characters: 250
pipeline:
providers:
txt:
config:
chunking_kwargs:
max_characters: 250
md:
config:
chunking_kwargs:
max_characters: 250

# used for both ingestion and evaluation
vector_store:
config:
index_name: chunk-250
# used for both ingestion and evaluation
vector_store:
config:
index_name: chunk-250
Original file line number Diff line number Diff line change
Expand Up @@ -4,17 +4,17 @@ task:
name: chunking-500

# used only for ingestion
providers:
txt:
config:
chunking_kwargs:
max_characters: 500
md:
config:
chunking_kwargs:
max_characters: 500

pipeline:
providers:
txt:
config:
chunking_kwargs:
max_characters: 500
md:
config:
chunking_kwargs:
max_characters: 500
# used for both ingestion and evaluation
vector_store:
config:
index_name: chunk-500
vector_store:
config:
index_name: chunk-500
4 changes: 1 addition & 3 deletions examples/evaluation/document-search/config/ingestion.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,4 @@
defaults:
- data: corpus
- embedder: litellm
- providers: unstructured
- vector_store: chroma
- pipeline: document_ingestion
- _self_
26 changes: 26 additions & 0 deletions examples/evaluation/document-search/config/optimization.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
defaults:
- pipeline: document_search_optimization
- data: qa
- _self_

task:
name: default
type: document-search

metrics:
- type: ragbits.evaluate.metrics.document_search:DocumentSearchPrecisionRecallF1
matching_strategy: RougeChunkMatch
options:
threshold: 0.5
- type: ragbits.evaluate.metrics.document_search:DocumentSearchRankedRetrievalMetrics
weight: -1.0
matching_strategy: RougeChunkMatch
options:
threshold: 0.5


callbacks:
- type: ragbits.evaluate.callbacks.neptune:NeptuneCallbackConfigurator
args:
callback_type: neptune.integrations.optuna:NeptuneCallback
project: deepsense-ai/ragbits
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
name: "hf-docs"
path: "micpst/hf-docs"
split: "train"
num_docs: 5
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
defaults:
- embedder: litellm
- providers: unstructured
- vector_store: chroma
- _self_
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
defaults:
- embedder: litellm
- providers: unstructured
- vector_store: chroma
- rephraser: noop
- reranker: noop
- _self_
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
defaults:
- embedder: litellm_opt_template
- providers: unstructured_opt_template
- vector_store: chroma
- rephraser: noop
- reranker: noop
- answer_data_source: corpus
- _self_

type: ragbits.evaluate.pipelines.document_search:DocumentSearchWithIngestionPipeline
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
type: ragbits.core.embeddings.litellm:LiteLLMEmbeddings
config:
model: "text-embedding-3-small"
options:
dimensions: 768
encoding_format: float
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
type: ragbits.core.embeddings.litellm:LiteLLMEmbeddings
config:
optimize: true
choices:
- model: "text-embedding-3-small"
options:
dimensions:
optimize: true
range:
- 32
- 512
encoding_format: float
- model: "text-embedding-3-large"
options:
dimensions:
optimize: true
range:
- 512
- 1024
encoding_format: float
Loading
Loading