deepsense-ai · kdziedzic68 · Nov 8, 2024 · Nov 6, 2024 · Nov 6, 2024 · Nov 6, 2024
diff --git a/docs/how-to/optimize.md b/docs/how-to/optimize.md
@@ -0,0 +1,178 @@
+# How to Autoconfigure Your Pipeline
+
+Ragbits provides a feature that allows users to automatically configure hyperparameters for a pipeline. This functionality is agnostic to the type of optimized structure, with the only requirements being the following:
+
+- The optimized pipeline must inherit from `ragbits.evaluate.pipelines.base.EvaluationPipeline`.
+- The definition of optimized metrics must adhere to the `ragbits.evaluate.metrics.base.Metric` interface.
+- These metrics should be gathered into an instance of `ragbits.evaluate.metrics.base.MetricSet`.
+- An instance of a class inheriting from `ragbits.evaluate.metrics.loader.base.DataLoader` must be provided as the data source for optimization.
+
+## Supported Parameter Types
+
+The optimized parameters can be of the following types:
+
+- **Continuous**
+- **Ordinal**
+- **Categorical**
+
+For ordinal and continuous parameters, the values should be integers or floats. For categorical parameters, more sophisticated structures are supported, including the possibility of nested parameters of other types.
+
+Each optimized variable should be marked with the `optimize=True` flag in the configuration.
+
+For categorical variables, you must also provide the `choices` field, which lists all possible values to be considered during optimization. For continuous and ordinal variables, the `range` field should be specified as a two-element list defining the minimum and maximum values of interest. For continuous parameters, the elements must be floats, while for ordinal parameters, they must be integers.
+
+## Example Usage
+
+In this example, we will optimize a system prompt for a question-answering pipeline so that the answers contain the minimal number of tokens.
+
+### Define the Optimized Pipeline Structure
+
+```python
+from dataclasses import dataclass
+from ragbits.evaluate.pipelines.base import EvaluationResult, EvaluationPipeline
+from ragbits.core.llms.litellm import LiteLLM
+from ragbits.core.prompt import Prompt
+from pydantic import BaseModel
+
+
+@dataclass
+class RandomQuestionPipelineResult(EvaluationResult):
+    answer: str
+
+
+class QuestionRespondPromptInput(BaseModel):
+    system_prompt_content: str
+    question: str
+
+
+class QuestionRespondPrompt(Prompt[QuestionRespondPromptInput]):
+    system_prompt = "{{ system_prompt_content }}"
+    user_prompt = "{{ question }}"
+
+
+class RandomQuestionRespondPipeline(EvaluationPipeline):
+    async def __call__(self, data: dict[str, str]) -> RandomQuestionPipelineResult:
+        llm = LiteLLM()
+        input_prompt = QuestionRespondPrompt(
+            QuestionRespondPromptInput(
+                system_prompt_content=self.config.system_prompt_content,
+                question=data["question"],
+            )
+        )
+        answer = await llm.generate(prompt=input_prompt)
+        return RandomQuestionPipelineResult(answer=answer)
+```
+
+### Define the Data Loader
+
+Next, we define the data loader. We'll use Ragbits generation stack to create an artificial data loader:
+
+
+```python
+from ragbits.evaluate.loaders.base import DataLoader, DataT
+from ragbits.core.llms.litellm import LiteLLM
+from ragbits.core.prompt import Prompt
+from pydantic import BaseModel
+from omegaconf import OmegaConf
+
+
+class DatasetGenerationPromptInput(BaseModel):
+    topic: str
+
+
+class DatasetGenerationPrompt(Prompt[DatasetGenerationPromptInput]):
+    system_prompt = "Be a provider of random questions on a topic specified by the user."
+    user_prompt = "Generate a question about {{ topic }}"
+
+
+class RandomQuestionsDataLoader(DataLoader):
+    async def load(self) -> list[dict[str, str]]:
+        questions = []
+        llm = LiteLLM()
+        for _ in range(self.config.num_questions):
+            question = await llm.generate(
+                DatasetGenerationPrompt(DatasetGenerationPromptInput(topic=self.config.question_topic))
+            )
+            questions.append({"question": question})
+        return questions
+
+
+dataloader_config = OmegaConf.create(
+    {"num_questions": 10, "question_topic": "conspiracy theories"}
+)
+dataloader = RandomQuestionsDataLoader(dataloader_config)
+```
+
+### Define the Metrics and Run the Experiment
+
+```python
+from pprint import pp as pprint
+import tiktoken
+from ragbits.evaluate.optimizer import Optimizer
+from ragbits.evaluate.metrics.base import Metric, MetricSet, ResultT
+from omegaconf import OmegaConf
+
+
+class TokenCountMetric(Metric):
+    def compute(self, results: list[ResultT]) -> dict[str, float]:
+        encoding = tiktoken.get_encoding("cl100k_base")
+        num_tokens = [len(encoding.encode(out.answer)) for out in results]
+        return {"num_tokens": sum(num_tokens) / len(num_tokens)}
+
+
+metrics = MetricSet(TokenCountMetric())
+
+optimization_cfg = OmegaConf.create(
+    {"direction": "minimize", "n_trials": 4, "max_retries_for_trial": 3}
+)
+optimizer = Optimizer(optimization_cfg)
+
+optimized_params = OmegaConf.create(
+    {
+        "system_prompt_content": {
+            "optimize": True,
+            "choices": [
+                "Be a friendly bot answering user questions. Be as concise as possible",
+                "Be a silly bot answering user questions. Use as few tokens as possible",
+                "Be informative and straight to the point",
+                "Respond to user questions in as few words as possible",
+            ],
+        }
+    }
+)
+configs_with_scores = optimizer.optimize(
+    pipeline_class=RandomQuestionRespondPipeline,
+    config_with_params=optimized_params,
+    metrics=metrics,
+    dataloader=dataloader,
+)
+pprint(configs_with_scores)
+```
+
+After executing the code, your console should display an output structure similar to this:
+
+```json
+[({'system_prompt_content': 'Be a silly bot answering user questions. Use as few tokens as possible'},
+  6.0,
+  {'num_tokens': 6.0}),
+ ({'system_prompt_content': 'Be a silly bot answering user questions. Use as few tokens as possible'},
+  10.7,
+  {'num_tokens': 10.7}),
+ ({'system_prompt_content': 'Be a friendly bot answering user questions. Be as concise as possible'},
+  37.8,
+  {'num_tokens': 37.8}),
+ ({'system_prompt_content': 'Be informative and straight to the point'},
+  113.2,
+  {'num_tokens': 113.2})]
+
+```
+
+This output consists of tuples, each containing three elements:
+
+1. The configuration used in the trial.
+2. The score achieved.
+3. A dictionary of detailed metrics that contribute to the score.
+
+The tuples are ordered from the best to the worst configuration based on the score.
+
+Please note that the details may vary between runs due to the non-deterministic nature of both the LLM and the optimization algorithm.
diff --git a/examples/evaluation/document-search/config/data/qa.yaml b/examples/evaluation/document-search/config/data/qa.yaml
@@ -1,3 +1,5 @@
-name: "hf-docs-retrieval"
-path: "micpst/hf-docs-retrieval"
-split: "train"
+type: ragbits.evaluate.loaders.hf:HFDataLoader
+options:
+  name: "hf-docs-retrieval"
+  path: "micpst/hf-docs-retrieval"
+  split: "train"
diff --git a/examples/evaluation/document-search/config/experiments/chunking-1000.yaml b/examples/evaluation/document-search/config/experiments/chunking-1000.yaml
@@ -3,18 +3,19 @@
 task:
   name: chunking-1000
 
+pipeline:
 # used only for ingestion
-providers:
-  txt:
-    config:
-      chunking_kwargs:
-        max_characters: 1000
-  md:
-    config:
-      chunking_kwargs:
-        max_characters: 1000
+  providers:
+    txt:
+      config:
+        chunking_kwargs:
+          max_characters: 1000
+    md:
+      config:
+        chunking_kwargs:
+          max_characters: 1000
 
-# used for both ingestion and evaluation
-vector_store:
-  config:
-    index_name: chunk-1000
+  # used for both ingestion and evaluation
+  vector_store:
+    config:
+      index_name: chunk-1000
diff --git a/examples/evaluation/document-search/config/experiments/chunking-250.yaml b/examples/evaluation/document-search/config/experiments/chunking-250.yaml
@@ -4,17 +4,18 @@ task:
   name: chunking-250
 
 # used only for ingestion
-providers:
-  txt:
-    config:
-      chunking_kwargs:
-        max_characters: 250
-  md:
-    config:
-      chunking_kwargs:
-        max_characters: 250
+pipeline:
+  providers:
+    txt:
+      config:
+        chunking_kwargs:
+          max_characters: 250
+    md:
+      config:
+        chunking_kwargs:
+          max_characters: 250
 
-# used for both ingestion and evaluation
-vector_store:
-  config:
-    index_name: chunk-250
+  # used for both ingestion and evaluation
+  vector_store:
+    config:
+      index_name: chunk-250
diff --git a/examples/evaluation/document-search/config/experiments/chunking-500.yaml b/examples/evaluation/document-search/config/experiments/chunking-500.yaml
@@ -4,17 +4,17 @@ task:
   name: chunking-500
 
 # used only for ingestion
-providers:
-  txt:
-    config:
-      chunking_kwargs:
-        max_characters: 500
-  md:
-    config:
-      chunking_kwargs:
-        max_characters: 500
-
+pipeline:
+  providers:
+    txt:
+      config:
+        chunking_kwargs:
+          max_characters: 500
+    md:
+      config:
+        chunking_kwargs:
+          max_characters: 500
 # used for both ingestion and evaluation
-vector_store:
-  config:
-    index_name: chunk-500
+  vector_store:
+    config:
+      index_name: chunk-500
diff --git a/examples/evaluation/document-search/config/ingestion.yaml b/examples/evaluation/document-search/config/ingestion.yaml
@@ -1,6 +1,4 @@
 defaults:
   - data: corpus
-  - embedder: litellm
-  - providers: unstructured
-  - vector_store: chroma
+  - pipeline: document_ingestion
   - _self_
diff --git a/examples/evaluation/document-search/config/optimization.yaml b/examples/evaluation/document-search/config/optimization.yaml
@@ -0,0 +1,26 @@
+defaults:
+  - pipeline: document_search_optimization
+  - data: qa
+  - _self_
+
+task:
+  name: default
+  type: document-search
+
+metrics:
+  - type: ragbits.evaluate.metrics.document_search:DocumentSearchPrecisionRecallF1
+    matching_strategy: RougeChunkMatch
+    options:
+      threshold: 0.5
+  - type: ragbits.evaluate.metrics.document_search:DocumentSearchRankedRetrievalMetrics
+    weight: -1.0
+    matching_strategy: RougeChunkMatch
+    options:
+      threshold: 0.5
+
+
+callbacks:
+  - type: ragbits.evaluate.callbacks.neptune:NeptuneCallbackConfigurator
+    args:
+      callback_type: neptune.integrations.optuna:NeptuneCallback
+      project: deepsense-ai/ragbits
diff --git a/examples/evaluation/document-search/config/pipeline/answer_data_source/corpus.yaml b/examples/evaluation/document-search/config/pipeline/answer_data_source/corpus.yaml
@@ -0,0 +1,4 @@
+name: "hf-docs"
+path: "micpst/hf-docs"
+split: "train"
+num_docs: 5
diff --git a/examples/evaluation/document-search/config/pipeline/document_ingestion.yaml b/examples/evaluation/document-search/config/pipeline/document_ingestion.yaml
@@ -0,0 +1,5 @@
+defaults:
+  - embedder: litellm
+  - providers: unstructured
+  - vector_store: chroma
+  - _self_
diff --git a/examples/evaluation/document-search/config/pipeline/document_search.yaml b/examples/evaluation/document-search/config/pipeline/document_search.yaml
@@ -0,0 +1,7 @@
+defaults:
+  - embedder: litellm
+  - providers: unstructured
+  - vector_store: chroma
+  - rephraser: noop
+  - reranker: noop
+  - _self_
diff --git a/examples/evaluation/document-search/config/pipeline/document_search_optimization.yaml b/examples/evaluation/document-search/config/pipeline/document_search_optimization.yaml
@@ -0,0 +1,10 @@
+defaults:
+  - embedder: litellm_opt_template
+  - providers: unstructured_opt_template
+  - vector_store: chroma
+  - rephraser: noop
+  - reranker: noop
+  - answer_data_source: corpus
+  - _self_
+
+type: ragbits.evaluate.pipelines.document_search:DocumentSearchWithIngestionPipeline
diff --git a/examples/evaluation/document-search/config/pipeline/embedder/litellm.yaml b/examples/evaluation/document-search/config/pipeline/embedder/litellm.yaml
@@ -0,0 +1,6 @@
+type: ragbits.core.embeddings.litellm:LiteLLMEmbeddings
+config:
+  model: "text-embedding-3-small"
+  options:
+    dimensions: 768
+    encoding_format: float
diff --git a/examples/evaluation/document-search/config/pipeline/embedder/litellm_opt_template.yaml b/examples/evaluation/document-search/config/pipeline/embedder/litellm_opt_template.yaml
@@ -0,0 +1,20 @@
+type: ragbits.core.embeddings.litellm:LiteLLMEmbeddings
+config:
+  optimize: true
+  choices:
+    - model: "text-embedding-3-small"
+      options:
+        dimensions:
+          optimize: true
+          range:
+            - 32
+            - 512
+        encoding_format: float
+    - model: "text-embedding-3-large"
+      options:
+        dimensions:
+          optimize: true
+          range:
+            - 512
+            - 1024
+        encoding_format: float