chore: docs for eval (#203)

Co-authored-by: Mateusz Hordyński <[email protected]> Co-authored-by: Mateusz Hordyński <[email protected]>
deepsense-ai · Dec 9, 2024 · 6dcb311 · 6dcb311
1 parent 32ba29c
commit 6dcb311
Show file tree

Hide file tree

Showing 7 changed files with 74 additions and 3 deletions.
diff --git a/docs/how-to/evaluate/custom_dataloader.md b/docs/how-to/evaluate/custom_dataloader.md
@@ -0,0 +1,9 @@
+# How to create custom DataLoader for Ragbits evaluation
+
+Ragbits provides a base interface for data loading, `ragbits.evaluate.loaders.base.DataLoader`, designed specifically for evaluation purposes. A ready-to-use implementation, `ragbits.evaluate.loaders.hf.HFLoader`, is available for handling datasets in huggingface format.
+
+To create a custom DataLoader for your specific needs, you need to implement the `load` method in a class that inherits from the `DataLoader` interface.
+
+Please find the [working example](optimize.md#define-the-data-loader) here.
+
+**Note:** This interface is not to be confused with PyTorch's `DataLoader`, as it serves a distinct purpose within the Ragbits evaluation framework.
diff --git a/docs/how-to/evaluate/custom_evaluation_pipeline.md b/docs/how-to/evaluate/custom_evaluation_pipeline.md
@@ -0,0 +1,8 @@
+# How to create custom Evaluation Pipeline for Ragbits evaluation
+
+Ragbits provides a ready-to-use evaluation pipeline for document search, implemented within the `ragbits.evaluate.document_search.DocumentSearchPipeline` module.
+
+To create a custom evaluation pipeline for your specific use case, you need to implement the `__call__` method as part of the `ragbits.evaluate.pipelines.base.EvaluationPipeline` interface.
+
+
+Please find the [working example](optimize.md#define-the-optimized-pipeline-structure) here
diff --git a/docs/how-to/evaluate/custom_metric.md b/docs/how-to/evaluate/custom_metric.md
@@ -0,0 +1,7 @@
+# How to create custom Metric for Ragbits evaluation
+
+`ragbits.evaluate` package provides the implementation of metrics that measure the quality of document search pipeline within `ragbits.evaluate.metrics.document_search`
+on your data, however you are not limited to this. In order to implement custom ones for your specific use case you would need to inherit from `ragbits.evaluate.metrics.base.Metric`
+abstract class and implement `compute` method.
+
+Please find the [working example](optimize.md#define-the-metrics-and-run-the-experiment) here.
diff --git a/docs/how-to/evaluate/evaluate.md b/docs/how-to/evaluate/evaluate.md
@@ -0,0 +1,41 @@
+# How to Evaluate with Ragbits
+
+Ragbits provides an interface for evaluating pipelines using specified metrics. Generally, you can create any evaluation pipeline and metrics that comply with the interface.
+
+Before running the evaluation, ensure the following prerequisites are met:
+
+1. Define the `EvaluationPipeline` structure class ([Example](optimize.md#define-the-optimized-pipeline-structure))
+2. Define the `Metrics` and organize them into a `MetricSet` ([Example](optimize.md#define-the-metrics-and-run-the-experiment))
+3. Define the `DataLoader` ([Example](optimize.md#define-the-data-loader))
+
+The evaluator interface itself is straightforward and requires no additional configuration to instantiate. Once the three prerequisites are complete, running the evaluation is as simple as:
+
+
+```python
+import asyncio
+from omegaconf import OmegaConf
+from ragbits.evaluate.evaluator import Evaluator
+from ragbits.evaluate.metrics.base import MetricSet
+
+
+
+
+async def main():
+    pipeline_config = OmegaConf.create({...})
+    pipeline = YourPipelineClass(config=pipeline_config)
+
+    metrics = [SomeMetric(OmegaConf.create({...})) for SomeMetric in your_metrics]
+    metric_set = MetricSet(*metrics)
+
+    dataloader = YourDataLoaderClass(OmegaConf.create({...}))
+
+    evaluator = Evaluator()
+
+    eval_results = await evaluator.compute(pipeline=pipeline, metrics=metric_set, dataloader=dataloader)
+    print(eval_results)
+
+asyncio.run(main())
+```
+
+After the succesful execution your console should print a dictionary with keys corresponding to components of each metric and values
+equal to results aggregated over the defined dataloader.
diff --git a/docs/how-to/generate_dataset.md → docs/how-to/evaluate/generate_dataset.md b/docs/how-to/generate_dataset.md → docs/how-to/evaluate/generate_dataset.md
@@ -1,4 +1,4 @@
-# Generating a Dataset with Ragbits
+# How to Generate a Dataset with Ragbits
 
 Ragbits offers a convenient feature to generate artificial QA datasets for evaluating Retrieval-Augmented Generation (RAG) systems. You can choose between two different approaches:
 

diff --git a/docs/how-to/optimize.md → docs/how-to/evaluate/optimize.md b/docs/how-to/optimize.md → docs/how-to/evaluate/optimize.md
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -12,17 +12,23 @@ nav:
   - How-to Guides:
       - how-to/use_prompting.md
       - how-to/prompts_lab.md
-      - how-to/optimize.md
       - how-to/use_guardrails.md
       - how-to/integrations/promptfoo.md
-      - how-to/generate_dataset.md
       - Document Search:
           - how-to/document_search/async_processing.md
           - how-to/document_search/create_custom_execution_strategy.md
           - how-to/document_search/search_documents.md
           - how-to/document_search/use_rephraser.md
           - how-to/document_search/use_reranker.md
           - how-to/document_search/distributed_ingestion.md
+      - Evaluate:
+          - how-to/evaluate/optimize.md
+          - how-to/evaluate/generate_dataset.md
+          - how-to/evaluate/evaluate.md
+          - how-to/evaluate/custom_metric.md
+          - how-to/evaluate/custom_evaluation_pipeline.md
+          - how-to/evaluate/custom_metric.md
+          - how-to/evaluate/custom_dataloader.md
   - API Reference:
       - Core:
           - api_reference/core/prompt.md