From 6dcb31150cda228b3d2d2d9ff9cf100b6e4265a5 Mon Sep 17 00:00:00 2001
From: kdziedzic68 <krzysztof.dziedzic@deepsense.ai>
Date: Mon, 9 Dec 2024 14:32:13 +0100
Subject: [PATCH] chore: docs for eval (#203)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-authored-by: Mateusz Hordyński <26008518+mhordynski@users.noreply.github.com>
Co-authored-by: Mateusz Hordyński <mateusz.hordynski@deepsense.ai>
---
 docs/how-to/evaluate/custom_dataloader.md     |  9 ++++
 .../evaluate/custom_evaluation_pipeline.md    |  8 ++++
 docs/how-to/evaluate/custom_metric.md         |  7 ++++
 docs/how-to/evaluate/evaluate.md              | 41 +++++++++++++++++++
 .../how-to/{ => evaluate}/generate_dataset.md |  2 +-
 docs/how-to/{ => evaluate}/optimize.md        |  0
 mkdocs.yml                                    | 10 ++++-
 7 files changed, 74 insertions(+), 3 deletions(-)
 create mode 100644 docs/how-to/evaluate/custom_dataloader.md
 create mode 100644 docs/how-to/evaluate/custom_evaluation_pipeline.md
 create mode 100644 docs/how-to/evaluate/custom_metric.md
 create mode 100644 docs/how-to/evaluate/evaluate.md
 rename docs/how-to/{ => evaluate}/generate_dataset.md (99%)
 rename docs/how-to/{ => evaluate}/optimize.md (100%)

diff --git a/docs/how-to/evaluate/custom_dataloader.md b/docs/how-to/evaluate/custom_dataloader.md
new file mode 100644
index 000000000..abfb0ccfd
--- /dev/null
+++ b/docs/how-to/evaluate/custom_dataloader.md
@@ -0,0 +1,9 @@
+# How to create custom DataLoader for Ragbits evaluation
+
+Ragbits provides a base interface for data loading, `ragbits.evaluate.loaders.base.DataLoader`, designed specifically for evaluation purposes. A ready-to-use implementation, `ragbits.evaluate.loaders.hf.HFLoader`, is available for handling datasets in huggingface format.
+
+To create a custom DataLoader for your specific needs, you need to implement the `load` method in a class that inherits from the `DataLoader` interface.
+
+Please find the [working example](optimize.md#define-the-data-loader) here.
+
+**Note:** This interface is not to be confused with PyTorch's `DataLoader`, as it serves a distinct purpose within the Ragbits evaluation framework.
diff --git a/docs/how-to/evaluate/custom_evaluation_pipeline.md b/docs/how-to/evaluate/custom_evaluation_pipeline.md
new file mode 100644
index 000000000..4e380ad91
--- /dev/null
+++ b/docs/how-to/evaluate/custom_evaluation_pipeline.md
@@ -0,0 +1,8 @@
+# How to create custom Evaluation Pipeline for Ragbits evaluation
+
+Ragbits provides a ready-to-use evaluation pipeline for document search, implemented within the `ragbits.evaluate.document_search.DocumentSearchPipeline` module.
+
+To create a custom evaluation pipeline for your specific use case, you need to implement the `__call__` method as part of the `ragbits.evaluate.pipelines.base.EvaluationPipeline` interface.
+
+
+Please find the [working example](optimize.md#define-the-optimized-pipeline-structure) here
\ No newline at end of file
diff --git a/docs/how-to/evaluate/custom_metric.md b/docs/how-to/evaluate/custom_metric.md
new file mode 100644
index 000000000..0e277fd9c
--- /dev/null
+++ b/docs/how-to/evaluate/custom_metric.md
@@ -0,0 +1,7 @@
+# How to create custom Metric for Ragbits evaluation
+
+`ragbits.evaluate` package provides the implementation of metrics that measure the quality of document search pipeline within `ragbits.evaluate.metrics.document_search`
+on your data, however you are not limited to this. In order to implement custom ones for your specific use case you would need to inherit from `ragbits.evaluate.metrics.base.Metric`
+abstract class and implement `compute` method.
+
+Please find the [working example](optimize.md#define-the-metrics-and-run-the-experiment) here.
\ No newline at end of file
diff --git a/docs/how-to/evaluate/evaluate.md b/docs/how-to/evaluate/evaluate.md
new file mode 100644
index 000000000..176bccf0d
--- /dev/null
+++ b/docs/how-to/evaluate/evaluate.md
@@ -0,0 +1,41 @@
+# How to Evaluate with Ragbits
+
+Ragbits provides an interface for evaluating pipelines using specified metrics. Generally, you can create any evaluation pipeline and metrics that comply with the interface.
+
+Before running the evaluation, ensure the following prerequisites are met:
+
+1. Define the `EvaluationPipeline` structure class ([Example](optimize.md#define-the-optimized-pipeline-structure))
+2. Define the `Metrics` and organize them into a `MetricSet` ([Example](optimize.md#define-the-metrics-and-run-the-experiment))
+3. Define the `DataLoader` ([Example](optimize.md#define-the-data-loader))
+
+The evaluator interface itself is straightforward and requires no additional configuration to instantiate. Once the three prerequisites are complete, running the evaluation is as simple as:
+
+
+```python
+import asyncio
+from omegaconf import OmegaConf
+from ragbits.evaluate.evaluator import Evaluator
+from ragbits.evaluate.metrics.base import MetricSet
+
+
+
+
+async def main():
+    pipeline_config = OmegaConf.create({...})
+    pipeline = YourPipelineClass(config=pipeline_config)
+
+    metrics = [SomeMetric(OmegaConf.create({...})) for SomeMetric in your_metrics]
+    metric_set = MetricSet(*metrics)
+
+    dataloader = YourDataLoaderClass(OmegaConf.create({...}))
+
+    evaluator = Evaluator()
+
+    eval_results = await evaluator.compute(pipeline=pipeline, metrics=metric_set, dataloader=dataloader)
+    print(eval_results)
+
+asyncio.run(main())
+```
+
+After the succesful execution your console should print a dictionary with keys corresponding to components of each metric and values
+equal to results aggregated over the defined dataloader.
\ No newline at end of file
diff --git a/docs/how-to/generate_dataset.md b/docs/how-to/evaluate/generate_dataset.md
similarity index 99%
rename from docs/how-to/generate_dataset.md
rename to docs/how-to/evaluate/generate_dataset.md
index 0df1034b9..63cd7fce6 100644
--- a/docs/how-to/generate_dataset.md
+++ b/docs/how-to/evaluate/generate_dataset.md
@@ -1,4 +1,4 @@
-# Generating a Dataset with Ragbits
+# How to Generate a Dataset with Ragbits
 
 Ragbits offers a convenient feature to generate artificial QA datasets for evaluating Retrieval-Augmented Generation (RAG) systems. You can choose between two different approaches:
 
diff --git a/docs/how-to/optimize.md b/docs/how-to/evaluate/optimize.md
similarity index 100%
rename from docs/how-to/optimize.md
rename to docs/how-to/evaluate/optimize.md
diff --git a/mkdocs.yml b/mkdocs.yml
index 29fa2b500..75f9d05aa 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -12,10 +12,8 @@ nav:
   - How-to Guides:
       - how-to/use_prompting.md
       - how-to/prompts_lab.md
-      - how-to/optimize.md
       - how-to/use_guardrails.md
       - how-to/integrations/promptfoo.md
-      - how-to/generate_dataset.md
       - Document Search:
           - how-to/document_search/async_processing.md
           - how-to/document_search/create_custom_execution_strategy.md
@@ -23,6 +21,14 @@ nav:
           - how-to/document_search/use_rephraser.md
           - how-to/document_search/use_reranker.md
           - how-to/document_search/distributed_ingestion.md
+      - Evaluate:
+          - how-to/evaluate/optimize.md
+          - how-to/evaluate/generate_dataset.md
+          - how-to/evaluate/evaluate.md
+          - how-to/evaluate/custom_metric.md
+          - how-to/evaluate/custom_evaluation_pipeline.md
+          - how-to/evaluate/custom_metric.md
+          - how-to/evaluate/custom_dataloader.md
   - API Reference:
       - Core:
           - api_reference/core/prompt.md