diff --git a/notebooks/evaluating_ai_with_haystack.ipynb b/notebooks/evaluating_ai_with_haystack.ipynb
index 44fe5c0..6eaa888 100644
--- a/notebooks/evaluating_ai_with_haystack.ipynb
+++ b/notebooks/evaluating_ai_with_haystack.ipynb
@@ -22,40 +22,7 @@
"\n",
"## 📺 Watch Along\n",
"\n",
- ""
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "toc",
- "id": "WI3_y1HNGiqQ"
- },
- "source": [
- ">[Evaluating AI with Haystack](#scrollTo=uriHEO8pkgSo)\n",
- "\n",
- ">[Building your pipeline](#scrollTo=C_WUXQzEQWv8)\n",
- "\n",
- ">>[ARAGOG](#scrollTo=Dms5Ict6NGXq)\n",
- "\n",
- ">[Human Evaluation](#scrollTo=zTbmQzeXQY1F)\n",
- "\n",
- ">[Deciding on Metrics](#scrollTo=-U-QnCBqQcd6)\n",
- "\n",
- ">[Building an Evaluation Pipeline](#scrollTo=yLkAcM_5Qfat)\n",
- "\n",
- ">[Running Evaluation](#scrollTo=p76stWMQQmPD)\n",
- "\n",
- ">>>[Run the RAG Pipeline](#scrollTo=rUfQQzusXhgk)\n",
- "\n",
- ">>>[Run the Evaluation](#scrollTo=mfepD9HwXk4Q)\n",
- "\n",
- ">[Analyzing Results](#scrollTo=mC_mIqdMQqZG)\n",
- "\n",
- ">>[Evaluation Harness (Step 4, 5, and 6)](#scrollTo=OmkHqAsQZhFr)\n",
- "\n",
- ">[Evaluation Frameworks](#scrollTo=gKfrFf1CebJJ)\n",
- "\n"
+ ""
]
},
{
@@ -91,7 +58,7 @@
"id": "C_WUXQzEQWv8"
},
"source": [
- "# 1. Building your pipeline"
+ "## 1. Building your pipeline"
]
},
{
@@ -100,7 +67,7 @@
"id": "Dms5Ict6NGXq"
},
"source": [
- "## ARAGOG\n",
+ "### ARAGOG\n",
"\n",
"This dataset is based on the paper [Advanced Retrieval Augmented Generation Output Grading (ARAGOG)](https://arxiv.org/pdf/2404.01037). It's a\n",
"collection of papers from ArXiv covering topics around Transformers and Large Language Models, all in PDF format.\n",
@@ -113,7 +80,14 @@
"- ground-truth answers\n",
"- questions\n",
"\n",
- "Source: https://github.com/deepset-ai/haystack-evaluation/blob/main/datasets/README.md"
+ "Get the dataset [here](https://github.com/deepset-ai/haystack-evaluation/blob/main/datasets/README.md)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Indexing Pipeline"
]
},
{
@@ -276,7 +250,7 @@
"embedding_model=\"sentence-transformers/all-MiniLM-L6-v2\"\n",
"document_store = InMemoryDocumentStore()\n",
"\n",
- "files_path = \"/content/papers_for_questions\"\n",
+ "files_path = \"/content/papers_for_questions\" # \n",
"pipeline = Pipeline()\n",
"pipeline.add_component(\"converter\", PyPDFToDocument())\n",
"pipeline.add_component(\"cleaner\", DocumentCleaner())\n",
@@ -412,7 +386,7 @@
"id": "zTbmQzeXQY1F"
},
"source": [
- "# 2. Human Evaluation"
+ "## 2. Human Evaluation"
]
},
{
@@ -543,7 +517,7 @@
"id": "-U-QnCBqQcd6"
},
"source": [
- "# 3. Deciding on Metrics\n",
+ "## 3. Deciding on Metrics\n",
"\n",
"* **Semantic Answer Similarity**: SASEvaluator compares the embedding of a generated answer against a ground-truth answer based on a common embedding model.\n",
"* **ContextRelevanceEvaluator** will assess the relevancy of the retrieved context to answer the query question\n",
@@ -556,7 +530,7 @@
"id": "yLkAcM_5Qfat"
},
"source": [
- "# 4. Building an Evaluation Pipeline"
+ "## 4. Building an Evaluation Pipeline"
]
},
{
@@ -582,7 +556,7 @@
"id": "p76stWMQQmPD"
},
"source": [
- "# 5. Running Evaluation"
+ "## 5. Running Evaluation"
]
},
{
@@ -663,7 +637,7 @@
"id": "mC_mIqdMQqZG"
},
"source": [
- "# 6. Analyzing Results"
+ "## 6. Analyzing Results"
]
},
{
@@ -3488,7 +3462,7 @@
"id": "gKfrFf1CebJJ"
},
"source": [
- "# Evaluation Frameworks"
+ "## Evaluation Frameworks"
]
},
{