diff --git a/notebooks/evaluating_ai_with_haystack.ipynb b/notebooks/evaluating_ai_with_haystack.ipynb index 44fe5c0..6eaa888 100644 --- a/notebooks/evaluating_ai_with_haystack.ipynb +++ b/notebooks/evaluating_ai_with_haystack.ipynb @@ -22,40 +22,7 @@ "\n", "## 📺 Watch Along\n", "\n", - "" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "colab_type": "toc", - "id": "WI3_y1HNGiqQ" - }, - "source": [ - ">[Evaluating AI with Haystack](#scrollTo=uriHEO8pkgSo)\n", - "\n", - ">[Building your pipeline](#scrollTo=C_WUXQzEQWv8)\n", - "\n", - ">>[ARAGOG](#scrollTo=Dms5Ict6NGXq)\n", - "\n", - ">[Human Evaluation](#scrollTo=zTbmQzeXQY1F)\n", - "\n", - ">[Deciding on Metrics](#scrollTo=-U-QnCBqQcd6)\n", - "\n", - ">[Building an Evaluation Pipeline](#scrollTo=yLkAcM_5Qfat)\n", - "\n", - ">[Running Evaluation](#scrollTo=p76stWMQQmPD)\n", - "\n", - ">>>[Run the RAG Pipeline](#scrollTo=rUfQQzusXhgk)\n", - "\n", - ">>>[Run the Evaluation](#scrollTo=mfepD9HwXk4Q)\n", - "\n", - ">[Analyzing Results](#scrollTo=mC_mIqdMQqZG)\n", - "\n", - ">>[Evaluation Harness (Step 4, 5, and 6)](#scrollTo=OmkHqAsQZhFr)\n", - "\n", - ">[Evaluation Frameworks](#scrollTo=gKfrFf1CebJJ)\n", - "\n" + "" ] }, { @@ -91,7 +58,7 @@ "id": "C_WUXQzEQWv8" }, "source": [ - "# 1. Building your pipeline" + "## 1. Building your pipeline" ] }, { @@ -100,7 +67,7 @@ "id": "Dms5Ict6NGXq" }, "source": [ - "## ARAGOG\n", + "### ARAGOG\n", "\n", "This dataset is based on the paper [Advanced Retrieval Augmented Generation Output Grading (ARAGOG)](https://arxiv.org/pdf/2404.01037). It's a\n", "collection of papers from ArXiv covering topics around Transformers and Large Language Models, all in PDF format.\n", @@ -113,7 +80,14 @@ "- ground-truth answers\n", "- questions\n", "\n", - "Source: https://github.com/deepset-ai/haystack-evaluation/blob/main/datasets/README.md" + "Get the dataset [here](https://github.com/deepset-ai/haystack-evaluation/blob/main/datasets/README.md)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Indexing Pipeline" ] }, { @@ -276,7 +250,7 @@ "embedding_model=\"sentence-transformers/all-MiniLM-L6-v2\"\n", "document_store = InMemoryDocumentStore()\n", "\n", - "files_path = \"/content/papers_for_questions\"\n", + "files_path = \"/content/papers_for_questions\" # \n", "pipeline = Pipeline()\n", "pipeline.add_component(\"converter\", PyPDFToDocument())\n", "pipeline.add_component(\"cleaner\", DocumentCleaner())\n", @@ -412,7 +386,7 @@ "id": "zTbmQzeXQY1F" }, "source": [ - "# 2. Human Evaluation" + "## 2. Human Evaluation" ] }, { @@ -543,7 +517,7 @@ "id": "-U-QnCBqQcd6" }, "source": [ - "# 3. Deciding on Metrics\n", + "## 3. Deciding on Metrics\n", "\n", "* **Semantic Answer Similarity**: SASEvaluator compares the embedding of a generated answer against a ground-truth answer based on a common embedding model.\n", "* **ContextRelevanceEvaluator** will assess the relevancy of the retrieved context to answer the query question\n", @@ -556,7 +530,7 @@ "id": "yLkAcM_5Qfat" }, "source": [ - "# 4. Building an Evaluation Pipeline" + "## 4. Building an Evaluation Pipeline" ] }, { @@ -582,7 +556,7 @@ "id": "p76stWMQQmPD" }, "source": [ - "# 5. Running Evaluation" + "## 5. Running Evaluation" ] }, { @@ -663,7 +637,7 @@ "id": "mC_mIqdMQqZG" }, "source": [ - "# 6. Analyzing Results" + "## 6. Analyzing Results" ] }, { @@ -3488,7 +3462,7 @@ "id": "gKfrFf1CebJJ" }, "source": [ - "# Evaluation Frameworks" + "## Evaluation Frameworks" ] }, {