diff --git a/gemini/evaluation/evaluate_langchain_chains.ipynb b/gemini/evaluation/evaluate_langchain_chains.ipynb
index b642da7a84b..fe292c127d6 100644
--- a/gemini/evaluation/evaluate_langchain_chains.ipynb
+++ b/gemini/evaluation/evaluate_langchain_chains.ipynb
@@ -1,929 +1,953 @@
 {
-  "cells": [
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {
-        "id": "OsXAs2gcIpbC"
-      },
-      "outputs": [],
-      "source": [
-        "# Copyright 2024 Google LLC\n",
-        "#\n",
-        "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
-        "# you may not use this file except in compliance with the License.\n",
-        "# You may obtain a copy of the License at\n",
-        "#\n",
-        "#     https://www.apache.org/licenses/LICENSE-2.0\n",
-        "#\n",
-        "# Unless required by applicable law or agreed to in writing, software\n",
-        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
-        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
-        "# See the License for the specific language governing permissions and\n",
-        "# limitations under the License."
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {
-        "id": "7ZX50cNFOFBt"
-      },
-      "source": [
-        " # Evaluate LangChain | Rapid Evaluation SDK Tutorial\n",
-        "\n",
-        " <table align=\"left\">\n",
-        "  <td style=\"text-align: center\">\n",
-        "    <a href=\"https://colab.research.google.com/github/GoogleCloudPlatform/generative-ai/blob/main/gemini/evaluation/evaluate_langchain_chains.ipynb\">\n",
-        "      <img src=\"https://cloud.google.com/ml-engine/images/colab-logo-32px.png\" alt=\"Google Colaboratory logo\"><br> Open in Colab\n",
-        "    </a>\n",
-        "  </td>\n",
-        "  <td style=\"text-align: center\">\n",
-        "    <a href=\"https://console.cloud.google.com/vertex-ai/colab/import/https:%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fgenerative-ai%2Fmain%2Fgemini%2Fevaluation%2Fevaluate_langchain_chains.ipynb\">\n",
-        "      <img width=\"32px\" src=\"https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN\" alt=\"Google Cloud Colab Enterprise logo\"><br> Open in Colab Enterprise\n",
-        "    </a>\n",
-        "  </td>    \n",
-        "  <td style=\"text-align: center\">\n",
-        "    <a href=\"https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/generative-ai/main/gemini/evaluation/evaluate_langchain_chains.ipynb\">\n",
-        "      <img src=\"https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32\" alt=\"Vertex AI logo\"><br> Open in Workbench\n",
-        "    </a>\n",
-        "  </td>\n",
-        "  <td style=\"text-align: center\">\n",
-        "    <a href=\"https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/evaluation/evaluate_langchain_chains.ipynb\">\n",
-        "      <img src=\"https://cloud.google.com/ml-engine/images/github-logo-32px.png\" alt=\"GitHub logo\"><br> View on GitHub\n",
-        "    </a>\n",
-        "  </td>\n",
-        "</table>"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {
-        "id": "usd0d_LiOFBt"
-      },
-      "source": [
-        "| | |\n",
-        "|-|-|\n",
-        "|Author(s) | [Elia Secchi](https://github.com/eliasecchig) |"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {
-        "id": "MjDmmmDaOFBt"
-      },
-      "source": [
-        "## Overview\n",
-        "\n",
-        "With this tutorial, you learn how to evaluate the performance of a conversational LangChain chain using the Vertex AI Rapid Evaluation SDK. The notebook utilizes a dummy chatbot designed to provide recipe suggestions.\n",
-        "\n",
-        "The tutorial goes trough:\n",
-        "1. Data preparation\n",
-        "2. Setting up the LangChain chain\n",
-        "3. Set-up a custom metric\n",
-        "4. Run evaluation with a combination of custom metrics and built-in metrics.\n",
-        "5. Log results into an experiment run and analyze different runs.\n",
-        "\n",
-        "### Costs\n",
-        "\n",
-        "This tutorial uses billable components of Google Cloud:\n",
-        "\n",
-        "- Vertex AI\n",
-        "\n",
-        "Learn about [Vertex AI pricing](https://cloud.google.com/vertex-ai/pricing) and use the [Pricing Calculator](https://cloud.google.com/products/calculator/) to generate a cost estimate based on your projected usage.\n"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {
-        "id": "w-OcPSC8_FUX"
-      },
-      "source": [
-        "## Getting Started"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {
-        "id": "-7Jso8-FO4N8"
-      },
-      "source": [
-        "### Install Vertex AI SDK for Rapid Evaluation"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {
-        "id": "tUat7NRq5JDC"
-      },
-      "outputs": [],
-      "source": [
-        "%pip install --quiet --upgrade nest_asyncio\n",
-        "%pip install --upgrade --user --quiet langchain-core langchain-google-vertexai langchain\n",
-        "%pip install --upgrade --user --quiet \"google-cloud-aiplatform[rapid_evaluation]\""
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {
-        "id": "R5Xep4W9lq-Z"
-      },
-      "source": [
-        "### Restart runtime\n",
-        "\n",
-        "To use the newly installed packages in this Jupyter runtime, you must restart the runtime. You can do this by running the cell below, which restarts the current kernel.\n",
-        "\n",
-        "The restart might take a minute or longer. After it's restarted, continue to the next step."
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {
-        "id": "XRvKdaPDTznN"
-      },
-      "outputs": [],
-      "source": [
-        "import IPython\n",
-        "\n",
-        "app = IPython.Application.instance()\n",
-        "app.kernel.do_shutdown(True)"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {
-        "id": "SbmM4z7FOBpM"
-      },
-      "source": [
-        "<div class=\"alert alert-block alert-warning\">\n",
-        "<b>⚠️ The kernel is going to restart. Wait until it's finished before continuing to the next step. ⚠️</b>\n",
-        "</div>\n"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {
-        "id": "dmWOrTJ3gx13"
-      },
-      "source": [
-        "### Authenticate your notebook environment (Colab only)\n",
-        "\n",
-        "If you're running this notebook on Google Colab, run the cell below to authenticate your environment."
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {
-        "id": "NyKGtVQjgx13"
-      },
-      "outputs": [],
-      "source": [
-        "# import sys\n",
-        "\n",
-        "# if \"google.colab\" in sys.modules:\n",
-        "#     from google.colab import auth\n",
-        "\n",
-        "#     auth.authenticate_user()"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {
-        "id": "DF4l8DTdWgPY"
-      },
-      "source": [
-        "### Set Google Cloud project information and initialize Vertex AI SDK\n",
-        "\n",
-        "To get started using Vertex AI, you must have an existing Google Cloud project and [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com).\n",
-        "\n",
-        "Learn more about [setting up a project and a development environment](https://cloud.google.com/vertex-ai/docs/start/cloud-environment)."
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {
-        "id": "Nqwi-5ufWp_B"
-      },
-      "outputs": [],
-      "source": [
-        "PROJECT_ID = \"[your-project-id]\"  # @param {type:\"string\"}\n",
-        "LOCATION = \"us-central1\"  # @param {type:\"string\"}\n",
-        "\n",
-        "\n",
-        "import vertexai\n",
-        "\n",
-        "vertexai.init(project=PROJECT_ID, location=LOCATION)"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {
-        "id": "dvhI92xhQTzk"
-      },
-      "source": [
-        "### Import libraries"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {
-        "id": "qP4ihOCkEBje"
-      },
-      "outputs": [],
-      "source": [
-        "from concurrent.futures import ThreadPoolExecutor\n",
-        "from functools import partial\n",
-        "import json\n",
-        "from json.decoder import JSONDecodeError\n",
-        "import logging\n",
-        "from typing import Any\n",
-        "import warnings\n",
-        "\n",
-        "from google.cloud import aiplatform\n",
-        "from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder\n",
-        "from langchain_google_vertexai import ChatVertexAI, VertexAI\n",
-        "import nest_asyncio\n",
-        "import pandas as pd\n",
-        "from tqdm import tqdm\n",
-        "\n",
-        "# Main\n",
-        "import vertexai\n",
-        "from vertexai.preview.evaluation import EvalTask, make_metric\n",
-        "\n",
-        "# General\n",
-        "import yaml"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {
-        "id": "bDmLyf-K5nRz"
-      },
-      "source": [
-        "### Library settings"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {
-        "id": "KRkatYS95mLP"
-      },
-      "outputs": [],
-      "source": [
-        "logging.getLogger(\"urllib3.connectionpool\").setLevel(logging.ERROR)\n",
-        "nest_asyncio.apply()\n",
-        "warnings.filterwarnings(\"ignore\")"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {
-        "id": "l26gX-cHOFBu"
-      },
-      "source": [
-        "### Helper functions"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {
-        "id": "gT_OJBHfCg4Q"
-      },
-      "outputs": [],
-      "source": [
-        "def generate_multiturn_history(df: pd.DataFrame):\n",
-        "    \"\"\"Processes a DataFrame of messages to add conversation history for each message.\n",
-        "\n",
-        "    This function takes a DataFrame containing message data and iterates through each row.\n",
-        "    For each message in a row, it constructs the conversation history up to that point by\n",
-        "    accumulating previous user and AI messages. This conversation history is then added\n",
-        "    to the message data, and the processed messages are returned as a new DataFrame.\n",
-        "\n",
-        "    Args:\n",
-        "        df: A DataFrame containing message data. It is expected to have a column named\n",
-        "            \"messages\" where each entry is a list of dictionaries representing messages in\n",
-        "            a conversation. Each message dictionary should have \"user\" and \"reference\" keys.\n",
-        "\n",
-        "    Returns:\n",
-        "        A DataFrame with the processed messages. Each message dictionary will now have an\n",
-        "        additional \"conversation_history\" key containing a list of tuples representing the\n",
-        "        conversation history leading up to that message. The tuples are of the form\n",
-        "        (\"user\", message_text) or (\"ai\", message_text).\n",
-        "    \"\"\"\n",
-        "    processed_messages = []\n",
-        "    for i, row in df.iterrows():\n",
-        "        conversation_history = []\n",
-        "        for message in row[\"messages\"]:\n",
-        "            message[\"conversation_history\"] = conversation_history\n",
-        "            processed_messages.append(message)\n",
-        "            conversation_history = conversation_history + [\n",
-        "                (\"user\", message[\"user\"]),\n",
-        "                (\"ai\", message[\"reference\"]),\n",
-        "            ]\n",
-        "    return pd.DataFrame(processed_messages)\n",
-        "\n",
-        "\n",
-        "def pairwise(iterable):\n",
-        "    \"\"\"Creates an iterable with tuples paired together\n",
-        "    e.g s -> (s0, s1), (s2, s3), (s4, s5), ...\n",
-        "    \"\"\"\n",
-        "    a = iter(iterable)\n",
-        "    return zip(a, a)\n",
-        "\n",
-        "\n",
-        "def batch_generate_message(row: dict, callable: Any) -> dict:\n",
-        "    \"\"\"\n",
-        "    Predicts a response from a chat agent.\n",
-        "\n",
-        "    Args:\n",
-        "        callable (ChatAgent): A chat agent.\n",
-        "        row (dict): A message.\n",
-        "    Returns:\n",
-        "        dict: The predicted response.\n",
-        "    \"\"\"\n",
-        "    index, message = row\n",
-        "\n",
-        "    messages = []\n",
-        "    for user_message, ground_truth in pairwise(message.get(\"conversation_history\", [])):\n",
-        "        messages.append((\"user\", user_message))\n",
-        "        messages.append((\"ai\", ground_truth))\n",
-        "    messages.append((\"user\", message[\"user\"]))\n",
-        "    input_callable = {\"messages\": messages, **message.get(\"callable_kwargs\", {})}\n",
-        "    response = callable.invoke(input_callable)\n",
-        "    message[\"response\"] = response.content\n",
-        "    message[\"response_obj\"] = response\n",
-        "    return message\n",
-        "\n",
-        "\n",
-        "def batch_generate_messages(\n",
-        "    messages: pd.DataFrame, callable: Any, max_workers: int = 4\n",
-        ") -> pd.DataFrame:\n",
-        "    \"\"\"\n",
-        "    Generates AI-powered responses to a series of user messages using a provided callable.\n",
-        "\n",
-        "    This function efficiently processes a Pandas DataFrame containing user-AI message pairs,\n",
-        "     utilizing the specified callable (either a LangChain Chain or a custom class with an\n",
-        "     `invoke` method) to predict AI responses in parallel.\n",
-        "\n",
-        "    Args:\n",
-        "        callable (callable): A callable object (e.g., LangChain Chain, custom class) used\n",
-        "            for response generation. Must have an `invoke(messages: dict) ->\n",
-        "            langchain_core.messages.ai.AIMessage` method.\n",
-        "            The `messages` dict follows this structure:\n",
-        "            {\"messages\" [(\"user\", \"first\"),(\"ai\", \"a response\"), (\"user\", \"a follow up\")]}\n",
-        "\n",
-        "        messages (pd.DataFrame): A DataFrame with one column named 'messages' containing\n",
-        "            the list of user-AI message pairs as described above.\n",
-        "\n",
-        "        max_workers (int, optional): The number of worker processes to use for parallel\n",
-        "            prediction. Defaults to the number of available CPU cores.\n",
-        "\n",
-        "    Returns:\n",
-        "        pd.DataFrame: A DataFrame containing the original messages and a new column with the predicted AI responses.\n",
-        "\n",
-        "    Example:\n",
-        "        ```python\n",
-        "        messages_df = pd.DataFrame({\n",
-        "            \"messages\": [\n",
-        "                [{\"user\": \"What's the weather today?\", \"reference\": \"It's sunny.\"}],\n",
-        "                [{\"user\": \"Tell me a joke.\", \"reference\": \"Why did the scarecrow win an award?...\n",
-        "                    Because he was outstanding in his field!\"}]\n",
-        "            ]\n",
-        "        })\n",
-        "\n",
-        "        responses_df = batch_generate_messages(my_callable, messages_df)\n",
-        "        ```\n",
-        "    \"\"\"\n",
-        "    logging.info(\"Executing batch scoring\")\n",
-        "    predicted_messages = []\n",
-        "    with ThreadPoolExecutor(max_workers) as pool:\n",
-        "        partial_func = partial(batch_generate_message, callable=callable)\n",
-        "        for message in tqdm(\n",
-        "            pool.map(partial_func, messages.iterrows()), total=len(messages)\n",
-        "        ):\n",
-        "            predicted_messages.append(message)\n",
-        "    return pd.DataFrame(predicted_messages)"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {
-        "id": "SZMXiJhKD2TY"
-      },
-      "source": [
-        "## Import ground truth data for evaluation\n",
-        "\n",
-        "In this sample, we will use 2 conversations as a ground truth. Every message in the conversations, along with the relative history, will be used to produce a response with a foundational model.\n"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {
-        "id": "jwqlusT7OJ4J"
-      },
-      "outputs": [],
-      "source": [
-        "%%writefile chats.yaml\n",
-        "- messages:\n",
-        "  - user: Hi\n",
-        "    reference: Hi, how can I help you?\n",
-        "  - user: I'm looking for a recipe for a healthy dinner. Do you have any recommendations?\n",
-        "    reference: Sure, I can help you with that. What are your dietary restrictions? Are you vegetarian, vegan, gluten-free, or anything else?\n",
-        "  - user: I'm not vegetarian or vegan, but I am gluten-free.\n",
-        "    reference: 'Okay, I ll keep that in mind. Here are a few recipes that I think you might like:\n",
-        "  * **Grilled Salmon with Roasted Vegetables:** This is a delicious and healthy recipe that is perfect for a weeknight meal. The salmon is grilled to perfection and the roasted vegetables add a touch of sweetness.\n",
-        "  * **Chicken Stir-Fry:** This is a quick and easy stir-fry that is perfect for busy weeknights. The chicken is cooked with vegetables and a light sauce.\n",
-        "  * **Lentil Soup:** This is a hearty and healthy soup that is perfect for a cold winter day. The lentils are packed with protein and fiber, and the soup is also gluten-free.'\n",
-        "  - user: Those all sound great! I think I'm going to try the grilled salmon with roasted vegetables.\n",
-        "    reference: That's a great choice! I hope you enjoy it.\n",
-        "  - user: Thanks for your help!\n",
-        "    reference: You're welcome! Is there anything else I can help you with today?\n",
-        "  - user: No, that's all. Thanks again!\n",
-        "    reference: You're welcome! Have a great day!\n",
-        "- messages:\n",
-        "  - user: Hi\n",
-        "    reference: Hi, how can I help you?\n",
-        "  - user: I'm looking for a recipe for a romantic dinner. Do you have any recommendations?\n",
-        "    reference: Sure, I can help you with that. What are your dietary restrictions? Are you vegetarian, vegan, gluten-free, or anything else?\n",
-        "  - user: I'm vegetarian.\n",
-        "    reference: 'Sure, I can help you find a healthy vegetarian dinner recipe. Here are a few ideas:\n",
-        "    * **Burnt aubergine veggie chilli:** This is a hearty and flavorful dish that is packed with nutrients. The roasted aubergine gives it a smoky flavor, and the lentils and beans add protein and fiber.\n",
-        "  * **Simple mushroom curry:** This is a quick and easy curry that is perfect for a weeknight meal. The mushrooms are cooked in a creamy sauce with spices, and the whole dish is ready in under 30 minutes.\n",
-        "  * **Vegetarian enchiladas:** This is a classic Mexican dish that is easy to make vegetarian. The enchiladas are filled with a variety of vegetables, and they are topped with a delicious sauce.\n",
-        "  * **Braised sesame tofu:** This is a flavorful and satisfying dish that is perfect for a cold night. The tofu is braised in a sauce with sesame, ginger, and garlic, and it is served over rice or noodles.\n",
-        "  * **Roast garlic & tahini spinach:** This is a light and healthy dish that is perfect for a spring or summer meal. The spinach is roasted with garlic and tahini, and it is served with a side of pita bread.\n",
-        "\n",
-        "  These are just a few ideas to get you started. There are many other great vegetarian dinner recipes out there, so you are sure to find something that you will enjoy.'\n",
-        "  - user: Those all sound great! I like the Burnt aubergine veggie chilli\n",
-        "    reference: That's a great choice! I hope you enjoy it."
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {
-        "id": "ie7KTd_OjwdE"
-      },
-      "source": [
-        "Let's now load all the messages into a Pandas DataFrame:"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {
-        "id": "j0eSy_avGCW7"
-      },
-      "outputs": [],
-      "source": [
-        "y = yaml.safe_load(open(\"chats.yaml\"))\n",
-        "df = pd.DataFrame(y)\n",
-        "df"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {
-        "id": "xA7G0bDTj84z"
-      },
-      "source": [
-        "**Decomposing the chat message input/output pairs**\n",
-        "\n",
-        "We are now ready for decomposing multi-turn history. This is essential to enable batch prediction.\n",
-        "\n",
-        "We decompose the `messages` list into single input/output pairs. The input is always composed by `user message`, `reference message` and `conversation_history`.\n",
-        "\n",
-        "**Example:**\n",
-        "\n",
-        "Given the following chat:\n",
-        "\n",
-        "```yaml\n",
-        "- messages:\n",
-        "- user: Hi\n",
-        "reference: Hi, how can I help you?\n",
-        "- user: I'm looking for a recipe for a healthy dinner. Do you have any recommendations?\n",
-        "reference: Sure, I can help you with that. What are your dietary restrictions? Are you vegetarian, vegan, gluten-free, or anything else?\n",
-        "```\n",
-        "\n",
-        "We can generate these two input/output samples:\n",
-        "\n",
-        "```yaml\n",
-        "- user: Hi\n",
-        "  reference: Hi, how can I help you?\n",
-        "  conversation_history: []\n",
-        "\n",
-        "- user: I'm looking for a recipe for a healthy dinner....\n",
-        "  reference: Sure, I can help you with that. What are your ...\n",
-        "  conversation_history:\n",
-        "  - user: Hi\n",
-        "    reference: Hi, how can I help you?\n",
-        "```"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {
-        "id": "1FL0R3YXPj3e"
-      },
-      "outputs": [],
-      "source": [
-        "df_processed = generate_multiturn_history(df)\n",
-        "df_processed"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {
-        "id": "8OhQLR1lGsEq"
-      },
-      "source": [
-        "## Let's define our LangChain chain!\n",
-        "\n",
-        "We now need to define our LangChain Chain. For this tutorial, we will create a simple conversational chain capable of producing cooking recipes for users."
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {
-        "id": "u5B0ufczjfA_"
-      },
-      "outputs": [],
-      "source": [
-        "llm = ChatVertexAI(model_name=\"gemini-1.5-flash-001\", temperature=0)\n",
-        "\n",
-        "template = ChatPromptTemplate.from_messages(\n",
-        "    [\n",
-        "        (\n",
-        "            \"system\",\n",
-        "            \"\"\"You are a conversational bot that produce nice recipes for users based on a question.\"\"\",\n",
-        "        ),\n",
-        "        MessagesPlaceholder(variable_name=\"messages\"),\n",
-        "    ]\n",
-        ")\n",
-        "\n",
-        "chain = template | llm"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {
-        "id": "MS9rlK6eq5qY"
-      },
-      "source": [
-        "We can test our chain:"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {
-        "id": "qzMotN92qrqv"
-      },
-      "outputs": [],
-      "source": [
-        "chain.invoke([(\"human\", \"Hi there!\")])"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {
-        "id": "NqdVpTMkq_pc"
-      },
-      "source": [
-        "## Batch scoring\n",
-        "\n",
-        "We are now ready to perform batch scoring. To perform batch scoring we will leverage the utility function `batch_generate_messages`\n",
-        "\n",
-        "Have a look at the definition to see the expected input format."
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {
-        "id": "Qc4DiVYOKk6H"
-      },
-      "outputs": [],
-      "source": [
-        "help(batch_generate_messages)"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {
-        "id": "Pip0WjopXMsu"
-      },
-      "outputs": [],
-      "source": [
-        "scored_data = batch_generate_messages(messages=df_processed, callable=chain)\n",
-        "scored_data"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {
-        "id": "7Tf1bssiSBk7"
-      },
-      "source": [
-        "## Evaluation\n",
-        "\n",
-        "We'll utilize [Vertex AI Rapid Evaluation](https://cloud.google.com/vertex-ai/generative-ai/docs/models/rapid-evaluation) to assess our generative AI model's performance. This service within Vertex AI streamlines the evaluation process, integrates with [Vertex AI Experiments](https://cloud.google.com/vertex-ai/docs/experiments/intro-vertex-ai-experiments) for tracking, and offers a range of [pre-built metrics](https://cloud.google.com/vertex-ai/generative-ai/docs/models/determine-eval#task-and-metrics) and the capability to define custom ones.\n"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {
-        "id": "5ZSzjLI_rBSp"
-      },
-      "source": [
-        "#### Define a CustomMetric using Gemini model\n",
-        "\n",
-        "Define a customized Gemini model-based metric function, with explanations for the score. The registered custom metrics are computed on the client side, without using online evaluation service APIs."
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {
-        "id": "aT0uclHrSBlC"
-      },
-      "outputs": [],
-      "source": [
-        "evaluator_llm = VertexAI(model_name=\"gemini-1.5-flash-001\", temperature=0)\n",
-        "\n",
-        "\n",
-        "def custom_faithfulness(instance):\n",
-        "    prompt = f\"\"\"You are examining written text content. Here is the text:\n",
-        "************\n",
-        "Written content: {instance[\"response\"]}\n",
-        "************\n",
-        "Original source data: {instance[\"reference\"]}\n",
-        "\n",
-        "Examine the text and determine whether the text is faithful or not.\n",
-        "Faithfulness refers to how accurately a generated summary reflects the essential information and key concepts present in the original source document.\n",
-        "A faithful summary stays true to the facts and meaning of the source text, without introducing distortions, hallucinations, or information that wasn't originally there.\n",
-        "\n",
-        "Your response must be an explanation of your thinking along with single integer number on a scale of 0-5, 0\n",
-        "the least faithful and 5 being the most faithful.\n",
-        "\n",
-        "Produce results in JSON\n",
-        "\n",
-        "Expected format:\n",
-        "\n",
-        "```json\n",
-        "{{\n",
-        "    \"explanation\": \"< your explanation>\",\n",
-        "    \"custom_faithfulness\": <your score>\n",
-        "}}\n",
-        "```\n",
-        "\"\"\"\n",
-        "\n",
-        "    result = evaluator_llm.invoke(prompt)\n",
-        "    try:\n",
-        "        result = json.loads(result.replace(\"```\", \"\").replace(\"json\", \"\"))\n",
-        "    except JSONDecodeError:\n",
-        "        result = {\"explanation\": None, \"custom_faithfulness\": None}\n",
-        "    return result\n",
-        "\n",
-        "\n",
-        "# Register Custom Metric\n",
-        "custom_faithfulness_metric = make_metric(\n",
-        "    name=\"custom_faithfulness\",\n",
-        "    metric_function=custom_faithfulness,\n",
-        ")"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {
-        "id": "JpdRqYlLq53t"
-      },
-      "source": [
-        "### Run Evaluation with CustomMetric"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {
-        "id": "w4iU_mIhoY93"
-      },
-      "outputs": [],
-      "source": [
-        "experiment_name = \"rapid-eval-langchain-eval\"  # @param {type:\"string\"}"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {
-        "id": "U3r3NiKNqn3u"
-      },
-      "source": [
-        "We are now ready to run the evaluation. We will use different metrics, combining the custom metric we defined above with some pre-built metrics.\n",
-        "\n",
-        "Results of the evaluation will be automatically tagged into the experiment_name we defined.\n",
-        "\n",
-        "You can click `View Experiment`, to see the experiment in Google Cloud Console."
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {
-        "id": "zeVra-g1rAFV"
-      },
-      "outputs": [],
-      "source": [
-        "metrics = [\"fluency\", \"coherence\", \"safety\", custom_faithfulness_metric]\n",
-        "eval_task = EvalTask(dataset=scored_data, metrics=metrics, experiment=experiment_name)\n",
-        "eval_result = eval_task.evaluate()"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {
-        "id": "ka3tZCL_uurD"
-      },
-      "source": [
-        "Once an eval result is produced, we are able to display summary metrics:"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {
-        "id": "KheOvIvtiRlz"
-      },
-      "outputs": [],
-      "source": [
-        "eval_result.summary_metrics"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {
-        "id": "JcALGGlwu0p_"
-      },
-      "source": [
-        "We are also able to display a pandas dataframe containing a detailed summary of how our eval dataset performed and relative granular metrics."
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {
-        "id": "9zJ686YYiWJC"
-      },
-      "outputs": [],
-      "source": [
-        "eval_result.metrics_table"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {
-        "id": "L1NsUKA3vEu8"
-      },
-      "source": [
-        "## Iterating over the prompt\n",
-        "\n",
-        "Let's perform some simple changes to our chain to see how our evaluation results change."
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {
-        "id": "O_8PnvLSv7Nu"
-      },
-      "outputs": [],
-      "source": [
-        "template = ChatPromptTemplate.from_messages(\n",
-        "    [\n",
-        "        (\n",
-        "            \"system\",\n",
-        "            \"\"\"You are a conversational bot that produce nice recipes for users based on a question.\n",
-        "Before suggesting a recipe, you should ask for the dietary requirements..\"\"\",\n",
-        "        ),\n",
-        "        MessagesPlaceholder(variable_name=\"messages\"),\n",
-        "    ]\n",
-        ")\n",
-        "\n",
-        "new_chain = template | llm"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {
-        "id": "JF_Po81twPbr"
-      },
-      "outputs": [],
-      "source": [
-        "scored_data = batch_generate_messages(messages=df_processed, callable=new_chain)\n",
-        "scored_data.rename(columns={\"text\": \"response\"}, inplace=True)\n",
-        "scored_data"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {
-        "id": "snIQ0itfwUZa"
-      },
-      "outputs": [],
-      "source": [
-        "metrics = [\"fluency\", \"coherence\", \"safety\", custom_faithfulness_metric]\n",
-        "eval_task = EvalTask(dataset=scored_data, metrics=metrics, experiment=experiment_name)\n",
-        "eval_result = eval_task.evaluate()"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {
-        "id": "CT6Ma5FLwfbF"
-      },
-      "outputs": [],
-      "source": [
-        "eval_result.summary_metrics"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {
-        "id": "b42l5juJsyyY"
-      },
-      "source": [
-        "#### Let's compare both eval results\n",
-        "\n",
-        "We can do that by using the method `display_runs` for a given `eval task` object to see which prompt template performed best on our dataset."
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {
-        "id": "HP0Zcm1yvh95"
-      },
-      "outputs": [],
-      "source": [
-        "df = vertexai.preview.get_experiment_df(experiment_name).T\n",
-        "df"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {
-        "id": "TpV-iwP9qw9c"
-      },
-      "source": [
-        "## Cleaning up\n",
-        "\n",
-        "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n",
-        "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n",
-        "\n",
-        "Otherwise, you can delete the individual resources you created in this tutorial."
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {
-        "id": "sx_vKniMq9ZX"
-      },
-      "outputs": [],
-      "source": [
-        "import os\n",
-        "\n",
-        "# Delete Experiments\n",
-        "delete_experiments = True\n",
-        "if delete_experiments or os.getenv(\"IS_TESTING\"):\n",
-        "    experiments_list = aiplatform.Experiment.list()\n",
-        "    for experiment in experiments_list:\n",
-        "        experiment.delete()"
-      ]
-    }
-  ],
-  "metadata": {
-    "colab": {
-      "name": "evaluate_langchain_chains.ipynb",
-      "toc_visible": true
-    },
-    "kernelspec": {
-      "display_name": "Python 3",
-      "name": "python3"
-    }
-  },
-  "nbformat": 4,
-  "nbformat_minor": 0
-}
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "OsXAs2gcIpbC"
+   },
+   "outputs": [],
+   "source": [
+    "# Copyright 2024 Google LLC\n",
+    "#\n",
+    "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
+    "# you may not use this file except in compliance with the License.\n",
+    "# You may obtain a copy of the License at\n",
+    "#\n",
+    "#     https://www.apache.org/licenses/LICENSE-2.0\n",
+    "#\n",
+    "# Unless required by applicable law or agreed to in writing, software\n",
+    "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
+    "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
+    "# See the License for the specific language governing permissions and\n",
+    "# limitations under the License."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "7ZX50cNFOFBt"
+   },
+   "source": [
+    " # Evaluate LangChain | Rapid Evaluation SDK Tutorial\n",
+    "\n",
+    " <table align=\"left\">\n",
+    "  <td style=\"text-align: center\">\n",
+    "    <a href=\"https://colab.research.google.com/github/GoogleCloudPlatform/generative-ai/blob/main/gemini/evaluation/evaluate_langchain_chains.ipynb\">\n",
+    "      <img src=\"https://cloud.google.com/ml-engine/images/colab-logo-32px.png\" alt=\"Google Colaboratory logo\"><br> Open in Colab\n",
+    "    </a>\n",
+    "  </td>\n",
+    "  <td style=\"text-align: center\">\n",
+    "    <a href=\"https://console.cloud.google.com/vertex-ai/colab/import/https:%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fgenerative-ai%2Fmain%2Fgemini%2Fevaluation%2Fevaluate_langchain_chains.ipynb\">\n",
+    "      <img width=\"32px\" src=\"https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN\" alt=\"Google Cloud Colab Enterprise logo\"><br> Open in Colab Enterprise\n",
+    "    </a>\n",
+    "  </td>    \n",
+    "  <td style=\"text-align: center\">\n",
+    "    <a href=\"https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/generative-ai/main/gemini/evaluation/evaluate_langchain_chains.ipynb\">\n",
+    "      <img src=\"https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32\" alt=\"Vertex AI logo\"><br> Open in Workbench\n",
+    "    </a>\n",
+    "  </td>\n",
+    "  <td style=\"text-align: center\">\n",
+    "    <a href=\"https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/evaluation/evaluate_langchain_chains.ipynb\">\n",
+    "      <img src=\"https://cloud.google.com/ml-engine/images/github-logo-32px.png\" alt=\"GitHub logo\"><br> View on GitHub\n",
+    "    </a>\n",
+    "  </td>\n",
+    "</table>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "usd0d_LiOFBt"
+   },
+   "source": [
+    "| | |\n",
+    "|-|-|\n",
+    "|Author(s) | [Elia Secchi](https://github.com/eliasecchig) |"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "MjDmmmDaOFBt"
+   },
+   "source": [
+    "## Overview\n",
+    "\n",
+    "With this tutorial, you learn how to evaluate the performance of a conversational LangChain chain using the Vertex AI Rapid Evaluation SDK. The notebook utilizes a dummy chatbot designed to provide recipe suggestions.\n",
+    "\n",
+    "The tutorial goes trough:\n",
+    "1. Data preparation\n",
+    "2. Setting up the LangChain chain\n",
+    "3. Set-up a custom metric\n",
+    "4. Run evaluation with a combination of custom metrics and built-in metrics.\n",
+    "5. Log results into an experiment run and analyze different runs.\n",
+    "\n",
+    "### Costs\n",
+    "\n",
+    "This tutorial uses billable components of Google Cloud:\n",
+    "\n",
+    "- Vertex AI\n",
+    "\n",
+    "Learn about [Vertex AI pricing](https://cloud.google.com/vertex-ai/pricing) and use the [Pricing Calculator](https://cloud.google.com/products/calculator/) to generate a cost estimate based on your projected usage.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "w-OcPSC8_FUX"
+   },
+   "source": [
+    "## Getting Started"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "-7Jso8-FO4N8"
+   },
+   "source": [
+    "### Install Vertex AI SDK for Rapid Evaluation"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "tUat7NRq5JDC"
+   },
+   "outputs": [],
+   "source": [
+    "%pip install --quiet --upgrade nest_asyncio\n",
+    "%pip install --upgrade --user --quiet langchain-core langchain-google-vertexai langchain\n",
+    "%pip install --upgrade --user --quiet \"google-cloud-aiplatform[rapid_evaluation]\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "R5Xep4W9lq-Z"
+   },
+   "source": [
+    "### Restart runtime\n",
+    "\n",
+    "To use the newly installed packages in this Jupyter runtime, you must restart the runtime. You can do this by running the cell below, which restarts the current kernel.\n",
+    "\n",
+    "The restart might take a minute or longer. After it's restarted, continue to the next step."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "XRvKdaPDTznN"
+   },
+   "outputs": [],
+   "source": [
+    "import IPython\n",
+    "\n",
+    "app = IPython.Application.instance()\n",
+    "app.kernel.do_shutdown(True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "SbmM4z7FOBpM"
+   },
+   "source": [
+    "<div class=\"alert alert-block alert-warning\">\n",
+    "<b>⚠️ The kernel is going to restart. Wait until it's finished before continuing to the next step. ⚠️</b>\n",
+    "</div>\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "dmWOrTJ3gx13"
+   },
+   "source": [
+    "### Authenticate your notebook environment (Colab only)\n",
+    "\n",
+    "If you're running this notebook on Google Colab, run the cell below to authenticate your environment."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "NyKGtVQjgx13"
+   },
+   "outputs": [],
+   "source": [
+    "import sys\n",
+    "\n",
+    "if \"google.colab\" in sys.modules:\n",
+    "    from google.colab import auth\n",
+    "\n",
+    "    auth.authenticate_user()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "DF4l8DTdWgPY"
+   },
+   "source": [
+    "### Set Google Cloud project information and initialize Vertex AI SDK\n",
+    "\n",
+    "To get started using Vertex AI, you must have an existing Google Cloud project and [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com).\n",
+    "\n",
+    "Learn more about [setting up a project and a development environment](https://cloud.google.com/vertex-ai/docs/start/cloud-environment)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "Nqwi-5ufWp_B"
+   },
+   "outputs": [],
+   "source": [
+    "PROJECT_ID = \"genai-blackbelt-fishfooding\"  # @param {type:\"string\"}\n",
+    "LOCATION = \"us-central1\"  # @param {type:\"string\"}\n",
+    "\n",
+    "\n",
+    "import vertexai\n",
+    "\n",
+    "vertexai.init(project=PROJECT_ID, location=LOCATION)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "dvhI92xhQTzk"
+   },
+   "source": [
+    "### Import libraries"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "qP4ihOCkEBje"
+   },
+   "outputs": [],
+   "source": [
+    "# General\n",
+    "import yaml\n",
+    "import json\n",
+    "import logging\n",
+    "import warnings\n",
+    "from concurrent.futures import ThreadPoolExecutor\n",
+    "from functools import partial\n",
+    "from json.decoder import JSONDecodeError\n",
+    "from typing import Any\n",
+    "import nest_asyncio\n",
+    "import pandas as pd\n",
+    "\n",
+    "# Main\n",
+    "import vertexai\n",
+    "from google.cloud import aiplatform\n",
+    "from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder\n",
+    "from langchain_google_vertexai import ChatVertexAI, VertexAI\n",
+    "from tqdm import tqdm\n",
+    "from vertexai.evaluation import EvalTask, CustomMetric"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "bDmLyf-K5nRz"
+   },
+   "source": [
+    "### Library settings"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "KRkatYS95mLP"
+   },
+   "outputs": [],
+   "source": [
+    "logging.getLogger(\"urllib3.connectionpool\").setLevel(logging.ERROR)\n",
+    "nest_asyncio.apply()\n",
+    "warnings.filterwarnings(\"ignore\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "l26gX-cHOFBu"
+   },
+   "source": [
+    "### Helper functions"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "gT_OJBHfCg4Q"
+   },
+   "outputs": [],
+   "source": [
+    "def generate_multiturn_history(df: pd.DataFrame):\n",
+    "    \"\"\"Processes a DataFrame of messages to add conversation history for each message.\n",
+    "\n",
+    "    This function takes a DataFrame containing message data and iterates through each row.\n",
+    "    For each message in a row, it constructs the conversation history up to that point by\n",
+    "    accumulating previous user and AI messages. This conversation history is then added\n",
+    "    to the message data, and the processed messages are returned as a new DataFrame.\n",
+    "\n",
+    "    Args:\n",
+    "        df: A DataFrame containing message data. It is expected to have a column named\n",
+    "            \"messages\" where each entry is a list of dictionaries representing messages in\n",
+    "            a conversation. Each message dictionary should have \"user\" and \"reference\" keys.\n",
+    "\n",
+    "    Returns:\n",
+    "        A DataFrame with the processed messages. Each message dictionary will now have an\n",
+    "        additional \"conversation_history\" key containing a list of tuples representing the\n",
+    "        conversation history leading up to that message. The tuples are of the form\n",
+    "        (\"user\", message_text) or (\"ai\", message_text).\n",
+    "    \"\"\"\n",
+    "    processed_messages = []\n",
+    "    for i, row in df.iterrows():\n",
+    "        conversation_history = []\n",
+    "        for message in row[\"messages\"]:\n",
+    "            message[\"conversation_history\"] = conversation_history\n",
+    "            processed_messages.append(message)\n",
+    "            conversation_history = conversation_history + [\n",
+    "                (\"user\", message[\"user\"]),\n",
+    "                (\"ai\", message[\"reference\"]),\n",
+    "            ]\n",
+    "    return pd.DataFrame(processed_messages)\n",
+    "\n",
+    "\n",
+    "def pairwise(iterable):\n",
+    "    \"\"\"Creates an iterable with tuples paired together\n",
+    "    e.g s -> (s0, s1), (s2, s3), (s4, s5), ...\n",
+    "    \"\"\"\n",
+    "    a = iter(iterable)\n",
+    "    return zip(a, a)\n",
+    "\n",
+    "\n",
+    "def batch_generate_message(row: dict, callable: Any) -> dict:\n",
+    "    \"\"\"\n",
+    "    Predicts a response from a chat agent.\n",
+    "\n",
+    "    Args:\n",
+    "        callable (ChatAgent): A chat agent.\n",
+    "        row (dict): A message.\n",
+    "    Returns:\n",
+    "        dict: The predicted response.\n",
+    "    \"\"\"\n",
+    "    index, message = row\n",
+    "\n",
+    "    messages = []\n",
+    "    for user_message, ground_truth in pairwise(message.get(\"conversation_history\", [])):\n",
+    "        messages.append((\"user\", user_message))\n",
+    "        messages.append((\"ai\", ground_truth))\n",
+    "    messages.append((\"user\", message[\"user\"]))\n",
+    "    input_callable = {\"messages\": messages, **message.get(\"callable_kwargs\", {})}\n",
+    "    response = callable.invoke(input_callable)\n",
+    "    message[\"response\"] = response.content\n",
+    "    message[\"response_obj\"] = response\n",
+    "    return message\n",
+    "\n",
+    "\n",
+    "def batch_generate_messages(\n",
+    "    messages: pd.DataFrame, callable: Any, max_workers: int = 4\n",
+    ") -> pd.DataFrame:\n",
+    "    \"\"\"\n",
+    "    Generates AI-powered responses to a series of user messages using a provided callable.\n",
+    "\n",
+    "    This function efficiently processes a Pandas DataFrame containing user-AI message pairs,\n",
+    "     utilizing the specified callable (either a LangChain Chain or a custom class with an\n",
+    "     `invoke` method) to predict AI responses in parallel.\n",
+    "\n",
+    "    Args:\n",
+    "        callable (callable): A callable object (e.g., LangChain Chain, custom class) used\n",
+    "            for response generation. Must have an `invoke(messages: dict) ->\n",
+    "            langchain_core.messages.ai.AIMessage` method.\n",
+    "            The `messages` dict follows this structure:\n",
+    "            {\"messages\" [(\"user\", \"first\"),(\"ai\", \"a response\"), (\"user\", \"a follow up\")]}\n",
+    "\n",
+    "        messages (pd.DataFrame): A DataFrame with one column named 'messages' containing\n",
+    "            the list of user-AI message pairs as described above.\n",
+    "\n",
+    "        max_workers (int, optional): The number of worker processes to use for parallel\n",
+    "            prediction. Defaults to the number of available CPU cores.\n",
+    "\n",
+    "    Returns:\n",
+    "        pd.DataFrame: A DataFrame containing the original messages and a new column with the predicted AI responses.\n",
+    "\n",
+    "    Example:\n",
+    "        ```python\n",
+    "        messages_df = pd.DataFrame({\n",
+    "            \"messages\": [\n",
+    "                [{\"user\": \"What's the weather today?\", \"reference\": \"It's sunny.\"}],\n",
+    "                [{\"user\": \"Tell me a joke.\", \"reference\": \"Why did the scarecrow win an award?...\n",
+    "                    Because he was outstanding in his field!\"}]\n",
+    "            ]\n",
+    "        })\n",
+    "\n",
+    "        responses_df = batch_generate_messages(my_callable, messages_df)\n",
+    "        ```\n",
+    "    \"\"\"\n",
+    "    logging.info(\"Executing batch scoring\")\n",
+    "    predicted_messages = []\n",
+    "    with ThreadPoolExecutor(max_workers) as pool:\n",
+    "        partial_func = partial(batch_generate_message, callable=callable)\n",
+    "        for message in tqdm(\n",
+    "            pool.map(partial_func, messages.iterrows()), total=len(messages)\n",
+    "        ):\n",
+    "            predicted_messages.append(message)\n",
+    "    return pd.DataFrame(predicted_messages)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "SZMXiJhKD2TY"
+   },
+   "source": [
+    "## Import ground truth data for evaluation\n",
+    "\n",
+    "In this sample, we will use 2 conversations as a ground truth. Every message in the conversations, along with the relative history, will be used to produce a response with a foundational model.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "jwqlusT7OJ4J"
+   },
+   "outputs": [],
+   "source": [
+    "%%writefile chats.yaml\n",
+    "- messages:\n",
+    "  - user: Hi\n",
+    "    reference: Hi, how can I help you?\n",
+    "  - user: I'm looking for a recipe for a healthy dinner. Do you have any recommendations?\n",
+    "    reference: Sure, I can help you with that. What are your dietary restrictions? Are you vegetarian, vegan, gluten-free, or anything else?\n",
+    "  - user: I'm not vegetarian or vegan, but I am gluten-free.\n",
+    "    reference: 'Okay, I ll keep that in mind. Here are a few recipes that I think you might like:\n",
+    "  * **Grilled Salmon with Roasted Vegetables:** This is a delicious and healthy recipe that is perfect for a weeknight meal. The salmon is grilled to perfection and the roasted vegetables add a touch of sweetness.\n",
+    "  * **Chicken Stir-Fry:** This is a quick and easy stir-fry that is perfect for busy weeknights. The chicken is cooked with vegetables and a light sauce.\n",
+    "  * **Lentil Soup:** This is a hearty and healthy soup that is perfect for a cold winter day. The lentils are packed with protein and fiber, and the soup is also gluten-free.'\n",
+    "  - user: Those all sound great! I think I'm going to try the grilled salmon with roasted vegetables.\n",
+    "    reference: That's a great choice! I hope you enjoy it.\n",
+    "  - user: Thanks for your help!\n",
+    "    reference: You're welcome! Is there anything else I can help you with today?\n",
+    "  - user: No, that's all. Thanks again!\n",
+    "    reference: You're welcome! Have a great day!\n",
+    "- messages:\n",
+    "  - user: Hi\n",
+    "    reference: Hi, how can I help you?\n",
+    "  - user: I'm looking for a recipe for a romantic dinner. Do you have any recommendations?\n",
+    "    reference: Sure, I can help you with that. What are your dietary restrictions? Are you vegetarian, vegan, gluten-free, or anything else?\n",
+    "  - user: I'm vegetarian.\n",
+    "    reference: 'Sure, I can help you find a healthy vegetarian dinner recipe. Here are a few ideas:\n",
+    "    * **Burnt aubergine veggie chilli:** This is a hearty and flavorful dish that is packed with nutrients. The roasted aubergine gives it a smoky flavor, and the lentils and beans add protein and fiber.\n",
+    "  * **Simple mushroom curry:** This is a quick and easy curry that is perfect for a weeknight meal. The mushrooms are cooked in a creamy sauce with spices, and the whole dish is ready in under 30 minutes.\n",
+    "  * **Vegetarian enchiladas:** This is a classic Mexican dish that is easy to make vegetarian. The enchiladas are filled with a variety of vegetables, and they are topped with a delicious sauce.\n",
+    "  * **Braised sesame tofu:** This is a flavorful and satisfying dish that is perfect for a cold night. The tofu is braised in a sauce with sesame, ginger, and garlic, and it is served over rice or noodles.\n",
+    "  * **Roast garlic & tahini spinach:** This is a light and healthy dish that is perfect for a spring or summer meal. The spinach is roasted with garlic and tahini, and it is served with a side of pita bread.\n",
+    "\n",
+    "  These are just a few ideas to get you started. There are many other great vegetarian dinner recipes out there, so you are sure to find something that you will enjoy.'\n",
+    "  - user: Those all sound great! I like the Burnt aubergine veggie chilli\n",
+    "    reference: That's a great choice! I hope you enjoy it."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "ie7KTd_OjwdE"
+   },
+   "source": [
+    "Let's now load all the messages into a Pandas DataFrame:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "j0eSy_avGCW7"
+   },
+   "outputs": [],
+   "source": [
+    "y = yaml.safe_load(open(\"chats.yaml\"))\n",
+    "df = pd.DataFrame(y)\n",
+    "df"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "xA7G0bDTj84z"
+   },
+   "source": [
+    "**Decomposing the chat message input/output pairs**\n",
+    "\n",
+    "We are now ready for decomposing multi-turn history. This is essential to enable batch prediction.\n",
+    "\n",
+    "We decompose the `messages` list into single input/output pairs. The input is always composed by `user message`, `reference message` and `conversation_history`.\n",
+    "\n",
+    "**Example:**\n",
+    "\n",
+    "Given the following chat:\n",
+    "\n",
+    "```yaml\n",
+    "- messages:\n",
+    "- user: Hi\n",
+    "reference: Hi, how can I help you?\n",
+    "- user: I'm looking for a recipe for a healthy dinner. Do you have any recommendations?\n",
+    "reference: Sure, I can help you with that. What are your dietary restrictions? Are you vegetarian, vegan, gluten-free, or anything else?\n",
+    "```\n",
+    "\n",
+    "We can generate these two input/output samples:\n",
+    "\n",
+    "```yaml\n",
+    "- user: Hi\n",
+    "  reference: Hi, how can I help you?\n",
+    "  conversation_history: []\n",
+    "\n",
+    "- user: I'm looking for a recipe for a healthy dinner....\n",
+    "  reference: Sure, I can help you with that. What are your ...\n",
+    "  conversation_history:\n",
+    "  - user: Hi\n",
+    "    reference: Hi, how can I help you?\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "1FL0R3YXPj3e"
+   },
+   "outputs": [],
+   "source": [
+    "df_processed = generate_multiturn_history(df)\n",
+    "df_processed"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "8OhQLR1lGsEq"
+   },
+   "source": [
+    "## Let's define our LangChain chain!\n",
+    "\n",
+    "We now need to define our LangChain Chain. For this tutorial, we will create a simple conversational chain capable of producing cooking recipes for users."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "u5B0ufczjfA_"
+   },
+   "outputs": [],
+   "source": [
+    "llm = ChatVertexAI(model_name=\"gemini-1.5-flash-001\", temperature=0)\n",
+    "\n",
+    "template = ChatPromptTemplate.from_messages(\n",
+    "    [\n",
+    "        (\n",
+    "            \"system\",\n",
+    "            \"\"\"You are a conversational bot that produce nice recipes for users based on a question.\"\"\",\n",
+    "        ),\n",
+    "        MessagesPlaceholder(variable_name=\"messages\"),\n",
+    "    ]\n",
+    ")\n",
+    "\n",
+    "chain = template | llm"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "MS9rlK6eq5qY"
+   },
+   "source": [
+    "We can test our chain:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "qzMotN92qrqv"
+   },
+   "outputs": [],
+   "source": [
+    "chain.invoke([(\"human\", \"Hi there!\")])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "NqdVpTMkq_pc"
+   },
+   "source": [
+    "## Batch scoring\n",
+    "\n",
+    "We are now ready to perform batch scoring. To perform batch scoring we will leverage the utility function `batch_generate_messages`\n",
+    "\n",
+    "Have a look at the definition to see the expected input format."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "Qc4DiVYOKk6H"
+   },
+   "outputs": [],
+   "source": [
+    "help(batch_generate_messages)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "Pip0WjopXMsu"
+   },
+   "outputs": [],
+   "source": [
+    "scored_data = batch_generate_messages(messages=df_processed, callable=chain)\n",
+    "# scored_data.rename(columns={\"user\": \"prompt\"}, inplace=True)\n",
+    "scored_data"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "7Tf1bssiSBk7"
+   },
+   "source": [
+    "## Evaluation\n",
+    "\n",
+    "We'll utilize [Vertex AI Rapid Evaluation](https://cloud.google.com/vertex-ai/generative-ai/docs/models/rapid-evaluation) to assess our generative AI model's performance. This service within Vertex AI streamlines the evaluation process, integrates with [Vertex AI Experiments](https://cloud.google.com/vertex-ai/docs/experiments/intro-vertex-ai-experiments) for tracking, and offers a range of [pre-built metrics](https://cloud.google.com/vertex-ai/generative-ai/docs/models/determine-eval#task-and-metrics) and the capability to define custom ones.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "5ZSzjLI_rBSp"
+   },
+   "source": [
+    "#### Define a CustomMetric using Gemini model\n",
+    "\n",
+    "Define a customized Gemini model-based metric function, with explanations for the score. The registered custom metrics are computed on the client side, without using online evaluation service APIs."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "aT0uclHrSBlC"
+   },
+   "outputs": [],
+   "source": [
+    "evaluator_llm = ChatVertexAI(\n",
+    "    model_name=\"gemini-1.5-flash-001\",\n",
+    "    temperature=0,\n",
+    "    response_mime_type=\"application/json\",\n",
+    ")\n",
+    "\n",
+    "\n",
+    "def custom_faithfulness(instance):\n",
+    "    prompt = f\"\"\"You are examining written text content. Here is the text:\n",
+    "************\n",
+    "Written content: {instance[\"response\"]}\n",
+    "************\n",
+    "Original source data: {instance[\"reference\"]}\n",
+    "\n",
+    "Examine the text and determine whether the text is faithful or not.\n",
+    "Faithfulness refers to how accurately a generated summary reflects the essential information and key concepts present in the original source document.\n",
+    "A faithful summary stays true to the facts and meaning of the source text, without introducing distortions, hallucinations, or information that wasn't originally there.\n",
+    "\n",
+    "Your response must be an explanation of your thinking along with single integer number on a scale of 0-5, 0\n",
+    "the least faithful and 5 being the most faithful.\n",
+    "\n",
+    "Produce results in JSON\n",
+    "\n",
+    "Expected format:\n",
+    "\n",
+    "```json\n",
+    "{{\n",
+    "    \"explanation\": \"< your explanation>\",\n",
+    "    \"custom_faithfulness\": <your score>\n",
+    "}}\n",
+    "```\n",
+    "\"\"\"\n",
+    "\n",
+    "    result = evaluator_llm.invoke([(\"human\", prompt)])\n",
+    "    result = json.loads(result.content)\n",
+    "    return result\n",
+    "\n",
+    "\n",
+    "# Register Custom Metric\n",
+    "custom_faithfulness_metric = CustomMetric(\n",
+    "    name=\"custom_faithfulness\",\n",
+    "    metric_function=custom_faithfulness,\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "JpdRqYlLq53t"
+   },
+   "source": [
+    "### Run Evaluation with CustomMetric"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "w4iU_mIhoY93"
+   },
+   "outputs": [],
+   "source": [
+    "experiment_name = \"rapid-eval-langchain-eval\"  # @param {type:\"string\"}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "U3r3NiKNqn3u"
+   },
+   "source": [
+    "We are now ready to run the evaluation. We will use different metrics, combining the custom metric we defined above with some pre-built metrics.\n",
+    "\n",
+    "Results of the evaluation will be automatically tagged into the experiment_name we defined.\n",
+    "\n",
+    "You can click `View Experiment`, to see the experiment in Google Cloud Console."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "zeVra-g1rAFV"
+   },
+   "outputs": [],
+   "source": [
+    "metrics = [\"fluency\", \"coherence\", \"safety\", custom_faithfulness_metric]\n",
+    "\n",
+    "eval_task = EvalTask(\n",
+    "    dataset=scored_data,\n",
+    "    metrics=metrics,\n",
+    "    experiment=experiment_name,\n",
+    "    metric_column_mapping={\"prompt\": \"user\"},\n",
+    ")\n",
+    "eval_result = eval_task.evaluate()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "ka3tZCL_uurD"
+   },
+   "source": [
+    "Once an eval result is produced, we are able to display summary metrics:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "KheOvIvtiRlz"
+   },
+   "outputs": [],
+   "source": [
+    "eval_result.summary_metrics"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "JcALGGlwu0p_"
+   },
+   "source": [
+    "We are also able to display a pandas dataframe containing a detailed summary of how our eval dataset performed and relative granular metrics."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "9zJ686YYiWJC"
+   },
+   "outputs": [],
+   "source": [
+    "eval_result.metrics_table"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "L1NsUKA3vEu8"
+   },
+   "source": [
+    "## Iterating over the prompt\n",
+    "\n",
+    "Let's perform some simple changes to our chain to see how our evaluation results change."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "O_8PnvLSv7Nu"
+   },
+   "outputs": [],
+   "source": [
+    "template = ChatPromptTemplate.from_messages(\n",
+    "    [\n",
+    "        (\n",
+    "            \"system\",\n",
+    "            \"\"\"You are a conversational bot that produce nice recipes for users based on a question.\n",
+    "Before suggesting a recipe, you should ask for the dietary requirements..\"\"\",\n",
+    "        ),\n",
+    "        MessagesPlaceholder(variable_name=\"messages\"),\n",
+    "    ]\n",
+    ")\n",
+    "\n",
+    "new_chain = template | llm"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "JF_Po81twPbr"
+   },
+   "outputs": [],
+   "source": [
+    "scored_data = batch_generate_messages(messages=df_processed, callable=new_chain)\n",
+    "scored_data.rename(columns={\"text\": \"response\"}, inplace=True)\n",
+    "scored_data"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "snIQ0itfwUZa"
+   },
+   "outputs": [],
+   "source": [
+    "metrics = [\"fluency\", \"coherence\", \"safety\", custom_faithfulness_metric]\n",
+    "eval_task = EvalTask(\n",
+    "    dataset=scored_data,\n",
+    "    metrics=metrics,\n",
+    "    experiment=experiment_name,\n",
+    "    metric_column_mapping={\"prompt\": \"user\"},\n",
+    ")\n",
+    "eval_result = eval_task.evaluate()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "CT6Ma5FLwfbF"
+   },
+   "outputs": [],
+   "source": [
+    "eval_result.summary_metrics"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "b42l5juJsyyY"
+   },
+   "source": [
+    "#### Let's compare both eval results\n",
+    "\n",
+    "We can do that by using the method `display_runs` for a given `eval task` object to see which prompt template performed best on our dataset."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "HP0Zcm1yvh95"
+   },
+   "outputs": [],
+   "source": [
+    "df = vertexai.preview.get_experiment_df(experiment_name).T\n",
+    "df"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "TpV-iwP9qw9c"
+   },
+   "source": [
+    "## Cleaning up\n",
+    "\n",
+    "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n",
+    "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n",
+    "\n",
+    "Otherwise, you can delete the individual resources you created in this tutorial."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "sx_vKniMq9ZX"
+   },
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "\n",
+    "# Delete Experiments\n",
+    "delete_experiments = True\n",
+    "if delete_experiments or os.getenv(\"IS_TESTING\"):\n",
+    "    experiments_list = aiplatform.Experiment.list()\n",
+    "    for experiment in experiments_list:\n",
+    "        experiment.delete()"
+   ]
+  }
+ ],
+ "metadata": {
+  "colab": {
+   "toc_visible": true,
+   "provenance": []
+  },
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.9"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
\ No newline at end of file