diff --git a/gemini/evaluation/evaluate_langchain_chains.ipynb b/gemini/evaluation/evaluate_langchain_chains.ipynb
index b642da7a84b..fe292c127d6 100644
--- a/gemini/evaluation/evaluate_langchain_chains.ipynb
+++ b/gemini/evaluation/evaluate_langchain_chains.ipynb
@@ -1,929 +1,953 @@
{
- "cells": [
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "OsXAs2gcIpbC"
- },
- "outputs": [],
- "source": [
- "# Copyright 2024 Google LLC\n",
- "#\n",
- "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
- "# you may not use this file except in compliance with the License.\n",
- "# You may obtain a copy of the License at\n",
- "#\n",
- "# https://www.apache.org/licenses/LICENSE-2.0\n",
- "#\n",
- "# Unless required by applicable law or agreed to in writing, software\n",
- "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
- "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
- "# See the License for the specific language governing permissions and\n",
- "# limitations under the License."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "7ZX50cNFOFBt"
- },
- "source": [
- " # Evaluate LangChain | Rapid Evaluation SDK Tutorial\n",
- "\n",
- "
\n",
- " \n",
- " \n",
- " ![\"Google](\"https://cloud.google.com/ml-engine/images/colab-logo-32px.png\") Open in Colab\n",
- " \n",
- " | \n",
- " \n",
- " \n",
- " ![\"Google](\"https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN\") Open in Colab Enterprise\n",
- " \n",
- " | \n",
- " \n",
- " \n",
- " ![\"Vertex](\"https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32\") Open in Workbench\n",
- " \n",
- " | \n",
- " \n",
- " \n",
- " ![\"GitHub](\"https://cloud.google.com/ml-engine/images/github-logo-32px.png\") View on GitHub\n",
- " \n",
- " | \n",
- "
"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "usd0d_LiOFBt"
- },
- "source": [
- "| | |\n",
- "|-|-|\n",
- "|Author(s) | [Elia Secchi](https://github.com/eliasecchig) |"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "MjDmmmDaOFBt"
- },
- "source": [
- "## Overview\n",
- "\n",
- "With this tutorial, you learn how to evaluate the performance of a conversational LangChain chain using the Vertex AI Rapid Evaluation SDK. The notebook utilizes a dummy chatbot designed to provide recipe suggestions.\n",
- "\n",
- "The tutorial goes trough:\n",
- "1. Data preparation\n",
- "2. Setting up the LangChain chain\n",
- "3. Set-up a custom metric\n",
- "4. Run evaluation with a combination of custom metrics and built-in metrics.\n",
- "5. Log results into an experiment run and analyze different runs.\n",
- "\n",
- "### Costs\n",
- "\n",
- "This tutorial uses billable components of Google Cloud:\n",
- "\n",
- "- Vertex AI\n",
- "\n",
- "Learn about [Vertex AI pricing](https://cloud.google.com/vertex-ai/pricing) and use the [Pricing Calculator](https://cloud.google.com/products/calculator/) to generate a cost estimate based on your projected usage.\n"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "w-OcPSC8_FUX"
- },
- "source": [
- "## Getting Started"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "-7Jso8-FO4N8"
- },
- "source": [
- "### Install Vertex AI SDK for Rapid Evaluation"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "tUat7NRq5JDC"
- },
- "outputs": [],
- "source": [
- "%pip install --quiet --upgrade nest_asyncio\n",
- "%pip install --upgrade --user --quiet langchain-core langchain-google-vertexai langchain\n",
- "%pip install --upgrade --user --quiet \"google-cloud-aiplatform[rapid_evaluation]\""
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "R5Xep4W9lq-Z"
- },
- "source": [
- "### Restart runtime\n",
- "\n",
- "To use the newly installed packages in this Jupyter runtime, you must restart the runtime. You can do this by running the cell below, which restarts the current kernel.\n",
- "\n",
- "The restart might take a minute or longer. After it's restarted, continue to the next step."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "XRvKdaPDTznN"
- },
- "outputs": [],
- "source": [
- "import IPython\n",
- "\n",
- "app = IPython.Application.instance()\n",
- "app.kernel.do_shutdown(True)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "SbmM4z7FOBpM"
- },
- "source": [
- "\n",
- "⚠️ The kernel is going to restart. Wait until it's finished before continuing to the next step. ⚠️\n",
- "
\n"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "dmWOrTJ3gx13"
- },
- "source": [
- "### Authenticate your notebook environment (Colab only)\n",
- "\n",
- "If you're running this notebook on Google Colab, run the cell below to authenticate your environment."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "NyKGtVQjgx13"
- },
- "outputs": [],
- "source": [
- "# import sys\n",
- "\n",
- "# if \"google.colab\" in sys.modules:\n",
- "# from google.colab import auth\n",
- "\n",
- "# auth.authenticate_user()"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "DF4l8DTdWgPY"
- },
- "source": [
- "### Set Google Cloud project information and initialize Vertex AI SDK\n",
- "\n",
- "To get started using Vertex AI, you must have an existing Google Cloud project and [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com).\n",
- "\n",
- "Learn more about [setting up a project and a development environment](https://cloud.google.com/vertex-ai/docs/start/cloud-environment)."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "Nqwi-5ufWp_B"
- },
- "outputs": [],
- "source": [
- "PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}\n",
- "LOCATION = \"us-central1\" # @param {type:\"string\"}\n",
- "\n",
- "\n",
- "import vertexai\n",
- "\n",
- "vertexai.init(project=PROJECT_ID, location=LOCATION)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "dvhI92xhQTzk"
- },
- "source": [
- "### Import libraries"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "qP4ihOCkEBje"
- },
- "outputs": [],
- "source": [
- "from concurrent.futures import ThreadPoolExecutor\n",
- "from functools import partial\n",
- "import json\n",
- "from json.decoder import JSONDecodeError\n",
- "import logging\n",
- "from typing import Any\n",
- "import warnings\n",
- "\n",
- "from google.cloud import aiplatform\n",
- "from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder\n",
- "from langchain_google_vertexai import ChatVertexAI, VertexAI\n",
- "import nest_asyncio\n",
- "import pandas as pd\n",
- "from tqdm import tqdm\n",
- "\n",
- "# Main\n",
- "import vertexai\n",
- "from vertexai.preview.evaluation import EvalTask, make_metric\n",
- "\n",
- "# General\n",
- "import yaml"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "bDmLyf-K5nRz"
- },
- "source": [
- "### Library settings"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "KRkatYS95mLP"
- },
- "outputs": [],
- "source": [
- "logging.getLogger(\"urllib3.connectionpool\").setLevel(logging.ERROR)\n",
- "nest_asyncio.apply()\n",
- "warnings.filterwarnings(\"ignore\")"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "l26gX-cHOFBu"
- },
- "source": [
- "### Helper functions"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "gT_OJBHfCg4Q"
- },
- "outputs": [],
- "source": [
- "def generate_multiturn_history(df: pd.DataFrame):\n",
- " \"\"\"Processes a DataFrame of messages to add conversation history for each message.\n",
- "\n",
- " This function takes a DataFrame containing message data and iterates through each row.\n",
- " For each message in a row, it constructs the conversation history up to that point by\n",
- " accumulating previous user and AI messages. This conversation history is then added\n",
- " to the message data, and the processed messages are returned as a new DataFrame.\n",
- "\n",
- " Args:\n",
- " df: A DataFrame containing message data. It is expected to have a column named\n",
- " \"messages\" where each entry is a list of dictionaries representing messages in\n",
- " a conversation. Each message dictionary should have \"user\" and \"reference\" keys.\n",
- "\n",
- " Returns:\n",
- " A DataFrame with the processed messages. Each message dictionary will now have an\n",
- " additional \"conversation_history\" key containing a list of tuples representing the\n",
- " conversation history leading up to that message. The tuples are of the form\n",
- " (\"user\", message_text) or (\"ai\", message_text).\n",
- " \"\"\"\n",
- " processed_messages = []\n",
- " for i, row in df.iterrows():\n",
- " conversation_history = []\n",
- " for message in row[\"messages\"]:\n",
- " message[\"conversation_history\"] = conversation_history\n",
- " processed_messages.append(message)\n",
- " conversation_history = conversation_history + [\n",
- " (\"user\", message[\"user\"]),\n",
- " (\"ai\", message[\"reference\"]),\n",
- " ]\n",
- " return pd.DataFrame(processed_messages)\n",
- "\n",
- "\n",
- "def pairwise(iterable):\n",
- " \"\"\"Creates an iterable with tuples paired together\n",
- " e.g s -> (s0, s1), (s2, s3), (s4, s5), ...\n",
- " \"\"\"\n",
- " a = iter(iterable)\n",
- " return zip(a, a)\n",
- "\n",
- "\n",
- "def batch_generate_message(row: dict, callable: Any) -> dict:\n",
- " \"\"\"\n",
- " Predicts a response from a chat agent.\n",
- "\n",
- " Args:\n",
- " callable (ChatAgent): A chat agent.\n",
- " row (dict): A message.\n",
- " Returns:\n",
- " dict: The predicted response.\n",
- " \"\"\"\n",
- " index, message = row\n",
- "\n",
- " messages = []\n",
- " for user_message, ground_truth in pairwise(message.get(\"conversation_history\", [])):\n",
- " messages.append((\"user\", user_message))\n",
- " messages.append((\"ai\", ground_truth))\n",
- " messages.append((\"user\", message[\"user\"]))\n",
- " input_callable = {\"messages\": messages, **message.get(\"callable_kwargs\", {})}\n",
- " response = callable.invoke(input_callable)\n",
- " message[\"response\"] = response.content\n",
- " message[\"response_obj\"] = response\n",
- " return message\n",
- "\n",
- "\n",
- "def batch_generate_messages(\n",
- " messages: pd.DataFrame, callable: Any, max_workers: int = 4\n",
- ") -> pd.DataFrame:\n",
- " \"\"\"\n",
- " Generates AI-powered responses to a series of user messages using a provided callable.\n",
- "\n",
- " This function efficiently processes a Pandas DataFrame containing user-AI message pairs,\n",
- " utilizing the specified callable (either a LangChain Chain or a custom class with an\n",
- " `invoke` method) to predict AI responses in parallel.\n",
- "\n",
- " Args:\n",
- " callable (callable): A callable object (e.g., LangChain Chain, custom class) used\n",
- " for response generation. Must have an `invoke(messages: dict) ->\n",
- " langchain_core.messages.ai.AIMessage` method.\n",
- " The `messages` dict follows this structure:\n",
- " {\"messages\" [(\"user\", \"first\"),(\"ai\", \"a response\"), (\"user\", \"a follow up\")]}\n",
- "\n",
- " messages (pd.DataFrame): A DataFrame with one column named 'messages' containing\n",
- " the list of user-AI message pairs as described above.\n",
- "\n",
- " max_workers (int, optional): The number of worker processes to use for parallel\n",
- " prediction. Defaults to the number of available CPU cores.\n",
- "\n",
- " Returns:\n",
- " pd.DataFrame: A DataFrame containing the original messages and a new column with the predicted AI responses.\n",
- "\n",
- " Example:\n",
- " ```python\n",
- " messages_df = pd.DataFrame({\n",
- " \"messages\": [\n",
- " [{\"user\": \"What's the weather today?\", \"reference\": \"It's sunny.\"}],\n",
- " [{\"user\": \"Tell me a joke.\", \"reference\": \"Why did the scarecrow win an award?...\n",
- " Because he was outstanding in his field!\"}]\n",
- " ]\n",
- " })\n",
- "\n",
- " responses_df = batch_generate_messages(my_callable, messages_df)\n",
- " ```\n",
- " \"\"\"\n",
- " logging.info(\"Executing batch scoring\")\n",
- " predicted_messages = []\n",
- " with ThreadPoolExecutor(max_workers) as pool:\n",
- " partial_func = partial(batch_generate_message, callable=callable)\n",
- " for message in tqdm(\n",
- " pool.map(partial_func, messages.iterrows()), total=len(messages)\n",
- " ):\n",
- " predicted_messages.append(message)\n",
- " return pd.DataFrame(predicted_messages)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "SZMXiJhKD2TY"
- },
- "source": [
- "## Import ground truth data for evaluation\n",
- "\n",
- "In this sample, we will use 2 conversations as a ground truth. Every message in the conversations, along with the relative history, will be used to produce a response with a foundational model.\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "jwqlusT7OJ4J"
- },
- "outputs": [],
- "source": [
- "%%writefile chats.yaml\n",
- "- messages:\n",
- " - user: Hi\n",
- " reference: Hi, how can I help you?\n",
- " - user: I'm looking for a recipe for a healthy dinner. Do you have any recommendations?\n",
- " reference: Sure, I can help you with that. What are your dietary restrictions? Are you vegetarian, vegan, gluten-free, or anything else?\n",
- " - user: I'm not vegetarian or vegan, but I am gluten-free.\n",
- " reference: 'Okay, I ll keep that in mind. Here are a few recipes that I think you might like:\n",
- " * **Grilled Salmon with Roasted Vegetables:** This is a delicious and healthy recipe that is perfect for a weeknight meal. The salmon is grilled to perfection and the roasted vegetables add a touch of sweetness.\n",
- " * **Chicken Stir-Fry:** This is a quick and easy stir-fry that is perfect for busy weeknights. The chicken is cooked with vegetables and a light sauce.\n",
- " * **Lentil Soup:** This is a hearty and healthy soup that is perfect for a cold winter day. The lentils are packed with protein and fiber, and the soup is also gluten-free.'\n",
- " - user: Those all sound great! I think I'm going to try the grilled salmon with roasted vegetables.\n",
- " reference: That's a great choice! I hope you enjoy it.\n",
- " - user: Thanks for your help!\n",
- " reference: You're welcome! Is there anything else I can help you with today?\n",
- " - user: No, that's all. Thanks again!\n",
- " reference: You're welcome! Have a great day!\n",
- "- messages:\n",
- " - user: Hi\n",
- " reference: Hi, how can I help you?\n",
- " - user: I'm looking for a recipe for a romantic dinner. Do you have any recommendations?\n",
- " reference: Sure, I can help you with that. What are your dietary restrictions? Are you vegetarian, vegan, gluten-free, or anything else?\n",
- " - user: I'm vegetarian.\n",
- " reference: 'Sure, I can help you find a healthy vegetarian dinner recipe. Here are a few ideas:\n",
- " * **Burnt aubergine veggie chilli:** This is a hearty and flavorful dish that is packed with nutrients. The roasted aubergine gives it a smoky flavor, and the lentils and beans add protein and fiber.\n",
- " * **Simple mushroom curry:** This is a quick and easy curry that is perfect for a weeknight meal. The mushrooms are cooked in a creamy sauce with spices, and the whole dish is ready in under 30 minutes.\n",
- " * **Vegetarian enchiladas:** This is a classic Mexican dish that is easy to make vegetarian. The enchiladas are filled with a variety of vegetables, and they are topped with a delicious sauce.\n",
- " * **Braised sesame tofu:** This is a flavorful and satisfying dish that is perfect for a cold night. The tofu is braised in a sauce with sesame, ginger, and garlic, and it is served over rice or noodles.\n",
- " * **Roast garlic & tahini spinach:** This is a light and healthy dish that is perfect for a spring or summer meal. The spinach is roasted with garlic and tahini, and it is served with a side of pita bread.\n",
- "\n",
- " These are just a few ideas to get you started. There are many other great vegetarian dinner recipes out there, so you are sure to find something that you will enjoy.'\n",
- " - user: Those all sound great! I like the Burnt aubergine veggie chilli\n",
- " reference: That's a great choice! I hope you enjoy it."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "ie7KTd_OjwdE"
- },
- "source": [
- "Let's now load all the messages into a Pandas DataFrame:"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "j0eSy_avGCW7"
- },
- "outputs": [],
- "source": [
- "y = yaml.safe_load(open(\"chats.yaml\"))\n",
- "df = pd.DataFrame(y)\n",
- "df"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "xA7G0bDTj84z"
- },
- "source": [
- "**Decomposing the chat message input/output pairs**\n",
- "\n",
- "We are now ready for decomposing multi-turn history. This is essential to enable batch prediction.\n",
- "\n",
- "We decompose the `messages` list into single input/output pairs. The input is always composed by `user message`, `reference message` and `conversation_history`.\n",
- "\n",
- "**Example:**\n",
- "\n",
- "Given the following chat:\n",
- "\n",
- "```yaml\n",
- "- messages:\n",
- "- user: Hi\n",
- "reference: Hi, how can I help you?\n",
- "- user: I'm looking for a recipe for a healthy dinner. Do you have any recommendations?\n",
- "reference: Sure, I can help you with that. What are your dietary restrictions? Are you vegetarian, vegan, gluten-free, or anything else?\n",
- "```\n",
- "\n",
- "We can generate these two input/output samples:\n",
- "\n",
- "```yaml\n",
- "- user: Hi\n",
- " reference: Hi, how can I help you?\n",
- " conversation_history: []\n",
- "\n",
- "- user: I'm looking for a recipe for a healthy dinner....\n",
- " reference: Sure, I can help you with that. What are your ...\n",
- " conversation_history:\n",
- " - user: Hi\n",
- " reference: Hi, how can I help you?\n",
- "```"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "1FL0R3YXPj3e"
- },
- "outputs": [],
- "source": [
- "df_processed = generate_multiturn_history(df)\n",
- "df_processed"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "8OhQLR1lGsEq"
- },
- "source": [
- "## Let's define our LangChain chain!\n",
- "\n",
- "We now need to define our LangChain Chain. For this tutorial, we will create a simple conversational chain capable of producing cooking recipes for users."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "u5B0ufczjfA_"
- },
- "outputs": [],
- "source": [
- "llm = ChatVertexAI(model_name=\"gemini-1.5-flash-001\", temperature=0)\n",
- "\n",
- "template = ChatPromptTemplate.from_messages(\n",
- " [\n",
- " (\n",
- " \"system\",\n",
- " \"\"\"You are a conversational bot that produce nice recipes for users based on a question.\"\"\",\n",
- " ),\n",
- " MessagesPlaceholder(variable_name=\"messages\"),\n",
- " ]\n",
- ")\n",
- "\n",
- "chain = template | llm"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "MS9rlK6eq5qY"
- },
- "source": [
- "We can test our chain:"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "qzMotN92qrqv"
- },
- "outputs": [],
- "source": [
- "chain.invoke([(\"human\", \"Hi there!\")])"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "NqdVpTMkq_pc"
- },
- "source": [
- "## Batch scoring\n",
- "\n",
- "We are now ready to perform batch scoring. To perform batch scoring we will leverage the utility function `batch_generate_messages`\n",
- "\n",
- "Have a look at the definition to see the expected input format."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "Qc4DiVYOKk6H"
- },
- "outputs": [],
- "source": [
- "help(batch_generate_messages)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "Pip0WjopXMsu"
- },
- "outputs": [],
- "source": [
- "scored_data = batch_generate_messages(messages=df_processed, callable=chain)\n",
- "scored_data"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "7Tf1bssiSBk7"
- },
- "source": [
- "## Evaluation\n",
- "\n",
- "We'll utilize [Vertex AI Rapid Evaluation](https://cloud.google.com/vertex-ai/generative-ai/docs/models/rapid-evaluation) to assess our generative AI model's performance. This service within Vertex AI streamlines the evaluation process, integrates with [Vertex AI Experiments](https://cloud.google.com/vertex-ai/docs/experiments/intro-vertex-ai-experiments) for tracking, and offers a range of [pre-built metrics](https://cloud.google.com/vertex-ai/generative-ai/docs/models/determine-eval#task-and-metrics) and the capability to define custom ones.\n"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "5ZSzjLI_rBSp"
- },
- "source": [
- "#### Define a CustomMetric using Gemini model\n",
- "\n",
- "Define a customized Gemini model-based metric function, with explanations for the score. The registered custom metrics are computed on the client side, without using online evaluation service APIs."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "aT0uclHrSBlC"
- },
- "outputs": [],
- "source": [
- "evaluator_llm = VertexAI(model_name=\"gemini-1.5-flash-001\", temperature=0)\n",
- "\n",
- "\n",
- "def custom_faithfulness(instance):\n",
- " prompt = f\"\"\"You are examining written text content. Here is the text:\n",
- "************\n",
- "Written content: {instance[\"response\"]}\n",
- "************\n",
- "Original source data: {instance[\"reference\"]}\n",
- "\n",
- "Examine the text and determine whether the text is faithful or not.\n",
- "Faithfulness refers to how accurately a generated summary reflects the essential information and key concepts present in the original source document.\n",
- "A faithful summary stays true to the facts and meaning of the source text, without introducing distortions, hallucinations, or information that wasn't originally there.\n",
- "\n",
- "Your response must be an explanation of your thinking along with single integer number on a scale of 0-5, 0\n",
- "the least faithful and 5 being the most faithful.\n",
- "\n",
- "Produce results in JSON\n",
- "\n",
- "Expected format:\n",
- "\n",
- "```json\n",
- "{{\n",
- " \"explanation\": \"< your explanation>\",\n",
- " \"custom_faithfulness\": \n",
- "}}\n",
- "```\n",
- "\"\"\"\n",
- "\n",
- " result = evaluator_llm.invoke(prompt)\n",
- " try:\n",
- " result = json.loads(result.replace(\"```\", \"\").replace(\"json\", \"\"))\n",
- " except JSONDecodeError:\n",
- " result = {\"explanation\": None, \"custom_faithfulness\": None}\n",
- " return result\n",
- "\n",
- "\n",
- "# Register Custom Metric\n",
- "custom_faithfulness_metric = make_metric(\n",
- " name=\"custom_faithfulness\",\n",
- " metric_function=custom_faithfulness,\n",
- ")"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "JpdRqYlLq53t"
- },
- "source": [
- "### Run Evaluation with CustomMetric"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "w4iU_mIhoY93"
- },
- "outputs": [],
- "source": [
- "experiment_name = \"rapid-eval-langchain-eval\" # @param {type:\"string\"}"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "U3r3NiKNqn3u"
- },
- "source": [
- "We are now ready to run the evaluation. We will use different metrics, combining the custom metric we defined above with some pre-built metrics.\n",
- "\n",
- "Results of the evaluation will be automatically tagged into the experiment_name we defined.\n",
- "\n",
- "You can click `View Experiment`, to see the experiment in Google Cloud Console."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "zeVra-g1rAFV"
- },
- "outputs": [],
- "source": [
- "metrics = [\"fluency\", \"coherence\", \"safety\", custom_faithfulness_metric]\n",
- "eval_task = EvalTask(dataset=scored_data, metrics=metrics, experiment=experiment_name)\n",
- "eval_result = eval_task.evaluate()"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "ka3tZCL_uurD"
- },
- "source": [
- "Once an eval result is produced, we are able to display summary metrics:"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "KheOvIvtiRlz"
- },
- "outputs": [],
- "source": [
- "eval_result.summary_metrics"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "JcALGGlwu0p_"
- },
- "source": [
- "We are also able to display a pandas dataframe containing a detailed summary of how our eval dataset performed and relative granular metrics."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "9zJ686YYiWJC"
- },
- "outputs": [],
- "source": [
- "eval_result.metrics_table"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "L1NsUKA3vEu8"
- },
- "source": [
- "## Iterating over the prompt\n",
- "\n",
- "Let's perform some simple changes to our chain to see how our evaluation results change."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "O_8PnvLSv7Nu"
- },
- "outputs": [],
- "source": [
- "template = ChatPromptTemplate.from_messages(\n",
- " [\n",
- " (\n",
- " \"system\",\n",
- " \"\"\"You are a conversational bot that produce nice recipes for users based on a question.\n",
- "Before suggesting a recipe, you should ask for the dietary requirements..\"\"\",\n",
- " ),\n",
- " MessagesPlaceholder(variable_name=\"messages\"),\n",
- " ]\n",
- ")\n",
- "\n",
- "new_chain = template | llm"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "JF_Po81twPbr"
- },
- "outputs": [],
- "source": [
- "scored_data = batch_generate_messages(messages=df_processed, callable=new_chain)\n",
- "scored_data.rename(columns={\"text\": \"response\"}, inplace=True)\n",
- "scored_data"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "snIQ0itfwUZa"
- },
- "outputs": [],
- "source": [
- "metrics = [\"fluency\", \"coherence\", \"safety\", custom_faithfulness_metric]\n",
- "eval_task = EvalTask(dataset=scored_data, metrics=metrics, experiment=experiment_name)\n",
- "eval_result = eval_task.evaluate()"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "CT6Ma5FLwfbF"
- },
- "outputs": [],
- "source": [
- "eval_result.summary_metrics"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "b42l5juJsyyY"
- },
- "source": [
- "#### Let's compare both eval results\n",
- "\n",
- "We can do that by using the method `display_runs` for a given `eval task` object to see which prompt template performed best on our dataset."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "HP0Zcm1yvh95"
- },
- "outputs": [],
- "source": [
- "df = vertexai.preview.get_experiment_df(experiment_name).T\n",
- "df"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "TpV-iwP9qw9c"
- },
- "source": [
- "## Cleaning up\n",
- "\n",
- "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n",
- "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n",
- "\n",
- "Otherwise, you can delete the individual resources you created in this tutorial."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "sx_vKniMq9ZX"
- },
- "outputs": [],
- "source": [
- "import os\n",
- "\n",
- "# Delete Experiments\n",
- "delete_experiments = True\n",
- "if delete_experiments or os.getenv(\"IS_TESTING\"):\n",
- " experiments_list = aiplatform.Experiment.list()\n",
- " for experiment in experiments_list:\n",
- " experiment.delete()"
- ]
- }
- ],
- "metadata": {
- "colab": {
- "name": "evaluate_langchain_chains.ipynb",
- "toc_visible": true
- },
- "kernelspec": {
- "display_name": "Python 3",
- "name": "python3"
- }
- },
- "nbformat": 4,
- "nbformat_minor": 0
-}
+ "cells": [
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "OsXAs2gcIpbC"
+ },
+ "outputs": [],
+ "source": [
+ "# Copyright 2024 Google LLC\n",
+ "#\n",
+ "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
+ "# you may not use this file except in compliance with the License.\n",
+ "# You may obtain a copy of the License at\n",
+ "#\n",
+ "# https://www.apache.org/licenses/LICENSE-2.0\n",
+ "#\n",
+ "# Unless required by applicable law or agreed to in writing, software\n",
+ "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
+ "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
+ "# See the License for the specific language governing permissions and\n",
+ "# limitations under the License."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "7ZX50cNFOFBt"
+ },
+ "source": [
+ " # Evaluate LangChain | Rapid Evaluation SDK Tutorial\n",
+ "\n",
+ " \n",
+ " \n",
+ " \n",
+ " ![\"Google](\"https://cloud.google.com/ml-engine/images/colab-logo-32px.png\") Open in Colab\n",
+ " \n",
+ " | \n",
+ " \n",
+ " \n",
+ " ![\"Google](\"https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN\") Open in Colab Enterprise\n",
+ " \n",
+ " | \n",
+ " \n",
+ " \n",
+ " ![\"Vertex](\"https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32\") Open in Workbench\n",
+ " \n",
+ " | \n",
+ " \n",
+ " \n",
+ " ![\"GitHub](\"https://cloud.google.com/ml-engine/images/github-logo-32px.png\") View on GitHub\n",
+ " \n",
+ " | \n",
+ "
"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "usd0d_LiOFBt"
+ },
+ "source": [
+ "| | |\n",
+ "|-|-|\n",
+ "|Author(s) | [Elia Secchi](https://github.com/eliasecchig) |"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "MjDmmmDaOFBt"
+ },
+ "source": [
+ "## Overview\n",
+ "\n",
+ "With this tutorial, you learn how to evaluate the performance of a conversational LangChain chain using the Vertex AI Rapid Evaluation SDK. The notebook utilizes a dummy chatbot designed to provide recipe suggestions.\n",
+ "\n",
+ "The tutorial goes trough:\n",
+ "1. Data preparation\n",
+ "2. Setting up the LangChain chain\n",
+ "3. Set-up a custom metric\n",
+ "4. Run evaluation with a combination of custom metrics and built-in metrics.\n",
+ "5. Log results into an experiment run and analyze different runs.\n",
+ "\n",
+ "### Costs\n",
+ "\n",
+ "This tutorial uses billable components of Google Cloud:\n",
+ "\n",
+ "- Vertex AI\n",
+ "\n",
+ "Learn about [Vertex AI pricing](https://cloud.google.com/vertex-ai/pricing) and use the [Pricing Calculator](https://cloud.google.com/products/calculator/) to generate a cost estimate based on your projected usage.\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "w-OcPSC8_FUX"
+ },
+ "source": [
+ "## Getting Started"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "-7Jso8-FO4N8"
+ },
+ "source": [
+ "### Install Vertex AI SDK for Rapid Evaluation"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "tUat7NRq5JDC"
+ },
+ "outputs": [],
+ "source": [
+ "%pip install --quiet --upgrade nest_asyncio\n",
+ "%pip install --upgrade --user --quiet langchain-core langchain-google-vertexai langchain\n",
+ "%pip install --upgrade --user --quiet \"google-cloud-aiplatform[rapid_evaluation]\""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "R5Xep4W9lq-Z"
+ },
+ "source": [
+ "### Restart runtime\n",
+ "\n",
+ "To use the newly installed packages in this Jupyter runtime, you must restart the runtime. You can do this by running the cell below, which restarts the current kernel.\n",
+ "\n",
+ "The restart might take a minute or longer. After it's restarted, continue to the next step."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "XRvKdaPDTznN"
+ },
+ "outputs": [],
+ "source": [
+ "import IPython\n",
+ "\n",
+ "app = IPython.Application.instance()\n",
+ "app.kernel.do_shutdown(True)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "SbmM4z7FOBpM"
+ },
+ "source": [
+ "\n",
+ "⚠️ The kernel is going to restart. Wait until it's finished before continuing to the next step. ⚠️\n",
+ "
\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "dmWOrTJ3gx13"
+ },
+ "source": [
+ "### Authenticate your notebook environment (Colab only)\n",
+ "\n",
+ "If you're running this notebook on Google Colab, run the cell below to authenticate your environment."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "NyKGtVQjgx13"
+ },
+ "outputs": [],
+ "source": [
+ "import sys\n",
+ "\n",
+ "if \"google.colab\" in sys.modules:\n",
+ " from google.colab import auth\n",
+ "\n",
+ " auth.authenticate_user()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "DF4l8DTdWgPY"
+ },
+ "source": [
+ "### Set Google Cloud project information and initialize Vertex AI SDK\n",
+ "\n",
+ "To get started using Vertex AI, you must have an existing Google Cloud project and [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com).\n",
+ "\n",
+ "Learn more about [setting up a project and a development environment](https://cloud.google.com/vertex-ai/docs/start/cloud-environment)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "Nqwi-5ufWp_B"
+ },
+ "outputs": [],
+ "source": [
+ "PROJECT_ID = \"genai-blackbelt-fishfooding\" # @param {type:\"string\"}\n",
+ "LOCATION = \"us-central1\" # @param {type:\"string\"}\n",
+ "\n",
+ "\n",
+ "import vertexai\n",
+ "\n",
+ "vertexai.init(project=PROJECT_ID, location=LOCATION)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "dvhI92xhQTzk"
+ },
+ "source": [
+ "### Import libraries"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "qP4ihOCkEBje"
+ },
+ "outputs": [],
+ "source": [
+ "# General\n",
+ "import yaml\n",
+ "import json\n",
+ "import logging\n",
+ "import warnings\n",
+ "from concurrent.futures import ThreadPoolExecutor\n",
+ "from functools import partial\n",
+ "from json.decoder import JSONDecodeError\n",
+ "from typing import Any\n",
+ "import nest_asyncio\n",
+ "import pandas as pd\n",
+ "\n",
+ "# Main\n",
+ "import vertexai\n",
+ "from google.cloud import aiplatform\n",
+ "from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder\n",
+ "from langchain_google_vertexai import ChatVertexAI, VertexAI\n",
+ "from tqdm import tqdm\n",
+ "from vertexai.evaluation import EvalTask, CustomMetric"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "bDmLyf-K5nRz"
+ },
+ "source": [
+ "### Library settings"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "KRkatYS95mLP"
+ },
+ "outputs": [],
+ "source": [
+ "logging.getLogger(\"urllib3.connectionpool\").setLevel(logging.ERROR)\n",
+ "nest_asyncio.apply()\n",
+ "warnings.filterwarnings(\"ignore\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "l26gX-cHOFBu"
+ },
+ "source": [
+ "### Helper functions"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "gT_OJBHfCg4Q"
+ },
+ "outputs": [],
+ "source": [
+ "def generate_multiturn_history(df: pd.DataFrame):\n",
+ " \"\"\"Processes a DataFrame of messages to add conversation history for each message.\n",
+ "\n",
+ " This function takes a DataFrame containing message data and iterates through each row.\n",
+ " For each message in a row, it constructs the conversation history up to that point by\n",
+ " accumulating previous user and AI messages. This conversation history is then added\n",
+ " to the message data, and the processed messages are returned as a new DataFrame.\n",
+ "\n",
+ " Args:\n",
+ " df: A DataFrame containing message data. It is expected to have a column named\n",
+ " \"messages\" where each entry is a list of dictionaries representing messages in\n",
+ " a conversation. Each message dictionary should have \"user\" and \"reference\" keys.\n",
+ "\n",
+ " Returns:\n",
+ " A DataFrame with the processed messages. Each message dictionary will now have an\n",
+ " additional \"conversation_history\" key containing a list of tuples representing the\n",
+ " conversation history leading up to that message. The tuples are of the form\n",
+ " (\"user\", message_text) or (\"ai\", message_text).\n",
+ " \"\"\"\n",
+ " processed_messages = []\n",
+ " for i, row in df.iterrows():\n",
+ " conversation_history = []\n",
+ " for message in row[\"messages\"]:\n",
+ " message[\"conversation_history\"] = conversation_history\n",
+ " processed_messages.append(message)\n",
+ " conversation_history = conversation_history + [\n",
+ " (\"user\", message[\"user\"]),\n",
+ " (\"ai\", message[\"reference\"]),\n",
+ " ]\n",
+ " return pd.DataFrame(processed_messages)\n",
+ "\n",
+ "\n",
+ "def pairwise(iterable):\n",
+ " \"\"\"Creates an iterable with tuples paired together\n",
+ " e.g s -> (s0, s1), (s2, s3), (s4, s5), ...\n",
+ " \"\"\"\n",
+ " a = iter(iterable)\n",
+ " return zip(a, a)\n",
+ "\n",
+ "\n",
+ "def batch_generate_message(row: dict, callable: Any) -> dict:\n",
+ " \"\"\"\n",
+ " Predicts a response from a chat agent.\n",
+ "\n",
+ " Args:\n",
+ " callable (ChatAgent): A chat agent.\n",
+ " row (dict): A message.\n",
+ " Returns:\n",
+ " dict: The predicted response.\n",
+ " \"\"\"\n",
+ " index, message = row\n",
+ "\n",
+ " messages = []\n",
+ " for user_message, ground_truth in pairwise(message.get(\"conversation_history\", [])):\n",
+ " messages.append((\"user\", user_message))\n",
+ " messages.append((\"ai\", ground_truth))\n",
+ " messages.append((\"user\", message[\"user\"]))\n",
+ " input_callable = {\"messages\": messages, **message.get(\"callable_kwargs\", {})}\n",
+ " response = callable.invoke(input_callable)\n",
+ " message[\"response\"] = response.content\n",
+ " message[\"response_obj\"] = response\n",
+ " return message\n",
+ "\n",
+ "\n",
+ "def batch_generate_messages(\n",
+ " messages: pd.DataFrame, callable: Any, max_workers: int = 4\n",
+ ") -> pd.DataFrame:\n",
+ " \"\"\"\n",
+ " Generates AI-powered responses to a series of user messages using a provided callable.\n",
+ "\n",
+ " This function efficiently processes a Pandas DataFrame containing user-AI message pairs,\n",
+ " utilizing the specified callable (either a LangChain Chain or a custom class with an\n",
+ " `invoke` method) to predict AI responses in parallel.\n",
+ "\n",
+ " Args:\n",
+ " callable (callable): A callable object (e.g., LangChain Chain, custom class) used\n",
+ " for response generation. Must have an `invoke(messages: dict) ->\n",
+ " langchain_core.messages.ai.AIMessage` method.\n",
+ " The `messages` dict follows this structure:\n",
+ " {\"messages\" [(\"user\", \"first\"),(\"ai\", \"a response\"), (\"user\", \"a follow up\")]}\n",
+ "\n",
+ " messages (pd.DataFrame): A DataFrame with one column named 'messages' containing\n",
+ " the list of user-AI message pairs as described above.\n",
+ "\n",
+ " max_workers (int, optional): The number of worker processes to use for parallel\n",
+ " prediction. Defaults to the number of available CPU cores.\n",
+ "\n",
+ " Returns:\n",
+ " pd.DataFrame: A DataFrame containing the original messages and a new column with the predicted AI responses.\n",
+ "\n",
+ " Example:\n",
+ " ```python\n",
+ " messages_df = pd.DataFrame({\n",
+ " \"messages\": [\n",
+ " [{\"user\": \"What's the weather today?\", \"reference\": \"It's sunny.\"}],\n",
+ " [{\"user\": \"Tell me a joke.\", \"reference\": \"Why did the scarecrow win an award?...\n",
+ " Because he was outstanding in his field!\"}]\n",
+ " ]\n",
+ " })\n",
+ "\n",
+ " responses_df = batch_generate_messages(my_callable, messages_df)\n",
+ " ```\n",
+ " \"\"\"\n",
+ " logging.info(\"Executing batch scoring\")\n",
+ " predicted_messages = []\n",
+ " with ThreadPoolExecutor(max_workers) as pool:\n",
+ " partial_func = partial(batch_generate_message, callable=callable)\n",
+ " for message in tqdm(\n",
+ " pool.map(partial_func, messages.iterrows()), total=len(messages)\n",
+ " ):\n",
+ " predicted_messages.append(message)\n",
+ " return pd.DataFrame(predicted_messages)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "SZMXiJhKD2TY"
+ },
+ "source": [
+ "## Import ground truth data for evaluation\n",
+ "\n",
+ "In this sample, we will use 2 conversations as a ground truth. Every message in the conversations, along with the relative history, will be used to produce a response with a foundational model.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "jwqlusT7OJ4J"
+ },
+ "outputs": [],
+ "source": [
+ "%%writefile chats.yaml\n",
+ "- messages:\n",
+ " - user: Hi\n",
+ " reference: Hi, how can I help you?\n",
+ " - user: I'm looking for a recipe for a healthy dinner. Do you have any recommendations?\n",
+ " reference: Sure, I can help you with that. What are your dietary restrictions? Are you vegetarian, vegan, gluten-free, or anything else?\n",
+ " - user: I'm not vegetarian or vegan, but I am gluten-free.\n",
+ " reference: 'Okay, I ll keep that in mind. Here are a few recipes that I think you might like:\n",
+ " * **Grilled Salmon with Roasted Vegetables:** This is a delicious and healthy recipe that is perfect for a weeknight meal. The salmon is grilled to perfection and the roasted vegetables add a touch of sweetness.\n",
+ " * **Chicken Stir-Fry:** This is a quick and easy stir-fry that is perfect for busy weeknights. The chicken is cooked with vegetables and a light sauce.\n",
+ " * **Lentil Soup:** This is a hearty and healthy soup that is perfect for a cold winter day. The lentils are packed with protein and fiber, and the soup is also gluten-free.'\n",
+ " - user: Those all sound great! I think I'm going to try the grilled salmon with roasted vegetables.\n",
+ " reference: That's a great choice! I hope you enjoy it.\n",
+ " - user: Thanks for your help!\n",
+ " reference: You're welcome! Is there anything else I can help you with today?\n",
+ " - user: No, that's all. Thanks again!\n",
+ " reference: You're welcome! Have a great day!\n",
+ "- messages:\n",
+ " - user: Hi\n",
+ " reference: Hi, how can I help you?\n",
+ " - user: I'm looking for a recipe for a romantic dinner. Do you have any recommendations?\n",
+ " reference: Sure, I can help you with that. What are your dietary restrictions? Are you vegetarian, vegan, gluten-free, or anything else?\n",
+ " - user: I'm vegetarian.\n",
+ " reference: 'Sure, I can help you find a healthy vegetarian dinner recipe. Here are a few ideas:\n",
+ " * **Burnt aubergine veggie chilli:** This is a hearty and flavorful dish that is packed with nutrients. The roasted aubergine gives it a smoky flavor, and the lentils and beans add protein and fiber.\n",
+ " * **Simple mushroom curry:** This is a quick and easy curry that is perfect for a weeknight meal. The mushrooms are cooked in a creamy sauce with spices, and the whole dish is ready in under 30 minutes.\n",
+ " * **Vegetarian enchiladas:** This is a classic Mexican dish that is easy to make vegetarian. The enchiladas are filled with a variety of vegetables, and they are topped with a delicious sauce.\n",
+ " * **Braised sesame tofu:** This is a flavorful and satisfying dish that is perfect for a cold night. The tofu is braised in a sauce with sesame, ginger, and garlic, and it is served over rice or noodles.\n",
+ " * **Roast garlic & tahini spinach:** This is a light and healthy dish that is perfect for a spring or summer meal. The spinach is roasted with garlic and tahini, and it is served with a side of pita bread.\n",
+ "\n",
+ " These are just a few ideas to get you started. There are many other great vegetarian dinner recipes out there, so you are sure to find something that you will enjoy.'\n",
+ " - user: Those all sound great! I like the Burnt aubergine veggie chilli\n",
+ " reference: That's a great choice! I hope you enjoy it."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "ie7KTd_OjwdE"
+ },
+ "source": [
+ "Let's now load all the messages into a Pandas DataFrame:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "j0eSy_avGCW7"
+ },
+ "outputs": [],
+ "source": [
+ "y = yaml.safe_load(open(\"chats.yaml\"))\n",
+ "df = pd.DataFrame(y)\n",
+ "df"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "xA7G0bDTj84z"
+ },
+ "source": [
+ "**Decomposing the chat message input/output pairs**\n",
+ "\n",
+ "We are now ready for decomposing multi-turn history. This is essential to enable batch prediction.\n",
+ "\n",
+ "We decompose the `messages` list into single input/output pairs. The input is always composed by `user message`, `reference message` and `conversation_history`.\n",
+ "\n",
+ "**Example:**\n",
+ "\n",
+ "Given the following chat:\n",
+ "\n",
+ "```yaml\n",
+ "- messages:\n",
+ "- user: Hi\n",
+ "reference: Hi, how can I help you?\n",
+ "- user: I'm looking for a recipe for a healthy dinner. Do you have any recommendations?\n",
+ "reference: Sure, I can help you with that. What are your dietary restrictions? Are you vegetarian, vegan, gluten-free, or anything else?\n",
+ "```\n",
+ "\n",
+ "We can generate these two input/output samples:\n",
+ "\n",
+ "```yaml\n",
+ "- user: Hi\n",
+ " reference: Hi, how can I help you?\n",
+ " conversation_history: []\n",
+ "\n",
+ "- user: I'm looking for a recipe for a healthy dinner....\n",
+ " reference: Sure, I can help you with that. What are your ...\n",
+ " conversation_history:\n",
+ " - user: Hi\n",
+ " reference: Hi, how can I help you?\n",
+ "```"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "1FL0R3YXPj3e"
+ },
+ "outputs": [],
+ "source": [
+ "df_processed = generate_multiturn_history(df)\n",
+ "df_processed"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "8OhQLR1lGsEq"
+ },
+ "source": [
+ "## Let's define our LangChain chain!\n",
+ "\n",
+ "We now need to define our LangChain Chain. For this tutorial, we will create a simple conversational chain capable of producing cooking recipes for users."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "u5B0ufczjfA_"
+ },
+ "outputs": [],
+ "source": [
+ "llm = ChatVertexAI(model_name=\"gemini-1.5-flash-001\", temperature=0)\n",
+ "\n",
+ "template = ChatPromptTemplate.from_messages(\n",
+ " [\n",
+ " (\n",
+ " \"system\",\n",
+ " \"\"\"You are a conversational bot that produce nice recipes for users based on a question.\"\"\",\n",
+ " ),\n",
+ " MessagesPlaceholder(variable_name=\"messages\"),\n",
+ " ]\n",
+ ")\n",
+ "\n",
+ "chain = template | llm"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "MS9rlK6eq5qY"
+ },
+ "source": [
+ "We can test our chain:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "qzMotN92qrqv"
+ },
+ "outputs": [],
+ "source": [
+ "chain.invoke([(\"human\", \"Hi there!\")])"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "NqdVpTMkq_pc"
+ },
+ "source": [
+ "## Batch scoring\n",
+ "\n",
+ "We are now ready to perform batch scoring. To perform batch scoring we will leverage the utility function `batch_generate_messages`\n",
+ "\n",
+ "Have a look at the definition to see the expected input format."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "Qc4DiVYOKk6H"
+ },
+ "outputs": [],
+ "source": [
+ "help(batch_generate_messages)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "Pip0WjopXMsu"
+ },
+ "outputs": [],
+ "source": [
+ "scored_data = batch_generate_messages(messages=df_processed, callable=chain)\n",
+ "# scored_data.rename(columns={\"user\": \"prompt\"}, inplace=True)\n",
+ "scored_data"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "7Tf1bssiSBk7"
+ },
+ "source": [
+ "## Evaluation\n",
+ "\n",
+ "We'll utilize [Vertex AI Rapid Evaluation](https://cloud.google.com/vertex-ai/generative-ai/docs/models/rapid-evaluation) to assess our generative AI model's performance. This service within Vertex AI streamlines the evaluation process, integrates with [Vertex AI Experiments](https://cloud.google.com/vertex-ai/docs/experiments/intro-vertex-ai-experiments) for tracking, and offers a range of [pre-built metrics](https://cloud.google.com/vertex-ai/generative-ai/docs/models/determine-eval#task-and-metrics) and the capability to define custom ones.\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "5ZSzjLI_rBSp"
+ },
+ "source": [
+ "#### Define a CustomMetric using Gemini model\n",
+ "\n",
+ "Define a customized Gemini model-based metric function, with explanations for the score. The registered custom metrics are computed on the client side, without using online evaluation service APIs."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "aT0uclHrSBlC"
+ },
+ "outputs": [],
+ "source": [
+ "evaluator_llm = ChatVertexAI(\n",
+ " model_name=\"gemini-1.5-flash-001\",\n",
+ " temperature=0,\n",
+ " response_mime_type=\"application/json\",\n",
+ ")\n",
+ "\n",
+ "\n",
+ "def custom_faithfulness(instance):\n",
+ " prompt = f\"\"\"You are examining written text content. Here is the text:\n",
+ "************\n",
+ "Written content: {instance[\"response\"]}\n",
+ "************\n",
+ "Original source data: {instance[\"reference\"]}\n",
+ "\n",
+ "Examine the text and determine whether the text is faithful or not.\n",
+ "Faithfulness refers to how accurately a generated summary reflects the essential information and key concepts present in the original source document.\n",
+ "A faithful summary stays true to the facts and meaning of the source text, without introducing distortions, hallucinations, or information that wasn't originally there.\n",
+ "\n",
+ "Your response must be an explanation of your thinking along with single integer number on a scale of 0-5, 0\n",
+ "the least faithful and 5 being the most faithful.\n",
+ "\n",
+ "Produce results in JSON\n",
+ "\n",
+ "Expected format:\n",
+ "\n",
+ "```json\n",
+ "{{\n",
+ " \"explanation\": \"< your explanation>\",\n",
+ " \"custom_faithfulness\": \n",
+ "}}\n",
+ "```\n",
+ "\"\"\"\n",
+ "\n",
+ " result = evaluator_llm.invoke([(\"human\", prompt)])\n",
+ " result = json.loads(result.content)\n",
+ " return result\n",
+ "\n",
+ "\n",
+ "# Register Custom Metric\n",
+ "custom_faithfulness_metric = CustomMetric(\n",
+ " name=\"custom_faithfulness\",\n",
+ " metric_function=custom_faithfulness,\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "JpdRqYlLq53t"
+ },
+ "source": [
+ "### Run Evaluation with CustomMetric"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "w4iU_mIhoY93"
+ },
+ "outputs": [],
+ "source": [
+ "experiment_name = \"rapid-eval-langchain-eval\" # @param {type:\"string\"}"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "U3r3NiKNqn3u"
+ },
+ "source": [
+ "We are now ready to run the evaluation. We will use different metrics, combining the custom metric we defined above with some pre-built metrics.\n",
+ "\n",
+ "Results of the evaluation will be automatically tagged into the experiment_name we defined.\n",
+ "\n",
+ "You can click `View Experiment`, to see the experiment in Google Cloud Console."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "zeVra-g1rAFV"
+ },
+ "outputs": [],
+ "source": [
+ "metrics = [\"fluency\", \"coherence\", \"safety\", custom_faithfulness_metric]\n",
+ "\n",
+ "eval_task = EvalTask(\n",
+ " dataset=scored_data,\n",
+ " metrics=metrics,\n",
+ " experiment=experiment_name,\n",
+ " metric_column_mapping={\"prompt\": \"user\"},\n",
+ ")\n",
+ "eval_result = eval_task.evaluate()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "ka3tZCL_uurD"
+ },
+ "source": [
+ "Once an eval result is produced, we are able to display summary metrics:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "KheOvIvtiRlz"
+ },
+ "outputs": [],
+ "source": [
+ "eval_result.summary_metrics"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "JcALGGlwu0p_"
+ },
+ "source": [
+ "We are also able to display a pandas dataframe containing a detailed summary of how our eval dataset performed and relative granular metrics."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "9zJ686YYiWJC"
+ },
+ "outputs": [],
+ "source": [
+ "eval_result.metrics_table"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "L1NsUKA3vEu8"
+ },
+ "source": [
+ "## Iterating over the prompt\n",
+ "\n",
+ "Let's perform some simple changes to our chain to see how our evaluation results change."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "O_8PnvLSv7Nu"
+ },
+ "outputs": [],
+ "source": [
+ "template = ChatPromptTemplate.from_messages(\n",
+ " [\n",
+ " (\n",
+ " \"system\",\n",
+ " \"\"\"You are a conversational bot that produce nice recipes for users based on a question.\n",
+ "Before suggesting a recipe, you should ask for the dietary requirements..\"\"\",\n",
+ " ),\n",
+ " MessagesPlaceholder(variable_name=\"messages\"),\n",
+ " ]\n",
+ ")\n",
+ "\n",
+ "new_chain = template | llm"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "JF_Po81twPbr"
+ },
+ "outputs": [],
+ "source": [
+ "scored_data = batch_generate_messages(messages=df_processed, callable=new_chain)\n",
+ "scored_data.rename(columns={\"text\": \"response\"}, inplace=True)\n",
+ "scored_data"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "snIQ0itfwUZa"
+ },
+ "outputs": [],
+ "source": [
+ "metrics = [\"fluency\", \"coherence\", \"safety\", custom_faithfulness_metric]\n",
+ "eval_task = EvalTask(\n",
+ " dataset=scored_data,\n",
+ " metrics=metrics,\n",
+ " experiment=experiment_name,\n",
+ " metric_column_mapping={\"prompt\": \"user\"},\n",
+ ")\n",
+ "eval_result = eval_task.evaluate()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "CT6Ma5FLwfbF"
+ },
+ "outputs": [],
+ "source": [
+ "eval_result.summary_metrics"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "b42l5juJsyyY"
+ },
+ "source": [
+ "#### Let's compare both eval results\n",
+ "\n",
+ "We can do that by using the method `display_runs` for a given `eval task` object to see which prompt template performed best on our dataset."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "HP0Zcm1yvh95"
+ },
+ "outputs": [],
+ "source": [
+ "df = vertexai.preview.get_experiment_df(experiment_name).T\n",
+ "df"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "TpV-iwP9qw9c"
+ },
+ "source": [
+ "## Cleaning up\n",
+ "\n",
+ "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n",
+ "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n",
+ "\n",
+ "Otherwise, you can delete the individual resources you created in this tutorial."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "sx_vKniMq9ZX"
+ },
+ "outputs": [],
+ "source": [
+ "import os\n",
+ "\n",
+ "# Delete Experiments\n",
+ "delete_experiments = True\n",
+ "if delete_experiments or os.getenv(\"IS_TESTING\"):\n",
+ " experiments_list = aiplatform.Experiment.list()\n",
+ " for experiment in experiments_list:\n",
+ " experiment.delete()"
+ ]
+ }
+ ],
+ "metadata": {
+ "colab": {
+ "toc_visible": true,
+ "provenance": []
+ },
+ "kernelspec": {
+ "display_name": "Python 3 (ipykernel)",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.10.9"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
\ No newline at end of file