Add LangSmith Run Chat Loader (#11458)

langchain-ai · Oct 7, 2023 · eb572f4 · eb572f4
1 parent 484947c
commit eb572f4
Show file tree

Hide file tree

Showing 7 changed files with 882 additions and 39 deletions.
diff --git a/docs/docs_skeleton/docs/integrations/chat_loaders/example_data/langsmith_chat_dataset.json b/docs/docs_skeleton/docs/integrations/chat_loaders/example_data/langsmith_chat_dataset.json
diff --git a/docs/docs_skeleton/docs/integrations/chat_loaders/langsmith_dataset.ipynb b/docs/docs_skeleton/docs/integrations/chat_loaders/langsmith_dataset.ipynb
@@ -0,0 +1,279 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "a9ab2a39-7c2d-4119-9dc7-8035fdfba3cb",
+   "metadata": {},
+   "source": [
+    "# Fine-Tuning on LangSmith Chat Datasets\n",
+    "\n",
+    "This notebook demonstrates an easy way to load a LangSmith chat dataset fine-tune a model on that data.\n",
+    "The process is simple and comprises 3 steps.\n",
+    "\n",
+    "1. Create the chat dataset.\n",
+    "2. Use the LangSmithDatasetChatLoader to load examples.\n",
+    "3. Fine-tune your model.\n",
+    "\n",
+    "Then you can use the fine-tuned model in your LangChain app.\n",
+    "\n",
+    "Before diving in, let's install our prerequisites.\n",
+    "\n",
+    "## Prerequisites\n",
+    "\n",
+    "Ensure you've installed langchain >= 0.0.311 and have configured your environment with your LangSmith API key."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ef488003-514a-48b4-93f1-7de4417abf5d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%pip install -U langchain openai"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "9fba5c30-9e72-48aa-9535-80f2b3d18ead",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "import uuid\n",
+    "uid = uuid.uuid4().hex[:6]\n",
+    "os.environ[\"LANGCHAIN_TRACING_V2\"] = \"true\"\n",
+    "os.environ[\"LANGCHAIN_API_KEY\"] = \"YOUR API KEY\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8533ab63-d437-492a-aaec-ccca31167bf2",
+   "metadata": {},
+   "source": [
+    "## 1. Select dataset\n",
+    "\n",
+    "This notebook fine-tunes a model directly on a selecting which runs to fine-tune on. You will often curate these from traced runs. You can learn more about LangSmith datasets in the docs [docs](https://docs.smith.langchain.com/evaluation/datasets).\n",
+    "\n",
+    "For the sake of this tutorial, we will upload an existing dataset here that you can use."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "462515e0-872a-446e-abbd-6166d73d7414",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langsmith.client import Client\n",
+    "\n",
+    "client = Client()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "d384e4ac-5e8f-42a2-8bb5-7d3c9a8a540d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import requests\n",
+    "url = \"https://raw.githubusercontent.com/langchain-ai/langchain/master/docs/docs_skeleton/docs/integrations/chat_loaders/example_data/langsmith_chat_dataset.json\"\n",
+    "response = requests.get(url)\n",
+    "response.raise_for_status()\n",
+    "data = response.json()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "b0d8ae47-2d3f-4b01-b15f-da58bd750fb4",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "dataset_name = f\"Extraction Fine-tuning Dataset {uid}\"\n",
+    "ds = client.create_dataset(dataset_name=dataset_name, data_type=\"chat\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "87f085b7-71e1-4ff4-a622-e4e1248aa94a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "_ = client.create_examples(\n",
+    "    inputs = [e['inputs'] for e in data],\n",
+    "    outputs = [e['outputs'] for e in data],\n",
+    "    dataset_id=ds.id,\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f365a359-52f7-47ff-8c36-aadc1070b409",
+   "metadata": {},
+   "source": [
+    "## 2. Prepare Data\n",
+    "Now we can create an instance of LangSmithRunChatLoader and load the chat sessions using its lazy_load() method."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "817bc077-c18a-473b-94a4-a7d810d583a8",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain.chat_loaders.langsmith import LangSmithDatasetChatLoader\n",
+    "\n",
+    "loader = LangSmithDatasetChatLoader(dataset_name=dataset_name)\n",
+    "\n",
+    "chat_sessions = loader.lazy_load()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f21a3bbd-1ed4-481b-9640-206b8bf0d751",
+   "metadata": {},
+   "source": [
+    "#### With the chat sessions loaded, convert them into a format suitable for fine-tuning."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "9e5ac127-b094-4584-9159-5a6d3d7315c7",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain.adapters.openai import convert_messages_for_finetuning\n",
+    "\n",
+    "training_data = convert_messages_for_finetuning(chat_sessions)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "188c4978-d85e-4984-a008-a50f6cd6bb84",
+   "metadata": {},
+   "source": [
+    "## 3. Fine-tune the Model\n",
+    "Now, initiate the fine-tuning process using the OpenAI library."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "11d19e28-be49-4801-8065-1a58d13cd192",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Status=[running]... 302.42s. 143.85s\r"
+     ]
+    }
+   ],
+   "source": [
+    "import openai\n",
+    "import time\n",
+    "import json\n",
+    "from io import BytesIO\n",
+    "\n",
+    "my_file = BytesIO()\n",
+    "for dialog in training_data:\n",
+    "    my_file.write((json.dumps({\"messages\": dialog}) + \"\\n\").encode('utf-8'))\n",
+    "\n",
+    "my_file.seek(0)\n",
+    "training_file = openai.File.create(\n",
+    "  file=my_file,\n",
+    "  purpose='fine-tune'\n",
+    ")\n",
+    "\n",
+    "job = openai.FineTuningJob.create(\n",
+    "    training_file=training_file.id,\n",
+    "    model=\"gpt-3.5-turbo\",\n",
+    ")\n",
+    "\n",
+    "# Wait for the fine-tuning to complete (this may take some time)\n",
+    "status = openai.FineTuningJob.retrieve(job.id).status\n",
+    "start_time = time.time()\n",
+    "while status != \"succeeded\":\n",
+    "    print(f\"Status=[{status}]... {time.time() - start_time:.2f}s\", end=\"\\r\", flush=True)\n",
+    "    time.sleep(5)\n",
+    "    status = openai.FineTuningJob.retrieve(job.id).status\n",
+    "\n",
+    "# Now your model is fine-tuned!"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "54c4cead-500d-41dd-8333-2defde634396",
+   "metadata": {},
+   "source": [
+    "## 4. Use in LangChain\n",
+    "\n",
+    "After fine-tuning, use the resulting model ID with the ChatOpenAI model class in your LangChain app."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "3f472ca4-fa9b-485d-bd37-8ce3c59c44db",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Get the fine-tuned model ID\n",
+    "job = openai.FineTuningJob.retrieve(job.id)\n",
+    "model_id = job.fine_tuned_model\n",
+    "\n",
+    "# Use the fine-tuned model in LangChain\n",
+    "model = ChatOpenAI(\n",
+    "    model=model_id,\n",
+    "    temperature=1,\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7d3b5845-6385-42d1-9f7d-5ea798dc2cd9",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "model.invoke(\"There were three ravens sat on a tree.\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5b8c2c79-ce27-4f37-b1b2-5977db8c4e84",
+   "metadata": {},
+   "source": [
+    "Now you have successfully fine-tuned a model using data from LangSmith LLM runs!"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.2"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}