-
Notifications
You must be signed in to change notification settings - Fork 15.8k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add LangSmith Run Chat Loader (#11458)
- Loading branch information
Showing
7 changed files
with
882 additions
and
39 deletions.
There are no files selected for viewing
1 change: 1 addition & 0 deletions
1
docs/docs_skeleton/docs/integrations/chat_loaders/example_data/langsmith_chat_dataset.json
Large diffs are not rendered by default.
Oops, something went wrong.
279 changes: 279 additions & 0 deletions
279
docs/docs_skeleton/docs/integrations/chat_loaders/langsmith_dataset.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,279 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"id": "a9ab2a39-7c2d-4119-9dc7-8035fdfba3cb", | ||
"metadata": {}, | ||
"source": [ | ||
"# Fine-Tuning on LangSmith Chat Datasets\n", | ||
"\n", | ||
"This notebook demonstrates an easy way to load a LangSmith chat dataset fine-tune a model on that data.\n", | ||
"The process is simple and comprises 3 steps.\n", | ||
"\n", | ||
"1. Create the chat dataset.\n", | ||
"2. Use the LangSmithDatasetChatLoader to load examples.\n", | ||
"3. Fine-tune your model.\n", | ||
"\n", | ||
"Then you can use the fine-tuned model in your LangChain app.\n", | ||
"\n", | ||
"Before diving in, let's install our prerequisites.\n", | ||
"\n", | ||
"## Prerequisites\n", | ||
"\n", | ||
"Ensure you've installed langchain >= 0.0.311 and have configured your environment with your LangSmith API key." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "ef488003-514a-48b4-93f1-7de4417abf5d", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"%pip install -U langchain openai" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 1, | ||
"id": "9fba5c30-9e72-48aa-9535-80f2b3d18ead", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import os\n", | ||
"import uuid\n", | ||
"uid = uuid.uuid4().hex[:6]\n", | ||
"os.environ[\"LANGCHAIN_TRACING_V2\"] = \"true\"\n", | ||
"os.environ[\"LANGCHAIN_API_KEY\"] = \"YOUR API KEY\"" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "8533ab63-d437-492a-aaec-ccca31167bf2", | ||
"metadata": {}, | ||
"source": [ | ||
"## 1. Select dataset\n", | ||
"\n", | ||
"This notebook fine-tunes a model directly on a selecting which runs to fine-tune on. You will often curate these from traced runs. You can learn more about LangSmith datasets in the docs [docs](https://docs.smith.langchain.com/evaluation/datasets).\n", | ||
"\n", | ||
"For the sake of this tutorial, we will upload an existing dataset here that you can use." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 2, | ||
"id": "462515e0-872a-446e-abbd-6166d73d7414", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from langsmith.client import Client\n", | ||
"\n", | ||
"client = Client()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 3, | ||
"id": "d384e4ac-5e8f-42a2-8bb5-7d3c9a8a540d", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import requests\n", | ||
"url = \"https://raw.githubusercontent.com/langchain-ai/langchain/master/docs/docs_skeleton/docs/integrations/chat_loaders/example_data/langsmith_chat_dataset.json\"\n", | ||
"response = requests.get(url)\n", | ||
"response.raise_for_status()\n", | ||
"data = response.json()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 4, | ||
"id": "b0d8ae47-2d3f-4b01-b15f-da58bd750fb4", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"dataset_name = f\"Extraction Fine-tuning Dataset {uid}\"\n", | ||
"ds = client.create_dataset(dataset_name=dataset_name, data_type=\"chat\")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 5, | ||
"id": "87f085b7-71e1-4ff4-a622-e4e1248aa94a", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"_ = client.create_examples(\n", | ||
" inputs = [e['inputs'] for e in data],\n", | ||
" outputs = [e['outputs'] for e in data],\n", | ||
" dataset_id=ds.id,\n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "f365a359-52f7-47ff-8c36-aadc1070b409", | ||
"metadata": {}, | ||
"source": [ | ||
"## 2. Prepare Data\n", | ||
"Now we can create an instance of LangSmithRunChatLoader and load the chat sessions using its lazy_load() method." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 10, | ||
"id": "817bc077-c18a-473b-94a4-a7d810d583a8", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from langchain.chat_loaders.langsmith import LangSmithDatasetChatLoader\n", | ||
"\n", | ||
"loader = LangSmithDatasetChatLoader(dataset_name=dataset_name)\n", | ||
"\n", | ||
"chat_sessions = loader.lazy_load()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "f21a3bbd-1ed4-481b-9640-206b8bf0d751", | ||
"metadata": {}, | ||
"source": [ | ||
"#### With the chat sessions loaded, convert them into a format suitable for fine-tuning." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 11, | ||
"id": "9e5ac127-b094-4584-9159-5a6d3d7315c7", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from langchain.adapters.openai import convert_messages_for_finetuning\n", | ||
"\n", | ||
"training_data = convert_messages_for_finetuning(chat_sessions)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "188c4978-d85e-4984-a008-a50f6cd6bb84", | ||
"metadata": {}, | ||
"source": [ | ||
"## 3. Fine-tune the Model\n", | ||
"Now, initiate the fine-tuning process using the OpenAI library." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "11d19e28-be49-4801-8065-1a58d13cd192", | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"Status=[running]... 302.42s. 143.85s\r" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"import openai\n", | ||
"import time\n", | ||
"import json\n", | ||
"from io import BytesIO\n", | ||
"\n", | ||
"my_file = BytesIO()\n", | ||
"for dialog in training_data:\n", | ||
" my_file.write((json.dumps({\"messages\": dialog}) + \"\\n\").encode('utf-8'))\n", | ||
"\n", | ||
"my_file.seek(0)\n", | ||
"training_file = openai.File.create(\n", | ||
" file=my_file,\n", | ||
" purpose='fine-tune'\n", | ||
")\n", | ||
"\n", | ||
"job = openai.FineTuningJob.create(\n", | ||
" training_file=training_file.id,\n", | ||
" model=\"gpt-3.5-turbo\",\n", | ||
")\n", | ||
"\n", | ||
"# Wait for the fine-tuning to complete (this may take some time)\n", | ||
"status = openai.FineTuningJob.retrieve(job.id).status\n", | ||
"start_time = time.time()\n", | ||
"while status != \"succeeded\":\n", | ||
" print(f\"Status=[{status}]... {time.time() - start_time:.2f}s\", end=\"\\r\", flush=True)\n", | ||
" time.sleep(5)\n", | ||
" status = openai.FineTuningJob.retrieve(job.id).status\n", | ||
"\n", | ||
"# Now your model is fine-tuned!" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "54c4cead-500d-41dd-8333-2defde634396", | ||
"metadata": {}, | ||
"source": [ | ||
"## 4. Use in LangChain\n", | ||
"\n", | ||
"After fine-tuning, use the resulting model ID with the ChatOpenAI model class in your LangChain app." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "3f472ca4-fa9b-485d-bd37-8ce3c59c44db", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Get the fine-tuned model ID\n", | ||
"job = openai.FineTuningJob.retrieve(job.id)\n", | ||
"model_id = job.fine_tuned_model\n", | ||
"\n", | ||
"# Use the fine-tuned model in LangChain\n", | ||
"model = ChatOpenAI(\n", | ||
" model=model_id,\n", | ||
" temperature=1,\n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "7d3b5845-6385-42d1-9f7d-5ea798dc2cd9", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"model.invoke(\"There were three ravens sat on a tree.\")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "5b8c2c79-ce27-4f37-b1b2-5977db8c4e84", | ||
"metadata": {}, | ||
"source": [ | ||
"Now you have successfully fine-tuned a model using data from LangSmith LLM runs!" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3 (ipykernel)", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.11.2" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 5 | ||
} |
Oops, something went wrong.