diff --git a/docs/evaluation/how_to_guides/evaluate_with_attachments.mdx b/docs/evaluation/how_to_guides/evaluate_with_attachments.mdx new file mode 100644 index 00000000..4d770c87 --- /dev/null +++ b/docs/evaluation/how_to_guides/evaluate_with_attachments.mdx @@ -0,0 +1,510 @@ +import { + python, + CodeTabs, + PythonBlock, + TypeScriptBlock, +} from "@site/src/components/InstructionsWithCode"; + +# Run an evaluation with large file inputs + +In addition to supporting [file attachments with traces](../../../observability/how_to_guides/tracing/upload_files_with_traces), LangSmith supports arbitrary file attachments with your examples, which you can consume when you run experiments. + +This is particularly useful when working with LLM applications that require multimodal inputs or outputs. + +:::tip Using separate attachments outside of inputs/outputs +When dealing with large files, it's recommended to upload them as attachments rather than embedding them in your JSON inputs or outputs via base64 encoding. + +Attachments are more efficient because base64 encoding increases data size, leading to slower uploads and downloads. +This also avoid potential performance bottlenecks with parsing large JSON payloads. + +Finally, attachments are more user-friendly in the LangSmith UI, as they are rendered as files with previews, rather than as base64-encoded strings. +::: + +## Add examples with attachments to a LangSmith dataset + +### SDK + +To upload examples with attachments using the SDK, you need to use the `upload_examples_multipart`/`update_examples_multipart` Python methods or the `uploadExamplesMultipart`/`updateExamplesMultipart` TypeScript methods. +This method allows you to pass in a list of examples with attachments. + +:::note Minimum SDK Versions +All of the below features are available in the following SDK versions: + +- Python SDK: >=0.2.3 +- JS/TS SDK: >=0.2.13 + ::: + + { +const response = await fetch(url); +if (!response.ok) { + throw new Error(\`Failed to fetch \${url}\: $\{response.statusText\}\`); +} + return response.arrayBuffer(); +}\n +// Fetch files as ArrayBuffer +const pdfArrayBuffer = await fetchArrayBuffer(pdfUrl); +const wavArrayBuffer = await fetchArrayBuffer(wavUrl); +const pngArrayBuffer = await fetchArrayBuffer(pngUrl);\n +// Create the LangSmith client (Ensure LANGCHAIN_API_KEY is set in env) +const langsmithClient = new Client();\n +// Create a unique dataset name +const datasetName = "attachment-test-dataset:" + uuid4().substring(0, 8);\n +// Create the dataset +const dataset = await langsmithClient.createDataset(datasetName, { + description: "Test dataset for evals with publicly available attachments", +});\n +// Define the example with attachments +const exampleId = uuid4(); +const example = { + id: exampleId, + inputs: { + audio_question: "What is in this audio clip?", + image_question: "What is in this image?", + }, + outputs: { + audio_answer: "The sun rises in the east and sets in the west. This simple fact has been observed by humans for thousands of years.", + image_answer: "A mug with a blanket over it.", + }, +attachments: { +my_pdf: { +mimeType: "application/pdf", +data: pdfArrayBuffer +}, +my_wav: { +mimeType: "audio/wav", +data: wavArrayBuffer +}, +my_img: { +mimeType: "image/png", +data: pngArrayBuffer +}, +}, +}; + +// Upload the example with attachments to the dataset +await langsmithClient.uploadExamplesMultipart(dataset.id, [example]);`, + `In the TypeScript SDK, you can use the \`uploadExamplesMultipart\` method to upload examples with attachments.\n +Note that this is a different method from the standard \`createExamples\` method, which currently does not support attachments. +Each attachment requires either a \`Uint8Array\` or an \`ArrayBuffer\` as the data type.\n + +- \`Uint8Array\`: Useful for handling binary data directly. +- \`ArrayBuffer\`: Represents fixed-length binary data, which can be converted to \`Uint8Array\` as needed.\n`), + ]} + groupId="client-language" + /> + +Once you upload examples with attachments, you can view them in the LangSmith UI. Each attachment will be rendered as a file with a preview, making it easy to inspect the contents. +![](./static/attachments_with_examples.png) + +### From existing runs + +When adding runs to a LangSmith dataset, attachments can be selectively propagated from the source run to the destination example. +To learn more, please see [this guide](./manage_datasets_in_application#add-runs-from-the-tracing-project-ui). + +![](./static/add_trace_with_attachments_to_dataset.png) + +### From the LangSmith UI + +You can also upload examples with attachments directly from the LangSmith UI. You can do so by clicking the `+ Example` button in the `Examples` tab of the dataset UI. +You can then upload the attachments that you want by using the "Upload Files" button: + +![](./static/create_example_with_attachments.png) + +## Running evaluations with attachments + +Once you have a dataset that contains examples with file attachments, you can run evaluations that process these attachments. + +### Define a target function with attachments + +Now that we have a dataset that includes examples with attachments, we can define a target function to run over these examples. +The following example simply uses OpenAI's GPT-4o model to answer questions about an image and an audio clip. + +, config?: Record) { +const presignedUrl = config?.attachments?.["my_wav"]?.presigned_url; +if (!presignedUrl) { +throw new Error("No presigned URL provided for audio."); +} + +const response = await fetch(presignedUrl); +if (!response.ok) { +throw new Error(\`Failed to fetch audio: $\{response.statusText\}\`); +} + +const arrayBuffer = await response.arrayBuffer(); +const uint8Array = new Uint8Array(arrayBuffer); +const audioB64 = Buffer.from(uint8Array).toString("base64"); + +const audioCompletion = await client.chat.completions.create({ +model: "gpt-4o-audio-preview", +messages: [ +{ +role: "user", +content: [ +{ type: "text", text: inputs["audio_question"] }, +{ +type: "input_audio", +input_audio: { +data: audioB64, +format: "wav", +}, +}, +], +}, +], +}); + +const imageUrl = config?.attachments?.["my_img"]?.presigned_url +const imageCompletion = await client.chat.completions.create({ +model: "gpt-4o-mini", +messages: [ +{ +role: "user", +content: [ +{ type: "text", text: inputs["image_question"] }, +{ +type: "image_url", +image_url: { +url: imageUrl, +}, +}, +], +}, +], +}); + +return { +audio_answer: audioCompletion.choices[0].message.content, +image_answer: imageCompletion.choices[0].message.content, +}; +}`, + `In the TypeScript SDK, the \`config\` argument is used to pass in the attachments to the target function if \`includeAttachments\` is set to \`true\`.\n +The \`config\` will contain \`attachments\` which is an object mapping the attachment name to an object of the form:\n +\`\`\` +{\n +presigned_url: string, +} +\`\`\`` +), +]} +groupId="client-language" +/> + +### Define custom evaluators with attachments + +In addition to using attachments inside of your target function, you can also use them inside of your evaluators as follows. +The exact same rules apply as above to determine whether the evaluator should receive attachments. + +The evaluator below uses an LLM to judge if the reasoning and the answer are consistent. +To learn more about how to define llm-based evaluators, please see [this guide](./llm_as_judge). + + bool: + """Use an LLM to judge if the image description and ime are consistent."""\n + instructions = """ + Does the description of the following image make sense? + Please carefully review the image and the description to determine if the description is valid."""\n + class Response(BaseModel): + description_is_valid: bool\n + image_url = attachments["my_img"]["presigned_url"] + response = client.beta.chat.completions.parse( + model="gpt-4o", + messages=[ + { + "role": "system", + "content": instructions + }, + { + "role": "user", + "content": [ + {"type": "image_url", "image_url": {"url": image_url}}, + {"type": "text", "text": outputs["image_answer"]} + ] + } + ], + response_format=Response + )\n + return response.choices[0].message.parsed.description_is_valid\n +langsmith_client.evaluate( + file_qa, + data=dataset_name, + evaluators=[valid_image_description], +) +`), +TypeScriptBlock(`import { zodResponseFormat } from 'openai/helpers/zod'; +import { z } from 'zod'; +import { evaluate } from "langsmith/evaluation"; + +const DescriptionResponse = z.object({ +description_is_valid: z.boolean(), +}); + +async function validImageDescription({ +outputs, +attachments, +}: { +outputs?: any; +attachments?: any; +}): Promise<{ key: string; score: boolean}> { +const instructions = \`Does the description of the following image make sense? +Please carefully review the image and the description to determine if the description is valid.\`; + + const imageUrl = attachments?.["my_img"]?.presigned_url + + const completion = await client.beta.chat.completions.parse({ + model: "gpt-4o", + messages: [ + { + role: "system", + content: instructions, + }, + { + role: "user", + content: [ + { type: "image_url", image_url: { url: imageUrl } }, + { type: "text", text: outputs?.image_answer }, + ], + }, + ], + response_format: zodResponseFormat(DescriptionResponse, 'imageResponse'), + }); + + const score: boolean = completion.choices[0]?.message?.parsed?.description_is_valid ?? false; + return { key: "valid_image_description", score }; + +} + +const resp = await evaluate(fileQA, { +data: datasetName, +// Need to pass flag to include attachments +includeAttachments: true, +evaluators: [validImageDescription], +client: langsmithClient +});`), +]} +groupId="client-language" +/> + +## Manage datasets with attachments + +### Manage programmatically + +In the code [above](#add-examples-with-attachments-to-a-langsmith-dataset), we showed how to add examples with attachments to a dataset. +is also possible to update these same examples using the SDK. + +As with existing examples, datasets are versioned when you update them with attachments. Therefore, you can navigate to the dataset version history to see the changes made to each example. +To learn more, please see [this guide](./manage_datasets_in_application). + +When updating an example with attachments, you can update attachments in a few different ways: + +- Pass in new attachments +- Rename existing attachments +- Delete existing attachments + + + +:::warning Attachment Operations +Currently, errors are not thrown if you pass the wrong attaachment name to `rename` or `retain`. +New attachments **ALWAYS** take precedence over existing attachments. So if you upload a new attachment +name "foo" and try to retain or rename an existing attachment to "foo", the new attachment will be used instead. + +Anything not in `rename` or `retain` will be deleted. +::: + +### From the LangSmith UI + +:::note Attachment Size Limit +Attachments are limited to 20MB in size in the UI. +::: + +When editing an example in the UI, you can upload new attachments, rename and delete attachemnts, +and there is also a quick reset button to restore the attachments to what previously existed on the example. +No changes will be saved until you click submit. + +![](./static/attachment_editing.gif) diff --git a/docs/evaluation/how_to_guides/index.md b/docs/evaluation/how_to_guides/index.md index 120ee0c1..f79293a1 100644 --- a/docs/evaluation/how_to_guides/index.md +++ b/docs/evaluation/how_to_guides/index.md @@ -20,6 +20,7 @@ Evaluate and improve your application before deploying it. - [Evaluate an existing experiment (Python only)](./how_to_guides/evaluate_existing_experiment) - [Run an evaluation from the UI](./how_to_guides/run_evaluation_from_prompt_playground) - [Run an evaluation via the REST API](./how_to_guides/run_evals_api_only) +- [Run an evaluation with large file inputs](./how_to_guides/evaluate_with_attachments) ### Define an evaluator @@ -43,6 +44,7 @@ Evaluate and improve your application before deploying it. - [Handle model rate limits](./how_to_guides/rate_limiting) - [Print detailed logs (Python only)](../../observability/how_to_guides/tracing/output_detailed_logs) - [Run an evaluation locally (beta, Python only)](./how_to_guides/local) + > > > > > > > main ## Unit testing diff --git a/docs/evaluation/how_to_guides/static/add_trace_with_attachments_to_dataset.png b/docs/evaluation/how_to_guides/static/add_trace_with_attachments_to_dataset.png new file mode 100644 index 00000000..52378645 Binary files /dev/null and b/docs/evaluation/how_to_guides/static/add_trace_with_attachments_to_dataset.png differ diff --git a/docs/evaluation/how_to_guides/static/attachment_editing.gif b/docs/evaluation/how_to_guides/static/attachment_editing.gif new file mode 100644 index 00000000..223b7953 Binary files /dev/null and b/docs/evaluation/how_to_guides/static/attachment_editing.gif differ diff --git a/docs/evaluation/how_to_guides/static/attachments_with_examples.png b/docs/evaluation/how_to_guides/static/attachments_with_examples.png new file mode 100644 index 00000000..bf11aff1 Binary files /dev/null and b/docs/evaluation/how_to_guides/static/attachments_with_examples.png differ diff --git a/docs/evaluation/how_to_guides/static/create_example_with_attachments.png b/docs/evaluation/how_to_guides/static/create_example_with_attachments.png new file mode 100644 index 00000000..c2a56329 Binary files /dev/null and b/docs/evaluation/how_to_guides/static/create_example_with_attachments.png differ