Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add docs for evaluating with attachments #540

Merged
merged 33 commits into from
Dec 12, 2024
Merged
Changes from 1 commit
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
77cbaf9
draft
isahers1 Nov 19, 2024
3896271
bagatur comments
isahers1 Nov 19, 2024
fd60625
Merge branch 'main' into isaac/examplewithattachmentdocs
isahers1 Nov 20, 2024
fa6e7f6
update docs
agola11 Nov 20, 2024
d4e5d8f
add screenshot
agola11 Nov 20, 2024
96b121f
fmt
agola11 Nov 20, 2024
b8d8462
edits
isahers1 Nov 20, 2024
5c537cf
light edits
agola11 Nov 20, 2024
4f44045
pass attachments to evaluate
isahers1 Nov 21, 2024
52e57f0
add link
isahers1 Nov 21, 2024
ca0fb14
edits
isahers1 Dec 2, 2024
6da15bd
Update docs/evaluation/how_to_guides/evaluation/evaluate_with_attachm…
agola11 Dec 4, 2024
c73b6dc
Update docs/evaluation/how_to_guides/evaluation/evaluate_with_attachm…
agola11 Dec 4, 2024
feb7722
Update docs/evaluation/how_to_guides/evaluation/evaluate_with_attachm…
agola11 Dec 4, 2024
4a6c877
Update docs/evaluation/how_to_guides/evaluation/evaluate_with_attachm…
agola11 Dec 4, 2024
ccd44e4
wip
isahers1 Dec 11, 2024
667caca
nits
isahers1 Dec 11, 2024
9b56217
nits
isahers1 Dec 11, 2024
f6d44f6
wip
isahers1 Dec 12, 2024
f4e8477
merge
agola11 Dec 12, 2024
3945996
fix a couple of issues
agola11 Dec 12, 2024
75c55e9
checkpoint
agola11 Dec 12, 2024
2f783dc
fix
agola11 Dec 12, 2024
a409b87
add missing image
agola11 Dec 12, 2024
74b4485
title
agola11 Dec 12, 2024
7adc63c
title
agola11 Dec 12, 2024
83589a5
fix
agola11 Dec 12, 2024
50bd13b
fix
agola11 Dec 12, 2024
85dd829
fix
agola11 Dec 12, 2024
9c5b168
fix
agola11 Dec 12, 2024
d75e677
fix
agola11 Dec 12, 2024
f4371d0
fix
agola11 Dec 12, 2024
40feefe
fix
agola11 Dec 12, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
103 changes: 103 additions & 0 deletions docs/evaluation/how_to_guides/evaluation/evaluate_with_attachments.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
import {
CodeTabs,
python,
typescript,
PythonBlock,
TypeScriptBlock,
isahers1 marked this conversation as resolved.
Show resolved Hide resolved
} from "@site/src/components/InstructionsWithCode";

# Evaluate an LLM application with attachments
isahers1 marked this conversation as resolved.
Show resolved Hide resolved

Attachments allow you to associate large files with your examples. This allows you to evaluate RAG applications
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe worth adding a sentence about why attachments are better than storing file in example inputs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tried my best, but someone more well versed with the benefits should probably chime in

over large internal documents, benchmark image analysis tools, etc.

## Create a dataset with attachments

To create a dataset with attachments, you need to use the `upsert_examples_multipart` method of the LangSmith client:

```python
from langsmith.client import Client
isahers1 marked this conversation as resolved.
Show resolved Hide resolved
from langsmith.schemas import ExampleUpsertWithAttachments

# Pass in your api key directly, or define it in the LANGCHAIN_API_KEY environment variable
langchain_client = Client(api_key="...")
isahers1 marked this conversation as resolved.
Show resolved Hide resolved
isahers1 marked this conversation as resolved.
Show resolved Hide resolved

dataset = langchain_client.create_dataset(
dataset_name="attachment-test-dataset",
description="Test dataset for evals with attachments",
)

# Define the example
example = ExampleUpsertWithAttachments(
dataset_id=dataset.id,
inputs={"question": "What were the cumulative earnings earned from online orders in the midwest during Q2?"},
outputs={"answer": "$123456"},
attachments={
# Each attachments is just a name with a mime type and the bytes content of the file
"pdf": ("application/pdf", Path("./foo_earnings.pdf").read_bytes()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not related to this pr, but is it too late to change tuple -> dict? imo naming the values improves usability a lot, eg

attachments = [
  {
    "name": "my_pdf",
    "mime_type": "applications/pdf",
    "bytes": Path(...).read_bytes()
  }, 
  ...
]

is easier to read for me

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

related to this pr, would be nice to download some real public pdf here to use. could just be https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf

# We can pass multiple attachments (of different types!), as long as they have different names
"pptx": ("application/pptx", Path("./foo_earnings.pptx").read_bytes()),
},
)

# Upsert the examples
langchain_client.upsert_examples_multipart(upserts=[example])
agola11 marked this conversation as resolved.
Show resolved Hide resolved
```

## Define a target function with attachments

Now that we have a dataset that includes examples with attachments, we can define a target function to run our LLM application with the attachments.
The target function must have two positional arguments, the first must be called `inputs` and the second must be called `attachments`.

```python
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage
isahers1 marked this conversation as resolved.
Show resolved Hide resolved

model = ChatOpenAI(model="gpt-4o-mini")

def pdf_to_image_bytes(pdf_bytes, image_format='PNG'):
pdf_document = fitz.open(stream=pdf_bytes, filetype="pdf")
isahers1 marked this conversation as resolved.
Show resolved Hide resolved
images = []
for page in pdf_document:
pix = page.get_pixmap()
img_bytes = BytesIO()
pix.pil_save(img_bytes, format=image_format)
# Encode the bytes in base64
base64_bytes = base64.b64encode(img_bytes.getvalue()).decode('utf-8')
images.append(base64_bytes)
pdf_document.close()
return images


def target(inputs, attachments):
isahers1 marked this conversation as resolved.
Show resolved Hide resolved
system_message = SystemMessage(
content="The images are of the pdf that the question is referencing. Use the images to generate your answer."
)
# The attachment tuple returned contains the S3 url first and then a reader of the bytes
pdf_s3_url, pdf_reader = attachments['pdf']
images = pdf_to_image_bytes(pdf_reader.read())
human_message = HumanMessage(
content=[
{"type": "text", "text": inputs["question"]},
] + [{"type": "image_url", "image_url": {"url": f"data:image/png;base64,{image}"}} for image in images],
)
messages = [system_message, human_message]
return {"answer": model.invoke(messages).content}
```

## Run an evaluation

We can then run an evaluation as usual, by passing in the target function to the `evaluate` method:

```python
# We can optionally define an evaluator to use
def evaluator(run, example):
isahers1 marked this conversation as resolved.
Show resolved Hide resolved
score = int(str(example.outputs["answer"]) in run.outputs["answer"])
return {"key": "correctness", "score": score}

evaluate(
target,
data="attachment-test-dataset",
client=langchain_client,
)
```
Loading