Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python] support uploading examples with attachments and running evals on examples with attachments #1209

Merged
merged 102 commits into from
Dec 10, 2024
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
Show all changes
102 commits
Select commit Hold shift + click to select a range
e9e2131
wip
isahers1 Nov 13, 2024
ff30541
unit test
isahers1 Nov 13, 2024
152ec59
integration test skeleton
isahers1 Nov 13, 2024
27b1546
integration test passing
isahers1 Nov 13, 2024
53a0f14
wip
isahers1 Nov 14, 2024
025aa6d
wip
isahers1 Nov 14, 2024
4208b6e
Update python/langsmith/client.py
isahers1 Nov 14, 2024
fd16baa
more edits
isahers1 Nov 14, 2024
28a4677
nit
isahers1 Nov 14, 2024
816302d
nit
isahers1 Nov 14, 2024
aa947a6
remove dev endpoint in test
isahers1 Nov 18, 2024
a82063b
typo
isahers1 Nov 18, 2024
b18df6b
Merge branch 'main' into isaac/multipartstuff
isahers1 Nov 18, 2024
ad19daf
fmt
isahers1 Nov 18, 2024
390ac66
yml changes
isahers1 Nov 18, 2024
523e5d1
fmt
isahers1 Nov 18, 2024
ed3aa1c
example search restoration
isahers1 Nov 18, 2024
ce73afc
fmt
isahers1 Nov 18, 2024
460b16b
list -> List
isahers1 Nov 18, 2024
4e9edf4
dict -> Dict
isahers1 Nov 18, 2024
b6b9d79
fmt
isahers1 Nov 18, 2024
bc9ec6f
undo yml changes
isahers1 Nov 18, 2024
15708dc
unit test fix
isahers1 Nov 18, 2024
527174a
unit test fix
isahers1 Nov 18, 2024
81f5249
unit test fix
isahers1 Nov 18, 2024
0b476e8
Merge branch 'main' into isaac/multipartstuff
isahers1 Nov 19, 2024
f36a0cb
make evaluate function compatible with attachments (#1218)
isahers1 Nov 19, 2024
ddbe2f5
file path update
isahers1 Nov 19, 2024
c1ba615
add benchmarks
jakerachleff Nov 19, 2024
3544171
better error message
jakerachleff Nov 19, 2024
3cc32c5
aevaluate
isahers1 Nov 19, 2024
161e0d1
Merge branch 'isaac/multipartstuff' of https://github.com/langchain-a…
isahers1 Nov 19, 2024
08a6f34
unit test for _include_attachments
isahers1 Nov 20, 2024
8e2e704
test that adding examples without attachments still lets you run evals
isahers1 Nov 20, 2024
cfa0e4c
fmt
isahers1 Nov 20, 2024
de38a37
fmt
isahers1 Nov 20, 2024
2e74735
fmt
isahers1 Nov 20, 2024
f26c996
attempt fix
isahers1 Nov 20, 2024
095aae9
fix test
isahers1 Nov 20, 2024
a99da23
add unit test
isahers1 Nov 20, 2024
ee9d968
Merge branch 'main' into isaac/multipartstuff
agola11 Nov 20, 2024
b9dd0f2
Bump version (rc)
hinthornw Nov 20, 2024
01ef4d0
repetitions
isahers1 Nov 27, 2024
3715c30
nit
isahers1 Nov 27, 2024
49442d7
added upload endpoint
isahers1 Dec 2, 2024
9a70f70
Merge branch 'main' into isaac/multipartstuff
isahers1 Dec 2, 2024
484f2a5
comments
isahers1 Dec 6, 2024
f57b4bd
Merge branch 'main' into isaac/multipartstuff
isahers1 Dec 6, 2024
28fe5d1
fmt
isahers1 Dec 6, 2024
1e5eebf
fmt
isahers1 Dec 6, 2024
e013d72
fmt
isahers1 Dec 6, 2024
bc2d4b6
fmt
isahers1 Dec 6, 2024
96f4246
fix test
isahers1 Dec 6, 2024
887782e
x
isahers1 Dec 6, 2024
66228e8
defaults
isahers1 Dec 9, 2024
a5ee599
refactor
isahers1 Dec 9, 2024
2f1e6be
fmt
isahers1 Dec 9, 2024
c9ade2e
fmt
isahers1 Dec 9, 2024
4576779
fmt
isahers1 Dec 9, 2024
578a715
changes
isahers1 Dec 9, 2024
e4e3068
fmt
isahers1 Dec 9, 2024
6e91e05
x
isahers1 Dec 9, 2024
020d074
fmt
isahers1 Dec 9, 2024
1abe4f9
flag
isahers1 Dec 9, 2024
39be3c7
flags in tests
isahers1 Dec 9, 2024
5c2c74d
attachment_urls -> attachments
isahers1 Dec 9, 2024
2b385b6
x
isahers1 Dec 9, 2024
0daf245
fmt
baskaryan Dec 9, 2024
04a5496
Merge branch 'isaac/multipartstuff' of github.com:langchain-ai/langsm…
baskaryan Dec 9, 2024
c8a2b01
undo
isahers1 Dec 9, 2024
8033b7e
undo
isahers1 Dec 9, 2024
114a79d
fix
isahers1 Dec 9, 2024
b524f72
fix
isahers1 Dec 9, 2024
23187f1
test fix
isahers1 Dec 9, 2024
5471e88
fmt
isahers1 Dec 9, 2024
49246d0
fmt
baskaryan Dec 9, 2024
9b7c36a
Merge branch 'isaac/multipartstuff' of github.com:langchain-ai/langsm…
baskaryan Dec 9, 2024
b0921e0
tests
isahers1 Dec 9, 2024
70c3f3c
tests
isahers1 Dec 10, 2024
8bb0826
update examples multipart (#1310)
isahers1 Dec 10, 2024
c841ec6
add attachments to evaluate (#1237)
isahers1 Dec 10, 2024
eeeb375
Merge branch 'main' into isaac/multipartstuff
agola11 Dec 10, 2024
f3cc56f
update to 0.2.2
agola11 Dec 10, 2024
bf00aa6
fix spelling
agola11 Dec 10, 2024
c63b92c
fix update_examples issue
agola11 Dec 10, 2024
76e003e
fix test
isahers1 Dec 10, 2024
cf9e58c
Merge branch 'isaac/multipartstuff' of https://github.com/langchain-a…
isahers1 Dec 10, 2024
ed73f1a
test fix
isahers1 Dec 10, 2024
4887a99
attempt to fix test_update_examples_multipart
agola11 Dec 10, 2024
87d2a33
Merge branch 'isaac/multipartstuff' of github.com:langchain-ai/langsm…
agola11 Dec 10, 2024
6b9a026
fix tests
isahers1 Dec 10, 2024
1f25f55
x
isahers1 Dec 10, 2024
7573691
x
isahers1 Dec 10, 2024
cf85e56
fix test
isahers1 Dec 10, 2024
5c74829
fix test_bulk_update_examples_with_attachments_operations
agola11 Dec 10, 2024
9795e6e
Merge branch 'isaac/multipartstuff' of github.com:langchain-ai/langsm…
agola11 Dec 10, 2024
266272d
lint and fmt
agola11 Dec 10, 2024
61b28f5
fix tests
isahers1 Dec 10, 2024
ca3ec28
Merge branch 'isaac/multipartstuff' of https://github.com/langchain-a…
isahers1 Dec 10, 2024
34e8bb9
fmt
isahers1 Dec 10, 2024
e043a7d
fmt
isahers1 Dec 10, 2024
d77bd0e
remove blanket try/except
agola11 Dec 10, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
124 changes: 124 additions & 0 deletions python/bench/upload_examples_bench.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
import statistics

Check notice on line 1 in python/bench/upload_examples_bench.py

View workflow job for this annotation

GitHub Actions / benchmark

Benchmark results

......................................... create_5_000_run_trees: Mean +- std dev: 617 ms +- 46 ms ......................................... create_10_000_run_trees: Mean +- std dev: 1.19 sec +- 0.05 sec ......................................... create_20_000_run_trees: Mean +- std dev: 1.18 sec +- 0.06 sec ......................................... dumps_class_nested_py_branch_and_leaf_200x400: Mean +- std dev: 717 us +- 15 us ......................................... dumps_class_nested_py_leaf_50x100: Mean +- std dev: 25.0 ms +- 0.2 ms ......................................... dumps_class_nested_py_leaf_100x200: Mean +- std dev: 103 ms +- 2 ms ......................................... dumps_dataclass_nested_50x100: Mean +- std dev: 25.4 ms +- 0.3 ms ......................................... WARNING: the benchmark result may be unstable * the standard deviation (15.6 ms) is 23% of the mean (66.9 ms) Try to rerun the benchmark with more runs, values and/or loops. Run 'python -m pyperf system tune' command to reduce the system jitter. Use pyperf stats, pyperf dump and pyperf hist to analyze results. Use --quiet option to hide these warnings. dumps_pydantic_nested_50x100: Mean +- std dev: 66.9 ms +- 15.6 ms ......................................... WARNING: the benchmark result may be unstable * the standard deviation (30.1 ms) is 14% of the mean (220 ms) Try to rerun the benchmark with more runs, values and/or loops. Run 'python -m pyperf system tune' command to reduce the system jitter. Use pyperf stats, pyperf dump and pyperf hist to analyze results. Use --quiet option to hide these warnings. dumps_pydanticv1_nested_50x100: Mean +- std dev: 220 ms +- 30 ms

Check notice on line 1 in python/bench/upload_examples_bench.py

View workflow job for this annotation

GitHub Actions / benchmark

Comparison against main

+-----------------------------------------------+----------+------------------------+ | Benchmark | main | changes | +===============================================+==========+========================+ | create_20_000_run_trees | 1.20 sec | 1.18 sec: 1.02x faster | +-----------------------------------------------+----------+------------------------+ | create_10_000_run_trees | 1.21 sec | 1.19 sec: 1.01x faster | +-----------------------------------------------+----------+------------------------+ | dumps_class_nested_py_leaf_100x200 | 105 ms | 103 ms: 1.01x faster | +-----------------------------------------------+----------+------------------------+ | dumps_class_nested_py_leaf_50x100 | 25.1 ms | 25.0 ms: 1.01x faster | +-----------------------------------------------+----------+------------------------+ | dumps_class_nested_py_branch_and_leaf_200x400 | 703 us | 717 us: 1.02x slower | +-----------------------------------------------+----------+------------------------+ | Geometric mean | (ref) | 1.01x faster | +-----------------------------------------------+----------+------------------------+ Benchmark hidden because not significant (4): create_5_000_run_trees, dumps_pydanticv1_nested_50x100, dumps_pydantic_nested_50x100, dumps_dataclass_nested_50x100
import time
from typing import Dict
from uuid import uuid4
from langsmith.schemas import DataType, ExampleCreateWithAttachments
import sys
sys.path.append('./../langsmith')
from client import Client

def create_large_json(length: int) -> Dict:
"""Create a large JSON object for benchmarking purposes."""
large_array = [
{
"index": i,
"data": f"This is element number {i}",
"nested": {"id": i, "value": f"Nested value for element {i}"},
}
for i in range(length)
]

return {
"name": "Huge JSON" + str(uuid4()),
"description": "This is a very large JSON object for benchmarking purposes.",
"array": large_array,
"metadata": {
"created_at": "2024-10-22T19:00:00Z",
"author": "Python Program",
"version": 1.0,
},
}


def create_example_data(dataset_id: str, json_size: int) -> Dict:
"""Create a single example data object."""
return ExampleCreateWithAttachments(**{
"dataset_id": dataset_id,
"inputs": create_large_json(json_size),
"outputs": create_large_json(json_size),
})

DATASET_NAME = "TEST DATASET"
isahers1 marked this conversation as resolved.
Show resolved Hide resolved
def benchmark_example_uploading(num_examples: int, json_size: int, samples: int = 1) -> Dict:
"""
Benchmark run creation with specified parameters.
Returns timing statistics.
"""
multipart_timings, old_timings = [], []


for _ in range(samples):
client = Client(api_url="https://dev.api.smith.langchain.com")

if client.has_dataset(dataset_name=DATASET_NAME):
client.delete_dataset(dataset_name=DATASET_NAME)

dataset = client.create_dataset(
DATASET_NAME,
description="Test dataset for multipart example upload",
data_type=DataType.kv,
)
examples = [create_example_data(dataset.id, json_size) for i in range(num_examples)]

# Old method
old_start = time.perf_counter()
inputs=[e.inputs for e in examples]
outputs=[e.outputs for e in examples]
# the create_examples endpoint fails above 20mb
try:
client.create_examples(inputs=inputs,
outputs=outputs,dataset_id=dataset.id)
old_elapsed = time.perf_counter() - old_start
except:
old_elapsed = 1000000
isahers1 marked this conversation as resolved.
Show resolved Hide resolved

# New method
multipart_start = time.perf_counter()
client.upsert_examples_multipart(upserts=examples)
multipart_elapsed = time.perf_counter() - multipart_start

multipart_timings.append(multipart_elapsed)
old_timings.append(old_elapsed)

return {
"old": {
"mean": statistics.mean(old_timings),
"median": statistics.median(old_timings),
"stdev": statistics.stdev(old_timings) if len(old_timings) > 1 else 0,
"min": min(old_timings),
"max": max(old_timings),
},
"new": {
"mean": statistics.mean(multipart_timings),
"median": statistics.median(multipart_timings),
"stdev": statistics.stdev(multipart_timings) if len(multipart_timings) > 1 else 0,
"min": min(multipart_timings),
"max": max(multipart_timings),
}
}

json_size = 1000
num_examples = 1000

def main(json_size: int, num_examples: int):
"""
Run benchmarks with different combinations of parameters and report results.
"""
results = benchmark_example_uploading(num_examples=num_examples, json_size=json_size)

print(f"\nBenchmark Results for {num_examples} examples with JSON size {json_size}:")
isahers1 marked this conversation as resolved.
Show resolved Hide resolved
print("-" * 60)
print(f"{'Metric':<15} {'Old Method':>20} {'New Method':>20}")
print("-" * 60)

metrics = ['mean', 'median', 'stdev', 'min', 'max']
for metric in metrics:
print(f"{metric:<15} {results['old'][metric]:>20.4f} {results['new'][metric]:>20.4f}")

print("-" * 60)
print(f"{'Throughput':<15} {num_examples / results['old']['mean']:>20.2f} {num_examples / results['new']['mean']:>20.2f}")
print("(examples/second)")


if __name__ == "__main__":
main(json_size, num_examples)
isahers1 marked this conversation as resolved.
Show resolved Hide resolved
129 changes: 129 additions & 0 deletions python/langsmith/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,7 @@
_SIZE_LIMIT_BYTES,
)
from langsmith._internal._multipart import (
MultipartPart,
MultipartPartsAndContext,
join_multipart_parts_and_context,
)
Expand Down Expand Up @@ -3369,6 +3370,134 @@ def create_example_from_run(
created_at=created_at,
)

def upsert_examples_multipart(
self,
*,
upserts: List[ls_schemas.ExampleCreateWithAttachments] = None,
) -> dict: # Should we create an object for the return type - like UpsertExamplesResponse?
isahers1 marked this conversation as resolved.
Show resolved Hide resolved
"""Upsert examples."""
if not (self.info.instance_flags or {}).get(
"examples_multipart_enabled", False
):
raise ValueError("Your LangChain version does not allow using the multipart examples endpoint, please update to the latest version.")
isahers1 marked this conversation as resolved.
Show resolved Hide resolved

if upserts is None:
upserts = []
parts: list[MultipartPart] = []

for example in upserts:
if example.id is not None:
example_id = str(example.id)
else:
example_id = str(uuid.uuid4())

example_body = {
"dataset_id": example.dataset_id,
"created_at": example.created_at,
}
if example.metadata is not None:
example_body["metadata"] = example.metadata
if example.split is not None:
example_body["split"] = example.split
valb = _dumps_json(example_body)

(
parts.append(
(
f"{example_id}",
(
None,
valb,
"application/json",
{},
),
)
),
)

inputsb = _dumps_json(example.inputs)
outputsb = _dumps_json(example.outputs)

(
parts.append(
(
f"{example_id}.inputs",
(
None,
inputsb,
"application/json",
{},
),
)
),
)

(
parts.append(
(
f"{example_id}.outputs",
(
None,
outputsb,
"application/json",
{},
),
)
),
)

if example.attachments:
for name, attachment in example.attachments.items():
if isinstance(attachment, tuple):
mime_type, data = attachment
(
parts.append(
(
f"{example_id}.attachment.{name}",
(
None,
data,
f"{mime_type}; length={len(data)}",
{},
isahers1 marked this conversation as resolved.
Show resolved Hide resolved
),
)
),
)
else:
(
parts.append(
(
f"{example_id}.attachment.{name}",
(
None,
attachment.data,
f"{attachment.mime_type}; length={len(attachment.data)}",
{},
),
)
),
)

encoder = rqtb_multipart.MultipartEncoder(parts, boundary=BOUNDARY)
if encoder.len <= 20_000_000: # ~20 MB
data = encoder.to_string()
else:
data = encoder

response = self.request_with_retries(
"POST",
"/v1/platform/examples/multipart",
request_kwargs={
"data": data,
"headers": {
**self._headers,
"Content-Type": encoder.content_type,
},
},
)
ls_utils.raise_for_status_with_text(response)
return response.json()

def create_examples(
self,
*,
Expand Down
8 changes: 8 additions & 0 deletions python/langsmith/schemas.py
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,12 @@ class ExampleCreate(ExampleBase):
split: Optional[Union[str, List[str]]] = None


class ExampleCreateWithAttachments(ExampleCreate):
"""Example create with attachments."""

attachments: Optional[Attachments] = None


class Example(ExampleBase):
"""Example model."""

Expand Down Expand Up @@ -695,6 +701,8 @@ class LangSmithInfo(BaseModel):
license_expiration_time: Optional[datetime] = None
"""The time the license will expire."""
batch_ingest_config: Optional[BatchIngestConfig] = None
"""The instance flags."""
instance_flags: dict[str, Any] = None


Example.update_forward_refs()
Expand Down
74 changes: 73 additions & 1 deletion python/tests/integration_tests/test_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,9 @@
from requests_toolbelt import MultipartEncoder, MultipartEncoderMonitor

from langsmith.client import ID_TYPE, Client
from langsmith.schemas import DataType
from langsmith.schemas import DataType, ExampleCreateWithAttachments
from langsmith.utils import (
LangSmithNotFoundError,
LangSmithConnectionError,
LangSmithError,
get_env_var,
Expand Down Expand Up @@ -368,6 +369,77 @@ def test_error_surfaced_invalid_uri(uri: str) -> None:
with pytest.raises(LangSmithConnectionError):
client.create_run("My Run", inputs={"text": "hello world"}, run_type="llm")

# NEED TO FIX ONCE CHANGES PUSH TO PROD
def test_upsert_examples_multipart() -> None:
"""Test upserting examples with attachments via multipart endpoint."""
dataset_name = "__test_upsert_examples_multipart" + uuid4().hex[:4]
langchain_client = Client(api_url="https://dev.api.smith.langchain.com")
if langchain_client.has_dataset(dataset_name=dataset_name):
langchain_client.delete_dataset(dataset_name=dataset_name)

dataset = langchain_client.create_dataset(
dataset_name,
description="Test dataset for multipart example upload",
data_type=DataType.kv,
)

# Test example with all fields
example_id = uuid4()
example_1 = ExampleCreateWithAttachments(
id=example_id,
dataset_id=dataset.id,
inputs={"text": "hello world"},
outputs={"response": "greeting"},
attachments={
"test_file": ("text/plain", b"test content"),
},
)
# Test example without id
example_2 = ExampleCreateWithAttachments(
dataset_id=dataset.id,
inputs={"text": "foo bar"},
outputs={"response": "baz"},
attachments={
"my_file": ("text/plain", b"more test content"),
},
)

created_examples = langchain_client.upsert_examples_multipart(upserts=[example_1, example_2])
assert created_examples['count'] == 2

created_example_1 = langchain_client.read_example(created_examples['example_ids'][0])
assert created_example_1.inputs["text"] == "hello world"
assert created_example_1.outputs["response"] == "greeting"

created_example_2 = langchain_client.read_example(created_examples['example_ids'][1])
assert created_example_2.inputs["text"] == "foo bar"
assert created_example_2.outputs["response"] == "baz"

# make sure examples were sent to the correct dataset
all_examples_in_dataset = [example for example in langchain_client.list_examples(dataset_id=dataset.id)]
assert len(all_examples_in_dataset) == 2

# Test that adding invalid example fails - even if valid examples are added alongside
example_3 = ExampleCreateWithAttachments(
dataset_id=uuid4(), # not a real dataset
inputs={"text": "foo bar"},
outputs={"response": "baz"},
attachments={
"my_file": ("text/plain", b"more test content"),
},
)

with pytest.raises(LangSmithNotFoundError):
langchain_client.upsert_examples_multipart(upserts=[example_3])
agola11 marked this conversation as resolved.
Show resolved Hide resolved

all_examples_in_dataset = [example for example in langchain_client.list_examples(dataset_id=dataset.id)]
assert len(all_examples_in_dataset) == 2

# Throw type errors when not passing ExampleCreateWithAttachments
with pytest.raises(AttributeError):
langchain_client.upsert_examples_multipart(upserts=[{"foo":"bar"}])

langchain_client.delete_dataset(dataset_name=dataset_name)

def test_create_dataset(langchain_client: Client) -> None:
dataset_name = "__test_create_dataset" + uuid4().hex[:4]
Expand Down
Loading
Loading