Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add update examples multipart #1305

Merged
merged 6 commits into from
Dec 9, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
59 changes: 58 additions & 1 deletion python/langsmith/client.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
"""Client for interacting with the LangSmith API.

Check notice on line 1 in python/langsmith/client.py

View workflow job for this annotation

GitHub Actions / benchmark

Benchmark results

........... WARNING: the benchmark result may be unstable * the standard deviation (96.1 ms) is 13% of the mean (720 ms) Try to rerun the benchmark with more runs, values and/or loops. Run 'python -m pyperf system tune' command to reduce the system jitter. Use pyperf stats, pyperf dump and pyperf hist to analyze results. Use --quiet option to hide these warnings. create_5_000_run_trees: Mean +- std dev: 720 ms +- 96 ms ........... WARNING: the benchmark result may be unstable * the standard deviation (188 ms) is 13% of the mean (1.46 sec) Try to rerun the benchmark with more runs, values and/or loops. Run 'python -m pyperf system tune' command to reduce the system jitter. Use pyperf stats, pyperf dump and pyperf hist to analyze results. Use --quiet option to hide these warnings. create_10_000_run_trees: Mean +- std dev: 1.46 sec +- 0.19 sec ........... WARNING: the benchmark result may be unstable * the standard deviation (166 ms) is 11% of the mean (1.45 sec) Try to rerun the benchmark with more runs, values and/or loops. Run 'python -m pyperf system tune' command to reduce the system jitter. Use pyperf stats, pyperf dump and pyperf hist to analyze results. Use --quiet option to hide these warnings. create_20_000_run_trees: Mean +- std dev: 1.45 sec +- 0.17 sec ........... dumps_class_nested_py_branch_and_leaf_200x400: Mean +- std dev: 696 us +- 10 us ........... dumps_class_nested_py_leaf_50x100: Mean +- std dev: 25.1 ms +- 0.3 ms ........... dumps_class_nested_py_leaf_100x200: Mean +- std dev: 105 ms +- 3 ms ........... dumps_dataclass_nested_50x100: Mean +- std dev: 25.7 ms +- 0.3 ms ........... WARNING: the benchmark result may be unstable * the standard deviation (18.9 ms) is 25% of the mean (74.9 ms) Try to rerun the benchmark with more runs, values and/or loops. Run 'python -m pyperf system tune' command to reduce the system jitter. Use pyperf stats, pyperf dump and pyperf hist to analyze results. Use --quiet option to hide these warnings. dumps_pydantic_nested_50x100: Mean +- std dev: 74.9 ms +- 18.9 ms ........... dumps_pydanticv1_nested_50x100: Mean +- std dev: 201 ms +- 3 ms

Check notice on line 1 in python/langsmith/client.py

View workflow job for this annotation

GitHub Actions / benchmark

Comparison against main

+-----------------------------------------------+----------+------------------------+ | Benchmark | main | changes | +===============================================+==========+========================+ | dumps_pydanticv1_nested_50x100 | 221 ms | 201 ms: 1.10x faster | +-----------------------------------------------+----------+------------------------+ | dumps_class_nested_py_branch_and_leaf_200x400 | 705 us | 696 us: 1.01x faster | +-----------------------------------------------+----------+------------------------+ | create_5_000_run_trees | 724 ms | 720 ms: 1.01x faster | +-----------------------------------------------+----------+------------------------+ | dumps_class_nested_py_leaf_100x200 | 105 ms | 105 ms: 1.00x faster | +-----------------------------------------------+----------+------------------------+ | dumps_class_nested_py_leaf_50x100 | 25.1 ms | 25.1 ms: 1.00x slower | +-----------------------------------------------+----------+------------------------+ | dumps_dataclass_nested_50x100 | 25.6 ms | 25.7 ms: 1.00x slower | +-----------------------------------------------+----------+------------------------+ | create_20_000_run_trees | 1.39 sec | 1.45 sec: 1.04x slower | +-----------------------------------------------+----------+------------------------+ | create_10_000_run_trees | 1.40 sec | 1.46 sec: 1.04x slower | +-----------------------------------------------+----------+------------------------+ | dumps_pydantic_nested_50x100 | 66.2 ms | 74.9 ms: 1.13x slower | +-----------------------------------------------+----------+------------------------+ | Geometric mean | (ref) | 1.01x slower | +-----------------------------------------------+----------+------------------------+

Check notice on line 1 in python/langsmith/client.py

View workflow job for this annotation

GitHub Actions / benchmark

Benchmark results

........... WARNING: the benchmark result may be unstable * the standard deviation (101 ms) is 14% of the mean (733 ms) Try to rerun the benchmark with more runs, values and/or loops. Run 'python -m pyperf system tune' command to reduce the system jitter. Use pyperf stats, pyperf dump and pyperf hist to analyze results. Use --quiet option to hide these warnings. create_5_000_run_trees: Mean +- std dev: 733 ms +- 101 ms ........... WARNING: the benchmark result may be unstable * the standard deviation (188 ms) is 13% of the mean (1.45 sec) Try to rerun the benchmark with more runs, values and/or loops. Run 'python -m pyperf system tune' command to reduce the system jitter. Use pyperf stats, pyperf dump and pyperf hist to analyze results. Use --quiet option to hide these warnings. create_10_000_run_trees: Mean +- std dev: 1.45 sec +- 0.19 sec ........... WARNING: the benchmark result may be unstable * the standard deviation (188 ms) is 13% of the mean (1.43 sec) Try to rerun the benchmark with more runs, values and/or loops. Run 'python -m pyperf system tune' command to reduce the system jitter. Use pyperf stats, pyperf dump and pyperf hist to analyze results. Use --quiet option to hide these warnings. create_20_000_run_trees: Mean +- std dev: 1.43 sec +- 0.19 sec ........... dumps_class_nested_py_branch_and_leaf_200x400: Mean +- std dev: 704 us +- 15 us ........... dumps_class_nested_py_leaf_50x100: Mean +- std dev: 25.2 ms +- 0.2 ms ........... dumps_class_nested_py_leaf_100x200: Mean +- std dev: 103 ms +- 1 ms ........... dumps_dataclass_nested_50x100: Mean +- std dev: 25.7 ms +- 0.1 ms ........... WARNING: the benchmark result may be unstable * the standard deviation (18.8 ms) is 25% of the mean (75.5 ms) Try to rerun the benchmark with more runs, values and/or loops. Run 'python -m pyperf system tune' command to reduce the system jitter. Use pyperf stats, pyperf dump and pyperf hist to analyze results. Use --quiet option to hide these warnings. dumps_pydantic_nested_50x100: Mean +- std dev: 75.5 ms +- 18.8 ms ........... dumps_pydanticv1_nested_50x100: Mean +- std dev: 201 ms +- 2 ms

Check notice on line 1 in python/langsmith/client.py

View workflow job for this annotation

GitHub Actions / benchmark

Comparison against main

+-----------------------------------------------+----------+------------------------+ | Benchmark | main | changes | +===============================================+==========+========================+ | dumps_pydanticv1_nested_50x100 | 221 ms | 201 ms: 1.10x faster | +-----------------------------------------------+----------+------------------------+ | dumps_class_nested_py_leaf_100x200 | 105 ms | 103 ms: 1.02x faster | +-----------------------------------------------+----------+------------------------+ | dumps_class_nested_py_branch_and_leaf_200x400 | 705 us | 704 us: 1.00x faster | +-----------------------------------------------+----------+------------------------+ | dumps_class_nested_py_leaf_50x100 | 25.1 ms | 25.2 ms: 1.00x slower | +-----------------------------------------------+----------+------------------------+ | dumps_dataclass_nested_50x100 | 25.6 ms | 25.7 ms: 1.00x slower | +-----------------------------------------------+----------+------------------------+ | create_5_000_run_trees | 724 ms | 733 ms: 1.01x slower | +-----------------------------------------------+----------+------------------------+ | create_20_000_run_trees | 1.39 sec | 1.43 sec: 1.03x slower | +-----------------------------------------------+----------+------------------------+ | create_10_000_run_trees | 1.40 sec | 1.45 sec: 1.03x slower | +-----------------------------------------------+----------+------------------------+ | dumps_pydantic_nested_50x100 | 66.2 ms | 75.5 ms: 1.14x slower | +-----------------------------------------------+----------+------------------------+ | Geometric mean | (ref) | 1.01x slower | +-----------------------------------------------+----------+------------------------+

Use the client to customize API keys / workspace ocnnections, SSl certs,
etc. for tracing.
Expand Down Expand Up @@ -3464,6 +3464,7 @@
examples: Union[
List[ls_schemas.ExampleUploadWithAttachments]
| List[ls_schemas.ExampleUpsertWithAttachments]
| List[ls_schemas.ExampleUpdateWithAttachments],
],
include_dataset_id: bool = False,
) -> Tuple[Any, bytes]:
Expand Down Expand Up @@ -3575,6 +3576,23 @@
)
)

if (
isinstance(example, ls_schemas.ExampleUpdateWithAttachments)
and example.attachments_operations
):
attachments_operationsb = _dumps_json(example.attachments_operations)
parts.append(
(
f"{example_id}.attachments_operations",
(
None,
attachments_operationsb,
"application/json",
{},
),
)
)

encoder = rqtb_multipart.MultipartEncoder(parts, boundary=BOUNDARY)
if encoder.len <= 20_000_000: # ~20 MB
data = encoder.to_string()
Expand All @@ -3583,6 +3601,38 @@

return encoder, data

def update_examples_multipart(
isahers1 marked this conversation as resolved.
Show resolved Hide resolved
self,
*,
dataset_id: ID_TYPE,
updates: Optional[List[ls_schemas.ExampleUpdateWithAttachments]] = None,
) -> ls_schemas.UpsertExamplesResponse:
"""Upload examples."""
if not (self.info.instance_flags or {}).get(
"examples_multipart_enabled", False
):
raise ValueError(
"Your LangSmith version does not allow using the multipart examples endpoint, please update to the latest version."
)
if updates is None:
updates = []

encoder, data = self._prepate_multipart_data(updates, include_dataset_id=False)

response = self.request_with_retries(
"PATCH",
f"/v1/platform/datasets/{dataset_id}/examples",
request_kwargs={
"data": data,
"headers": {
**self._headers,
"Content-Type": encoder.content_type,
},
},
)
ls_utils.raise_for_status_with_text(response)
return response.json()

def upload_examples_multipart(
self,
*,
Expand Down Expand Up @@ -4067,6 +4117,7 @@
metadata: Optional[Dict] = None,
split: Optional[str | List[str]] = None,
dataset_id: Optional[ID_TYPE] = None,
attachments_operations: Optional[ls_schemas.AttachmentsOperations] = None,
) -> Dict[str, Any]:
"""Update a specific example.

Expand Down Expand Up @@ -4097,6 +4148,7 @@
dataset_id=dataset_id,
metadata=metadata,
split=split,
attachments_operations=attachments_operations,
)
response = self.request_with_retries(
"PATCH",
Expand All @@ -4116,6 +4168,9 @@
metadata: Optional[Sequence[Optional[Dict]]] = None,
splits: Optional[Sequence[Optional[str | List[str]]]] = None,
dataset_ids: Optional[Sequence[Optional[ID_TYPE]]] = None,
attachments_operations: Optional[
Sequence[Optional[ls_schemas.AttachmentsOperations]]
] = None,
) -> Dict[str, Any]:
"""Update multiple examples.

Expand Down Expand Up @@ -4146,6 +4201,7 @@
"metadata": metadata,
"splits": splits,
"dataset_ids": dataset_ids,
"attachments_operations": attachments_operations,
}
# Since inputs are required, we will check against them
examples_len = len(example_ids)
Expand All @@ -4163,8 +4219,9 @@
"dataset_id": dataset_id_,
"metadata": metadata_,
"split": split_,
"attachments_operations": attachments_operations_,
}
for id_, in_, out_, metadata_, split_, dataset_id_ in zip(
for id_, in_, out_, metadata_, split_, dataset_id_, attachments_operations_ in zip(
example_ids,
inputs or [None] * len(example_ids),
outputs or [None] * len(example_ids),
Expand Down
17 changes: 17 additions & 0 deletions python/langsmith/schemas.py
Original file line number Diff line number Diff line change
Expand Up @@ -184,12 +184,24 @@ class ExampleSearch(ExampleBase):
id: UUID


class AttachmentsOperations(BaseModel):
"""Operations to perform on attachments."""

rename: Dict[str, str] = Field(
default_factory=dict, description="Mapping of old attachment names to new names"
)
retain: List[str] = Field(
default_factory=list, description="List of attachment names to keep"
)


class ExampleUpdate(BaseModel):
"""Update class for Example."""

dataset_id: Optional[UUID] = None
inputs: Optional[Dict[str, Any]] = None
outputs: Optional[Dict[str, Any]] = None
attachments_operations: Optional[AttachmentsOperations] = None
metadata: Optional[Dict[str, Any]] = None
split: Optional[Union[str, List[str]]] = None

Expand All @@ -203,7 +215,12 @@ class ExampleUpdateWithAttachments(ExampleUpdate):
"""Example update with attachments."""

id: UUID
inputs: Dict[str, Any] = Field(default_factory=dict)
outputs: Optional[Dict[str, Any]] = Field(default=None)
metadata: Optional[Dict[str, Any]] = Field(default=None)
split: Optional[Union[str, List[str]]] = None
attachments: Optional[Attachments] = None
attachments_operations: Optional[AttachmentsOperations] = None


class DataType(str, Enum):
Expand Down
Loading
Loading