Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release candidate langsmith-pyo3 Python package #1253

Closed
wants to merge 106 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
106 commits
Select commit Hold shift + click to select a range
3a03725
add minimal test
agola11 Oct 16, 2024
0be9ec1
add struct definitions and tests
agola11 Oct 17, 2024
b491b09
lint and format
agola11 Oct 17, 2024
dac80dd
compilation
agola11 Oct 17, 2024
f212ff8
all tests passing
agola11 Oct 18, 2024
0a7d49e
use shutdown method instead of janky sleep
agola11 Oct 18, 2024
471e654
comments
agola11 Oct 18, 2024
b7eefed
cleanup
agola11 Oct 18, 2024
f5d0ade
move more stuff to common
agola11 Oct 19, 2024
77f3dc5
make the vec<u8> optional
agola11 Oct 19, 2024
8065b3d
something kind of working with streaming
agola11 Oct 19, 2024
b7c5508
inspect multipart form
agola11 Oct 19, 2024
f7e3e2f
fix some issues
agola11 Oct 21, 2024
85315cd
use arc::mutex to capture request body
agola11 Oct 21, 2024
a9d71ad
use bytes
agola11 Oct 21, 2024
5ba4b10
finish writing assertions
agola11 Oct 21, 2024
5f5f768
avoid excessive clones
agola11 Oct 21, 2024
d8e020d
remove excessive clear
agola11 Oct 21, 2024
2cc93f3
fix tests
agola11 Oct 22, 2024
8ad8dbf
add length header
agola11 Oct 22, 2024
d24a87b
add a directory for bindings
agola11 Oct 22, 2024
4cc7d92
several fixes
agola11 Oct 23, 2024
d8e98d2
add benchmarks
agola11 Oct 23, 2024
01b6305
improve benchmarks
agola11 Oct 23, 2024
69398a2
use custom_iter
agola11 Oct 24, 2024
4774da2
use custom_iter
agola11 Oct 24, 2024
14f5817
baseline
agola11 Oct 24, 2024
795c0f2
match baseline concurrency
agola11 Oct 24, 2024
f85811a
checkpoint with semaphore
agola11 Oct 24, 2024
c8ee98c
comment
agola11 Oct 24, 2024
34a632b
offload serialization but keep attachment async
agola11 Oct 24, 2024
3a21a41
benchmark for run_bytes
agola11 Oct 25, 2024
29aa6a1
add json serialization benchmarks
agola11 Oct 25, 2024
6fe0c2b
use rayon with print statements
agola11 Oct 26, 2024
d92f19d
more perf stuff
agola11 Oct 26, 2024
df21de0
more prints
agola11 Oct 26, 2024
645977a
change benchmark numbers
agola11 Oct 26, 2024
580dfbd
change benchmark
agola11 Oct 27, 2024
4b8c0a4
change benchmark
agola11 Oct 27, 2024
ba1e013
merge
agola11 Oct 27, 2024
c1ea2c0
update benches
agola11 Oct 27, 2024
1125bd4
use sonic-rs
agola11 Oct 27, 2024
4a0be83
get rid of excess import in error
agola11 Oct 27, 2024
0a30e74
use sonic-rs
agola11 Oct 27, 2024
4baf240
slight styling
agola11 Oct 27, 2024
094aca0
add sync client and processor
agola11 Oct 28, 2024
a819163
isolate bottleneck in sending request
agola11 Oct 28, 2024
01f78cf
benchmark 5000 and 300
agola11 Oct 28, 2024
3459434
Ankush/pyo3 (#1139)
agola11 Oct 31, 2024
391c5d7
update benchmarks
agola11 Oct 31, 2024
605b6e5
Merge branch 'ankush/rust' into ankush/increase-throughput
agola11 Oct 31, 2024
e809558
Add workspace at `rust/` level with shared dependencies and config.
obi1kenobi Nov 13, 2024
486f61f
Update PyO3 and replace abandoned pyo3-asyncio with maintained fork.
obi1kenobi Nov 13, 2024
1dec74d
Split out pyo3 crate and add necessary pyo3 features.
obi1kenobi Nov 13, 2024
283e118
Remove lint and reformat.
obi1kenobi Nov 13, 2024
aed43fa
Split impls into blocking and async-enabled modules.
obi1kenobi Nov 13, 2024
40c44d7
Create new maturin project for `langsmith_pyo3`.
obi1kenobi Nov 13, 2024
8eafdd3
Extract Rust types from Python Run dict.
obi1kenobi Nov 13, 2024
72ce2ee
Add BlockingTracingClient and remaining plumbing to PyO3 library.
obi1kenobi Nov 13, 2024
055bfb9
Update README getting started instructions.
obi1kenobi Nov 13, 2024
6dc7dbf
Add `drain()` method to client. Add Rust benchmark.
obi1kenobi Nov 13, 2024
d9414c7
Fix `KeyError` on conversions.
obi1kenobi Nov 13, 2024
6ed49aa
Make draining empty the buffer and put the worker thread to sleep.
obi1kenobi Nov 13, 2024
5b521ad
Final tweaks to get benchmarks working.
obi1kenobi Nov 13, 2024
d70e2d3
Update README with benchmarking instructions.
obi1kenobi Nov 13, 2024
6032205
Use batch size and timeout settings consistent with other benchmarks.
obi1kenobi Nov 14, 2024
c51ecc8
Black-box inputs and outputs, and ensure clear names for benchmarks.
obi1kenobi Nov 14, 2024
3a4d1cf
Update langsmith_pyo3 for sonic 0.3.14 and drop GIL when not needed.
obi1kenobi Nov 14, 2024
bc117e2
Ignore profiler output files.
obi1kenobi Nov 14, 2024
3ea2855
Switch `inputs/outputs` to bytes and use orjson to serialize them.
obi1kenobi Nov 14, 2024
ebe4381
Vendor orjson and pyo3.
obi1kenobi Nov 14, 2024
1f88d91
Use vendored orjson to serialize Python objects to `Vec<u8>` directly.
obi1kenobi Nov 14, 2024
c746f00
Add `maturin generate-ci github` script for building wheels.
obi1kenobi Nov 21, 2024
13d864d
Update wheel-building script for langsmith-sdk monorepo layout.
obi1kenobi Nov 21, 2024
10d4287
Enable workflow on this branch temporarily.
obi1kenobi Nov 21, 2024
5bd20c7
Merge branch 'main' into pg/vendored-orjson
obi1kenobi Nov 21, 2024
642a97c
Unset abi3 features since pyo3 doesn't support them.
obi1kenobi Nov 21, 2024
b1cf369
Vendor OpenSSL to work around cross-compilation problems.
obi1kenobi Nov 21, 2024
efcdbc4
Use rustls instead of OpenSSL.
obi1kenobi Nov 21, 2024
db8ca50
Use newer manylinux version.
obi1kenobi Nov 21, 2024
6daf24e
Switch to '2_28' manylinux. The '2_24' manylinux image is broken for …
obi1kenobi Nov 21, 2024
bdc8a2a
Work around missing pip in `manylinux_2_28` docker image.
obi1kenobi Nov 21, 2024
90a39f8
langsmith_pyo3 currently requires nightly Rust.
obi1kenobi Nov 21, 2024
a041625
Remove support for 32-bit platforms.
obi1kenobi Nov 21, 2024
cbfac8f
Use preferred spelling of `default-features`.
obi1kenobi Nov 21, 2024
24a167c
Explicitly specify Python versions to build wheels for.
obi1kenobi Nov 21, 2024
fe7b481
Use only officially-supported platforms for building wheels.
obi1kenobi Nov 21, 2024
196ce38
Add `.cargo/config.toml` to langsmith_pyo3 to fix macOS build.
obi1kenobi Nov 21, 2024
a4ee96a
Enable portion of release job to show which wheels would be uploaded.
obi1kenobi Nov 21, 2024
18a2d75
Extract common Actions config values into top-level `env` values.
obi1kenobi Nov 21, 2024
52b4cb3
Use stable Rust instead of nightly.
obi1kenobi Nov 21, 2024
ad3a475
Specify explicit Rust 1.82 use, since `stable` is an old tag.
obi1kenobi Nov 21, 2024
8533d78
Switch from `maturin upload` to trusted publishing action.
obi1kenobi Nov 21, 2024
b0bbcf0
Switch Rust code to dashes instead of underscores.
obi1kenobi Nov 21, 2024
377aae8
Run only on `main`, PRs, and tags of the pyo3 package.
obi1kenobi Nov 21, 2024
a2df3b0
Add langsmith client option to use the PyO3 bindings.
obi1kenobi Nov 26, 2024
fbc4736
Only send batch traces to Rust code if batch-specific keys are set.
obi1kenobi Nov 26, 2024
ff9c0f2
Use string literals for logger template messages.
obi1kenobi Nov 26, 2024
db39c92
Merge branch 'main' into pg/full-pyo3-package
obi1kenobi Nov 26, 2024
2c09f4b
Add auth capability to the blocking client, and an end-to-end benchmark.
obi1kenobi Nov 26, 2024
12cf6eb
Standardize benchmark JSON shape and size.
obi1kenobi Nov 26, 2024
2c68622
Intern all Python dict key strings in the Rust bindings. Clean up err…
obi1kenobi Nov 26, 2024
a2c384a
Arc the shared Rust client config, to avoid copying it across threads.
obi1kenobi Nov 26, 2024
05d6953
Mark `langsmith-pyo3` as a release candidate.
obi1kenobi Nov 26, 2024
645bc5b
Enable publishing PyO3 package under `langsmith-pyo3==<version>` tags.
obi1kenobi Nov 26, 2024
287711d
Add safeguard for publishing versions that match the tag.
obi1kenobi Nov 26, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
234 changes: 228 additions & 6 deletions .github/workflows/build_langsmith_pyo3_wheels.yml
Original file line number Diff line number Diff line change
@@ -1,16 +1,238 @@
name: Build langsmith_pyo3 wheels
# This file is based on a starter autogenerated by maturin v1.7.4,
# which can be optained by running `maturin generate-ci github`.
#
# The autogenerated file makes some assumptions that don't work for this repo.
# For example, it assumes that the PyO3 project is the only piece of code in the repo.
# This is why the file has been hand-edited after generation.
#
# To find the changes applied on top of the autogenerated contents, diff this file
# versus commit `c746f00fadb0c84d769be117643683d60eb07ba3` which has the raw autogenerated output.
name: Build langsmith-pyo3 wheels

# Our wheels build depends on both the `langsmith-pyo3` crate itself and all its dependencies:
# - The Rust workspace-level dependency specification in `rust/Cargo.toml` and `rust/Cargo.lock`.
# - Other local Rust crates, like `langsmith-tracing-client`.
# - The vendored `orjson` and `pyo3` workspaces.
on:
push:
branches:
- main
tags:
- 'langsmith-pyo3==*'
paths:
- "rust/**"
- "vendor/orjson/**"
- "vendor/pyo3/**"
pull_request:
branches:
- main
paths:
- "rust/**"
- "vendor/orjson/**"
- "vendor/pyo3/**"
workflow_dispatch:

permissions:
contents: read

env:
RUST_VERSION: '1.82' # Be careful, "stable" gets you "whatever GitHub ships", which is quite old.
SUPPORTED_PYTHON_VERSIONS: 'python3.8 python3.9 python3.10 python3.11 python3.12'
WORKING_DIRECTORY: rust/crates/langsmith-pyo3

jobs:
hello-world:
runs-on: ubuntu-20.04
linux:
runs-on: ${{ matrix.platform.runner }}
strategy:
matrix:
platform:
- runner: ubuntu-latest
target: x86_64
- runner: ubuntu-latest
target: aarch64
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: 3.x
- name: Build wheels
uses: PyO3/maturin-action@v1
with:
working-directory: ${{ env.WORKING_DIRECTORY }}
rust-toolchain: ${{ env.RUST_VERSION }}
target: ${{ matrix.platform.target }}
args: '--release --out dist --interpreter ${{ env.SUPPORTED_PYTHON_VERSIONS }}'
sccache: 'true'
manylinux: '2_28' # The default is 'auto' AKA '2014', which is too old for us.
# Workaround for missing `pip` in manylinux_2_28:
# https://github.com/PyO3/maturin-action/issues/249
before-script-linux: '(python3 -m pip --version || python3 -m ensurepip)'
- name: Upload wheels
uses: actions/upload-artifact@v4
with:
name: wheels-linux-${{ matrix.platform.target }}
path: ${{ env.WORKING_DIRECTORY }}/dist
if-no-files-found: error

musllinux:
runs-on: ${{ matrix.platform.runner }}
strategy:
matrix:
platform:
- runner: ubuntu-latest
target: x86_64
- runner: ubuntu-latest
target: aarch64
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: 3.x
- name: Build wheels
uses: PyO3/maturin-action@v1
with:
working-directory: ${{ env.WORKING_DIRECTORY }}
rust-toolchain: ${{ env.RUST_VERSION }}
target: ${{ matrix.platform.target }}
args: '--release --out dist --interpreter ${{ env.SUPPORTED_PYTHON_VERSIONS }}'
sccache: 'true'
manylinux: musllinux_1_2
- name: Upload wheels
uses: actions/upload-artifact@v4
with:
name: wheels-musllinux-${{ matrix.platform.target }}
path: ${{ env.WORKING_DIRECTORY }}/dist
if-no-files-found: error

windows:
runs-on: ${{ matrix.platform.runner }}
strategy:
matrix:
platform:
- runner: windows-latest
target: x64
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: 3.x
architecture: ${{ matrix.platform.target }}
- name: Build wheels
uses: PyO3/maturin-action@v1
with:
working-directory: ${{ env.WORKING_DIRECTORY }}
rust-toolchain: ${{ env.RUST_VERSION }}
target: ${{ matrix.platform.target }}
args: '--release --out dist --interpreter ${{ env.SUPPORTED_PYTHON_VERSIONS }}'
sccache: 'true'
- name: Upload wheels
uses: actions/upload-artifact@v4
with:
name: wheels-windows-${{ matrix.platform.target }}
path: ${{ env.WORKING_DIRECTORY }}/dist
if-no-files-found: error

macos:
runs-on: ${{ matrix.platform.runner }}
strategy:
matrix:
platform:
- runner: macos-14
target: aarch64
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: 3.x
- name: Build wheels
uses: PyO3/maturin-action@v1
with:
working-directory: ${{ env.WORKING_DIRECTORY }}
rust-toolchain: ${{ env.RUST_VERSION }}
target: ${{ matrix.platform.target }}
args: '--release --out dist --interpreter ${{ env.SUPPORTED_PYTHON_VERSIONS }}'
sccache: 'true'
- name: Upload wheels
uses: actions/upload-artifact@v4
with:
name: wheels-macos-${{ matrix.platform.target }}
path: ${{ env.WORKING_DIRECTORY }}/dist
if-no-files-found: error

sdist:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Build sdist
uses: PyO3/maturin-action@v1
with:
working-directory: ${{ env.WORKING_DIRECTORY }}
rust-toolchain: ${{ env.RUST_VERSION }}
command: sdist
args: --out dist
- name: Upload sdist
uses: actions/upload-artifact@v4
with:
name: wheels-sdist
path: ${{ env.WORKING_DIRECTORY }}/dist
if-no-files-found: error

release:
name: Release
runs-on: ubuntu-latest
if: ${{ startsWith(github.ref, 'refs/tags/langsmith-pyo3==') || github.event_name == 'workflow_dispatch' }}
needs: [linux, musllinux, windows, macos, sdist]
permissions:
# Use to sign the release artifacts
id-token: write
# Used to upload release artifacts
contents: write
# Used to generate artifact attestation
attestations: write
steps:
- run: echo 'hello world'
- uses: actions/download-artifact@v4
with:
path: ${{ env.WORKING_DIRECTORY }}
- name: Move wheels to 'dist' dir
run: |
set -euxo pipefail

# Show what wheels got built.
cd "$WORKING_DIRECTORY"
ls -alh wheels-*

# Wipe the `dist` directory if it already existed, so we don't accidentally publish
# something we didn't intend to. Then, move the wheels into a fresh `dist` directory.
rm -rf dist
mkdir dist
mv wheels-*/* dist/
ls -alh dist/
- name: Ensure the tagged version matches the wheel versions
if: startsWith(github.ref, 'refs/tags/langsmith-pyo3==')
env:
REF_NAME: ${{ github.ref }}
run: |
cd "$WORKING_DIRECTORY"

# - Look up the first wheel file
# - Select just the filename,
# for example `langsmith_pyo3-0.1.0rc1-cp313-cp313-manylinux_2_34_x86_64.whl`
# - Split on dashes, and extract the version field (the second field).
EXPECTED_VERSION="$(find dist/ -name 'langsmith_pyo3-*.whl' -printf '%f\n' | head -1 | cut -d- -f2)"
EXPECTED_REF="refs/tags/langsmith-pyo3==${EXPECTED_VERSION}"

if [[ "$REF_NAME" != "$EXPECTED_REF" ]]; then
echo 'Current tag does not match the expected tag for the wheel versions being published!'
echo "Expected ref name: ${EXPECTED_REF}"
echo "Actual ref name: ${REF_NAME}"
echo ''
echo 'Something is wrong, so refusing to publish.'
exit 1
fi
- name: Generate artifact attestation
uses: actions/attest-build-provenance@v1
with:
subject-path: ${{ env.WORKING_DIRECTORY }}/dist/*
- name: Publish package distributions to PyPI
if: startsWith(github.ref, 'refs/tags/langsmith-pyo3==')
uses: pypa/gh-action-pypi-publish@release/v1
with:
packages-dir: ${{ env.WORKING_DIRECTORY }}/dist
2 changes: 2 additions & 0 deletions python/bench/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Profiler output files.
*.json
132 changes: 132 additions & 0 deletions python/bench/json_serialization.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
import time
import statistics
from concurrent.futures import ThreadPoolExecutor
import threading
import orjson
import zlib


def create_json_with_large_array(length):
"""Create a large JSON object for benchmarking purposes."""
large_array = [
{
"index": i,
"data": f"This is element number {i}",
"nested": {
"id": i,
"value": f"Nested value for element {i}"
}
}
for i in range(length)
]

return {
"name": "Huge JSON",
"description": "This is a very large JSON object for benchmarking purposes.",
"array": large_array,
"metadata": {
"created_at": "2024-10-22T19:00:00Z",
"author": "Python Program",
"version": 1.0
}
}


def create_json_with_large_strings(length: int) -> dict:
large_string = "a" * length # Create a large string of repeated 'a' characters

return {
"name": "Huge JSON",
"description": "This is a very large JSON object for benchmarking purposes.",
"key1": large_string,
"key2": large_string,
"key3": large_string,
"metadata": {
"created_at": "2024-10-22T19:00:00Z",
"author": "Python Program",
"version": 1.0
}
}



def serialize_sequential(data):
"""Serialize data sequentially."""
return [orjson.dumps(json_obj) for json_obj in data]


def serialize_parallel(data):
"""Serialize data in parallel using ThreadPoolExecutor."""
with ThreadPoolExecutor() as executor:
return list(executor.map(orjson.dumps, data))


def serialize_sequential_gz(data):
"""Serialize data sequentially and compress using zlib with adjustable compression level."""
compressed_data = []
for json_obj in data:
serialized = orjson.dumps(json_obj)
compressed = zlib.compress(serialized, level=1)
compressed_data.append(compressed)
return compressed_data

def serialize_parallel_gz(data):
"""Serialize data in parallel using ThreadPoolExecutor and compress using zlib with adjustable compression level."""

def compress_item(json_obj):
serialized = orjson.dumps(json_obj)
return zlib.compress(serialized, level=1)

with ThreadPoolExecutor() as executor:
compressed_data = list(executor.map(compress_item, data))
return compressed_data

def gzip_parallel(serialized_data):
"""Compress serialized data in parallel using ThreadPoolExecutor and zlib."""
with ThreadPoolExecutor() as executor:
return list(executor.map(zlib.compress, serialized_data))

def gzip_sequential(serialized_data):
"""Compress serialized data sequentially using zlib."""
return [zlib.compress(serialized) for serialized in serialized_data]


def benchmark_serialization(data, func, samples=10):
"""Benchmark a serialization function with multiple samples."""
timings = []
for _ in range(samples):
start = time.perf_counter()
func(data)
elapsed = time.perf_counter() - start
timings.append(elapsed)

return {
"mean": statistics.mean(timings),
"median": statistics.median(timings),
"stdev": statistics.stdev(timings) if len(timings) > 1 else 0,
"min": min(timings),
"max": max(timings),
}


def main():
num_json_objects = 2000
json_length = 5000

data = [create_json_with_large_array(json_length) for _ in range(num_json_objects)]
serialized_data = serialize_sequential(data)

for func in [serialize_sequential, serialize_parallel, serialize_sequential_gz, serialize_parallel_gz, gzip_sequential, gzip_parallel]:
# data = [create_json_with_large_strings(json_length) for _ in range(num_json_objects)]

print(f"\nBenchmarking {func.__name__} with {num_json_objects} JSON objects of length {json_length}...")
results_seq = benchmark_serialization(data, func) if not func.__name__.startswith("gzip") else benchmark_serialization(serialized_data, func)
print(f"Mean time: {results_seq['mean']:.4f} seconds")
print(f"Median time: {results_seq['median']:.4f} seconds")
print(f"Std Dev: {results_seq['stdev']:.4f} seconds")
print(f"Min time: {results_seq['min']:.4f} seconds")
print(f"Max time: {results_seq['max']:.4f} seconds")


if __name__ == "__main__":
main()
Loading
Loading