Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tmp #1181

Closed
wants to merge 6 commits into from
Closed

tmp #1181

Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 14 additions & 12 deletions .github/actions/python-integration-tests/action.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: "Python Integration Tests"

Check notice on line 1 in .github/actions/python-integration-tests/action.yml

View workflow job for this annotation

GitHub Actions / benchmark

Benchmark results

......................................... create_5_000_run_trees: Mean +- std dev: 611 ms +- 43 ms ......................................... create_10_000_run_trees: Mean +- std dev: 1.17 sec +- 0.05 sec ......................................... create_20_000_run_trees: Mean +- std dev: 1.17 sec +- 0.05 sec ......................................... dumps_class_nested_py_branch_and_leaf_200x400: Mean +- std dev: 703 us +- 12 us ......................................... dumps_class_nested_py_leaf_50x100: Mean +- std dev: 25.0 ms +- 0.2 ms ......................................... dumps_class_nested_py_leaf_100x200: Mean +- std dev: 104 ms +- 2 ms ......................................... dumps_dataclass_nested_50x100: Mean +- std dev: 25.4 ms +- 0.8 ms ......................................... WARNING: the benchmark result may be unstable * the standard deviation (15.6 ms) is 24% of the mean (65.8 ms) Try to rerun the benchmark with more runs, values and/or loops. Run 'python -m pyperf system tune' command to reduce the system jitter. Use pyperf stats, pyperf dump and pyperf hist to analyze results. Use --quiet option to hide these warnings. dumps_pydantic_nested_50x100: Mean +- std dev: 65.8 ms +- 15.6 ms ......................................... WARNING: the benchmark result may be unstable * the standard deviation (30.4 ms) is 14% of the mean (220 ms) Try to rerun the benchmark with more runs, values and/or loops. Run 'python -m pyperf system tune' command to reduce the system jitter. Use pyperf stats, pyperf dump and pyperf hist to analyze results. Use --quiet option to hide these warnings. dumps_pydanticv1_nested_50x100: Mean +- std dev: 220 ms +- 30 ms

Check notice on line 1 in .github/actions/python-integration-tests/action.yml

View workflow job for this annotation

GitHub Actions / benchmark

Comparison against main

+-----------------------------------------------+----------+------------------------+ | Benchmark | main | changes | +===============================================+==========+========================+ | create_20_000_run_trees | 1.19 sec | 1.17 sec: 1.02x faster | +-----------------------------------------------+----------+------------------------+ | dumps_class_nested_py_leaf_100x200 | 105 ms | 104 ms: 1.01x faster | +-----------------------------------------------+----------+------------------------+ | dumps_class_nested_py_branch_and_leaf_200x400 | 709 us | 703 us: 1.01x faster | +-----------------------------------------------+----------+------------------------+ | dumps_dataclass_nested_50x100 | 25.6 ms | 25.4 ms: 1.01x faster | +-----------------------------------------------+----------+------------------------+ | dumps_class_nested_py_leaf_50x100 | 25.2 ms | 25.0 ms: 1.01x faster | +-----------------------------------------------+----------+------------------------+ | Geometric mean | (ref) | 1.01x faster | +-----------------------------------------------+----------+------------------------+ Benchmark hidden because not significant (4): dumps_pydantic_nested_50x100, dumps_pydanticv1_nested_50x100, create_5_000_run_trees, create_10_000_run_trees
description: "Run integration tests"
inputs:
python-version:
Expand Down Expand Up @@ -37,30 +37,32 @@
run: |
poetry install --with dev
poetry run pip install -U langchain langchain_anthropic tiktoken rapidfuzz vcrpy numpy
poetry run pip install "urllib3>=2"
shell: bash
working-directory: python

- name: Run integration tests
env:
LANGCHAIN_TRACING_V2: "true"
LANGCHAIN_ENDPOINT: https://beta.api.smith.langchain.com
LANGCHAIN_API_KEY: ${{ inputs.langchain-api-key-beta }}
OPENAI_API_KEY: ${{ inputs.openai-api-key }}
run: make integration_tests_fast
shell: bash
working-directory: python

# - name: Run integration tests
# env:
# LANGCHAIN_TRACING_V2: "true"
# LANGCHAIN_ENDPOINT: https://beta.api.smith.langchain.com
# LANGCHAIN_API_KEY: ${{ inputs.langchain-api-key-beta }}
# OPENAI_API_KEY: ${{ inputs.openai-api-key }}
# LANGSMITH_TEST_CACHE: tests/cassettes
# run: make integration_tests
# shell: bash
# working-directory: python

- name: Run doctest
env:
LANGCHAIN_TRACING_V2: "true"
LANGCHAIN_ENDPOINT: https://api.smith.langchain.com
LANGCHAIN_API_KEY: ${{ inputs.langchain-api-key-prod }}
OPENAI_API_KEY: ${{ inputs.openai-api-key }}
ANTHROPIC_API_KEY: ${{ inputs.anthropic-api-key }}
LANGSMITH_TEST_CACHE: tests/cassettes
run: make doctest
shell: bash
working-directory: python


- name: Run Evaluation
env:
Expand All @@ -69,7 +71,7 @@
LANGCHAIN_API_KEY: ${{ inputs.langchain-api-key-beta }}
OPENAI_API_KEY: ${{ inputs.openai-api-key }}
ANTHROPIC_API_KEY: ${{ inputs.anthropic-api-key }}
LANGSMITH_TEST_CACHE: tests/cassettes
run: make evals
shell: bash
working-directory: python

5 changes: 4 additions & 1 deletion python/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -39,9 +39,12 @@ integration_tests:
poetry run python -m pytest -v --durations=10 --cov=langsmith --cov-report=term-missing --cov-report=html --cov-config=.coveragerc tests/integration_tests

integration_tests_fast:
poetry run python -m pytest -n auto --durations=10 -v --cov=langsmith --cov-report=term-missing --cov-report=html --cov-config=.coveragerc tests/integration_tests
poetry run python -m pytest -n auto --durations=10 -v --cov=langsmith --cov-report=term-missing --cov-report=html --cov-config=.coveragerc --capture=no tests/integration_tests

doctest:
poetry run python -m pytest -vv --capture=no --durations=10 --doctest-modules langsmith/evaluation

doctest_fast:
poetry run python -m pytest -n auto --durations=10 --doctest-modules langsmith

evals:
Expand Down
88 changes: 88 additions & 0 deletions python/langsmith/_internal/_cache_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
"""Provide utilities for managing caching in LangSmith.

Includes a CacheManager class that handles the lifecycle of caching
operations, allowing for easy setup and teardown of caching contexts.
"""

import pathlib
from typing import Optional, Sequence, Union

from langsmith import utils


class CacheManager:
"""Manage caching operations for LangSmith.

Provides methods to start and stop caching, and can be used
as a context manager for automatic cache management.

Attributes:
path (Optional[Union[str, pathlib.Path]]): The path to the cache file.
ignore_hosts (Optional[Sequence[str]]): A list of hosts to ignore in caching.
context_manager: The context manager for caching operations.
context: The context object for the current caching session.
"""

def __init__(
self,
path: Optional[Union[str, pathlib.Path]],
ignore_hosts: Optional[Sequence[str]] = None,
):
"""Initialize the CacheManager.

Args:
path: The path to the cache file.
ignore_hosts: A list of hosts to ignore in caching.
"""
self.path = path
self.ignore_hosts = ignore_hosts
self.context_manager = None
self.context = None

def start_caching(self):
"""Start the caching session.

Returns:
self: The CacheManager instance.

Raises:
RuntimeError: If caching is already started.
"""
if self.context is not None:
raise RuntimeError("Caching is already started")

self.context_manager = utils.with_optional_cache(self.path, self.ignore_hosts)
if self.context_manager:
self.context = self.context_manager.__enter__()
return self

def close(self):
"""Close the current caching session.

Raises:
RuntimeError: If caching is not started.
"""
if self.context_manager is not None:
self.context_manager.__exit__(None, None, None)
if self.context is None:
raise RuntimeError("Caching is not started")
self.context = None
self.context_manager = None

def __enter__(self):
"""Enter the context manager, starting the caching session.

Returns:
self: The CacheManager instance with caching started.
"""
return self.start_caching()

def __exit__(self, exc_type, exc_value, traceback):
"""Exit the context manager, closing the caching session.

Args:
exc_type: The type of the exception that caused the context to be exited.
exc_value: The instance of the exception.
traceback: A traceback object encoding the stack trace.
"""
self.close()
41 changes: 28 additions & 13 deletions python/langsmith/_testing.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
from langsmith import run_trees as rt
from langsmith import schemas as ls_schemas
from langsmith import utils as ls_utils
from langsmith._internal import _cache_utils as cache_utils

try:
import pytest # type: ignore
Expand Down Expand Up @@ -365,6 +366,8 @@ def _end_tests(
},
)
test_suite.wait()
if test_suite.cache_manager.context is not None:
test_suite.cache_manager.close()


VT = TypeVar("VT", bound=Optional[dict])
Expand All @@ -386,12 +389,18 @@ def __init__(
client: Optional[ls_client.Client],
experiment: ls_schemas.TracerSession,
dataset: ls_schemas.Dataset,
cache: Optional[str] = None,
):
self.client = client or rt.get_cached_client()
self._experiment = experiment
self._dataset = dataset
self._version: Optional[datetime.datetime] = None
self._executor = ls_utils.ContextThreadPoolExecutor(max_workers=1)
cache_path = Path(cache) / f"{self._dataset.id}.yaml" if cache else None
self.cache_manager = cache_utils.CacheManager(
path=cache_path, ignore_hosts=[self.client.api_url]
)
self.cache_manager.start_caching()
atexit.register(_end_tests, self)

@property
Expand All @@ -412,6 +421,7 @@ def from_test(
client: Optional[ls_client.Client],
func: Callable,
test_suite_name: Optional[str] = None,
cache: Optional[str] = None,
) -> _LangSmithTestSuite:
client = client or rt.get_cached_client()
test_suite_name = test_suite_name or _get_test_suite_name(func)
Expand All @@ -421,7 +431,9 @@ def from_test(
if test_suite_name not in cls._instances:
test_suite = _get_test_suite(client, test_suite_name)
experiment = _start_experiment(client, test_suite)
cls._instances[test_suite_name] = cls(client, experiment, test_suite)
cls._instances[test_suite_name] = cls(
client, experiment, test_suite, cache=cache
)
return cls._instances[test_suite_name]

@property
Expand Down Expand Up @@ -535,7 +547,10 @@ def _ensure_example(
for k in output_keys:
outputs[k] = inputs.pop(k, None)
test_suite = _LangSmithTestSuite.from_test(
client, func, langtest_extra.get("test_suite_name")
client,
func,
langtest_extra.get("test_suite_name"),
cache=langtest_extra.get("cache"),
)
example_id, example_name = _get_id(func, inputs, test_suite.id)
example_id = langtest_extra["id"] or example_id
Expand Down Expand Up @@ -591,11 +606,6 @@ def _test():
except BaseException as e:
logger.warning(f"Failed to create feedback for run_id {run_id}: {e}")

cache_path = (
Path(langtest_extra["cache"]) / f"{test_suite.id}.yaml"
if langtest_extra["cache"]
else None
)
current_context = rh.get_tracing_context()
metadata = {
**(current_context["metadata"] or {}),
Expand All @@ -604,11 +614,7 @@ def _test():
"reference_example_id": str(example_id),
},
}
with rh.tracing_context(
**{**current_context, "metadata": metadata}
), ls_utils.with_optional_cache(
cache_path, ignore_hosts=[test_suite.client.api_url]
):
with rh.tracing_context(**{**current_context, "metadata": metadata}):
_test()


Expand Down Expand Up @@ -671,7 +677,16 @@ async def _test():
with rh.tracing_context(
**{**current_context, "metadata": metadata}
), ls_utils.with_optional_cache(
cache_path, ignore_hosts=[test_suite.client.api_url]
cache_path,
ignore_hosts=list(
set(
(
test_suite.client.api_url,
"https://beta.api.smith.langchain.com",
"https://api.smith.langchain.com",
)
)
),
):
await _test()

Expand Down
7 changes: 6 additions & 1 deletion python/langsmith/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -708,7 +708,9 @@ def request_with_retries(
for idx in range(stop_after_attempt):
try:
try:
with ls_utils.filter_logs(_urllib3_logger, logging_filters):
with ls_utils.filter_logs(
_urllib3_logger, logging_filters
), ls_utils._ensure_orig():
response = self.session.request(
method,
(
Expand All @@ -721,6 +723,8 @@ def request_with_retries(
)
ls_utils.raise_for_status_with_text(response)
return response
# except TypeError:
# continue
except requests.exceptions.ReadTimeout as e:
logger.debug("Passing on exception %s", e)
if idx + 1 == stop_after_attempt:
Expand Down Expand Up @@ -826,6 +830,7 @@ def request_with_retries(
raise ls_utils.LangSmithError(
f"Failed to {method} {pathname} in LangSmith API. {emsg}"
f"{_context}"
f"{ls_utils._CACHE_HANDLES}"
) from e
except to_ignore_ as e:
if response is not None:
Expand Down
19 changes: 13 additions & 6 deletions python/langsmith/evaluation/_arunner.py
Original file line number Diff line number Diff line change
Expand Up @@ -170,14 +170,10 @@ async def aevaluate(

Evaluating over only a subset of the examples using an async generator:

>>> async def example_generator():
... examples = client.list_examples(dataset_name=dataset_name, limit=5)
... for example in examples:
... yield example
>>> results = asyncio.run(
... aevaluate(
... apredict,
... data=example_generator(),
... data=client.list_examples(dataset_name=dataset_name, limit=5),
... evaluators=[accuracy],
... summary_evaluators=[precision],
... experiment_prefix="My Subset Experiment",
Expand Down Expand Up @@ -405,7 +401,18 @@ async def _aevaluate(
cache_path = pathlib.Path(cache_dir) / f"{dsid}.yaml"
else:
cache_path = None
with ls_utils.with_optional_cache(cache_path, ignore_hosts=[client.api_url]):
with ls_utils.with_optional_cache(
cache_path,
ignore_hosts=list(
set(
(
client.api_url,
"https://beta.api.smith.langchain.com",
"https://api.smith.langchain.com",
)
)
),
):
if is_async_target:
manager = await manager.awith_predictions(
cast(ATARGET_T, target), max_concurrency=max_concurrency
Expand Down
13 changes: 12 additions & 1 deletion python/langsmith/evaluation/_runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -921,7 +921,18 @@ def _evaluate(
cache_path = (
pathlib.Path(cache_dir) / f"{manager.dataset_id}.yaml" if cache_dir else None
)
with ls_utils.with_optional_cache(cache_path, ignore_hosts=[client.api_url]):
with ls_utils.with_optional_cache(
cache_path,
ignore_hosts=list(
set(
(
client.api_url,
"https://beta.api.smith.langchain.com",
"https://api.smith.langchain.com",
)
)
),
):
if _is_callable(target):
# Add predictions to the experiment.
manager = manager.with_predictions(
Expand Down
Loading
Loading