python[minor]: pytest integration #1362

baskaryan · 2024-12-30T22:14:20Z

Log pytest tests to LangSmith. Useful for:

Evaluations where the eval logic is different for each datapoint, making it difficult to use generic evaluators on a whole dataset
Unit tests where you want both the local pytest experience and the ability to trace and share results

Install

pip install "langsmith[pytest]==0.2.11rc7"

Simple usage

# tests/test_simple.py
import pytest
from langsmith import testing as t

@pytest.mark.langsmith
def test_addition_single():
    x = 3
    y = 4
    # directly log example inputs if you don't want to use fixtures
    t.log_inputs({"x": x, "y": y})

    expected = 7
    # directly log example outputs if you don't want to use fixtures
    t.log_reference_outputs({"sum": expected})

    actual = x + y
    # directly log run outputs
    t.log_outputs({"sum": actual})
    
    # test pass/fail automatically logged to langsmith
    assert actual == expected

Run

pytest --outputs='ls' tests/test_foo.py

Results

Advanced usage

#tests/test_advanced.py
import openai
import pytest

from langsmith import wrappers
from langsmith import testing as t

oai_client = wrappers.wrap_openai(openai.Client())

@pytest.mark.langsmith
def test_openai_says_hello():
    # Traced code will be included in the test case
    text = "Say hello!"
    response = oai_client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": text},
        ],
    )
    t.log_inputs({"text": text})
    t.log_outputs({"response": response.choices[0].message.content})
    t.log_reference_outputs({"response": "hello!"})

    # Use this context manager to trace any steps used for generating evaluation 
    # feedback separately from the main application logic
    with t.trace_feedback():
        grade = oai_client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[
                {
                    "role": "system",
                    "content": "Return 1 if 'hello' is in the user message and 0 otherwise.",
                },
                {"role": "user", "content": response.choices[0].message.content},
            ],
        )
        t.log_feedback(
            key="llm_judge", score=float(grade.choices[0].message.content)
        )

    assert "hello" in response.choices[0].message.content.lower()


@pytest.mark.langsmith(output_keys=["expected"])
@pytest.mark.parametrize(
    "a, b, expected",
    [(1, 2, 3), (3, 4, 7)],
)
def test_addition_parametrized(a: int, b: int, expected: int):
    t.log_outputs({"sum": a + b})
    assert a + b == expected

Run using pytest-xdist to parallelize (pip install pytest-xdist first)

LANGSMITH_TEST_SUITE="Test suite name" LANGSMITH_EXPERIMENT="Experiment name" pytest  --outputs='ls' tests

Results: https://dev.smith.langchain.com/public/cea0e7fd-2d27-47d1-8ada-141069acdf0d/d

hinthornw · 2024-12-31T01:38:07Z

I like the functionality. I kinda like using a single module/object to access the methods, like

@unit
def test_example(captest):
    x = 0
    y = 1
    captest.log_inputs({"x": x, "y": y})
    captest.log_reference({"product": 0})
    captest.log_outputs({"product": x * y})
    assert x * y == 0

or something

hinthornw · 2025-01-03T14:41:02Z

I like

jxnl · 2025-01-08T21:20:03Z

why is it "reference" input vs

target, expect, etc

…angsmith-sdk into bagatur/rfc_set_test_vals

baskaryan · 2025-01-09T01:27:24Z

why is it "reference" input vs

target, expect, etc

"reference outputs" is what we call this when defining evaluators for evaluate() and how we display this info in experiments UI, want to be consistent with existing flows

historically this decision was made bc target/expected seemed a bit too narrow/prescriptive when this can be any information relevant for evaluating outputs, not necessarily the actual expected outputs. eg could just be agent trajectories not actual agent outputs. but see the argument that target/expected are more common terms and it wasn't worth us introducing a new term for such a small nuance

baskaryan · 2025-01-15T23:30:54Z

python/langsmith/testing/_internal.py

+            pytest_nodeid=pytest_nodeid,
+        )
+
+    def _end_run(


needs review, very open to less hacky ideas

can't think of any great solutions right now without async. Would add a detailed comment about this and address later as a TODO

baskaryan · 2025-01-15T23:32:50Z

python/langsmith/testing/_internal.py

+
+@warn_beta
+@contextlib.contextmanager
+def trace_feedback(


worth review

baskaryan · 2025-01-15T23:33:34Z

python/langsmith/testing/_internal.py

+    )
+
+
+def _run_test(


worth review

looks reasonable to me

agola11

just one main comment

agola11 · 2025-01-16T00:55:32Z

python/langsmith/testing/_internal.py

+    def end_run(
+        self, run_tree, example_id, outputs, pytest_plugin=None, pytest_nodeid=None
+    ) -> Future:
+        return self._executor.submit(


why do we need to submit this in an executor? run_tree.patch calls update_run which is async already

agola11 · 2025-01-16T00:58:24Z

python/langsmith/testing/_internal.py

+            pytest_nodeid=pytest_nodeid,
+        )
+
+    def _end_run(


can't think of any great solutions right now without async. Would add a detailed comment about this and address later as a TODO

agola11 · 2025-01-16T02:43:51Z

python/langsmith/testing/_internal.py

+    )
+
+
+def _run_test(


looks reasonable to me

rfc: manually set test case inputs/outputs

ed71173

fmt

80cbf66

This comment was marked as outdated.

Sign in to view

fmt

b33f75a

baskaryan requested review from hinthornw and hwchase17 January 2, 2025 15:48

baskaryan and others added 4 commits January 2, 2025 10:51

fmt

37f43cb

Merge branch 'main' into bagatur/rfc_set_test_vals

cf82fd6

Merge branch 'main' into bagatur/rfc_set_test_vals

80d4205

Merge branch 'main' into bagatur/rfc_set_test_vals

fa8882f

baskaryan marked this pull request as ready for review January 3, 2025 14:15

baskaryan changed the title ~~rfc: manually set test case inputs/outputs~~ python[minor]: manually set test case inputs/outputs Jan 3, 2025

fmt

6b999b4

baskaryan added 3 commits January 3, 2025 13:43

fmt

cbbf3a3

fmt

a61d7d0

fmt

cf37a91

This comment was marked as outdated.

Sign in to view

baskaryan and others added 9 commits January 3, 2025 15:15

fmt

c9addf0

Merge branch 'main' into bagatur/rfc_set_test_vals

d814ec5

fmt

8700d4b

fmt

376a645

Merge branch 'main' into bagatur/rfc_set_test_vals

81e41f4

rc release

d5d4ebb

Merge branch 'main' into bagatur/rfc_set_test_vals

6940ace

Merge branch 'main' into bagatur/rfc_set_test_vals

d3ed9c6

add better error messaging

bd72391

Merge branch 'main' into bagatur/rfc_set_test_vals

542d76e

baskaryan added 3 commits January 8, 2025 15:47

Merge branch 'bagatur/rfc_set_test_vals' of github.com:langchain-ai/l…

c4de666

…angsmith-sdk into bagatur/rfc_set_test_vals

fmt

44897c7

fmt

2192c09

baskaryan and others added 11 commits January 8, 2025 18:07

fmt

7b433a2

rc2

48670e3

fmt

78eeabf

fmt

e1c9d7c

ptyest plugin

400e38b

Merge branch 'main' into bagatur/rfc_set_test_vals

9250bbf

Merge branch 'main' into bagatur/rfc_set_test_vals

414ef69

fmt

1446452

Merge branch 'main' into bagatur/rfc_set_test_vals

790617d

fmt

37ceca0

rc6

1fca587

baskaryan changed the title ~~python[minor]: manually set test case inputs/outputs~~ python[minor]: pytest integration Jan 13, 2025

baskaryan added 12 commits January 13, 2025 15:58

fmt

88e6f63

Merge branch 'main' into bagatur/rfc_set_test_vals

43509c2

update table

233ae72

fmt

996e21f

Merge branch 'main' into bagatur/rfc_set_test_vals

d79bec9

group by test suite

99069fb

fmt

6076c29

fmt

d4d1695

fmt

441a8b1

fmt

05dbf16

rc7

82cd582

fmt

e56a68f

baskaryan commented Jan 15, 2025

View reviewed changes

agola11 approved these changes Jan 16, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

python[minor]: pytest integration #1362

python[minor]: pytest integration #1362

baskaryan commented Dec 30, 2024 •

edited

Loading

hinthornw commented Dec 31, 2024

This comment was marked as outdated.

hinthornw commented Jan 3, 2025

This comment was marked as outdated.

jxnl commented Jan 8, 2025

baskaryan commented Jan 9, 2025

baskaryan Jan 15, 2025

agola11 Jan 16, 2025

baskaryan Jan 15, 2025

baskaryan Jan 15, 2025

agola11 Jan 16, 2025

agola11 left a comment

agola11 Jan 16, 2025

agola11 Jan 16, 2025

agola11 Jan 16, 2025

python[minor]: pytest integration #1362

Are you sure you want to change the base?

python[minor]: pytest integration #1362

Conversation

baskaryan commented Dec 30, 2024 • edited Loading

Install

Simple usage

Advanced usage

hinthornw commented Dec 31, 2024

This comment was marked as outdated.

hinthornw commented Jan 3, 2025

This comment was marked as outdated.

jxnl commented Jan 8, 2025

baskaryan commented Jan 9, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

agola11 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

baskaryan commented Dec 30, 2024 •

edited

Loading