Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python[minor]: pytest integration #1362

Open
wants to merge 49 commits into
base: main
Choose a base branch
from
Open

Conversation

baskaryan
Copy link
Contributor

@baskaryan baskaryan commented Dec 30, 2024

Log pytest tests to LangSmith. Useful for:

  • Evaluations where the eval logic is different for each datapoint, making it difficult to use generic evaluators on a whole dataset
  • Unit tests where you want both the local pytest experience and the ability to trace and share results

Install

pip install "langsmith[pytest]==0.2.11rc7"

Simple usage

# tests/test_simple.py
import pytest
from langsmith import testing as t

@pytest.mark.langsmith
def test_addition_single():
    x = 3
    y = 4
    # directly log example inputs if you don't want to use fixtures
    t.log_inputs({"x": x, "y": y})

    expected = 7
    # directly log example outputs if you don't want to use fixtures
    t.log_reference_outputs({"sum": expected})

    actual = x + y
    # directly log run outputs
    t.log_outputs({"sum": actual})
    
    # test pass/fail automatically logged to langsmith
    assert actual == expected

Run

pytest --outputs='ls' tests/test_foo.py

Results

Screenshot 2025-01-08 at 2 53 04 AM

Advanced usage

#tests/test_advanced.py
import openai
import pytest

from langsmith import wrappers
from langsmith import testing as t

oai_client = wrappers.wrap_openai(openai.Client())

@pytest.mark.langsmith
def test_openai_says_hello():
    # Traced code will be included in the test case
    text = "Say hello!"
    response = oai_client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": text},
        ],
    )
    t.log_inputs({"text": text})
    t.log_outputs({"response": response.choices[0].message.content})
    t.log_reference_outputs({"response": "hello!"})

    # Use this context manager to trace any steps used for generating evaluation 
    # feedback separately from the main application logic
    with t.trace_feedback():
        grade = oai_client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[
                {
                    "role": "system",
                    "content": "Return 1 if 'hello' is in the user message and 0 otherwise.",
                },
                {"role": "user", "content": response.choices[0].message.content},
            ],
        )
        t.log_feedback(
            key="llm_judge", score=float(grade.choices[0].message.content)
        )

    assert "hello" in response.choices[0].message.content.lower()


@pytest.mark.langsmith(output_keys=["expected"])
@pytest.mark.parametrize(
    "a, b, expected",
    [(1, 2, 3), (3, 4, 7)],
)
def test_addition_parametrized(a: int, b: int, expected: int):
    t.log_outputs({"sum": a + b})
    assert a + b == expected

Run using pytest-xdist to parallelize (pip install pytest-xdist first)

LANGSMITH_TEST_SUITE="Test suite name" LANGSMITH_EXPERIMENT="Experiment name" pytest  --outputs='ls' tests

Results: https://dev.smith.langchain.com/public/cea0e7fd-2d27-47d1-8ada-141069acdf0d/d

Screenshot 2025-01-08 at 3 07 51 AM

@hinthornw
Copy link
Collaborator

I like the functionality. I kinda like using a single module/object to access the methods, like

@unit
def test_example(captest):
    x = 0
    y = 1
    captest.log_inputs({"x": x, "y": y})
    captest.log_reference({"product": 0})
    captest.log_outputs({"product": x * y})
    assert x * y == 0

or something

@baskaryan

This comment was marked as outdated.

@baskaryan baskaryan marked this pull request as ready for review January 3, 2025 14:15
@baskaryan baskaryan changed the title rfc: manually set test case inputs/outputs python[minor]: manually set test case inputs/outputs Jan 3, 2025
@hinthornw
Copy link
Collaborator

I like

@baskaryan

This comment was marked as outdated.

@jxnl
Copy link

jxnl commented Jan 8, 2025

why is it "reference" input vs

target, expect, etc

@baskaryan
Copy link
Contributor Author

why is it "reference" input vs

target, expect, etc

"reference outputs" is what we call this when defining evaluators for evaluate() and how we display this info in experiments UI, want to be consistent with existing flows

historically this decision was made bc target/expected seemed a bit too narrow/prescriptive when this can be any information relevant for evaluating outputs, not necessarily the actual expected outputs. eg could just be agent trajectories not actual agent outputs. but see the argument that target/expected are more common terms and it wasn't worth us introducing a new term for such a small nuance

@baskaryan baskaryan changed the title python[minor]: manually set test case inputs/outputs python[minor]: pytest integration Jan 13, 2025
pytest_nodeid=pytest_nodeid,
)

def _end_run(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs review, very open to less hacky ideas

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't think of any great solutions right now without async. Would add a detailed comment about this and address later as a TODO


@warn_beta
@contextlib.contextmanager
def trace_feedback(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worth review

)


def _run_test(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worth review

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks reasonable to me

Copy link
Contributor

@agola11 agola11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just one main comment

def end_run(
self, run_tree, example_id, outputs, pytest_plugin=None, pytest_nodeid=None
) -> Future:
return self._executor.submit(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to submit this in an executor? run_tree.patch calls update_run which is async already

pytest_nodeid=pytest_nodeid,
)

def _end_run(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't think of any great solutions right now without async. Would add a detailed comment about this and address later as a TODO

)


def _run_test(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks reasonable to me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants