python[patch]: `evaluate` local mode #1224

baskaryan · 2024-11-18T20:58:06Z

Adds an upload_results flag to avoid tracing any runs (target or evaluator) or creating an experiment in langsmith:

from langsmith import evaluate

results = evaluate(
    lambda x: x, 
    data="Sample Dataset 3", 
    evaluators=[lambda inputs: {"score": 1, "key": "correct"}],
    upload_results=False
)

baskaryan · 2024-11-19T00:08:55Z

python/tests/unit_tests/test_run_helpers.py

@@ -1201,14 +1201,14 @@ def my_run(foo: str):

    my_run(foo="bar", langsmith_extra={"parent": headers, "client": mock_client})
    mock_calls = _get_calls(mock_client)
-    assert len(mock_calls) == 1


@hinthornw not sure i understand this behavior, do we need/want to maintain it?

Ya. At one point what was happening was we'd create a RunTree from headers withoutt he overriding client, and then use that, meaning the user configuration was broken

…h-sdk into bagatur/local_mode

jakerachleff · 2024-11-21T00:59:19Z

python/langsmith/evaluation/_arunner.py

@@ -675,7 +685,7 @@ async def _arun_evaluators(
                **current_context,
                "project_name": "evaluators",
                "metadata": metadata,
-                "enabled": True,
+                "enabled": "local" if not self._upload_results else True,


What does local mean here?

means we're tracing (ie tracking intermediate steps) but not sending to langsmith

In that case, where do they get stored?

we're keeping all the runs in memory currently (on the ExperimentManager)

python/langsmith/evaluation/_runner.py

python/langsmith/run_helpers.py

agola11 · 2024-11-21T05:24:18Z

python/langsmith/evaluation/_arunner.py

@@ -83,6 +83,7 @@ async def aevaluate(
    client: Optional[langsmith.Client] = None,
    blocking: bool = True,
    experiment: Optional[Union[schemas.TracerSession, str, uuid.UUID]] = None,
+    upload_results: bool = True,


flyby comment: the fact that we have to update so many call sites feels pretty problematic

wdym call sites?

Our data model for evals is indeed too complicated lol -tracer sessions, feedback, runs in multiple sessions, ay caramba

agola11 · 2024-11-21T05:25:02Z

python/langsmith/evaluation/_arunner.py

@@ -675,7 +685,7 @@ async def _arun_evaluators(
                **current_context,
                "project_name": "evaluators",
                "metadata": metadata,
-                "enabled": True,
+                "enabled": "local" if not self._upload_results else True,


In that case, where do they get stored?

python/langsmith/evaluation/_runner.py

agola11 · 2024-11-21T05:32:11Z

python/langsmith/evaluation/_arunner.py

+                    if self._upload_results:
+                        for result in flattened_results:
+                            feedback = result.dict(exclude={"target_run_id"})
+                            evaluator_info = feedback.pop("evaluator_info", None)
+                            await aitertools.aio_to_thread(
+                                self.client.create_feedback,
+                                **feedback,
+                                run_id=None,
+                                project_id=project_id,
+                                source_info=evaluator_info,
+                            )


A couple of comments here:

What's the rationale of awaiting in the for loop over gathering the results with some max concurrency?

Second, in most cases now, (when multipart is enabled) we now enqueue the feedback to the TracingQueue for asyncronous background processing. So it really doesn't make sense to keep this logic to run this in an executor? cc @hinthornw for this too

fwiw the only change here was condition in L786 (rest is just indentation)

Ya we can asyncio.gather()

re: background processing, it doesn't get put in the background queue because we lack a trace_id in this case

agola11 · 2024-11-21T05:32:25Z

python/langsmith/evaluation/_runner.py

-                                project_id=project_id,
-                                source_info=evaluator_info,
-                            )
+                        if self._upload_results:


Same comment as above here

…h-sdk into bagatur/local_mode

hinthornw

Think this looks good. I do today feel more that upload_results=False shouldn't enforce local mode but rather just shouldn't create a project and then should use the user's environment setup to see whether or not to trace

baskaryan added 5 commits November 18, 2024 09:42

rfc: evaluate local mode

2dfc9fd

Merge branch 'main' into bagatur/local_mode

fb1469f

wip

3fc7012

fmt

8f245d8

fmt

aa0acf5

baskaryan requested a review from hinthornw November 18, 2024 20:58

baskaryan and others added 4 commits November 18, 2024 13:15

fmt

82a7bf6

fmt

7ea5729

Pass through things

76f7132

debug

0220f31

baskaryan commented Nov 19, 2024

View reviewed changes

baskaryan and others added 3 commits November 18, 2024 16:11

Merge branch 'main' into bagatur/local_mode

da548f9

fmt

9e83081

fmt

4d746aa

baskaryan marked this pull request as ready for review November 19, 2024 00:25

baskaryan changed the title ~~rfc: evaluate local mode~~ python[patch]: evaluate local mode Nov 19, 2024

baskaryan and others added 12 commits November 18, 2024 16:31

fmt

84cbb6e

fmt

c650280

fmt

a19f1d5

fmt

befed70

fmt

62f9476

fmt

02e5c75

fmt

8e59842

fmt

a0504f0

Merge branch 'main' into bagatur/local_mode

4b7b8b2

Merge branch 'main' into bagatur/local_mode

8f858a5

Merge branch 'bagatur/local_mode' of github.com:langchain-ai/langsmit…

0047fe7

…h-sdk into bagatur/local_mode

Merge branch 'main' into bagatur/local_mode

07085ef

jakerachleff reviewed Nov 21, 2024

View reviewed changes

agola11 reviewed Nov 21, 2024

View reviewed changes

baskaryan and others added 5 commits November 22, 2024 10:20

fmt

94c4fef

Merge branch 'bagatur/local_mode' of github.com:langchain-ai/langsmit…

b24f78a

…h-sdk into bagatur/local_mode

fmt

324f3d1

fmt

1ba1bfa

Merge branch 'main' into bagatur/local_mode

bdfea79

baskaryan enabled auto-merge (squash) November 22, 2024 18:06

baskaryan disabled auto-merge November 22, 2024 18:16

hinthornw approved these changes Nov 22, 2024

View reviewed changes

baskaryan merged commit 6cf7c9b into main Nov 22, 2024
10 checks passed

baskaryan deleted the bagatur/local_mode branch November 22, 2024 18:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

python[patch]: `evaluate` local mode #1224

python[patch]: `evaluate` local mode #1224

baskaryan commented Nov 18, 2024

baskaryan Nov 19, 2024

hinthornw Nov 22, 2024

jakerachleff Nov 21, 2024

baskaryan Nov 21, 2024

agola11 Nov 21, 2024

baskaryan Nov 21, 2024

agola11 Nov 21, 2024

hinthornw Nov 22, 2024

agola11 Nov 21, 2024

agola11 Nov 21, 2024

baskaryan Nov 21, 2024

hinthornw Nov 22, 2024

hinthornw Nov 22, 2024

agola11 Nov 21, 2024

hinthornw left a comment

python[patch]: evaluate local mode #1224

python[patch]: evaluate local mode #1224

Conversation

baskaryan commented Nov 18, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hinthornw left a comment

Choose a reason for hiding this comment

python[patch]: `evaluate` local mode #1224

python[patch]: `evaluate` local mode #1224