-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
python[patch]: evaluate
local mode
#1224
Conversation
@@ -1201,14 +1201,14 @@ def my_run(foo: str): | |||
|
|||
my_run(foo="bar", langsmith_extra={"parent": headers, "client": mock_client}) | |||
mock_calls = _get_calls(mock_client) | |||
assert len(mock_calls) == 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hinthornw not sure i understand this behavior, do we need/want to maintain it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya. At one point what was happening was we'd create a RunTree from headers withoutt he overriding client, and then use that, meaning the user configuration was broken
evaluate
local modeevaluate
local mode
…h-sdk into bagatur/local_mode
@@ -675,7 +685,7 @@ async def _arun_evaluators( | |||
**current_context, | |||
"project_name": "evaluators", | |||
"metadata": metadata, | |||
"enabled": True, | |||
"enabled": "local" if not self._upload_results else True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does local
mean here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
means we're tracing (ie tracking intermediate steps) but not sending to langsmith
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, where do they get stored?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we're keeping all the runs in memory currently (on the ExperimentManager)
@@ -83,6 +83,7 @@ async def aevaluate( | |||
client: Optional[langsmith.Client] = None, | |||
blocking: bool = True, | |||
experiment: Optional[Union[schemas.TracerSession, str, uuid.UUID]] = None, | |||
upload_results: bool = True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flyby comment: the fact that we have to update so many call sites feels pretty problematic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wdym call sites?
Our data model for evals is indeed too complicated lol -tracer sessions, feedback, runs in multiple sessions, ay caramba
@@ -675,7 +685,7 @@ async def _arun_evaluators( | |||
**current_context, | |||
"project_name": "evaluators", | |||
"metadata": metadata, | |||
"enabled": True, | |||
"enabled": "local" if not self._upload_results else True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, where do they get stored?
if self._upload_results: | ||
for result in flattened_results: | ||
feedback = result.dict(exclude={"target_run_id"}) | ||
evaluator_info = feedback.pop("evaluator_info", None) | ||
await aitertools.aio_to_thread( | ||
self.client.create_feedback, | ||
**feedback, | ||
run_id=None, | ||
project_id=project_id, | ||
source_info=evaluator_info, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of comments here:
- What's the rationale of
await
ing in the for loop overgather
ing the results with some max concurrency? - Second, in most cases now, (when multipart is enabled) we now enqueue the feedback to the TracingQueue for asyncronous background processing. So it really doesn't make sense to keep this logic to run this in an executor? cc @hinthornw for this too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fwiw the only change here was condition in L786 (rest is just indentation)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya we can asyncio.gather()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re: background processing, it doesn't get put in the background queue because we lack a trace_id in this case
project_id=project_id, | ||
source_info=evaluator_info, | ||
) | ||
if self._upload_results: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above here
…h-sdk into bagatur/local_mode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think this looks good. I do today feel more that upload_results=False shouldn't enforce local mode but rather just shouldn't create a project and then should use the user's environment setup to see whether or not to trace
Adds an
upload_results
flag to avoid tracing any runs (target or evaluator) or creating an experiment in langsmith: