python[patch]: accept simple evaluators #1200

baskaryan · 2024-11-09T01:39:39Z

can write evaluators like this:

from langsmith import evaluate

def simp(inputs: dict, outputs: dict, reference_outputs: dict) -> dict:
    return {"results": [
        {"score": inputs == outputs, "key": 'identity'}, 
        {"score": outputs == reference_outputs, "key": "correct"}
    ]}

evaluate(
    (lambda x: x),
    data="Sample Dataset 3",
    evaluators=[simp],
)

example experiment: left-tray-86 https://dev.smith.langchain.com/public/e7782ea0-3de5-4352-8cd4-7b2cdbb03e4c/d

hinthornw

Nice i like the start will re-review this morning

python/langsmith/evaluation/evaluator.py

Co-authored-by: William FH <[email protected]>

…i/langsmith-sdk into bagatur/rfc_simple_evaluator

jakerachleff · 2024-11-12T18:15:37Z

python/langsmith/evaluation/evaluator.py

@@ -632,3 +636,70 @@ def comparison_evaluator(
 ) -> DynamicComparisonRunEvaluator:
    """Create a comaprison evaluator from a function."""
    return DynamicComparisonRunEvaluator(func)
+
+
+def _normalize_evaluator_func(


might be nice to add like a couple unit tests on this to make it obvious it's working

jakerachleff

I think this makes sense, but would add some extra tests in to confirm it works properly

python/langsmith/evaluation/evaluator.py

hinthornw · 2024-11-12T19:00:01Z

Do we want it to be like pytest where it's all by name?
run, example, inputs, predictions, reference

baskaryan · 2024-11-12T19:16:38Z

Do we want it to be like pytest where it's all by name? run, example, inputs, predictions, reference

yea i like that. for backwards compat can't enforce run/example but can enforce the others

agola11 · 2024-11-13T19:11:36Z

python/langsmith/evaluation/evaluator.py

+]:
+    # for backwards compatibility, if args are untyped we assume they correspond to
+    # Run and Example:
+    if not (type_hints := get_type_hints(func)):


shouldn't we check the number of args here? traditional evaluators have run and example whereas the simple evaluators take 3 args

agola11 · 2024-11-13T19:12:24Z

python/langsmith/evaluation/evaluator.py

+        if not (
+            num_positional in (2, 3) or (num_positional <= 3 and has_positional_var)
+        ):
+            msg = (
+                "Invalid evaluator function. Expected to take either 2 or 3 positional "
+                "arguments. Please see "
+                "https://docs.smith.langchain.com/evaluation/how_to_guides/evaluation/evaluate_llm_application#use-custom-evaluators"  # noqa: E501
+            )
+            raise ValueError(msg)


seems like this check on arg length should be moved up

hinthornw · 2024-11-14T21:48:02Z

python/langsmith/evaluation/evaluator.py

+        msg = (
+            f"Invalid evaluator function. Must have at least one positional "
+            f"argument. Supported positional arguments are {supported_args}. Please "
+            f"see https://docs.smith.langchain.com/evaluation/how_to_guides/evaluation/evaluate_llm_application#use-custom-evaluators"


This link feels like it's not gonna have a long shelf-life

ye im updating it as we speak, but there will be redirects

hinthornw · 2024-11-14T22:27:27Z

python/langsmith/evaluation/evaluator.py

+        if p.kind in (p.POSITIONAL_OR_KEYWORD, p.POSITIONAL_ONLY)
+        and p.default is p.empty
+    ]
+    if not positional_no_default or (


ooc, why do we require at least one positional one?

we only pass in the supported args as positional args, so equivalent to enforcing that there's at least one supported arg

hinthornw · 2024-11-14T22:29:23Z

python/langsmith/evaluation/evaluator.py

+                    "outputs": run.outputs or {},
+                    "reference_outputs": example.outputs or {},
+                }
+                args = (arg_map[arg] for arg in positional_no_default)


If I put a default in an arg this silently never provides the matching value. Would either want to validate ahead of time that no default is provided (preferred) or pass it in anyway (think not preferred)

think im just going to remove the check for whether it has defaults or not, doesn't seem necessary

hinthornw · 2024-11-14T22:30:28Z

python/langsmith/evaluation/_runner.py

@@ -87,6 +87,7 @@
        [schemas.Run, Optional[schemas.Example]],
        Union[EvaluationResult, EvaluationResults],
    ],
+    Callable[..., Union[dict, EvaluationResults, EvaluationResult]],


Could we update the docstring for evaluate() and aevaluate() to have examples or link to a docs page that shows the valid arguments?

want to update docs and api ref all at once, will do as fast follow

python/langsmith/evaluation/evaluator.py

js equivalent of #1200

rfc: accept simple evaluators

7e901ad

baskaryan requested a review from hinthornw November 9, 2024 01:39

hinthornw reviewed Nov 11, 2024

View reviewed changes

python/langsmith/evaluation/evaluator.py Outdated Show resolved Hide resolved

python/langsmith/evaluation/evaluator.py Outdated Show resolved Hide resolved

baskaryan and others added 4 commits November 11, 2024 15:08

Merge branch 'main' into bagatur/rfc_simple_evaluator

3877b36

Update python/langsmith/evaluation/evaluator.py

9b26f6b

Co-authored-by: William FH <[email protected]>

Merge branch 'bagatur/rfc_simple_evaluator' of github.com:langchain-a…

4a24f9f

…i/langsmith-sdk into bagatur/rfc_simple_evaluator

fmt

b3b841f

baskaryan marked this pull request as ready for review November 12, 2024 01:52

baskaryan changed the title ~~rfc: accept simple evaluators~~ python[patch]: accept simple evaluators Nov 12, 2024

baskaryan requested a review from hinthornw November 12, 2024 14:55

jakerachleff reviewed Nov 12, 2024

View reviewed changes

jakerachleff approved these changes Nov 12, 2024

View reviewed changes

jakerachleff reviewed Nov 12, 2024

View reviewed changes

python/langsmith/evaluation/evaluator.py Outdated Show resolved Hide resolved

agola11 reviewed Nov 13, 2024

View reviewed changes

baskaryan added 4 commits November 14, 2024 09:54

Merge branch 'main' into bagatur/rfc_simple_evaluator

8604117

cr

03506c5

merge

a4d2b03

fmt

448bbf3

hinthornw reviewed Nov 14, 2024

View reviewed changes

python/langsmith/evaluation/evaluator.py Show resolved Hide resolved

baskaryan and others added 4 commits November 18, 2024 07:51

Merge branch 'main' into bagatur/rfc_simple_evaluator

2f88f07

fmt

0eb3cf1

fmt

8b90979

fmt

459402d

baskaryan merged commit 9336fce into main Nov 18, 2024
9 checks passed

baskaryan deleted the bagatur/rfc_simple_evaluator branch November 18, 2024 17:39

baskaryan mentioned this pull request Nov 27, 2024

js[patch]: simple evaluator args #1264

Merged

baskaryan added a commit that referenced this pull request Dec 3, 2024

js[patch]: simple evaluator args (#1264)

94eabad

js equivalent of #1200

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

python[patch]: accept simple evaluators #1200

python[patch]: accept simple evaluators #1200

baskaryan commented Nov 9, 2024

hinthornw left a comment

jakerachleff Nov 12, 2024

jakerachleff left a comment

hinthornw commented Nov 12, 2024

baskaryan commented Nov 12, 2024

agola11 Nov 13, 2024

agola11 Nov 13, 2024

hinthornw Nov 14, 2024

baskaryan Nov 14, 2024

hinthornw Nov 14, 2024

baskaryan Nov 14, 2024

hinthornw Nov 14, 2024

baskaryan Nov 18, 2024

hinthornw Nov 14, 2024

baskaryan Nov 18, 2024

python[patch]: accept simple evaluators #1200

python[patch]: accept simple evaluators #1200

Conversation

baskaryan commented Nov 9, 2024

hinthornw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jakerachleff left a comment

Choose a reason for hiding this comment

hinthornw commented Nov 12, 2024

baskaryan commented Nov 12, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment