chore(llmobs): update ragas trace ml app #11952

lievan · 2025-01-15T14:58:33Z

Update the ml application of any ragas evaluation span to be the same as the span being evaluated.

Refresher:

Spans are generated by datadog for RAGAS operations
These spans need to be identified as RAGAS-specific to avoid an infinite eval loop and tagged with 'runner.integration:ragas' for backend purposes
Some of these spans are auto-instrumented through our langchain integration, meaning they can't be manually tagged

Previous Implementation:

Added a dd-ragas- prefix to the ML application name for RAGAS spans
This prefix was used to identify which spans came from RAGAS when spans are processed

Now:

add _dd_ragas prefix on ragas span's ml app for identification purposes, except only temporarily.
Just before sending these spans to the backend, remove the prefix

An alternative solution is to utilize annotation_contexts to set a tag on auto-instrumented ragas spans. This is a larger refactor that i would like to explore when we clean up the ragas evaluators (more cleanly separate tracing setup vs actual eval logic).

Checklist

PR author has checked that all the criteria below are met
The PR description includes an overview of the change
The PR description articulates the motivation for the change
The change includes tests OR the PR description describes a testing strategy
The PR description notes risks associated with the change, if any
Newly-added code is easy to change
The change follows the library release note guidelines
The change includes or references documentation updates if necessary
Backport labels are set (if applicable)

Reviewer Checklist

Reviewer has checked that all the criteria below are met
Title is accurate
All changes are related to the pull request's stated goal
Avoids breaking API changes
Testing strategy adequately addresses listed risks
Newly-added code is easy to change
Release note makes sense to a user of the library
If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment
Backport labels are set in a manner that is consistent with the release branch maintenance policy

github-actions · 2025-01-15T14:59:05Z

CODEOWNERS have been resolved as:

ddtrace/llmobs/_constants.py                                            @DataDog/ml-observability
ddtrace/llmobs/_evaluators/ragas/answer_relevancy.py                    @DataDog/ml-observability
ddtrace/llmobs/_evaluators/ragas/base.py                                @DataDog/ml-observability
ddtrace/llmobs/_evaluators/ragas/context_precision.py                   @DataDog/ml-observability
ddtrace/llmobs/_evaluators/ragas/faithfulness.py                        @DataDog/ml-observability
ddtrace/llmobs/_llmobs.py                                               @DataDog/ml-observability
ddtrace/llmobs/_utils.py                                                @DataDog/ml-observability
tests/llmobs/_utils.py                                                  @DataDog/ml-observability
tests/llmobs/test_llmobs_service.py                                     @DataDog/ml-observability

…gas-ml-app

pr-commenter · 2025-01-16T00:24:40Z

Benchmarks

Benchmark execution time: 2025-01-17 20:28:11

Comparing candidate commit 38e1f62 in PR branch evan.li/ragas-ml-app with baseline commit ef4c997 in branch main.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 394 metrics, 2 unstable metrics.

lievan · 2025-01-16T16:04:58Z

ddtrace/llmobs/_llmobs.py

@@ -290,6 +286,10 @@ def _stop_service(self) -> None:
        except ServiceStatusError:
            log.debug("Error stopping LLMObs writers")

+        # Remove listener hooks for span events


this makes sure hooks aren't removed prematurely while ragas is still running as triggered by self._evaluator_runner.stop()

Was this triggering any bugs?

ddtrace/llmobs/_constants.py

Yun-Kim · 2025-01-17T15:52:30Z

ddtrace/llmobs/_llmobs.py

+        span._set_ctx_item(ML_APP, ml_app)

        is_ragas_integration_span = False
-
-        if ml_app.startswith(constants.RAGAS_ML_APP_PREFIX):
+        if ml_app.startswith(constants.TEMP_RAGAS_ML_APP_PREFIX):
            is_ragas_integration_span = True
+            ml_app = ml_app.replace(constants.TEMP_RAGAS_ML_APP_PREFIX, "")


I'm not a fan of this temporary set/replace/remove behavior. What is the ultimate goal of using the ragas temp prefix? Can we not set a ragas identifier onto the span's internal store object instead of relying on a temporary ml app name?

thanks for the suggestion, just added in logic to use the span's store instead

… into evan.li/ragas-ml-app

…gas-ml-app

Yun-Kim · 2025-01-17T21:43:40Z

ddtrace/llmobs/_constants.py

-RAGAS_ML_APP_PREFIX = "dd-ragas"
+# All ragas traces have this context item set so we can differentiate
+# spans generated from the ragas integration vs user application spans.
+IS_EVALUATION_SPAN = "_is_evaluation_span"


Suggested change

IS_EVALUATION_SPAN = "_is_evaluation_span"

IS_EVALUATION_SPAN = "_ml_obs.evaluation_span"

Let's use the _ml_obs prefix to make it clear this is coming from us (for future reviewers/users)

Yun-Kim · 2025-01-17T21:45:27Z

ddtrace/llmobs/_llmobs.py

+        is_evaluation_span = _is_evaluation_span(span)
+        span._set_ctx_item(IS_EVALUATION_SPAN, is_evaluation_span)


Do we need this check in this function scope?

Yun-Kim · 2025-01-17T21:46:49Z

ddtrace/llmobs/_llmobs.py

@@ -210,9 +210,9 @@ def _llmobs_span_event(cls, span: Span) -> Tuple[Dict[str, Any], bool]:
            llmobs_span_event["session_id"] = session_id

        llmobs_span_event["tags"] = cls._llmobs_tags(
-            span, ml_app, session_id, is_ragas_integration_span=is_ragas_integration_span
+            span, ml_app, session_id, is_ragas_integration_span=is_evaluation_span


Seems like we don't necessarily need to pass this information in to _llmobs_tags() or even return it from this function. Thoughts on just doing the checks individually when needed instead of passing it around?

Yun-Kim · 2025-01-17T21:47:28Z

ddtrace/llmobs/_llmobs.py

@@ -290,6 +286,10 @@ def _stop_service(self) -> None:
        except ServiceStatusError:
            log.debug("Error stopping LLMObs writers")

+        # Remove listener hooks for span events


Was this triggering any bugs?

Yun-Kim · 2025-01-17T21:47:52Z

ddtrace/llmobs/_utils.py

+    while llmobs_parent:
+        is_evaluation_span = llmobs_parent._get_ctx_item(IS_EVALUATION_SPAN)
+        if is_evaluation_span is not None:
+            return is_evaluation_span


Let's do a bool check to be defensive here

lievan added 3 commits January 14, 2025 15:03

small changes

f90caa4

ragas version parse

5bd3840

ragas ml app updates

0d8f9a6

lievan added 3 commits January 15, 2025 13:15

Merge branch 'main' of github.com:DataDog/dd-trace-py into evan.li/ra…

5f35100

…gas-ml-app

Merge branch 'main' of github.com:DataDog/dd-trace-py into evan.li/ra…

a2580e8

…gas-ml-app

clarify the ml app is temp

ea038a9

lievan added 2 commits January 16, 2025 09:34

make sure listeners are removed after all spans are flushed

d7f4630

revert acc change

e52f564

lievan commented Jan 16, 2025

View reviewed changes

lievan added the changelog/no-changelog A changelog entry is not required for this PR. label Jan 16, 2025

lievan marked this pull request as ready for review January 16, 2025 16:06

lievan requested a review from a team as a code owner January 16, 2025 16:06

lievan changed the title ~~chore(llmobs): update ragas tracing experience~~ chore(llmobs): update ragas trace ml app Jan 16, 2025

Merge branch 'main' into evan.li/ragas-ml-app

74f9b92

lievan commented Jan 17, 2025

View reviewed changes

ddtrace/llmobs/_constants.py Outdated Show resolved Hide resolved

Yun-Kim reviewed Jan 17, 2025

View reviewed changes

lievan and others added 5 commits January 17, 2025 12:30

change how we detect eval spans

3d2a446

Merge branch 'evan.li/ragas-ml-app' of github.com:DataDog/dd-trace-py…

2b11d86

… into evan.li/ragas-ml-app

Merge branch 'main' of github.com:DataDog/dd-trace-py into evan.li/ra…

fb4bb8c

…gas-ml-app

remove unneeded change

195cb4a

Merge branch 'main' into evan.li/ragas-ml-app

38e1f62

Yun-Kim reviewed Jan 17, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(llmobs): update ragas trace ml app #11952

chore(llmobs): update ragas trace ml app #11952

lievan commented Jan 15, 2025 •

edited

Loading

github-actions bot commented Jan 15, 2025 •

edited

Loading

pr-commenter bot commented Jan 16, 2025 •

edited

Loading

lievan Jan 16, 2025 •

edited

Loading

Yun-Kim Jan 17, 2025

Yun-Kim Jan 17, 2025

lievan Jan 17, 2025

Yun-Kim Jan 17, 2025

Yun-Kim Jan 17, 2025

Yun-Kim Jan 17, 2025

Yun-Kim Jan 17, 2025

Yun-Kim Jan 17, 2025

	IS_EVALUATION_SPAN = "_is_evaluation_span"
	IS_EVALUATION_SPAN = "_ml_obs.evaluation_span"

		is_evaluation_span = _is_evaluation_span(span)
		span._set_ctx_item(IS_EVALUATION_SPAN, is_evaluation_span)

chore(llmobs): update ragas trace ml app #11952

Are you sure you want to change the base?

chore(llmobs): update ragas trace ml app #11952

Conversation

lievan commented Jan 15, 2025 • edited Loading

Refresher:

Previous Implementation:

Now:

Checklist

Reviewer Checklist

github-actions bot commented Jan 15, 2025 • edited Loading

pr-commenter bot commented Jan 16, 2025 • edited Loading

Benchmarks

lievan Jan 16, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lievan commented Jan 15, 2025 •

edited

Loading

github-actions bot commented Jan 15, 2025 •

edited

Loading

pr-commenter bot commented Jan 16, 2025 •

edited

Loading

lievan Jan 16, 2025 •

edited

Loading