Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(llmobs): update ragas trace ml app #11952

Open
wants to merge 14 commits into
base: main
Choose a base branch
from
Open

Conversation

lievan
Copy link
Contributor

@lievan lievan commented Jan 15, 2025

Update the ml application of any ragas evaluation span to be the same as the span being evaluated.

Refresher:

  • Spans are generated by datadog for RAGAS operations
  • These spans need to be identified as RAGAS-specific to avoid an infinite eval loop and tagged with 'runner.integration:ragas' for backend purposes
  • Some of these spans are auto-instrumented through our langchain integration, meaning they can't be manually tagged

Previous Implementation:

  • Added a dd-ragas- prefix to the ML application name for RAGAS spans
  • This prefix was used to identify which spans came from RAGAS when spans are processed

Now:

  • add _dd_ragas prefix on ragas span's ml app for identification purposes, except only temporarily.
  • Just before sending these spans to the backend, remove the prefix

An alternative solution is to utilize annotation_contexts to set a tag on auto-instrumented ragas spans. This is a larger refactor that i would like to explore when we clean up the ragas evaluators (more cleanly separate tracing setup vs actual eval logic).

Checklist

  • PR author has checked that all the criteria below are met
  • The PR description includes an overview of the change
  • The PR description articulates the motivation for the change
  • The change includes tests OR the PR description describes a testing strategy
  • The PR description notes risks associated with the change, if any
  • Newly-added code is easy to change
  • The change follows the library release note guidelines
  • The change includes or references documentation updates if necessary
  • Backport labels are set (if applicable)

Reviewer Checklist

  • Reviewer has checked that all the criteria below are met
  • Title is accurate
  • All changes are related to the pull request's stated goal
  • Avoids breaking API changes
  • Testing strategy adequately addresses listed risks
  • Newly-added code is easy to change
  • Release note makes sense to a user of the library
  • If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment
  • Backport labels are set in a manner that is consistent with the release branch maintenance policy

Copy link
Contributor

github-actions bot commented Jan 15, 2025

CODEOWNERS have been resolved as:

ddtrace/llmobs/_constants.py                                            @DataDog/ml-observability
ddtrace/llmobs/_evaluators/ragas/answer_relevancy.py                    @DataDog/ml-observability
ddtrace/llmobs/_evaluators/ragas/base.py                                @DataDog/ml-observability
ddtrace/llmobs/_evaluators/ragas/context_precision.py                   @DataDog/ml-observability
ddtrace/llmobs/_evaluators/ragas/faithfulness.py                        @DataDog/ml-observability
ddtrace/llmobs/_llmobs.py                                               @DataDog/ml-observability
ddtrace/llmobs/_utils.py                                                @DataDog/ml-observability
tests/llmobs/_utils.py                                                  @DataDog/ml-observability
tests/llmobs/test_llmobs_service.py                                     @DataDog/ml-observability

@pr-commenter
Copy link

pr-commenter bot commented Jan 16, 2025

Benchmarks

Benchmark execution time: 2025-01-17 20:28:11

Comparing candidate commit 38e1f62 in PR branch evan.li/ragas-ml-app with baseline commit ef4c997 in branch main.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 394 metrics, 2 unstable metrics.

@@ -290,6 +286,10 @@ def _stop_service(self) -> None:
except ServiceStatusError:
log.debug("Error stopping LLMObs writers")

# Remove listener hooks for span events
Copy link
Contributor Author

@lievan lievan Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this makes sure hooks aren't removed prematurely while ragas is still running as triggered by self._evaluator_runner.stop()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this triggering any bugs?

@lievan lievan added the changelog/no-changelog A changelog entry is not required for this PR. label Jan 16, 2025
@lievan lievan marked this pull request as ready for review January 16, 2025 16:06
@lievan lievan requested a review from a team as a code owner January 16, 2025 16:06
@lievan lievan changed the title chore(llmobs): update ragas tracing experience chore(llmobs): update ragas trace ml app Jan 16, 2025
ddtrace/llmobs/_constants.py Outdated Show resolved Hide resolved
Comment on lines 187 to 192
span._set_ctx_item(ML_APP, ml_app)

is_ragas_integration_span = False

if ml_app.startswith(constants.RAGAS_ML_APP_PREFIX):
if ml_app.startswith(constants.TEMP_RAGAS_ML_APP_PREFIX):
is_ragas_integration_span = True
ml_app = ml_app.replace(constants.TEMP_RAGAS_ML_APP_PREFIX, "")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of this temporary set/replace/remove behavior. What is the ultimate goal of using the ragas temp prefix? Can we not set a ragas identifier onto the span's internal store object instead of relying on a temporary ml app name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the suggestion, just added in logic to use the span's store instead

RAGAS_ML_APP_PREFIX = "dd-ragas"
# All ragas traces have this context item set so we can differentiate
# spans generated from the ragas integration vs user application spans.
IS_EVALUATION_SPAN = "_is_evaluation_span"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
IS_EVALUATION_SPAN = "_is_evaluation_span"
IS_EVALUATION_SPAN = "_ml_obs.evaluation_span"

Let's use the _ml_obs prefix to make it clear this is coming from us (for future reviewers/users)

Comment on lines +190 to +191
is_evaluation_span = _is_evaluation_span(span)
span._set_ctx_item(IS_EVALUATION_SPAN, is_evaluation_span)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this check in this function scope?

@@ -210,9 +210,9 @@ def _llmobs_span_event(cls, span: Span) -> Tuple[Dict[str, Any], bool]:
llmobs_span_event["session_id"] = session_id

llmobs_span_event["tags"] = cls._llmobs_tags(
span, ml_app, session_id, is_ragas_integration_span=is_ragas_integration_span
span, ml_app, session_id, is_ragas_integration_span=is_evaluation_span
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we don't necessarily need to pass this information in to _llmobs_tags() or even return it from this function. Thoughts on just doing the checks individually when needed instead of passing it around?

@@ -290,6 +286,10 @@ def _stop_service(self) -> None:
except ServiceStatusError:
log.debug("Error stopping LLMObs writers")

# Remove listener hooks for span events
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this triggering any bugs?

while llmobs_parent:
is_evaluation_span = llmobs_parent._get_ctx_item(IS_EVALUATION_SPAN)
if is_evaluation_span is not None:
return is_evaluation_span
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do a bool check to be defensive here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog/no-changelog A changelog entry is not required for this PR.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants