Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Observability AI Assistant] [Root Cause Analysis] Create an evaluation tool for LLM root cause analysis results #200064

Open
dominiqueclarke opened this issue Nov 13, 2024 · 3 comments
Assignees
Labels
Team:Obs AI Assistant Observability AI Assistant Team:obs-ux-management Observability Management User Experience Team

Comments

@dominiqueclarke
Copy link
Contributor

dominiqueclarke commented Nov 13, 2024

As we begin to evaluate LLM assisted root cause analysis, we need a way to be able to evaluate the validity and usefulness of the results.

Historically, our process for evaluating these results has been quite manual. We've

  1. Create a failure scenario where the root cause is known
  2. Launched the LLM analysis while manually collecting all trace logs from the inference plugin
  3. Manually scanned the output for any mention of our known root cause

As we begin to test more scenarios against an LLM assisted root cause analysis, we'd like a more automated and sensible way to evaluate the results.

We'd like to evaluate both controlled (known architecture, known root cause) and uncontrolled (unknown architecture, unknown root cause) scenarios. Here's some of what we envision.

  1. Ability to generate a diagnostic file containing LLM output and available plugin logs.
    • Ideally, this diagnostic file would be able to be downloaded via the UI, perhaps as part of the exiting observability inspect framework, or something else. This would enable our partners (SREs, customers) to gather information we can use to help improve the product without hassle.
  2. Ability to specify what we believe the root cause to be. This could be provided by us (for controlled tests) or by our partners (customers, SREs, for uncontrolled tests).
    • Ideally, this could be specified at the time the diagnostic file is downloaded from the UI. That way, we could download a full diagnostic file that includes both the actual LLM responses and the specified expected root cause.
  3. Ability to evaluate how well we believe the output matches the expected root cause.
    We are aware of this framework for evaluating various scenarios with the LLM integration, but it's unclear if this can be adapted to fit our use case. We are intrigued by the idea of utilizing AI to help use evaluate the output of the investigation, for example providing the LLM with the final report and asking it how well does this final report reflect the known root cause. We'd be interested in a system that allows us to provide the given diagnostic file and output how well it believes the LLM performed.
@dominiqueclarke dominiqueclarke added Team:Obs AI Assistant Observability AI Assistant Team:obs-ux-management Observability Management User Experience Team labels Nov 13, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/obs-ai-assistant (Team:Obs AI Assistant)

@elasticmachine
Copy link
Contributor

Pinging @elastic/obs-ux-management-team (Team:obs-ux-management)

@emma-raffenne
Copy link
Contributor

hi @dominiqueclarke

I'd suggest to have a look at the evaluation framework implemented by @almudenasanz the Obs AI Assistant team is using https://github.com/elastic/kibana/tree/main/x-pack/plugins/observability_solution/observability_ai_assistant_app/scripts/evaluation

It may make sense to use it as a base and extend it for your needs. Let us know what you think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Obs AI Assistant Observability AI Assistant Team:obs-ux-management Observability Management User Experience Team
Projects
None yet
Development

No branches or pull requests

3 participants