Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Investigate App] [AI RCA] [META] Evaluating AI performance for RCA #205670

Open
3 tasks
Tracked by #205670
dominiqueclarke opened this issue Jan 7, 2025 · 1 comment
Open
3 tasks
Tracked by #205670
Assignees
Labels
Team:obs-ux-management Observability Management User Experience Team

Comments

@dominiqueclarke
Copy link
Contributor

dominiqueclarke commented Jan 7, 2025

Relates to #200064

This issue tracks work related to evaluating the performance of LLM based implementations for root cause analysis within the investigate app.

Tasks

Preview Give feedback
  1. 0 of 3
    Team:obs-ux-management
    dominiqueclarke
  2. Team:obs-ux-management
    dominiqueclarke mgiota
  3. Team:obs-ux-management
@dominiqueclarke dominiqueclarke added the Team:obs-ux-management Observability Management User Experience Team label Jan 7, 2025
@elasticmachine
Copy link
Contributor

Pinging @elastic/obs-ux-management-team (Team:obs-ux-management)

@dominiqueclarke dominiqueclarke self-assigned this Jan 7, 2025
dominiqueclarke added a commit that referenced this issue Jan 17, 2025
…ysis integration (#204634)

## Summary

Extends the Observability AI Assistant's evaluation framework to create
the first set of tests aimed at evaluating the performance of the
Investigation App's AI root cause analysis integration.

To execute tests, please consult the
[README](https://github.com/elastic/kibana/pull/204634/files#diff-4823a154e593051126d3d5822c88d72e89d07f41b8c07a5a69d18281c50b09adR1).
Note the prerequisites and the Kibana & Elasticsearch configuration.

Further evolution
--
This PR is the first MVP of the evaluation framework. A (somewhat light)
[meta issue](#205670) exists for
our continued work on this project, and will be added to over time.

Test data and fixture architecture
--
Logs, metrics, and traces are indexed to
[edge-rca](https://studious-disco-k66oojq.pages.github.io/edge-rca/).
Observability engineers can [create an oblt-cli
cluster](https://studious-disco-k66oojq.pages.github.io/user-guide/cluster-create-ccs/)
configured for cross cluster search against edge-rca as the remote
cluster.

When creating new testing fixtures, engineers will utilize their
oblt-cli cluster to create rules against the remote cluster data. Once
alerts are triggered in a failure scenario, the engineer can choose to
archive the alert data to utilize as a test fixture.

Test fixtures are added to the `investigate_app/scripts/load/fixtures`
directory for use in tests.

When execute tests, the fixtures are loaded into the engineer's oblt-cli
cluster, configured for cross cluster search against edge-rca. The local
alert fixture and the remote demo data are utilized together to replay
root cause analysis and execute the test evaluations.

Implementation
--

Creates a new directory `scripts`, to house scripts related to setting
up and running these tests. Here's what each directory does:
## scripts/evaluate
1. Extends the evaluation script from
`observability_ai_assistant_app/scripts/evaluation` by creating a
[custom Kibana
client](https://github.com/elastic/kibana/pull/204634/files#diff-ae05b2a20168ea08f452297fc1bd59310c69ac3ea4651da1f65cd9fa93bb8fe9R1)
with RCA specific methods. The custom client is [passed to the
Observability AI Assistant's
`runEvaluations`](https://github.com/elastic/kibana/pull/204634/files#diff-0f2d3662c01df8fbe7d1f19704fa071cbd6232fb5f732b313e8ba99012925d0bR14)
script an[d invoked instead of the default Kibana
Client](https://github.com/elastic/kibana/pull/204634/files#diff-98509a357e86ea5c5931b1b46abc72f76e5304439430358eee845f9ad57f63f1R54).
2. Defines a single, MVP test in `index.spec.ts`. This test find a
specific alert fixture designated for that test, creates an
investigation for that alert with a specified time range, and calls the
root cause analysis api. Once the report is received back from the api,
a prompt is created for the evaluation framework with details of the
report. The evaluation framework then judges how well the root cause
analysis api performed against specified criteria.
## scripts/archive
1. Utilized when creating new test fixtures, this script will easily
archive observability alerts data for use as a fixture in a feature test
## scripts/load
1. Loads created testing fixtures before running the test.

---------

Co-authored-by: kibanamachine <[email protected]>
Co-authored-by: Dario Gieselaar <[email protected]>
cqliu1 pushed a commit to cqliu1/kibana that referenced this issue Jan 21, 2025
…ysis integration (elastic#204634)

## Summary

Extends the Observability AI Assistant's evaluation framework to create
the first set of tests aimed at evaluating the performance of the
Investigation App's AI root cause analysis integration.

To execute tests, please consult the
[README](https://github.com/elastic/kibana/pull/204634/files#diff-4823a154e593051126d3d5822c88d72e89d07f41b8c07a5a69d18281c50b09adR1).
Note the prerequisites and the Kibana & Elasticsearch configuration.

Further evolution
--
This PR is the first MVP of the evaluation framework. A (somewhat light)
[meta issue](elastic#205670) exists for
our continued work on this project, and will be added to over time.

Test data and fixture architecture
--
Logs, metrics, and traces are indexed to
[edge-rca](https://studious-disco-k66oojq.pages.github.io/edge-rca/).
Observability engineers can [create an oblt-cli
cluster](https://studious-disco-k66oojq.pages.github.io/user-guide/cluster-create-ccs/)
configured for cross cluster search against edge-rca as the remote
cluster.

When creating new testing fixtures, engineers will utilize their
oblt-cli cluster to create rules against the remote cluster data. Once
alerts are triggered in a failure scenario, the engineer can choose to
archive the alert data to utilize as a test fixture.

Test fixtures are added to the `investigate_app/scripts/load/fixtures`
directory for use in tests.

When execute tests, the fixtures are loaded into the engineer's oblt-cli
cluster, configured for cross cluster search against edge-rca. The local
alert fixture and the remote demo data are utilized together to replay
root cause analysis and execute the test evaluations.

Implementation
--

Creates a new directory `scripts`, to house scripts related to setting
up and running these tests. Here's what each directory does:
## scripts/evaluate
1. Extends the evaluation script from
`observability_ai_assistant_app/scripts/evaluation` by creating a
[custom Kibana
client](https://github.com/elastic/kibana/pull/204634/files#diff-ae05b2a20168ea08f452297fc1bd59310c69ac3ea4651da1f65cd9fa93bb8fe9R1)
with RCA specific methods. The custom client is [passed to the
Observability AI Assistant's
`runEvaluations`](https://github.com/elastic/kibana/pull/204634/files#diff-0f2d3662c01df8fbe7d1f19704fa071cbd6232fb5f732b313e8ba99012925d0bR14)
script an[d invoked instead of the default Kibana
Client](https://github.com/elastic/kibana/pull/204634/files#diff-98509a357e86ea5c5931b1b46abc72f76e5304439430358eee845f9ad57f63f1R54).
2. Defines a single, MVP test in `index.spec.ts`. This test find a
specific alert fixture designated for that test, creates an
investigation for that alert with a specified time range, and calls the
root cause analysis api. Once the report is received back from the api,
a prompt is created for the evaluation framework with details of the
report. The evaluation framework then judges how well the root cause
analysis api performed against specified criteria.
## scripts/archive
1. Utilized when creating new test fixtures, this script will easily
archive observability alerts data for use as a fixture in a feature test
## scripts/load
1. Loads created testing fixtures before running the test.

---------

Co-authored-by: kibanamachine <[email protected]>
Co-authored-by: Dario Gieselaar <[email protected]>
viduni94 pushed a commit to viduni94/kibana that referenced this issue Jan 23, 2025
…ysis integration (elastic#204634)

## Summary

Extends the Observability AI Assistant's evaluation framework to create
the first set of tests aimed at evaluating the performance of the
Investigation App's AI root cause analysis integration.

To execute tests, please consult the
[README](https://github.com/elastic/kibana/pull/204634/files#diff-4823a154e593051126d3d5822c88d72e89d07f41b8c07a5a69d18281c50b09adR1).
Note the prerequisites and the Kibana & Elasticsearch configuration.

Further evolution
--
This PR is the first MVP of the evaluation framework. A (somewhat light)
[meta issue](elastic#205670) exists for
our continued work on this project, and will be added to over time.

Test data and fixture architecture
--
Logs, metrics, and traces are indexed to
[edge-rca](https://studious-disco-k66oojq.pages.github.io/edge-rca/).
Observability engineers can [create an oblt-cli
cluster](https://studious-disco-k66oojq.pages.github.io/user-guide/cluster-create-ccs/)
configured for cross cluster search against edge-rca as the remote
cluster.

When creating new testing fixtures, engineers will utilize their
oblt-cli cluster to create rules against the remote cluster data. Once
alerts are triggered in a failure scenario, the engineer can choose to
archive the alert data to utilize as a test fixture.

Test fixtures are added to the `investigate_app/scripts/load/fixtures`
directory for use in tests.

When execute tests, the fixtures are loaded into the engineer's oblt-cli
cluster, configured for cross cluster search against edge-rca. The local
alert fixture and the remote demo data are utilized together to replay
root cause analysis and execute the test evaluations.

Implementation
--

Creates a new directory `scripts`, to house scripts related to setting
up and running these tests. Here's what each directory does:
## scripts/evaluate
1. Extends the evaluation script from
`observability_ai_assistant_app/scripts/evaluation` by creating a
[custom Kibana
client](https://github.com/elastic/kibana/pull/204634/files#diff-ae05b2a20168ea08f452297fc1bd59310c69ac3ea4651da1f65cd9fa93bb8fe9R1)
with RCA specific methods. The custom client is [passed to the
Observability AI Assistant's
`runEvaluations`](https://github.com/elastic/kibana/pull/204634/files#diff-0f2d3662c01df8fbe7d1f19704fa071cbd6232fb5f732b313e8ba99012925d0bR14)
script an[d invoked instead of the default Kibana
Client](https://github.com/elastic/kibana/pull/204634/files#diff-98509a357e86ea5c5931b1b46abc72f76e5304439430358eee845f9ad57f63f1R54).
2. Defines a single, MVP test in `index.spec.ts`. This test find a
specific alert fixture designated for that test, creates an
investigation for that alert with a specified time range, and calls the
root cause analysis api. Once the report is received back from the api,
a prompt is created for the evaluation framework with details of the
report. The evaluation framework then judges how well the root cause
analysis api performed against specified criteria.
## scripts/archive
1. Utilized when creating new test fixtures, this script will easily
archive observability alerts data for use as a fixture in a feature test
## scripts/load
1. Loads created testing fixtures before running the test.

---------

Co-authored-by: kibanamachine <[email protected]>
Co-authored-by: Dario Gieselaar <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:obs-ux-management Observability Management User Experience Team
Projects
None yet
Development

No branches or pull requests

2 participants