[Investigate App] [AI RCA] [META] Evaluating AI performance for RCA #205670

dominiqueclarke · 2025-01-07T03:30:50Z

Relates to #200064

This issue tracks work related to evaluating the performance of LLM based implementations for root cause analysis within the investigate app.

Tasks

Give feedback

[Investigate App] [AI RCA] [META] Evaluating AI performance for RCA #205670

0 of 3

Team:obs-ux-management
[Investigate app] RCA Evaluation tool - Test suites #205669

Team:obs-ux-management
[LMM powered RCA] RCA Evaluation tool - Test run history #205668

Team:obs-ux-management
Options

elasticmachine · 2025-01-07T03:31:17Z

Pinging @elastic/obs-ux-management-team (Team:obs-ux-management)

…ysis integration (#204634) ## Summary Extends the Observability AI Assistant's evaluation framework to create the first set of tests aimed at evaluating the performance of the Investigation App's AI root cause analysis integration. To execute tests, please consult the [README](https://github.com/elastic/kibana/pull/204634/files#diff-4823a154e593051126d3d5822c88d72e89d07f41b8c07a5a69d18281c50b09adR1). Note the prerequisites and the Kibana & Elasticsearch configuration. Further evolution -- This PR is the first MVP of the evaluation framework. A (somewhat light) [meta issue](#205670) exists for our continued work on this project, and will be added to over time. Test data and fixture architecture -- Logs, metrics, and traces are indexed to [edge-rca](https://studious-disco-k66oojq.pages.github.io/edge-rca/). Observability engineers can [create an oblt-cli cluster](https://studious-disco-k66oojq.pages.github.io/user-guide/cluster-create-ccs/) configured for cross cluster search against edge-rca as the remote cluster. When creating new testing fixtures, engineers will utilize their oblt-cli cluster to create rules against the remote cluster data. Once alerts are triggered in a failure scenario, the engineer can choose to archive the alert data to utilize as a test fixture. Test fixtures are added to the `investigate_app/scripts/load/fixtures` directory for use in tests. When execute tests, the fixtures are loaded into the engineer's oblt-cli cluster, configured for cross cluster search against edge-rca. The local alert fixture and the remote demo data are utilized together to replay root cause analysis and execute the test evaluations. Implementation -- Creates a new directory `scripts`, to house scripts related to setting up and running these tests. Here's what each directory does: ## scripts/evaluate 1. Extends the evaluation script from `observability_ai_assistant_app/scripts/evaluation` by creating a [custom Kibana client](https://github.com/elastic/kibana/pull/204634/files#diff-ae05b2a20168ea08f452297fc1bd59310c69ac3ea4651da1f65cd9fa93bb8fe9R1) with RCA specific methods. The custom client is [passed to the Observability AI Assistant's `runEvaluations`](https://github.com/elastic/kibana/pull/204634/files#diff-0f2d3662c01df8fbe7d1f19704fa071cbd6232fb5f732b313e8ba99012925d0bR14) script an[d invoked instead of the default Kibana Client](https://github.com/elastic/kibana/pull/204634/files#diff-98509a357e86ea5c5931b1b46abc72f76e5304439430358eee845f9ad57f63f1R54). 2. Defines a single, MVP test in `index.spec.ts`. This test find a specific alert fixture designated for that test, creates an investigation for that alert with a specified time range, and calls the root cause analysis api. Once the report is received back from the api, a prompt is created for the evaluation framework with details of the report. The evaluation framework then judges how well the root cause analysis api performed against specified criteria. ## scripts/archive 1. Utilized when creating new test fixtures, this script will easily archive observability alerts data for use as a fixture in a feature test ## scripts/load 1. Loads created testing fixtures before running the test. --------- Co-authored-by: kibanamachine <[email protected]> Co-authored-by: Dario Gieselaar <[email protected]>

…ysis integration (elastic#204634) ## Summary Extends the Observability AI Assistant's evaluation framework to create the first set of tests aimed at evaluating the performance of the Investigation App's AI root cause analysis integration. To execute tests, please consult the [README](https://github.com/elastic/kibana/pull/204634/files#diff-4823a154e593051126d3d5822c88d72e89d07f41b8c07a5a69d18281c50b09adR1). Note the prerequisites and the Kibana & Elasticsearch configuration. Further evolution -- This PR is the first MVP of the evaluation framework. A (somewhat light) [meta issue](elastic#205670) exists for our continued work on this project, and will be added to over time. Test data and fixture architecture -- Logs, metrics, and traces are indexed to [edge-rca](https://studious-disco-k66oojq.pages.github.io/edge-rca/). Observability engineers can [create an oblt-cli cluster](https://studious-disco-k66oojq.pages.github.io/user-guide/cluster-create-ccs/) configured for cross cluster search against edge-rca as the remote cluster. When creating new testing fixtures, engineers will utilize their oblt-cli cluster to create rules against the remote cluster data. Once alerts are triggered in a failure scenario, the engineer can choose to archive the alert data to utilize as a test fixture. Test fixtures are added to the `investigate_app/scripts/load/fixtures` directory for use in tests. When execute tests, the fixtures are loaded into the engineer's oblt-cli cluster, configured for cross cluster search against edge-rca. The local alert fixture and the remote demo data are utilized together to replay root cause analysis and execute the test evaluations. Implementation -- Creates a new directory `scripts`, to house scripts related to setting up and running these tests. Here's what each directory does: ## scripts/evaluate 1. Extends the evaluation script from `observability_ai_assistant_app/scripts/evaluation` by creating a [custom Kibana client](https://github.com/elastic/kibana/pull/204634/files#diff-ae05b2a20168ea08f452297fc1bd59310c69ac3ea4651da1f65cd9fa93bb8fe9R1) with RCA specific methods. The custom client is [passed to the Observability AI Assistant's `runEvaluations`](https://github.com/elastic/kibana/pull/204634/files#diff-0f2d3662c01df8fbe7d1f19704fa071cbd6232fb5f732b313e8ba99012925d0bR14) script an[d invoked instead of the default Kibana Client](https://github.com/elastic/kibana/pull/204634/files#diff-98509a357e86ea5c5931b1b46abc72f76e5304439430358eee845f9ad57f63f1R54). 2. Defines a single, MVP test in `index.spec.ts`. This test find a specific alert fixture designated for that test, creates an investigation for that alert with a specified time range, and calls the root cause analysis api. Once the report is received back from the api, a prompt is created for the evaluation framework with details of the report. The evaluation framework then judges how well the root cause analysis api performed against specified criteria. ## scripts/archive 1. Utilized when creating new test fixtures, this script will easily archive observability alerts data for use as a fixture in a feature test ## scripts/load 1. Loads created testing fixtures before running the test. --------- Co-authored-by: kibanamachine <[email protected]> Co-authored-by: Dario Gieselaar <[email protected]>

dominiqueclarke added the Team:obs-ux-management Observability Management User Experience Team label Jan 7, 2025

dominiqueclarke self-assigned this Jan 7, 2025

dominiqueclarke mentioned this issue Jan 8, 2025

[Investigate App] add MVP evaluation framework for AI root cause analysis integration #204634

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Investigate App] [AI RCA] [META] Evaluating AI performance for RCA #205670

[Investigate App] [AI RCA] [META] Evaluating AI performance for RCA #205670

dominiqueclarke commented Jan 7, 2025 •

edited

Loading

Tasks

elasticmachine commented Jan 7, 2025

[Investigate App] [AI RCA] [META] Evaluating AI performance for RCA #205670

[Investigate App] [AI RCA] [META] Evaluating AI performance for RCA #205670

Comments

dominiqueclarke commented Jan 7, 2025 • edited Loading

Tasks

elasticmachine commented Jan 7, 2025

dominiqueclarke commented Jan 7, 2025 •

edited

Loading