-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Investigate App] [AI RCA] [META] Evaluating AI performance for RCA #205670
Open
3 tasks
Tracked by
#205670
Labels
Team:obs-ux-management
Observability Management User Experience Team
Comments
dominiqueclarke
added
the
Team:obs-ux-management
Observability Management User Experience Team
label
Jan 7, 2025
Pinging @elastic/obs-ux-management-team (Team:obs-ux-management) |
dominiqueclarke
added a commit
that referenced
this issue
Jan 17, 2025
…ysis integration (#204634) ## Summary Extends the Observability AI Assistant's evaluation framework to create the first set of tests aimed at evaluating the performance of the Investigation App's AI root cause analysis integration. To execute tests, please consult the [README](https://github.com/elastic/kibana/pull/204634/files#diff-4823a154e593051126d3d5822c88d72e89d07f41b8c07a5a69d18281c50b09adR1). Note the prerequisites and the Kibana & Elasticsearch configuration. Further evolution -- This PR is the first MVP of the evaluation framework. A (somewhat light) [meta issue](#205670) exists for our continued work on this project, and will be added to over time. Test data and fixture architecture -- Logs, metrics, and traces are indexed to [edge-rca](https://studious-disco-k66oojq.pages.github.io/edge-rca/). Observability engineers can [create an oblt-cli cluster](https://studious-disco-k66oojq.pages.github.io/user-guide/cluster-create-ccs/) configured for cross cluster search against edge-rca as the remote cluster. When creating new testing fixtures, engineers will utilize their oblt-cli cluster to create rules against the remote cluster data. Once alerts are triggered in a failure scenario, the engineer can choose to archive the alert data to utilize as a test fixture. Test fixtures are added to the `investigate_app/scripts/load/fixtures` directory for use in tests. When execute tests, the fixtures are loaded into the engineer's oblt-cli cluster, configured for cross cluster search against edge-rca. The local alert fixture and the remote demo data are utilized together to replay root cause analysis and execute the test evaluations. Implementation -- Creates a new directory `scripts`, to house scripts related to setting up and running these tests. Here's what each directory does: ## scripts/evaluate 1. Extends the evaluation script from `observability_ai_assistant_app/scripts/evaluation` by creating a [custom Kibana client](https://github.com/elastic/kibana/pull/204634/files#diff-ae05b2a20168ea08f452297fc1bd59310c69ac3ea4651da1f65cd9fa93bb8fe9R1) with RCA specific methods. The custom client is [passed to the Observability AI Assistant's `runEvaluations`](https://github.com/elastic/kibana/pull/204634/files#diff-0f2d3662c01df8fbe7d1f19704fa071cbd6232fb5f732b313e8ba99012925d0bR14) script an[d invoked instead of the default Kibana Client](https://github.com/elastic/kibana/pull/204634/files#diff-98509a357e86ea5c5931b1b46abc72f76e5304439430358eee845f9ad57f63f1R54). 2. Defines a single, MVP test in `index.spec.ts`. This test find a specific alert fixture designated for that test, creates an investigation for that alert with a specified time range, and calls the root cause analysis api. Once the report is received back from the api, a prompt is created for the evaluation framework with details of the report. The evaluation framework then judges how well the root cause analysis api performed against specified criteria. ## scripts/archive 1. Utilized when creating new test fixtures, this script will easily archive observability alerts data for use as a fixture in a feature test ## scripts/load 1. Loads created testing fixtures before running the test. --------- Co-authored-by: kibanamachine <[email protected]> Co-authored-by: Dario Gieselaar <[email protected]>
cqliu1
pushed a commit
to cqliu1/kibana
that referenced
this issue
Jan 21, 2025
…ysis integration (elastic#204634) ## Summary Extends the Observability AI Assistant's evaluation framework to create the first set of tests aimed at evaluating the performance of the Investigation App's AI root cause analysis integration. To execute tests, please consult the [README](https://github.com/elastic/kibana/pull/204634/files#diff-4823a154e593051126d3d5822c88d72e89d07f41b8c07a5a69d18281c50b09adR1). Note the prerequisites and the Kibana & Elasticsearch configuration. Further evolution -- This PR is the first MVP of the evaluation framework. A (somewhat light) [meta issue](elastic#205670) exists for our continued work on this project, and will be added to over time. Test data and fixture architecture -- Logs, metrics, and traces are indexed to [edge-rca](https://studious-disco-k66oojq.pages.github.io/edge-rca/). Observability engineers can [create an oblt-cli cluster](https://studious-disco-k66oojq.pages.github.io/user-guide/cluster-create-ccs/) configured for cross cluster search against edge-rca as the remote cluster. When creating new testing fixtures, engineers will utilize their oblt-cli cluster to create rules against the remote cluster data. Once alerts are triggered in a failure scenario, the engineer can choose to archive the alert data to utilize as a test fixture. Test fixtures are added to the `investigate_app/scripts/load/fixtures` directory for use in tests. When execute tests, the fixtures are loaded into the engineer's oblt-cli cluster, configured for cross cluster search against edge-rca. The local alert fixture and the remote demo data are utilized together to replay root cause analysis and execute the test evaluations. Implementation -- Creates a new directory `scripts`, to house scripts related to setting up and running these tests. Here's what each directory does: ## scripts/evaluate 1. Extends the evaluation script from `observability_ai_assistant_app/scripts/evaluation` by creating a [custom Kibana client](https://github.com/elastic/kibana/pull/204634/files#diff-ae05b2a20168ea08f452297fc1bd59310c69ac3ea4651da1f65cd9fa93bb8fe9R1) with RCA specific methods. The custom client is [passed to the Observability AI Assistant's `runEvaluations`](https://github.com/elastic/kibana/pull/204634/files#diff-0f2d3662c01df8fbe7d1f19704fa071cbd6232fb5f732b313e8ba99012925d0bR14) script an[d invoked instead of the default Kibana Client](https://github.com/elastic/kibana/pull/204634/files#diff-98509a357e86ea5c5931b1b46abc72f76e5304439430358eee845f9ad57f63f1R54). 2. Defines a single, MVP test in `index.spec.ts`. This test find a specific alert fixture designated for that test, creates an investigation for that alert with a specified time range, and calls the root cause analysis api. Once the report is received back from the api, a prompt is created for the evaluation framework with details of the report. The evaluation framework then judges how well the root cause analysis api performed against specified criteria. ## scripts/archive 1. Utilized when creating new test fixtures, this script will easily archive observability alerts data for use as a fixture in a feature test ## scripts/load 1. Loads created testing fixtures before running the test. --------- Co-authored-by: kibanamachine <[email protected]> Co-authored-by: Dario Gieselaar <[email protected]>
viduni94
pushed a commit
to viduni94/kibana
that referenced
this issue
Jan 23, 2025
…ysis integration (elastic#204634) ## Summary Extends the Observability AI Assistant's evaluation framework to create the first set of tests aimed at evaluating the performance of the Investigation App's AI root cause analysis integration. To execute tests, please consult the [README](https://github.com/elastic/kibana/pull/204634/files#diff-4823a154e593051126d3d5822c88d72e89d07f41b8c07a5a69d18281c50b09adR1). Note the prerequisites and the Kibana & Elasticsearch configuration. Further evolution -- This PR is the first MVP of the evaluation framework. A (somewhat light) [meta issue](elastic#205670) exists for our continued work on this project, and will be added to over time. Test data and fixture architecture -- Logs, metrics, and traces are indexed to [edge-rca](https://studious-disco-k66oojq.pages.github.io/edge-rca/). Observability engineers can [create an oblt-cli cluster](https://studious-disco-k66oojq.pages.github.io/user-guide/cluster-create-ccs/) configured for cross cluster search against edge-rca as the remote cluster. When creating new testing fixtures, engineers will utilize their oblt-cli cluster to create rules against the remote cluster data. Once alerts are triggered in a failure scenario, the engineer can choose to archive the alert data to utilize as a test fixture. Test fixtures are added to the `investigate_app/scripts/load/fixtures` directory for use in tests. When execute tests, the fixtures are loaded into the engineer's oblt-cli cluster, configured for cross cluster search against edge-rca. The local alert fixture and the remote demo data are utilized together to replay root cause analysis and execute the test evaluations. Implementation -- Creates a new directory `scripts`, to house scripts related to setting up and running these tests. Here's what each directory does: ## scripts/evaluate 1. Extends the evaluation script from `observability_ai_assistant_app/scripts/evaluation` by creating a [custom Kibana client](https://github.com/elastic/kibana/pull/204634/files#diff-ae05b2a20168ea08f452297fc1bd59310c69ac3ea4651da1f65cd9fa93bb8fe9R1) with RCA specific methods. The custom client is [passed to the Observability AI Assistant's `runEvaluations`](https://github.com/elastic/kibana/pull/204634/files#diff-0f2d3662c01df8fbe7d1f19704fa071cbd6232fb5f732b313e8ba99012925d0bR14) script an[d invoked instead of the default Kibana Client](https://github.com/elastic/kibana/pull/204634/files#diff-98509a357e86ea5c5931b1b46abc72f76e5304439430358eee845f9ad57f63f1R54). 2. Defines a single, MVP test in `index.spec.ts`. This test find a specific alert fixture designated for that test, creates an investigation for that alert with a specified time range, and calls the root cause analysis api. Once the report is received back from the api, a prompt is created for the evaluation framework with details of the report. The evaluation framework then judges how well the root cause analysis api performed against specified criteria. ## scripts/archive 1. Utilized when creating new test fixtures, this script will easily archive observability alerts data for use as a fixture in a feature test ## scripts/load 1. Loads created testing fixtures before running the test. --------- Co-authored-by: kibanamachine <[email protected]> Co-authored-by: Dario Gieselaar <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Relates to #200064
This issue tracks work related to evaluating the performance of LLM based implementations for root cause analysis within the investigate app.
Tasks
The text was updated successfully, but these errors were encountered: