[Investigate App] add MVP evaluation framework for AI root cause anal…

…ysis integration (#204634) ## Summary Extends the Observability AI Assistant's evaluation framework to create the first set of tests aimed at evaluating the performance of the Investigation App's AI root cause analysis integration. To execute tests, please consult the [README](https://github.com/elastic/kibana/pull/204634/files#diff-4823a154e593051126d3d5822c88d72e89d07f41b8c07a5a69d18281c50b09adR1). Note the prerequisites and the Kibana & Elasticsearch configuration. Further evolution -- This PR is the first MVP of the evaluation framework. A (somewhat light) [meta issue](#205670) exists for our continued work on this project, and will be added to over time. Test data and fixture architecture -- Logs, metrics, and traces are indexed to [edge-rca](https://studious-disco-k66oojq.pages.github.io/edge-rca/). Observability engineers can [create an oblt-cli cluster](https://studious-disco-k66oojq.pages.github.io/user-guide/cluster-create-ccs/) configured for cross cluster search against edge-rca as the remote cluster. When creating new testing fixtures, engineers will utilize their oblt-cli cluster to create rules against the remote cluster data. Once alerts are triggered in a failure scenario, the engineer can choose to archive the alert data to utilize as a test fixture. Test fixtures are added to the `investigate_app/scripts/load/fixtures` directory for use in tests. When execute tests, the fixtures are loaded into the engineer's oblt-cli cluster, configured for cross cluster search against edge-rca. The local alert fixture and the remote demo data are utilized together to replay root cause analysis and execute the test evaluations. Implementation -- Creates a new directory `scripts`, to house scripts related to setting up and running these tests. Here's what each directory does: ## scripts/evaluate 1. Extends the evaluation script from `observability_ai_assistant_app/scripts/evaluation` by creating a [custom Kibana client](https://github.com/elastic/kibana/pull/204634/files#diff-ae05b2a20168ea08f452297fc1bd59310c69ac3ea4651da1f65cd9fa93bb8fe9R1) with RCA specific methods. The custom client is [passed to the Observability AI Assistant's `runEvaluations`](https://github.com/elastic/kibana/pull/204634/files#diff-0f2d3662c01df8fbe7d1f19704fa071cbd6232fb5f732b313e8ba99012925d0bR14) script an[d invoked instead of the default Kibana Client](https://github.com/elastic/kibana/pull/204634/files#diff-98509a357e86ea5c5931b1b46abc72f76e5304439430358eee845f9ad57f63f1R54). 2. Defines a single, MVP test in `index.spec.ts`. This test find a specific alert fixture designated for that test, creates an investigation for that alert with a specified time range, and calls the root cause analysis api. Once the report is received back from the api, a prompt is created for the evaluation framework with details of the report. The evaluation framework then judges how well the root cause analysis api performed against specified criteria. ## scripts/archive 1. Utilized when creating new test fixtures, this script will easily archive observability alerts data for use as a fixture in a feature test ## scripts/load 1. Loads created testing fixtures before running the test. --------- Co-authored-by: kibanamachine <[email protected]> Co-authored-by: Dario Gieselaar <[email protected]>
elastic · Jan 17, 2025 · 5ab8a52 · 5ab8a52
1 parent 61c2d18
commit 5ab8a52
Show file tree

Hide file tree

Showing 24 changed files with 33,910 additions and 41 deletions.
diff --git a/.gitignore b/.gitignore
@@ -137,6 +137,7 @@ src/platform/packages/**/package-map.json
 /packages/kbn-synthetic-package-map/
 **/.synthetics/
 **/.journeys/
+**/.rca/
 x-pack/test/security_api_integration/plugins/audit_log/audit.log
 
 # ignore FTR temp directory

diff --git a/x-pack/solutions/observability/packages/utils_server/entities/get_data_streams_for_entity.ts b/x-pack/solutions/observability/packages/utils_server/entities/get_data_streams_for_entity.ts
@@ -54,7 +54,20 @@ export async function getDataStreamsForEntity({
   });
 
   const dataStreams = uniq(
-    compact(await resolveIndexResponse.indices.flatMap((idx) => idx.data_stream))
+    compact([
+      /* Check both data streams and indices.
+       * The response body shape differs depending on the request. Example:
+       * GET _resolve/index/logs-*-default* will return data in the `data_streams` key.
+       * GET _resolve/index/.ds-logs-*-default* will return data in the `indices` key */
+      ...resolveIndexResponse.indices.flatMap((idx) => {
+        const remoteCluster = idx.name.includes(':') ? idx.name.split(':')[0] : null;
+        if (remoteCluster) {
+          return `${remoteCluster}:${idx.data_stream}`;
+        }
+        return idx.data_stream;
+      }),
+      ...resolveIndexResponse.data_streams.map((ds) => ds.name),
+    ])
   );
 
   return {

diff --git a/x-pack/solutions/observability/plugins/investigate_app/common/rca/llm_context.ts b/x-pack/solutions/observability/plugins/investigate_app/common/rca/llm_context.ts
@@ -0,0 +1,36 @@
+/*
+ * Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
+ * or more contributor license agreements. Licensed under the Elastic License
+ * 2.0; you may not use this file except in compliance with the Elastic License
+ * 2.0.
+ */
+
+import { EcsFieldsResponse } from '@kbn/rule-registry-plugin/common';
+import {
+  ALERT_FLAPPING_HISTORY,
+  ALERT_RULE_EXECUTION_TIMESTAMP,
+  ALERT_RULE_EXECUTION_UUID,
+  EVENT_ACTION,
+  EVENT_KIND,
+} from '@kbn/rule-registry-plugin/common/technical_rule_data_field_names';
+import { omit } from 'lodash';
+
+export function sanitizeAlert(alert: EcsFieldsResponse) {
+  return omit(
+    alert,
+    ALERT_RULE_EXECUTION_TIMESTAMP,
+    '_index',
+    ALERT_FLAPPING_HISTORY,
+    EVENT_ACTION,
+    EVENT_KIND,
+    ALERT_RULE_EXECUTION_UUID,
+    '@timestamp'
+  );
+}
+
+export function getRCAContext(alert: EcsFieldsResponse, serviceName: string) {
+  return `The user is investigating an alert for the ${serviceName} service,
+    and wants to find the root cause. Here is the alert:
+  
+    ${JSON.stringify(sanitizeAlert(alert))}`;
+}
diff --git a/...stigate_app/public/pages/details/components/assistant_hypothesis/assistant_hypothesis.tsx b/...stigate_app/public/pages/details/components/assistant_hypothesis/assistant_hypothesis.tsx
@@ -8,19 +8,12 @@
 import { i18n } from '@kbn/i18n';
 import type { RootCauseAnalysisEvent } from '@kbn/observability-ai-server/root_cause_analysis';
 import { EcsFieldsResponse } from '@kbn/rule-registry-plugin/common';
-import {
-  ALERT_FLAPPING_HISTORY,
-  ALERT_RULE_EXECUTION_TIMESTAMP,
-  ALERT_RULE_EXECUTION_UUID,
-  EVENT_ACTION,
-  EVENT_KIND,
-} from '@kbn/rule-registry-plugin/common/technical_rule_data_field_names';
 import { isRequestAbortedError } from '@kbn/server-route-repository-client';
-import { omit } from 'lodash';
 import React, { useEffect, useRef, useState } from 'react';
 import { useKibana } from '../../../../hooks/use_kibana';
 import { useUpdateInvestigation } from '../../../../hooks/use_update_investigation';
 import { useInvestigation } from '../../contexts/investigation_context';
+import { getRCAContext } from '../../../../../common/rca/llm_context';
 
 export interface InvestigationContextualInsight {
   key: string;
@@ -90,10 +83,7 @@ export function AssistantHypothesis() {
           body: {
             investigationId: investigation!.id,
             connectorId,
-            context: `The user is investigating an alert for the ${serviceName} service,
-            and wants to find the root cause. Here is the alert:
-
-            ${JSON.stringify(sanitizeAlert(nonNullishAlert))}`,
+            context: getRCAContext(nonNullishAlert, nonNullishServiceName),
             rangeFrom,
             rangeTo,
             serviceName: nonNullishServiceName,
@@ -190,16 +180,3 @@ export function AssistantHypothesis() {
     />
   );
 }
-
-function sanitizeAlert(alert: EcsFieldsResponse) {
-  return omit(
-    alert,
-    ALERT_RULE_EXECUTION_TIMESTAMP,
-    '_index',
-    ALERT_FLAPPING_HISTORY,
-    EVENT_ACTION,
-    EVENT_KIND,
-    ALERT_RULE_EXECUTION_UUID,
-    '@timestamp'
-  );
-}
diff --git a/x-pack/solutions/observability/plugins/investigate_app/scripts/archive/README.md b/x-pack/solutions/observability/plugins/investigate_app/scripts/archive/README.md
@@ -0,0 +1,35 @@
+# Investigation RCA Evaluation Framework
+
+## Overview
+
+This tool is developed for our team working on the Elastic Observability platform, specifically focusing on evaluating the Investigation RCA AI Integration. It simplifies archiving data critical for evaluating the Investigation UI and it's integration with large language models (LLM).
+
+## Setup requirements
+
+- An Elasticsearch instance
+
+You'll need an instance configured with cross cluster search for the [edge-rca](https://studious-disco-k66oojq.pages.github.io/edge-rca/) cluster. To create one, utilize [oblt-cli](https://studious-disco-k66oojq.pages.github.io/user-guide/cluster-create-ccs/) and select `edge-rca` as the remote cluster.
+
+## Running archive
+
+Run the tool using:
+
+`$ node x-pack/solutions/observability/plugins/investigate_app/scripts/archive/index.js --kibana http://admin:[YOUR_CLUSTER_PASSWORD]@localhost:5601`
+
+This will archive the observability alerts index to use as fixtures within the tests.
+
+Archived data will automatically be saved at the root of the kibana project in the `.rca/archives` folder.
+
+## Creating a test fixture
+
+To create a test fixture, create a new folder in `x-pack/solutions/observability/plugins/investigate_app/scripts/load/fixtures` with the `data.json.gz` file and the `mappings.json` file. The fixture will now be loaded when running `$ node x-pack/solutions/observability/plugins/investigate_app/scripts/load/index.js`
+
+### Configuration
+
+#### Kibana and Elasticsearch
+
+By default, the tool will look for a Kibana instance running locally (at `http://localhost:5601`, which is the default address for running Kibana in development mode). It will also attempt to read the Kibana config file for the Elasticsearch address & credentials. If you want to override these settings, use `--kibana` and `--es`. Only basic auth is supported, e.g. `--kibana http://username:password@localhost:5601`. If you want to use a specific space, use `--spaceId`
+
+#### filePath
+
+Use `--filePath` to specify a custom file path to store your archived data. By default, data is stored at `.rca/archives`
diff --git a/x-pack/solutions/observability/plugins/investigate_app/scripts/archive/archive.ts b/x-pack/solutions/observability/plugins/investigate_app/scripts/archive/archive.ts
@@ -0,0 +1,53 @@
+/*
+ * Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
+ * or more contributor license agreements. Licensed under the Elastic License
+ * 2.0; you may not use this file except in compliance with the Elastic License
+ * 2.0.
+ */
+
+import { spawnSync } from 'child_process';
+import { run } from '@kbn/dev-cli-runner';
+import yargs from 'yargs';
+import { getServiceUrls } from '@kbn/observability-ai-assistant-app-plugin/scripts/evaluation/get_service_urls';
+import { options } from './cli';
+
+async function archiveAllRelevantData({ filePath, esUrl }: { filePath: string; esUrl: string }) {
+  spawnSync(
+    'node',
+    ['scripts/es_archiver', 'save', `${filePath}/alerts`, '.internal.alerts-*', '--es-url', esUrl],
+    {
+      stdio: 'inherit',
+    }
+  );
+}
+
+function archiveData() {
+  yargs(process.argv.slice(2))
+    .command('*', 'Archive RCA data', async () => {
+      const argv = await options(yargs);
+      run(
+        async ({ log }) => {
+          const serviceUrls = await getServiceUrls({
+            log,
+            elasticsearch: argv.elasticsearch,
+            kibana: argv.kibana,
+          });
+          await archiveAllRelevantData({
+            esUrl: serviceUrls.esUrl,
+            filePath: argv.filePath,
+          });
+        },
+        {
+          log: {
+            defaultLevel: argv.logLevel as any,
+          },
+          flags: {
+            allowUnexpected: true,
+          },
+        }
+      );
+    })
+    .parse();
+}
+
+archiveData();
diff --git a/x-pack/solutions/observability/plugins/investigate_app/scripts/archive/cli.ts b/x-pack/solutions/observability/plugins/investigate_app/scripts/archive/cli.ts
@@ -0,0 +1,54 @@
+/*
+ * Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
+ * or more contributor license agreements. Licensed under the Elastic License
+ * 2.0; you may not use this file except in compliance with the Elastic License
+ * 2.0.
+ */
+import * as inquirer from 'inquirer';
+import * as fs from 'fs';
+import { Argv } from 'yargs';
+import {
+  elasticsearchOption,
+  kibanaOption,
+} from '@kbn/observability-ai-assistant-app-plugin/scripts/evaluation/cli';
+
+function getISOStringWithoutMicroseconds(): string {
+  const now = new Date();
+  const isoString = now.toISOString();
+  return isoString.split('.')[0] + 'Z';
+}
+
+export async function options(y: Argv) {
+  const argv = y
+    .option('filePath', {
+      string: true as const,
+      describe: 'file path to store the archived data',
+      default: `./.rca/archives/${getISOStringWithoutMicroseconds()}`,
+    })
+    .option('kibana', kibanaOption)
+    .option('elasticsearch', elasticsearchOption)
+    .option('logLevel', {
+      describe: 'Log level',
+      default: 'info',
+    }).argv;
+
+  if (
+    fs.existsSync(`${argv.filePath}/data.json.gz`) ||
+    fs.existsSync(`${argv.filePath}/mappings.json`)
+  ) {
+    const { confirmOverwrite } = await inquirer.prompt([
+      {
+        type: 'confirm',
+        name: 'confirmOverwrite',
+        message: `Archived data already exists at path: ${argv.filePath}. Do you want to overwrite it?`,
+        default: false,
+      },
+    ]);
+
+    if (!confirmOverwrite) {
+      process.exit(1);
+    }
+  }
+
+  return argv;
+}
diff --git a/x-pack/solutions/observability/plugins/investigate_app/scripts/archive/index.js b/x-pack/solutions/observability/plugins/investigate_app/scripts/archive/index.js
@@ -0,0 +1,10 @@
+/*
+ * Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
+ * or more contributor license agreements. Licensed under the Elastic License
+ * 2.0; you may not use this file except in compliance with the Elastic License
+ * 2.0.
+ */
+
+require('@kbn/babel-register').install();
+
+require('./archive');
diff --git a/x-pack/solutions/observability/plugins/investigate_app/scripts/evaluate/.eslintrc.json b/x-pack/solutions/observability/plugins/investigate_app/scripts/evaluate/.eslintrc.json
@@ -0,0 +1,17 @@
+{
+  "overrides": [
+    {
+      "files": [
+        "**/*.spec.ts"
+      ],
+      "rules": {
+        "@kbn/imports/require_import": [
+          "error",
+          "@kbn/ambient-ftr-types"
+        ],
+        "@typescript-eslint/triple-slash-reference": "off",
+        "spaced-comment": "off"
+      }
+    }
+  ]
+}
diff --git a/x-pack/solutions/observability/plugins/investigate_app/scripts/evaluate/README.md b/x-pack/solutions/observability/plugins/investigate_app/scripts/evaluate/README.md
@@ -0,0 +1,53 @@
+# Investigation RCA Evaluation Framework
+
+## Overview
+
+This tool is developed for our team working on the Elastic Observability platform, specifically focusing on evaluating the Investigation RCA AI Integration. It simplifies scripting and evaluating various scenarios with the Large Language Model (LLM) integration.
+
+## Setup requirements
+
+- An Elasticsearch instance configured with cross cluster search pointing to the edge-rca cluster
+- A Kibana instance
+- At least one .gen-ai connector set up
+
+## Running evaluations
+
+### Prerequists
+
+#### Elasticsearch instance
+
+You'll need an instance configured with cross cluster search for the [edge-rca](https://studious-disco-k66oojq.pages.github.io/edge-rca/) cluster. To create one, utilize [oblt-cli](https://studious-disco-k66oojq.pages.github.io/user-guide/cluster-create-ccs/) and select `edge-rca` as the remote cluster.
+
+Once your cluster is created, paste the the yml config provided in your `kibana.dev.yml` file.
+
+#### Fixture data
+
+To load the fixtures needed for the tests, first run:
+
+`$ node x-pack/solutions/observability/plugins/investigate_app/scripts/load/index.js --kibana http://admin:[YOUR_CLUSTER_PASSWORD]@localhost:5601`
+
+### Executing tests
+
+Run the tool using:
+
+`$ $ node x-pack/solutions/observability/plugins/observability_ai_assistant_app/scripts/evaluation/index.js --files=x-pack/solutions/observability/plugins/investigate_app/scripts/evaluate/scenarios/rca/index.spec.ts --kibana http://admin:[YOUR_CLUSTER_PASSWORD]@localhost:5601`
+
+This will evaluate all existing scenarios, and write the evaluation results to the terminal.
+
+### Configuration
+
+#### Kibana and Elasticsearch
+
+By default, the tool will look for a Kibana instance running locally (at `http://localhost:5601`, which is the default address for running Kibana in development mode). It will also attempt to read the Kibana config file for the Elasticsearch address & credentials. If you want to override these settings, use `--kibana` and `--es`. Only basic auth is supported, e.g. `--kibana http://username:password@localhost:5601`. If you want to use a specific space, use `--spaceId`
+
+#### Connector
+
+Use `--connectorId` to specify a `.gen-ai` or `.bedrock` connector to use. If none are given, it will prompt you to select a connector based on the ones that are available. If only a single supported connector is found, it will be used without prompting.
+
+#### Persisting conversations
+
+By default, completed conversations are not persisted. If you do want to persist them, for instance for reviewing purposes, set the `--persist` flag to store them. This will also generate a clickable link in the output of the evaluation that takes you to the conversation.
+
+If you want to clear conversations on startup, use the `--clear` flag. This only works when `--persist` is enabled. If `--spaceId` is set, only conversations for the current space will be cleared.
+
+When storing conversations, the name of the scenario is used as a title. Set the `--autoTitle` flag to have the LLM generate a title for you.