Skip to content

Commit

Permalink
[Investigate App] add MVP evaluation framework for AI root cause anal…
Browse files Browse the repository at this point in the history
…ysis integration (#204634)

## Summary

Extends the Observability AI Assistant's evaluation framework to create
the first set of tests aimed at evaluating the performance of the
Investigation App's AI root cause analysis integration.

To execute tests, please consult the
[README](https://github.com/elastic/kibana/pull/204634/files#diff-4823a154e593051126d3d5822c88d72e89d07f41b8c07a5a69d18281c50b09adR1).
Note the prerequisites and the Kibana & Elasticsearch configuration.

Further evolution
--
This PR is the first MVP of the evaluation framework. A (somewhat light)
[meta issue](#205670) exists for
our continued work on this project, and will be added to over time.

Test data and fixture architecture
--
Logs, metrics, and traces are indexed to
[edge-rca](https://studious-disco-k66oojq.pages.github.io/edge-rca/).
Observability engineers can [create an oblt-cli
cluster](https://studious-disco-k66oojq.pages.github.io/user-guide/cluster-create-ccs/)
configured for cross cluster search against edge-rca as the remote
cluster.

When creating new testing fixtures, engineers will utilize their
oblt-cli cluster to create rules against the remote cluster data. Once
alerts are triggered in a failure scenario, the engineer can choose to
archive the alert data to utilize as a test fixture.

Test fixtures are added to the `investigate_app/scripts/load/fixtures`
directory for use in tests.

When execute tests, the fixtures are loaded into the engineer's oblt-cli
cluster, configured for cross cluster search against edge-rca. The local
alert fixture and the remote demo data are utilized together to replay
root cause analysis and execute the test evaluations.

Implementation
--

Creates a new directory `scripts`, to house scripts related to setting
up and running these tests. Here's what each directory does:
## scripts/evaluate
1. Extends the evaluation script from
`observability_ai_assistant_app/scripts/evaluation` by creating a
[custom Kibana
client](https://github.com/elastic/kibana/pull/204634/files#diff-ae05b2a20168ea08f452297fc1bd59310c69ac3ea4651da1f65cd9fa93bb8fe9R1)
with RCA specific methods. The custom client is [passed to the
Observability AI Assistant's
`runEvaluations`](https://github.com/elastic/kibana/pull/204634/files#diff-0f2d3662c01df8fbe7d1f19704fa071cbd6232fb5f732b313e8ba99012925d0bR14)
script an[d invoked instead of the default Kibana
Client](https://github.com/elastic/kibana/pull/204634/files#diff-98509a357e86ea5c5931b1b46abc72f76e5304439430358eee845f9ad57f63f1R54).
2. Defines a single, MVP test in `index.spec.ts`. This test find a
specific alert fixture designated for that test, creates an
investigation for that alert with a specified time range, and calls the
root cause analysis api. Once the report is received back from the api,
a prompt is created for the evaluation framework with details of the
report. The evaluation framework then judges how well the root cause
analysis api performed against specified criteria.
## scripts/archive
1. Utilized when creating new test fixtures, this script will easily
archive observability alerts data for use as a fixture in a feature test
## scripts/load
1. Loads created testing fixtures before running the test.

---------

Co-authored-by: kibanamachine <[email protected]>
Co-authored-by: Dario Gieselaar <[email protected]>
  • Loading branch information
3 people authored Jan 17, 2025
1 parent 61c2d18 commit 5ab8a52
Show file tree
Hide file tree
Showing 24 changed files with 33,910 additions and 41 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -137,6 +137,7 @@ src/platform/packages/**/package-map.json
/packages/kbn-synthetic-package-map/
**/.synthetics/
**/.journeys/
**/.rca/
x-pack/test/security_api_integration/plugins/audit_log/audit.log

# ignore FTR temp directory
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,20 @@ export async function getDataStreamsForEntity({
});

const dataStreams = uniq(
compact(await resolveIndexResponse.indices.flatMap((idx) => idx.data_stream))
compact([
/* Check both data streams and indices.
* The response body shape differs depending on the request. Example:
* GET _resolve/index/logs-*-default* will return data in the `data_streams` key.
* GET _resolve/index/.ds-logs-*-default* will return data in the `indices` key */
...resolveIndexResponse.indices.flatMap((idx) => {
const remoteCluster = idx.name.includes(':') ? idx.name.split(':')[0] : null;
if (remoteCluster) {
return `${remoteCluster}:${idx.data_stream}`;
}
return idx.data_stream;
}),
...resolveIndexResponse.data_streams.map((ds) => ds.name),
])
);

return {
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the Elastic License
* 2.0; you may not use this file except in compliance with the Elastic License
* 2.0.
*/

import { EcsFieldsResponse } from '@kbn/rule-registry-plugin/common';
import {
ALERT_FLAPPING_HISTORY,
ALERT_RULE_EXECUTION_TIMESTAMP,
ALERT_RULE_EXECUTION_UUID,
EVENT_ACTION,
EVENT_KIND,
} from '@kbn/rule-registry-plugin/common/technical_rule_data_field_names';
import { omit } from 'lodash';

export function sanitizeAlert(alert: EcsFieldsResponse) {
return omit(
alert,
ALERT_RULE_EXECUTION_TIMESTAMP,
'_index',
ALERT_FLAPPING_HISTORY,
EVENT_ACTION,
EVENT_KIND,
ALERT_RULE_EXECUTION_UUID,
'@timestamp'
);
}

export function getRCAContext(alert: EcsFieldsResponse, serviceName: string) {
return `The user is investigating an alert for the ${serviceName} service,
and wants to find the root cause. Here is the alert:
${JSON.stringify(sanitizeAlert(alert))}`;
}
Original file line number Diff line number Diff line change
Expand Up @@ -8,19 +8,12 @@
import { i18n } from '@kbn/i18n';
import type { RootCauseAnalysisEvent } from '@kbn/observability-ai-server/root_cause_analysis';
import { EcsFieldsResponse } from '@kbn/rule-registry-plugin/common';
import {
ALERT_FLAPPING_HISTORY,
ALERT_RULE_EXECUTION_TIMESTAMP,
ALERT_RULE_EXECUTION_UUID,
EVENT_ACTION,
EVENT_KIND,
} from '@kbn/rule-registry-plugin/common/technical_rule_data_field_names';
import { isRequestAbortedError } from '@kbn/server-route-repository-client';
import { omit } from 'lodash';
import React, { useEffect, useRef, useState } from 'react';
import { useKibana } from '../../../../hooks/use_kibana';
import { useUpdateInvestigation } from '../../../../hooks/use_update_investigation';
import { useInvestigation } from '../../contexts/investigation_context';
import { getRCAContext } from '../../../../../common/rca/llm_context';

export interface InvestigationContextualInsight {
key: string;
Expand Down Expand Up @@ -90,10 +83,7 @@ export function AssistantHypothesis() {
body: {
investigationId: investigation!.id,
connectorId,
context: `The user is investigating an alert for the ${serviceName} service,
and wants to find the root cause. Here is the alert:
${JSON.stringify(sanitizeAlert(nonNullishAlert))}`,
context: getRCAContext(nonNullishAlert, nonNullishServiceName),
rangeFrom,
rangeTo,
serviceName: nonNullishServiceName,
Expand Down Expand Up @@ -190,16 +180,3 @@ export function AssistantHypothesis() {
/>
);
}

function sanitizeAlert(alert: EcsFieldsResponse) {
return omit(
alert,
ALERT_RULE_EXECUTION_TIMESTAMP,
'_index',
ALERT_FLAPPING_HISTORY,
EVENT_ACTION,
EVENT_KIND,
ALERT_RULE_EXECUTION_UUID,
'@timestamp'
);
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Investigation RCA Evaluation Framework

## Overview

This tool is developed for our team working on the Elastic Observability platform, specifically focusing on evaluating the Investigation RCA AI Integration. It simplifies archiving data critical for evaluating the Investigation UI and it's integration with large language models (LLM).

## Setup requirements

- An Elasticsearch instance

You'll need an instance configured with cross cluster search for the [edge-rca](https://studious-disco-k66oojq.pages.github.io/edge-rca/) cluster. To create one, utilize [oblt-cli](https://studious-disco-k66oojq.pages.github.io/user-guide/cluster-create-ccs/) and select `edge-rca` as the remote cluster.

## Running archive

Run the tool using:

`$ node x-pack/solutions/observability/plugins/investigate_app/scripts/archive/index.js --kibana http://admin:[YOUR_CLUSTER_PASSWORD]@localhost:5601`

This will archive the observability alerts index to use as fixtures within the tests.

Archived data will automatically be saved at the root of the kibana project in the `.rca/archives` folder.

## Creating a test fixture

To create a test fixture, create a new folder in `x-pack/solutions/observability/plugins/investigate_app/scripts/load/fixtures` with the `data.json.gz` file and the `mappings.json` file. The fixture will now be loaded when running `$ node x-pack/solutions/observability/plugins/investigate_app/scripts/load/index.js`

### Configuration

#### Kibana and Elasticsearch

By default, the tool will look for a Kibana instance running locally (at `http://localhost:5601`, which is the default address for running Kibana in development mode). It will also attempt to read the Kibana config file for the Elasticsearch address & credentials. If you want to override these settings, use `--kibana` and `--es`. Only basic auth is supported, e.g. `--kibana http://username:password@localhost:5601`. If you want to use a specific space, use `--spaceId`

#### filePath

Use `--filePath` to specify a custom file path to store your archived data. By default, data is stored at `.rca/archives`
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the Elastic License
* 2.0; you may not use this file except in compliance with the Elastic License
* 2.0.
*/

import { spawnSync } from 'child_process';
import { run } from '@kbn/dev-cli-runner';
import yargs from 'yargs';
import { getServiceUrls } from '@kbn/observability-ai-assistant-app-plugin/scripts/evaluation/get_service_urls';
import { options } from './cli';

async function archiveAllRelevantData({ filePath, esUrl }: { filePath: string; esUrl: string }) {
spawnSync(
'node',
['scripts/es_archiver', 'save', `${filePath}/alerts`, '.internal.alerts-*', '--es-url', esUrl],
{
stdio: 'inherit',
}
);
}

function archiveData() {
yargs(process.argv.slice(2))
.command('*', 'Archive RCA data', async () => {
const argv = await options(yargs);
run(
async ({ log }) => {
const serviceUrls = await getServiceUrls({
log,
elasticsearch: argv.elasticsearch,
kibana: argv.kibana,
});
await archiveAllRelevantData({
esUrl: serviceUrls.esUrl,
filePath: argv.filePath,
});
},
{
log: {
defaultLevel: argv.logLevel as any,
},
flags: {
allowUnexpected: true,
},
}
);
})
.parse();
}

archiveData();
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the Elastic License
* 2.0; you may not use this file except in compliance with the Elastic License
* 2.0.
*/
import * as inquirer from 'inquirer';
import * as fs from 'fs';
import { Argv } from 'yargs';
import {
elasticsearchOption,
kibanaOption,
} from '@kbn/observability-ai-assistant-app-plugin/scripts/evaluation/cli';

function getISOStringWithoutMicroseconds(): string {
const now = new Date();
const isoString = now.toISOString();
return isoString.split('.')[0] + 'Z';
}

export async function options(y: Argv) {
const argv = y
.option('filePath', {
string: true as const,
describe: 'file path to store the archived data',
default: `./.rca/archives/${getISOStringWithoutMicroseconds()}`,
})
.option('kibana', kibanaOption)
.option('elasticsearch', elasticsearchOption)
.option('logLevel', {
describe: 'Log level',
default: 'info',
}).argv;

if (
fs.existsSync(`${argv.filePath}/data.json.gz`) ||
fs.existsSync(`${argv.filePath}/mappings.json`)
) {
const { confirmOverwrite } = await inquirer.prompt([
{
type: 'confirm',
name: 'confirmOverwrite',
message: `Archived data already exists at path: ${argv.filePath}. Do you want to overwrite it?`,
default: false,
},
]);

if (!confirmOverwrite) {
process.exit(1);
}
}

return argv;
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the Elastic License
* 2.0; you may not use this file except in compliance with the Elastic License
* 2.0.
*/

require('@kbn/babel-register').install();

require('./archive');
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
{
"overrides": [
{
"files": [
"**/*.spec.ts"
],
"rules": {
"@kbn/imports/require_import": [
"error",
"@kbn/ambient-ftr-types"
],
"@typescript-eslint/triple-slash-reference": "off",
"spaced-comment": "off"
}
}
]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# Investigation RCA Evaluation Framework

## Overview

This tool is developed for our team working on the Elastic Observability platform, specifically focusing on evaluating the Investigation RCA AI Integration. It simplifies scripting and evaluating various scenarios with the Large Language Model (LLM) integration.

## Setup requirements

- An Elasticsearch instance configured with cross cluster search pointing to the edge-rca cluster
- A Kibana instance
- At least one .gen-ai connector set up

## Running evaluations

### Prerequists

#### Elasticsearch instance

You'll need an instance configured with cross cluster search for the [edge-rca](https://studious-disco-k66oojq.pages.github.io/edge-rca/) cluster. To create one, utilize [oblt-cli](https://studious-disco-k66oojq.pages.github.io/user-guide/cluster-create-ccs/) and select `edge-rca` as the remote cluster.

Once your cluster is created, paste the the yml config provided in your `kibana.dev.yml` file.

#### Fixture data

To load the fixtures needed for the tests, first run:

`$ node x-pack/solutions/observability/plugins/investigate_app/scripts/load/index.js --kibana http://admin:[YOUR_CLUSTER_PASSWORD]@localhost:5601`

### Executing tests

Run the tool using:

`$ $ node x-pack/solutions/observability/plugins/observability_ai_assistant_app/scripts/evaluation/index.js --files=x-pack/solutions/observability/plugins/investigate_app/scripts/evaluate/scenarios/rca/index.spec.ts --kibana http://admin:[YOUR_CLUSTER_PASSWORD]@localhost:5601`

This will evaluate all existing scenarios, and write the evaluation results to the terminal.

### Configuration

#### Kibana and Elasticsearch

By default, the tool will look for a Kibana instance running locally (at `http://localhost:5601`, which is the default address for running Kibana in development mode). It will also attempt to read the Kibana config file for the Elasticsearch address & credentials. If you want to override these settings, use `--kibana` and `--es`. Only basic auth is supported, e.g. `--kibana http://username:password@localhost:5601`. If you want to use a specific space, use `--spaceId`

#### Connector

Use `--connectorId` to specify a `.gen-ai` or `.bedrock` connector to use. If none are given, it will prompt you to select a connector based on the ones that are available. If only a single supported connector is found, it will be used without prompting.

#### Persisting conversations

By default, completed conversations are not persisted. If you do want to persist them, for instance for reviewing purposes, set the `--persist` flag to store them. This will also generate a clickable link in the output of the evaluation that takes you to the conversation.

If you want to clear conversations on startup, use the `--clear` flag. This only works when `--persist` is enabled. If `--spaceId` is set, only conversations for the current space will be cleared.

When storing conversations, the name of the scenario is used as a title. Set the `--autoTitle` flag to have the LLM generate a title for you.
Loading

0 comments on commit 5ab8a52

Please sign in to comment.