Skip to content

Commit

Permalink
[SIEM migrations] Implement ES|QL lookups and other fixes (elastic#20…
Browse files Browse the repository at this point in the history
…4960)

## Summary

Adds support for ES|QL native LOOKUP JOIN operators for Splunk lookups. 

- Lookups import changes:
- Stores the lookups files as indices using `lookup_<lookup_name>`
pattern (queries fail if the name contains `-`)
- Indexes the lookups content data without duplicates (supports csv and
json/ndjson)
- Stores the lookup index name as the resource content that is passed to
the translation agent
- Fixes bug with `_lookup` suffix in the names coming from Splunk:
queries use the `_lookup` suffix, but files in the. lookup editor don't
have it)

- Lookups translation changes:
- Prompt for the `inline_query` node updated to support lookups,
replacing the splunk lookup name with the new Elastic lookup index name.
Placeholders for missing macros/lookups are now added in this node
instead of the `translate_query` node.
- Prompt for ES|QL translation updated to convert LOOKUP syntax and
ignore macro/lookups placeholders
  
- Other improvements on the agent graph:
- All rule migration nodes in the graph now generate a "summary"
explaining the reasoning behind each decision of the LLM, they are
displayed in the comments section of each rule translation.
- The inline query node was moved inside the translation sub-graph since
it's only needed there.
- Validation now is executed without placeholders, preventing it from
running all the iterations without being able to fix it.
- A deterministic node was added at the end to set the translation
result and ensure minimum defaults are met.
- Avoid inline_query LLM calls when a prebuilt rule matched or when the
Splunk query is unsupported
- Avoid prebuilt_rule matching LLM calls when no prebuilt rule is
retrieved from the semantic search.
- Avoid integration matching LLM calls when no integration is retrieved
from the semantic search.

- Other fixes
- Fixes bug which was setting translation `FULL` when we missed the
integration and index pattern (logs-*). Changed to `PARTIAL`
- Fixes bug where the description was missing for custom translated
rules, we now fallback to the splunk rule title if the description is
missing
  - Added summary comment for prebuilt rule matching

### Screenshots

#### New summary comments:

##### Prebuilt rule matching:

- matching
![prebuilt
matching](https://github.com/user-attachments/assets/63c86cd9-f06d-4664-89db-2fa36bdff838)

- not matching
![prebuilt not
matching](https://github.com/user-attachments/assets/3bd6bf7b-0564-416b-9b16-700b346dd95e)

##### Query inlining summary:

![Inlining
summary](https://github.com/user-attachments/assets/6bf88e61-e269-4d4b-a01f-1a009c622982)

##### Integration matching:

- matching:
![integration
matching](https://github.com/user-attachments/assets/a77e01d9-3a2e-4629-a575-905b6995d55d)

- not matching
![integration no
match](https://github.com/user-attachments/assets/ce21b0e4-e3a3-4e2c-b6d2-2114f8a7f146)

##### ES|QL translation


![translation](https://github.com/user-attachments/assets/d0dd0879-c9ce-44f3-aa44-e3b724cd5898)

Needs manual translation reason:

![unsupported](https://github.com/user-attachments/assets/45fd73b2-5fc0-4504-99bd-e263c01c3a11)


#### Lookups UI:

![UI](https://github.com/user-attachments/assets/c7271e47-b0a5-4b31-b5cf-d99285e108bf)

Lookup index example:
![lookup
index](https://github.com/user-attachments/assets/88c275b8-96dd-4770-804b-164b3e3d4f8f)

Translation
![lookup
translation](https://github.com/user-attachments/assets/647a6003-e930-407b-aaf2-02bc1ea95de6)

#### Test data


[rules.json](https://github.com/user-attachments/files/18208912/rules.json)

[all_macros.json](https://github.com/user-attachments/files/18208914/all_macros.json)

[lookups.zip](https://github.com/user-attachments/files/18208904/lookups.zip)
(uncompress before uploading)
  • Loading branch information
semd authored and CAWilson94 committed Jan 10, 2025
1 parent c3a9e25 commit eedc917
Show file tree
Hide file tree
Showing 62 changed files with 939 additions and 581 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -61,11 +61,14 @@ export type FieldMap<T extends string = string> = Record<
// This utility type flattens all the keys of a schema object and its nested objects as a union type.
// Its purpose is to ensure that the FieldMap keys are always in sync with the schema object.
// It assumes all optional fields of the schema are required in the field map, they can always be omitted from the resulting type.
export type SchemaFieldMapKeys<
T extends Record<string, unknown>,
Key = keyof T
> = Key extends string
? NonNullable<T[Key]> extends Record<string, unknown>
// We need to use any to avoid TS errors since interfaces do not satisfy Record<string, unknown>, but they do satisfy Record<string, any>.
/* eslint-disable @typescript-eslint/no-explicit-any */
export type SchemaFieldMapKeys<T extends Record<string, any>, Key = keyof T> = Key extends string
? NonNullable<T[Key]> extends any[]
? NonNullable<T[Key]> extends Array<Record<string, any>>
? `${Key}` | `${Key}.${SchemaFieldMapKeys<NonNullable<T[Key]>[number]>}`
: `${Key}`
: NonNullable<T[Key]> extends Record<string, any>
? `${Key}` | `${Key}.${SchemaFieldMapKeys<NonNullable<T[Key]>>}`
: `${Key}`
: never;
Original file line number Diff line number Diff line change
Expand Up @@ -62,5 +62,3 @@ export const DEFAULT_TRANSLATION_FIELDS = {
to: 'now',
interval: '5m',
} as const;

export const EMPTY_RESOURCE_PLACEHOLDER = '<empty>';
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ import {
RuleMigrationResourceData,
RuleMigrationResourceType,
RuleMigrationResource,
RuleMigrationResourceBase,
} from '../../rule_migration.gen';
import { RelatedIntegration } from '../../../../api/detection_engine/model/rule_schema/common_attributes.gen';
import { NonEmptyString } from '../../../../api/model/primitives.gen';
Expand Down Expand Up @@ -147,7 +148,7 @@ export type GetRuleMigrationResourcesMissingRequestParamsInput = z.input<
export type GetRuleMigrationResourcesMissingResponse = z.infer<
typeof GetRuleMigrationResourcesMissingResponse
>;
export const GetRuleMigrationResourcesMissingResponse = z.array(RuleMigrationResourceData);
export const GetRuleMigrationResourcesMissingResponse = z.array(RuleMigrationResourceBase);

export type GetRuleMigrationStatsRequestParams = z.infer<typeof GetRuleMigrationStatsRequestParams>;
export const GetRuleMigrationStatsRequestParams = z.object({
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -583,4 +583,4 @@ paths:
type: array
description: The identified resources missing
items:
$ref: '../../rule_migration.schema.yaml#/components/schemas/RuleMigrationResourceData'
$ref: '../../rule_migration.schema.yaml#/components/schemas/RuleMigrationResourceBase'
Original file line number Diff line number Diff line change
Expand Up @@ -356,35 +356,49 @@ export const UpdateRuleMigrationData = z.object({
* The type of the rule migration resource.
*/
export type RuleMigrationResourceType = z.infer<typeof RuleMigrationResourceType>;
export const RuleMigrationResourceType = z.enum(['macro', 'list']);
export const RuleMigrationResourceType = z.enum(['macro', 'lookup']);
export type RuleMigrationResourceTypeEnum = typeof RuleMigrationResourceType.enum;
export const RuleMigrationResourceTypeEnum = RuleMigrationResourceType.enum;

/**
* The rule migration resource data provided by the vendor.
* The rule migration resource basic information.
*/
export type RuleMigrationResourceData = z.infer<typeof RuleMigrationResourceData>;
export const RuleMigrationResourceData = z.object({
export type RuleMigrationResourceBase = z.infer<typeof RuleMigrationResourceBase>;
export const RuleMigrationResourceBase = z.object({
type: RuleMigrationResourceType,
/**
* The resource name identifier.
*/
name: z.string(),
});

export type RuleMigrationResourceContent = z.infer<typeof RuleMigrationResourceContent>;
export const RuleMigrationResourceContent = z.object({
/**
* The resource content value.
* The resource content value. Can be an empty string.
*/
content: z.string().optional(),
content: z.string(),
/**
* The resource arbitrary metadata.
*/
metadata: z.object({}).optional(),
});

/**
* The rule migration resource data.
*/
export type RuleMigrationResourceData = z.infer<typeof RuleMigrationResourceData>;
export const RuleMigrationResourceData = RuleMigrationResourceBase.merge(
RuleMigrationResourceContent
);

/**
* The rule migration resource document object.
*/
export type RuleMigrationResource = z.infer<typeof RuleMigrationResource>;
export const RuleMigrationResource = RuleMigrationResourceData.merge(
export const RuleMigrationResource = RuleMigrationResourceBase.merge(
RuleMigrationResourceContent.partial()
).merge(
z.object({
/**
* The rule resource migration id
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -327,11 +327,11 @@ components:
description: The type of the rule migration resource.
enum:
- macro # Reusable part a query that can be customized and called from multiple rules
- list # A list of values that can be used inside queries reused in different rules
- lookup # A list of values that can be used inside queries as data enrichment or data source

RuleMigrationResourceData:
RuleMigrationResourceBase:
type: object
description: The rule migration resource data provided by the vendor.
description: The rule migration resource basic information.
required:
- type
- name
Expand All @@ -341,17 +341,31 @@ components:
name:
type: string
description: The resource name identifier.

RuleMigrationResourceContent:
type: object
required:
- content
properties:
content:
type: string
description: The resource content value.
description: The resource content value. Can be an empty string.
metadata:
type: object
description: The resource arbitrary metadata.

RuleMigrationResourceData:
description: The rule migration resource data.
allOf:
- $ref: '#/components/schemas/RuleMigrationResourceBase'
- $ref: '#/components/schemas/RuleMigrationResourceContent'

RuleMigrationResource:
description: The rule migration resource document object.
allOf:
- $ref: '#/components/schemas/RuleMigrationResourceData'
- $ref: '#/components/schemas/RuleMigrationResourceBase'
- $ref: '#/components/schemas/RuleMigrationResourceContent'
x-modify: partial
- type: object
required:
- id
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,9 @@ import type {
OriginalRule,
OriginalRuleVendor,
RuleMigrationResourceData,
RuleMigrationResourceBase,
} from '../../model/rule_migration.gen';
import type { ResourceIdentifiers, RuleResource } from './types';
import type { ResourceIdentifiers } from './types';
import { splResourceIdentifiers } from './splunk';

const ruleResourceIdentifiers: Record<OriginalRuleVendor, ResourceIdentifiers> = {
Expand All @@ -29,48 +30,48 @@ export class ResourceIdentifier {
this.identifiers = ruleResourceIdentifiers[vendor];
}

public fromOriginalRule(originalRule: OriginalRule): RuleResource[] {
public fromOriginalRule(originalRule: OriginalRule): RuleMigrationResourceBase[] {
return this.identifiers.fromOriginalRule(originalRule);
}

public fromResource(resource: RuleMigrationResourceData): RuleResource[] {
public fromResource(resource: RuleMigrationResourceData): RuleMigrationResourceBase[] {
return this.identifiers.fromResource(resource);
}

public fromOriginalRules(originalRules: OriginalRule[]): RuleResource[] {
const lists = new Set<string>();
public fromOriginalRules(originalRules: OriginalRule[]): RuleMigrationResourceBase[] {
const lookups = new Set<string>();
const macros = new Set<string>();
originalRules.forEach((rule) => {
const resources = this.identifiers.fromOriginalRule(rule);
resources.forEach((resource) => {
if (resource.type === 'macro') {
macros.add(resource.name);
} else if (resource.type === 'list') {
lists.add(resource.name);
} else if (resource.type === 'lookup') {
lookups.add(resource.name);
}
});
});
return [
...Array.from(macros).map<RuleResource>((name) => ({ type: 'macro', name })),
...Array.from(lists).map<RuleResource>((name) => ({ type: 'list', name })),
...Array.from(macros).map<RuleMigrationResourceBase>((name) => ({ type: 'macro', name })),
...Array.from(lookups).map<RuleMigrationResourceBase>((name) => ({ type: 'lookup', name })),
];
}

public fromResources(resources: RuleMigrationResourceData[]): RuleResource[] {
const lists = new Set<string>();
public fromResources(resources: RuleMigrationResourceData[]): RuleMigrationResourceBase[] {
const lookups = new Set<string>();
const macros = new Set<string>();
resources.forEach((resource) => {
this.identifiers.fromResource(resource).forEach((identifiedResource) => {
if (identifiedResource.type === 'macro') {
macros.add(identifiedResource.name);
} else if (identifiedResource.type === 'list') {
lists.add(identifiedResource.name);
} else if (identifiedResource.type === 'lookup') {
lookups.add(identifiedResource.name);
}
});
});
return [
...Array.from(macros).map<RuleResource>((name) => ({ type: 'macro', name })),
...Array.from(lists).map<RuleResource>((name) => ({ type: 'list', name })),
...Array.from(macros).map<RuleMigrationResourceBase>((name) => ({ type: 'macro', name })),
...Array.from(lookups).map<RuleMigrationResourceBase>((name) => ({ type: 'lookup', name })),
];
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -47,9 +47,9 @@ describe('splResourceIdentifier', () => {

const result = splResourceIdentifier(query);
expect(result).toEqual([
{ type: 'list', name: 'my_lookup_table' },
{ type: 'list', name: 'other_lookup_list' },
{ type: 'list', name: 'third_lookup' },
{ type: 'lookup', name: 'my_lookup_table' },
{ type: 'lookup', name: 'other_lookup_list' },
{ type: 'lookup', name: 'third' },
]);
});

Expand All @@ -60,9 +60,9 @@ describe('splResourceIdentifier', () => {
const result = splResourceIdentifier(query);
expect(result).toEqual([
{ type: 'macro', name: 'macro_one' },
{ type: 'list', name: 'my_lookup_table' },
{ type: 'list', name: 'other_lookup_list' },
{ type: 'list', name: 'third_lookup' },
{ type: 'lookup', name: 'my_lookup_table' },
{ type: 'lookup', name: 'other_lookup_list' },
{ type: 'lookup', name: 'third' },
]);
});

Expand All @@ -72,11 +72,11 @@ describe('splResourceIdentifier', () => {

const result = splResourceIdentifier(query);
expect(result).toEqual([
{ type: 'list', name: 'my_lookup_1' },
{ type: 'list', name: 'my_lookup_2' },
{ type: 'list', name: 'my_lookup_3' },
{ type: 'list', name: 'my_lookup_4' },
{ type: 'list', name: 'my_lookup_5' },
{ type: 'lookup', name: 'my_lookup_1' },
{ type: 'lookup', name: 'my_lookup_2' },
{ type: 'lookup', name: 'my_lookup_3' },
{ type: 'lookup', name: 'my_lookup_4' },
{ type: 'lookup', name: 'my_lookup_5' },
]);
});

Expand All @@ -96,7 +96,7 @@ describe('splResourceIdentifier', () => {
{ type: 'macro', name: 'macro_one' },
{ type: 'macro', name: 'my_lookup_table' },
{ type: 'macro', name: 'third_macro' },
{ type: 'list', name: 'real_lookup_list' },
{ type: 'lookup', name: 'real_lookup_list' },
]);
});

Expand All @@ -107,7 +107,7 @@ describe('splResourceIdentifier', () => {
const result = splResourceIdentifier(query);
expect(result).toEqual([
{ type: 'macro', name: 'macro_one' },
{ type: 'list', name: 'my_lookup_table' },
{ type: 'lookup', name: 'my_lookup_table' },
]);
});

Expand All @@ -118,7 +118,7 @@ describe('splResourceIdentifier', () => {
const result = splResourceIdentifier(query);
expect(result).toEqual([
{ type: 'macro', name: 'macro_one' },
{ type: 'list', name: 'my_lookup_table' },
{ type: 'lookup', name: 'my_lookup_table' },
]);
});

Expand All @@ -129,7 +129,7 @@ describe('splResourceIdentifier', () => {
const result = splResourceIdentifier(query);
expect(result).toEqual([
{ type: 'macro', name: 'macro_one' },
{ type: 'list', name: 'my_lookup_table' },
{ type: 'lookup', name: 'my_lookup_table' },
]);
});
});
Original file line number Diff line number Diff line change
Expand Up @@ -11,17 +11,17 @@
* Please make sure to test all regular expressions them before using them.
* At the time of writing, this tool can be used to test it: https://devina.io/redos-checker
*/
import type { RuleMigrationResourceBase } from '../../../model/rule_migration.gen';
import type { ResourceIdentifier } from '../types';

import type { ResourceIdentifier, RuleResource } from '../types';

const listRegex = /\b(?:lookup)\s+([\w-]+)\b/g; // Captures only the lookup name
const lookupRegex = /\b(?:lookup)\s+([\w-]+)\b/g; // Captures only the lookup name
const macrosRegex = /`([\w-]+)(?:\(([^`]*?)\))?`/g; // Captures only the macro name and arguments

export const splResourceIdentifier: ResourceIdentifier = (input) => {
// sanitize the query to avoid mismatching macro and list names inside comments or literal strings
// sanitize the query to avoid mismatching macro and lookup names inside comments or literal strings
const sanitizedInput = sanitizeInput(input);

const resources: RuleResource[] = [];
const resources: RuleMigrationResourceBase[] = [];
let macroMatch;
while ((macroMatch = macrosRegex.exec(sanitizedInput)) !== null) {
const macroName = macroMatch[1] as string;
Expand All @@ -31,17 +31,17 @@ export const splResourceIdentifier: ResourceIdentifier = (input) => {
resources.push({ type: 'macro', name: macroWithArgs });
}

let listMatch;
while ((listMatch = listRegex.exec(sanitizedInput)) !== null) {
resources.push({ type: 'list', name: listMatch[1] });
let lookupMatch;
while ((lookupMatch = lookupRegex.exec(sanitizedInput)) !== null) {
resources.push({ type: 'lookup', name: lookupMatch[1].replace(/_lookup$/, '') });
}

return resources;
};

// Comments should be removed before processing the query to avoid matching macro and list names inside them
// Comments should be removed before processing the query to avoid matching macro and lookup names inside them
const commentRegex = /```.*?```/g;
// Literal strings should be replaced with a placeholder to avoid matching macro and list names inside them
// Literal strings should be replaced with a placeholder to avoid matching macro and lookup names inside them
const doubleQuoteStrRegex = /".*?"/g;
const singleQuoteStrRegex = /'.*?'/g;
// lookup operator can have modifiers like local=true or update=false before the lookup name, we need to remove them
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,17 +7,13 @@

import type {
OriginalRule,
RuleMigrationResourceBase,
RuleMigrationResourceData,
RuleMigrationResourceType,
} from '../../model/rule_migration.gen';

export interface RuleResource {
type: RuleMigrationResourceType;
name: string;
}
export type ResourceIdentifier = (input: string) => RuleResource[];
export type ResourceIdentifier = (input: string) => RuleMigrationResourceBase[];

export interface ResourceIdentifiers {
fromOriginalRule: (originalRule: OriginalRule) => RuleResource[];
fromResource: (resource: RuleMigrationResourceData) => RuleResource[];
fromOriginalRule: (originalRule: OriginalRule) => RuleMigrationResourceBase[];
fromResource: (resource: RuleMigrationResourceData) => RuleMigrationResourceBase[];
}
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ import {
} from '@elastic/eui';
import { FormattedMessage } from '@kbn/i18n-react';
import type {
RuleMigrationResourceData,
RuleMigrationResourceBase,
RuleMigrationTaskStats,
} from '../../../../../common/siem_migrations/model/rule_migration.gen';
import { RulesDataInput } from './steps/rules/rules_data_input';
Expand Down Expand Up @@ -60,12 +60,12 @@ export const MigrationDataInputFlyout = React.memo<MigrationDataInputFlyoutProps
}, []);

const onMissingResourcesFetched = useCallback(
(missingResources: RuleMigrationResourceData[]) => {
(missingResources: RuleMigrationResourceBase[]) => {
const newMissingResourcesIndexed = missingResources.reduce<MissingResourcesIndexed>(
(acc, { type, name }) => {
if (type === 'macro') {
acc.macros.push(name);
} else if (type === 'list') {
} else if (type === 'lookup') {
acc.lookups.push(name);
}
return acc;
Expand Down
Loading

0 comments on commit eedc917

Please sign in to comment.