add support for multiple output connectors #306

ekneg54 · 2023-01-31T21:55:45Z

Add support for multiple output connectors
changes rule language of selective_extractor, pseudonymizer, pre_detector to support multiple outputs

TODO:

validation of output_names in selective_extractor, pseudonymizer, pre_detector against defined outputs
add storing extra_data with multiple outputs
add documentation for callback behavior if two default outputs are defined

ToDo but NotNow:

add optional overwrite of outputs in pseudonymizer and pre_detector -> add optional overwrite of outputs in pseudonymizer and pre_detector #321
add optional default outputs to selective_extractor -> add default outputs to selective_extractor #322
refactor and harmonize code of pseudonymizer, pre_detector and selective_extractor -> refactor code of pseudonymizer, pre_detector and selective_extractor #323

codecov-commenter · 2023-02-11T19:35:14Z

Codecov Report

Base: 91.25% // Head: 91.41% // Increases project coverage by +0.15% 🎉

Coverage data is based on head (3f5cac1) compared to base (a224468).
Patch coverage: 98.72% of modified lines in pull request are covered.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #306      +/-   ##
==========================================
+ Coverage   91.25%   91.41%   +0.15%     
==========================================
  Files         120      120              
  Lines        8235     8293      +58     
==========================================
+ Hits         7515     7581      +66     
+ Misses        720      712       -8

Impacted Files	Coverage Δ
logprep/util/auto_rule_tester/auto_rule_tester.py	`86.65% <50.00%> (-0.24%)`	⬇️
logprep/processor/pre_detector/processor.py	`94.00% <85.71%> (ø)`
logprep/abc/output.py	`100.00% <100.00%> (ø)`
logprep/abc/processor.py	`100.00% <100.00%> (ø)`
logprep/framework/pipeline.py	`95.04% <100.00%> (+0.25%)`	⬆️
logprep/processor/clusterer/processor.py	`93.54% <100.00%> (ø)`
logprep/processor/pseudonymizer/processor.py	`98.19% <100.00%> (ø)`
logprep/processor/selective_extractor/processor.py	`100.00% <100.00%> (ø)`
logprep/processor/selective_extractor/rule.py	`100.00% <100.00%> (ø)`
logprep/util/configuration.py	`97.65% <100.00%> (+0.38%)`	⬆️
... and 2 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

ppcad

This is a very nice feature, but would it be possible to have multiple outputs for extra data as well?
Furthermore, it is not verified if the output name for custom data is valid. Logprep will therefore run and throw a KeyError as soon as custom data is written.

ekneg54 · 2023-02-14T18:42:09Z

Sure... I will give it a try.👍😁

ppcad · 2023-02-15T06:23:42Z

Thank you!

ekneg54 · 2023-02-15T09:34:39Z

For the selective_extractor storing extra_output to multiple outputs should yet be possible by defining different rules for different outputs or topics.

we could make the same in the pseudonymizer and the pre_detector, that you are able to overwrite the the output and topic defined in the processor configuration on rule level.

this would be the most useful solution in my opinion.

ppcad · 2023-02-15T10:08:51Z

If the goal is to write extra data for the same event into multiple outputs, then having multiple rules with different outputs would have the problem that events need to be processed by each rule variant. This would multiply the rule count by the number of outputs, which would multiply the processing time of that processor and make it hard to maintain the rules.
Depending on the processor it might not work at all, since it could also change the event by a random value (hash/uuid) or it can't overwrite etc.

ekneg54 · 2023-02-15T10:26:31Z

ok... good point... then I have to dig a lot deeper keeping my fingers crossed to not get to a really ugly solution :-D

ppcad · 2023-02-15T10:38:30Z

Thank you very much!

herrfeder

I tested locally with individiual selective_extractor configs on my host and the FileInputConnector.

Incompability with console_output

The output type console_output doesn't work with the new implementation at all as it will raise the following error:

2023-02-16 09:11:08,030 Pipeline ERROR   : A critical error occurred for processor SelectiveExtractor (selective_extractor) when processing an event, processing was aborted: (AttributeError: module 'sys' has no attribute 'first_topic') (59 in ~29.0 sek)

Apparently this output connector doesn't like the topic attribute but ommiting target_topic isn't possible which is also confusing when using file outputs.

Confusion about multi output selective extraction

According to my review comments I don't understand the actual intended test behaviour as I would assume with individual selective_extractor the affected outputs should be different.

Suggestion for optimized dealing with multiple outputs through consecutive pipeline processors

And I guess it touches somehow @ppcad comments I don't see the possibility to create individual streams when using multi-outputs which leads to the described behaviour that for each additional output all rules will be executed again.
An easy but not perfect possibility to deal with it in my opinion would be to introduce (just like in Logstash) Meta-Fields that can be filtered. When addressing multiple or even new outputs in a multi-output aware processor a meta-field could be added {"output":"output_stream_123"} which can be filtered in the following processors in the pipeline. This would still touch the filter of each following processor but doesn't execute the full processor behaviour.

tests/acceptance/test_multiple_outputs.py

ppcad · 2023-02-16T08:38:29Z

I tested locally with individiual selective_extractor configs on my host and the FileInputConnector.

Incompability with console_output

The output type console_output doesn't work with the new implementation at all as it will raise the following error:
2023-02-16 09:11:08,030 Pipeline ERROR   : A critical error occurred for processor SelectiveExtractor (selective_extractor) when processing an event, processing was aborted: (AttributeError: module 'sys' has no attribute 'first_topic') (59 in ~29.0 sek)
Apparently this output connector doesn't like the topic attribute but ommiting target_topic isn't possible which is also confusing when using file outputs.

Yes, I have also noticed that. It works if target_topic is set to stdout.

Confusion about multi output selective extraction

According to my review comments I don't understand the actual intended test behaviour as I would assume with individual selective_extractor the affected outputs should be different.

Suggestion for optimized dealing with multiple outputs through consecutive pipeline processors

And I guess it touches somehow @ppcad comments I don't see the possibility to create individual streams when using multi-outputs which leads to the described behaviour that for each additional output all rules will be executed again. An easy but not perfect possibility to deal with it in my opinion would be to introduce (just like in Logstash) Meta-Fields that can be filtered. When addressing multiple or even new outputs in a multi-output aware processor a meta-field could be added {"output":"output_stream_123"} which can be filtered in the following processors in the pipeline. This would still touch the filter of each following processor but doesn't execute the full processor behaviour.

Extra data doesn't get passed to the next processor, so it needs to be generated again for each output if the goal is to send the same extra data to different outputs. To generate it the whole event needs to be processed again.

ekneg54 · 2023-02-16T08:40:16Z

It is a draft... I am not ready

ekneg54 · 2023-02-22T08:23:12Z

@ppcad @herrfeder ... I'm ready now... please have a look

herrfeder

As we finished the review with at least 6 eyes gracefully, I can finish this review with minor request changes.

logprep/util/auto_rule_tester/auto_rule_tester.py

logprep/util/configuration.py

ppcad

Thank you for the changes!

ekneg54 · 2023-03-06T14:49:36Z

thank you for your approve. I will merge this, if we are ready. Because it will break our configuration.

Co-authored-by: David Lassig <[email protected]>

ekneg54 self-assigned this Jan 31, 2023

ekneg54 linked an issue Jan 31, 2023 that may be closed by this pull request

Add support for multiple input and output connectors #178

Closed

ekneg54 changed the title ~~set output as dict in pipeline~~ add support for multiple output connectors Jan 31, 2023

ekneg54 force-pushed the dev-add-multiple-output-processors branch from bc4698c to e9cc825 Compare February 11, 2023 15:18

ekneg54 requested review from ppcad and herrfeder February 12, 2023 15:16

ekneg54 marked this pull request as ready for review February 12, 2023 15:18

ppcad requested changes Feb 14, 2023

View reviewed changes

ekneg54 force-pushed the dev-add-multiple-output-processors branch from 5fb9b87 to a8d00d7 Compare February 15, 2023 10:14

ekneg54 marked this pull request as draft February 15, 2023 10:15

ekneg54 force-pushed the dev-add-multiple-output-processors branch from a8d00d7 to 25d9d9a Compare February 15, 2023 10:18

ekneg54 mentioned this pull request Feb 15, 2023

Add pseudonymization of list elements and allow regex of list elements in auto-tests #284

Merged

ekneg54 force-pushed the dev-add-multiple-output-processors branch from 25d9d9a to 76777f6 Compare February 15, 2023 14:13

herrfeder requested changes Feb 16, 2023

View reviewed changes

tests/acceptance/test_multiple_outputs.py Show resolved Hide resolved

tests/acceptance/test_multiple_outputs.py Show resolved Hide resolved

ekneg54 marked this pull request as ready for review February 22, 2023 08:22

ekneg54 requested review from ppcad and herrfeder February 22, 2023 08:23

herrfeder requested changes Feb 22, 2023

View reviewed changes

logprep/util/auto_rule_tester/auto_rule_tester.py Outdated Show resolved Hide resolved

logprep/util/configuration.py Outdated Show resolved Hide resolved

herrfeder approved these changes Feb 22, 2023

View reviewed changes

ppcad approved these changes Mar 6, 2023

View reviewed changes

ekneg54 and others added 26 commits March 7, 2023 00:12

set output as dict in pipeline

9f03965

fix tests for pipeline

17d8b07

reimplemented selective_extractor

3d7bac4

fix output creation

80b49b6

update processors and fix tests

7d7e6c5

update changelog

40cff60

fix tests for auto_rule_tester and factory

e806cb5

add acceptance test fot multiple outputs

b20986d

fix configuration test

531dbe3

refactor pipeline to use stop method

189e91c

WIP

089dfc5

add multiple outputs

a6cdd9e

fix factory tests

f15a811

fix unit tests

4856e16

Revise acceptance tests

f345ebb

filter https warnings

4744d95

add config validation for outputs in processors

24e1c36

use dummy_output to fix acceptance tests

5617c42

filter threadexception warnings

d815569

Add rules outputs validation

d34eb37

fix variable name

4035f1c

fix merge conflict artifacts

2c319c2

add pipeline test call multiple outputs

44cc6fa

Apply suggestions from code review

01c6b33

Co-authored-by: David Lassig <[email protected]>

add documentation for multiple outputs

05385c1

fix tests

fbb55d0

ekneg54 force-pushed the dev-add-multiple-output-processors branch from 9537f30 to fbb55d0 Compare March 7, 2023 17:12

ekneg54 merged commit 49303b8 into main Mar 7, 2023

ekneg54 deleted the dev-add-multiple-output-processors branch March 10, 2023 17:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add support for multiple output connectors #306

add support for multiple output connectors #306

ekneg54 commented Jan 31, 2023 •

edited

Loading

codecov-commenter commented Feb 11, 2023 •

edited

Loading

ppcad left a comment

ekneg54 commented Feb 14, 2023

ppcad commented Feb 15, 2023

ekneg54 commented Feb 15, 2023

ppcad commented Feb 15, 2023 •

edited

Loading

ekneg54 commented Feb 15, 2023

ppcad commented Feb 15, 2023

herrfeder left a comment

ppcad commented Feb 16, 2023

Incompability with console_output

Confusion about multi output selective extraction

Suggestion for optimized dealing with multiple outputs through consecutive pipeline processors

ekneg54 commented Feb 16, 2023

ekneg54 commented Feb 22, 2023

herrfeder left a comment

ppcad left a comment

ekneg54 commented Mar 6, 2023

add support for multiple output connectors #306

add support for multiple output connectors #306

Conversation

ekneg54 commented Jan 31, 2023 • edited Loading

codecov-commenter commented Feb 11, 2023 • edited Loading

Codecov Report

ppcad left a comment

Choose a reason for hiding this comment

ekneg54 commented Feb 14, 2023

ppcad commented Feb 15, 2023

ekneg54 commented Feb 15, 2023

ppcad commented Feb 15, 2023 • edited Loading

ekneg54 commented Feb 15, 2023

ppcad commented Feb 15, 2023

herrfeder left a comment

Choose a reason for hiding this comment

Incompability with console_output

Confusion about multi output selective extraction

Suggestion for optimized dealing with multiple outputs through consecutive pipeline processors

ppcad commented Feb 16, 2023

Incompability with console_output

Confusion about multi output selective extraction

Suggestion for optimized dealing with multiple outputs through consecutive pipeline processors

ekneg54 commented Feb 16, 2023

ekneg54 commented Feb 22, 2023

herrfeder left a comment

Choose a reason for hiding this comment

ppcad left a comment

Choose a reason for hiding this comment

ekneg54 commented Mar 6, 2023

ekneg54 commented Jan 31, 2023 •

edited

Loading

codecov-commenter commented Feb 11, 2023 •

edited

Loading

ppcad commented Feb 15, 2023 •

edited

Loading