-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add support for multiple output connectors #306
Conversation
bc4698c
to
e9cc825
Compare
Codecov ReportBase: 91.25% // Head: 91.41% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## main #306 +/- ##
==========================================
+ Coverage 91.25% 91.41% +0.15%
==========================================
Files 120 120
Lines 8235 8293 +58
==========================================
+ Hits 7515 7581 +66
+ Misses 720 712 -8
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very nice feature, but would it be possible to have multiple outputs for extra data as well?
Furthermore, it is not verified if the output name for custom data is valid. Logprep will therefore run and throw a KeyError as soon as custom data is written.
Sure... I will give it a try.👍😁 |
Thank you! |
For the selective_extractor storing extra_output to multiple outputs should yet be possible by defining different rules for different outputs or topics. we could make the same in the pseudonymizer and the pre_detector, that you are able to overwrite the the output and topic defined in the processor configuration on rule level. this would be the most useful solution in my opinion. |
If the goal is to write extra data for the same event into multiple outputs, then having multiple rules with different outputs would have the problem that events need to be processed by each rule variant. This would multiply the rule count by the number of outputs, which would multiply the processing time of that processor and make it hard to maintain the rules. |
5fb9b87
to
a8d00d7
Compare
a8d00d7
to
25d9d9a
Compare
ok... good point... then I have to dig a lot deeper keeping my fingers crossed to not get to a really ugly solution :-D |
Thank you very much! |
25d9d9a
to
76777f6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested locally with individiual selective_extractor
configs on my host and the FileInputConnector.
Incompability with console_output
The output type console_output
doesn't work with the new implementation at all as it will raise the following error:
2023-02-16 09:11:08,030 Pipeline ERROR : A critical error occurred for processor SelectiveExtractor (selective_extractor) when processing an event, processing was aborted: (AttributeError: module 'sys' has no attribute 'first_topic') (59 in ~29.0 sek)
Apparently this output connector doesn't like the topic attribute but ommiting target_topic
isn't possible which is also confusing when using file outputs.
Confusion about multi output selective extraction
According to my review comments I don't understand the actual intended test behaviour as I would assume with individual selective_extractor
the affected outputs should be different.
Suggestion for optimized dealing with multiple outputs through consecutive pipeline processors
And I guess it touches somehow @ppcad comments I don't see the possibility to create individual streams when using multi-outputs which leads to the described behaviour that for each additional output all rules will be executed again.
An easy but not perfect possibility to deal with it in my opinion would be to introduce (just like in Logstash) Meta-Fields that can be filtered. When addressing multiple or even new outputs in a multi-output aware processor a meta-field could be added {"output":"output_stream_123"}
which can be filtered in the following processors in the pipeline. This would still touch the filter of each following processor but doesn't execute the full processor behaviour.
Yes, I have also noticed that. It works if
Extra data doesn't get passed to the next processor, so it needs to be generated again for each output if the goal is to send the same extra data to different outputs. To generate it the whole event needs to be processed again. |
It is a draft... I am not ready |
@ppcad @herrfeder ... I'm ready now... please have a look |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we finished the review with at least 6 eyes gracefully, I can finish this review with minor request changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the changes!
thank you for your approve. I will merge this, if we are ready. Because it will break our configuration. |
Co-authored-by: David Lassig <[email protected]>
9537f30
to
fbb55d0
Compare
selective_extractor
,pseudonymizer
,pre_detector
to support multiple outputsTODO:
selective_extractor
,pseudonymizer
,pre_detector
against defined outputsToDo but NotNow:
pseudonymizer
andpre_detector
-> add optional overwrite of outputs in pseudonymizer and pre_detector #321selective_extractor
-> add default outputs toselective_extractor
#322pseudonymizer
,pre_detector
andselective_extractor
-> refactor code ofpseudonymizer
,pre_detector
andselective_extractor
#323