You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue/proposal is an extension of #33347 and is looking to address the problem at a broader scale. Currently, setting event.original is solely up to the module or integration. While this provides the greatest flexibility, it also means that behavior is not consistent across all integrations/modules, especially when errors occur. It also means that whenever a new integration is developed, the developer will need to implement event.original handling, and ensure it gets saved during errors, etc. Furthermore, depending on the processors used, event.original may have be set on the beats side and not the ingest pipeline.
This proposal suggests having the beat/input set event.original during input processing. Per ECS docs, event.original is meant to represent the raw text message of the event:
Raw text message of entire event. Used to demonstrate log integrity or where the full log message (before splitting it up in multiple parts) may be required, e.g. for reindex.
Depending on the input, event.original might have to be set before or after input processing. Cases where it may have to be after would be inputs that use parsers (i.e., multiline), while ones that need it before would be doing things like splitting the raw message into separate fields. I was originally hoping that beats could do it "automatically" after the input runs, but I don't think it would be possible. It would be up to each individual input to implement this behavior.
Benefits
event.original is set for "free", integration developers will no longer be responsible for setting it
With the exception of if an input fails, event.original will be set and available at all stages of processing, improving troubleshooting
Saving the raw text message will also provide a level of integrity, showing the original message before we've done major processing (for example, certain integrations currently save event.original after syslog headers have been stripped)
Potential Issues
Doubling the amount of data sent from the edge. In additional to the data in existing fields, essentially a copy of it (or more) will be sent with the document. This could be worked around by respecting keep_original_event set in most integrations, and not saving event.original if that is false. We'd lose that field for troubleshooting, though.
Potential change in behavior in existing pipelines. We would have to inspect each pipeline to ensure that we are not changing any expected behaviors (i.e., rename processors may break if event.original is already set)
What's Next
Feel free to comment below with feedback, especially if there are any issues with this approach. My intention is to solve this problem without having to implement the same behavior across nearly 200 integrations, and certainly more so as we scale beyond that number. Also, while I'm somewhat familiar with how inputs work, it would be nice to have someone who's more knowledgeable about this chime in, especially how it relates to the v2 architecture. Additionally, I'm writing this from the perspective of the SEI team, but if anyone in Observability (or anywhere else) has thoughts, feel free to chime in as well!
The text was updated successfully, but these errors were encountered:
Hi!
We just realized that we haven't looked into this issue in a while. We're sorry!
We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1.
Thank you for your contribution!
Summary
This issue/proposal is an extension of #33347 and is looking to address the problem at a broader scale. Currently, setting
event.original
is solely up to the module or integration. While this provides the greatest flexibility, it also means that behavior is not consistent across all integrations/modules, especially when errors occur. It also means that whenever a new integration is developed, the developer will need to implement event.original handling, and ensure it gets saved during errors, etc. Furthermore, depending on the processors used, event.original may have be set on the beats side and not the ingest pipeline.This proposal suggests having the beat/input set event.original during input processing. Per ECS docs, event.original is meant to represent the raw text message of the event:
Depending on the input,
event.original
might have to be set before or after input processing. Cases where it may have to be after would be inputs that use parsers (i.e., multiline), while ones that need it before would be doing things like splitting the raw message into separate fields. I was originally hoping that beats could do it "automatically" after the input runs, but I don't think it would be possible. It would be up to each individual input to implement this behavior.Benefits
event.original
is set for "free", integration developers will no longer be responsible for setting itPotential Issues
keep_original_event
set in most integrations, and not saving event.original if that is false. We'd lose that field for troubleshooting, though.What's Next
Feel free to comment below with feedback, especially if there are any issues with this approach. My intention is to solve this problem without having to implement the same behavior across nearly 200 integrations, and certainly more so as we scale beyond that number. Also, while I'm somewhat familiar with how inputs work, it would be nice to have someone who's more knowledgeable about this chime in, especially how it relates to the v2 architecture. Additionally, I'm writing this from the perspective of the SEI team, but if anyone in Observability (or anywhere else) has thoughts, feel free to chime in as well!
The text was updated successfully, but these errors were encountered: