[Proposal] Set event.original during input processing #33421

taylor-swanson · 2022-10-20T15:30:06Z

Summary

This issue/proposal is an extension of #33347 and is looking to address the problem at a broader scale. Currently, setting event.original is solely up to the module or integration. While this provides the greatest flexibility, it also means that behavior is not consistent across all integrations/modules, especially when errors occur. It also means that whenever a new integration is developed, the developer will need to implement event.original handling, and ensure it gets saved during errors, etc. Furthermore, depending on the processors used, event.original may have be set on the beats side and not the ingest pipeline.

This proposal suggests having the beat/input set event.original during input processing. Per ECS docs, event.original is meant to represent the raw text message of the event:

Raw text message of entire event. Used to demonstrate log integrity or where the full log message (before splitting it up in multiple parts) may be required, e.g. for reindex.

Depending on the input, event.original might have to be set before or after input processing. Cases where it may have to be after would be inputs that use parsers (i.e., multiline), while ones that need it before would be doing things like splitting the raw message into separate fields. I was originally hoping that beats could do it "automatically" after the input runs, but I don't think it would be possible. It would be up to each individual input to implement this behavior.

Benefits

event.original is set for "free", integration developers will no longer be responsible for setting it
With the exception of if an input fails, event.original will be set and available at all stages of processing, improving troubleshooting
Saving the raw text message will also provide a level of integrity, showing the original message before we've done major processing (for example, certain integrations currently save event.original after syslog headers have been stripped)

Potential Issues

Doubling the amount of data sent from the edge. In additional to the data in existing fields, essentially a copy of it (or more) will be sent with the document. This could be worked around by respecting keep_original_event set in most integrations, and not saving event.original if that is false. We'd lose that field for troubleshooting, though.
Potential change in behavior in existing pipelines. We would have to inspect each pipeline to ensure that we are not changing any expected behaviors (i.e., rename processors may break if event.original is already set)

What's Next

Feel free to comment below with feedback, especially if there are any issues with this approach. My intention is to solve this problem without having to implement the same behavior across nearly 200 integrations, and certainly more so as we scale beyond that number. Also, while I'm somewhat familiar with how inputs work, it would be nice to have someone who's more knowledgeable about this chime in, especially how it relates to the v2 architecture. Additionally, I'm writing this from the perspective of the SEI team, but if anyone in Observability (or anywhere else) has thoughts, feel free to chime in as well!

The text was updated successfully, but these errors were encountered:

botelastic · 2022-10-20T15:30:11Z

This issue doesn't have a Team:<team> label.

botelastic · 2023-10-20T16:04:14Z

Hi!
We just realized that we haven't looked into this issue in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1.
Thank you for your contribution!

taylor-swanson added the enhancement label Oct 20, 2022

botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Oct 20, 2022

botelastic bot added the Stalled label Oct 20, 2023

botelastic bot closed this as completed Apr 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Proposal] Set event.original during input processing #33421

[Proposal] Set event.original during input processing #33421

taylor-swanson commented Oct 20, 2022

botelastic bot commented Oct 20, 2022

botelastic bot commented Oct 20, 2023

[Proposal] Set event.original during input processing #33421

[Proposal] Set event.original during input processing #33421

Comments

taylor-swanson commented Oct 20, 2022

Summary

Benefits

Potential Issues

What's Next

botelastic bot commented Oct 20, 2022

botelastic bot commented Oct 20, 2023