Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] Set event.original during input processing #33421

Closed
taylor-swanson opened this issue Oct 20, 2022 · 2 comments
Closed

[Proposal] Set event.original during input processing #33421

taylor-swanson opened this issue Oct 20, 2022 · 2 comments
Labels
enhancement needs_team Indicates that the issue/PR needs a Team:* label Stalled

Comments

@taylor-swanson
Copy link
Contributor

Summary

This issue/proposal is an extension of #33347 and is looking to address the problem at a broader scale. Currently, setting event.original is solely up to the module or integration. While this provides the greatest flexibility, it also means that behavior is not consistent across all integrations/modules, especially when errors occur. It also means that whenever a new integration is developed, the developer will need to implement event.original handling, and ensure it gets saved during errors, etc. Furthermore, depending on the processors used, event.original may have be set on the beats side and not the ingest pipeline.

This proposal suggests having the beat/input set event.original during input processing. Per ECS docs, event.original is meant to represent the raw text message of the event:

Raw text message of entire event. Used to demonstrate log integrity or where the full log message (before splitting it up in multiple parts) may be required, e.g. for reindex.

Depending on the input, event.original might have to be set before or after input processing. Cases where it may have to be after would be inputs that use parsers (i.e., multiline), while ones that need it before would be doing things like splitting the raw message into separate fields. I was originally hoping that beats could do it "automatically" after the input runs, but I don't think it would be possible. It would be up to each individual input to implement this behavior.

Benefits

  • event.original is set for "free", integration developers will no longer be responsible for setting it
  • With the exception of if an input fails, event.original will be set and available at all stages of processing, improving troubleshooting
  • Saving the raw text message will also provide a level of integrity, showing the original message before we've done major processing (for example, certain integrations currently save event.original after syslog headers have been stripped)

Potential Issues

  • Doubling the amount of data sent from the edge. In additional to the data in existing fields, essentially a copy of it (or more) will be sent with the document. This could be worked around by respecting keep_original_event set in most integrations, and not saving event.original if that is false. We'd lose that field for troubleshooting, though.
  • Potential change in behavior in existing pipelines. We would have to inspect each pipeline to ensure that we are not changing any expected behaviors (i.e., rename processors may break if event.original is already set)

What's Next

Feel free to comment below with feedback, especially if there are any issues with this approach. My intention is to solve this problem without having to implement the same behavior across nearly 200 integrations, and certainly more so as we scale beyond that number. Also, while I'm somewhat familiar with how inputs work, it would be nice to have someone who's more knowledgeable about this chime in, especially how it relates to the v2 architecture. Additionally, I'm writing this from the perspective of the SEI team, but if anyone in Observability (or anywhere else) has thoughts, feel free to chime in as well!

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Oct 20, 2022
@botelastic
Copy link

botelastic bot commented Oct 20, 2022

This issue doesn't have a Team:<team> label.

@botelastic
Copy link

botelastic bot commented Oct 20, 2023

Hi!
We just realized that we haven't looked into this issue in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1.
Thank you for your contribution!

@botelastic botelastic bot added the Stalled label Oct 20, 2023
@botelastic botelastic bot closed this as completed Apr 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement needs_team Indicates that the issue/PR needs a Team:* label Stalled
Projects
None yet
Development

No branches or pull requests

1 participant