kubernetes_logs source permits arbitrarily large lines due to interaction of auto_partial_merge and max_line_bytes #22581
Labels
type: feature
A value-adding code addition that introduce new functionality.
A note for the community
Use Cases
When using vector to collect logs using a kubernetes_logs source downstream of eg crio, which splits loglines, setting auto_partial_merge to true results in max_line_bytes being essentially ignored.
Attempted Solutions
Consider the following scenarios:
lines split by crio into 2.5 MiB chunks
auto_partial_merge = N/A (due to Vector stopping before reaching the continuation character)
result: lines greater than 1 MiB always dropped, including all lines that were split by crio
lines split by crio into 2.5 MiB chunks
auto_partial_merge = true
result: no lines ever dropped (due to max_line_bytes being applied before merging -> all partial lines are automatically below 3 MiB limit since they're split into 2.5 MiB chunks)
Proposal
It would be nice if it were possible to specify an additional configuration limit for line size to be applied after merging to protect downstream pipeline/consumers from huge lines. That would allow simultaneously benefitting from the auto_partial_merge feature without allowing arbitrarily large lines into the pipeline.
Another option would be to change the behavior of max_line_bytes when auto_partial_merge is set to true, but it might be better for backcompat reasons to avoid changing the behavior of an existing config field.
References
No response
Version
0.45.0
The text was updated successfully, but these errors were encountered: