Exclude hidden files from logstream regex discovery #448
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This resolves a problem where Singer attempts to initialize logstreams for watermark files when the regex in the logstream configuration isn't left-bounded. For example if we have a regex:
.*test.*
and in the directory we have the following files:Singer will try to initialize logstreams for both files, resulting in a third watermark that tracks the watermark file:
In some cases when filenames are long, the processor won't be able to persist progress onto the unwanted watermark files due to filename too long exceptions.
Test Plan:
Added unit tests and tested in dev environment, as well as in kubernetes since its bound to happen more often there