You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In read mode zipped files are currently always read completely when they already have been read fully.
This is due to a call to sincedb_collection.clear_watched_file(key) after the zip read completes.
Thus removing the file information from the sincedb entry making it impossible for logstash to recognize the file as already having been read on re-run.
The problem can be fixed by adding a call to sincedb_collection.reading_completed(key) before the clear_watched_file call; as is done in the plain text handler.
The reading completed call will set the @path_in_sincedb attribute ensuring that the entry is correctly serialized even after clearing the watched file.
This is still broken - using the file input plugin with .gz files will break sincedb.
Did you find any elegant way to work around this issue? It seems the only way to keep sincedb functionality is to uncompress the files outside of Logstash, but I would like to avoid that of course.
@jbwl I have not needed to use this in a long time so I did not search for solutions other than the tiny modification I made in #287 This one worked for me but I did not test it thoroughly.
We have been hit with this issue while trying to read gzipped log files. Every time a pipeline restarted, Logstash would reingest files that had previously been fully read. This was causing us to have duplicate documents in Elasticsearch much like a fellow Logstash user described in this discuss.elastic.co discussion.
@max-frank's fix in #287 fixed our issue. Are there any reasons this PR was never fully merged?
Description
In read mode zipped files are currently always read completely when they already have been read fully.
This is due to a call to
sincedb_collection.clear_watched_file(key)
after the zip read completes.Thus removing the file information from the sincedb entry making it impossible for logstash to recognize the file as already having been read on re-run.
The problem can be fixed by adding a call to
sincedb_collection.reading_completed(key)
before theclear_watched_file
call; as is done in the plain text handler.The reading completed call will set the
@path_in_sincedb
attribute ensuring that the entry is correctly serialized even after clearing the watched file.Information
/tmp/test.log
/tmp/test.log
/tmp/test.log.gz
is missing the pathThe text was updated successfully, but these errors were encountered: