This repository has been archived by the owner on Jul 19, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 75
Skips large blocks of events during import from CloudWatch? #74
Comments
@danielmcquillen look this issue: #46 |
What I've seen is that in a scenario the e.g. you may get two streams worth of messages from logs, and they might not get interleaved completely
The result being that the high water mark might be set to 12:06 before we query the second stream's data. I'm working on a PR for this, where the plugin tries to maintain the high-water mark at the level of the streams The flow being:
|
I spotted some issues with my last PR when I was in a really high volume scenario, so I've rejected it and suggested a different approach : #96 |
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
I'm using this (excellent, thanks @lukewaite ) plugin to move a filtered subset of events from one CloudWatch log stream with a years worth of Open edX data into ES for analysis.
About 30k or so should make it through the filter and into ES. I'm using the integer seconds value for
start_position
last March 1, 2018 (35102038). Every day in the log has at least ten or so events that should make it through the filter.Everything works fine up until what seems like a random point, where Logstash suddenly jumps to a future date, skipping a month or two of data.
I've tried deleting the index, deleting .since_db and re-running the import, but each time the plugin somehow skips a large block of time somewhere (not the same place) between the start position and current time.
I log out the time of each event that made it through the filter, so on my last run I saw something like:
Had anyone else experienced this issue? Thanks for any thoughts...
The text was updated successfully, but these errors were encountered: