Skip to content
This repository has been archived by the owner on Jul 19, 2023. It is now read-only.

Skips large blocks of events during import from CloudWatch? #74

Open
danielmcquillen opened this issue Apr 12, 2019 · 3 comments · May be fixed by #96
Open

Skips large blocks of events during import from CloudWatch? #74

danielmcquillen opened this issue Apr 12, 2019 · 3 comments · May be fixed by #96

Comments

@danielmcquillen
Copy link

I'm using this (excellent, thanks @lukewaite ) plugin to move a filtered subset of events from one CloudWatch log stream with a years worth of Open edX data into ES for analysis.

About 30k or so should make it through the filter and into ES. I'm using the integer seconds value for start_position last March 1, 2018 (35102038). Every day in the log has at least ten or so events that should make it through the filter.

Everything works fine up until what seems like a random point, where Logstash suddenly jumps to a future date, skipping a month or two of data.

I've tried deleting the index, deleting .since_db and re-running the import, but each time the plugin somehow skips a large block of time somewhere (not the same place) between the start position and current time.

I log out the time of each event that made it through the filter, so on my last run I saw something like:

(...lots of log messages ... )

Found PYSJ SP Event
Event time: 2019-02-14T19:21:40.749596+00:00
 
Found PYSJ SP Event
Event time: 2019-02-14T19:21:41.095490+00:00
 
Found PYSJ SP Event
Event time: 2019-04-11T04:59:57.224356+00:00
 
Found PYSJ SP Event
Event time: 2019-04-11T01:14:25.953122+00:00

(...lots of log messages ..)

Had anyone else experienced this issue? Thanks for any thoughts...

@Najdzionix
Copy link

@danielmcquillen look this issue: #46

@daniel-bray-sonalake
Copy link

What I've seen is that in a scenario the interleaved parameter on cloudwatch.filter_log_events can fail, causing the high water mark being set in sincedb at the group level to be wrong

e.g. you may get two streams worth of messages from logs, and they might not get interleaved completely

  • stream 1 from 26/08/2020 12:03 -> 12:06
  • stream 2 from 26/08/2020 12:05 -> 12:09

The result being that the high water mark might be set to 12:06 before we query the second stream's data.

I'm working on a PR for this, where the plugin tries to maintain the high-water mark at the level of the streams

The flow being:

on startup load the map
	there are 2 types of keys in here: 
		group > last timestamp for group
		group:stream > last timestamp for group/stream

every n seconds

	# no change here
	find all the log groups, for each group

		# this is new
		group_query_start_time = 
			check in @sincedb for the per-stream high water mark for each stream 
			pick the _earliest_ one of these (we will try to filter out duplicates later on)

			if there are no records, then chose the default start time (e.g. 0, now, now - x seconds)

		events = cloudwatch.filter_logs(group, group_query_start_time)
		for e in events:
			tss = from sincedb: get last time for event.group/stream (if not set then 0/now/now-x)
			if tss event is before the event ts we process it as before

@daniel-bray-sonalake
Copy link

I spotted some issues with my last PR when I was in a really high volume scenario, so I've rejected it and suggested a different approach : #96

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
3 participants