-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Default timestamp_format to be compatible with zero padding and non zero padding #870
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #870 +/- ##
==========================================
+ Coverage 57.58% 62.44% +4.85%
==========================================
Files 370 339 -31
Lines 17548 17130 -418
==========================================
+ Hits 10105 10696 +591
+ Misses 6848 5882 -966
+ Partials 595 552 -43
☔ View full report in Codecov by Sentry. |
translator/translate/logs/logs_collected/files/collect_list/ruleTimestampFormat.go
Show resolved
Hide resolved
translator/translate/logs/logs_collected/files/collect_list/collect_list_test.go
Outdated
Show resolved
Hide resolved
translator/translate/logs/logs_collected/files/collect_list/collect_list_test.go
Outdated
Show resolved
Hide resolved
translator/translate/logs/logs_collected/files/collect_list/ruleTimestampFormat_test.go
Outdated
Show resolved
Hide resolved
translator/translate/logs/logs_collected/files/collect_list/ruleTimestampFormat_test.go
Outdated
Show resolved
Hide resolved
translator/translate/logs/logs_collected/files/collect_list/ruleTimestampFormat_test.go
Outdated
Show resolved
Hide resolved
translator/translate/logs/logs_collected/files/collect_list/ruleTimestampFormat_test.go
Outdated
Show resolved
Hide resolved
translator/tocwconfig/sampleConfig/logs_and_kubernetes_config.conf
Outdated
Show resolved
Hide resolved
translator/translate/logs/logs_collected/files/collect_list/collect_list_test.go
Outdated
Show resolved
Hide resolved
translator/translate/logs/logs_collected/files/collect_list/ruleTimestampFormat.go
Outdated
Show resolved
Hide resolved
translator/translate/logs/logs_collected/files/collect_list/ruleTimestampFormat.go
Outdated
Show resolved
Hide resolved
// Check when timestamp has an interval of 5 days. | ||
// 432000 seconds = 5 days | ||
// p.lastUpdateTime -> is set in miliseconds | ||
if ((time.Now().UnixNano() / 1000000000) - (p.lastUpdateTime / 1000)) > 432000 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
100000000
is this to convert from Nano seconds to seconds? can we not just use Unix() instead of UnixNano(), is there any particular reason we are using UnixNano ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I can change this to use Unix()
to make things more concise. Good callout
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. The log timestamps that CWL accepts do not accept nanosecond precision. There's no need for us to use that level of precision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better yet, store the lastUpdateTime
as a time.Time
so you can do the check without all this conversion.
time.Since(p.lastUpdateTime) > [name_of_threshold_variable]
where [name_of_threshold_variable] is like 5 * 24 * time.Hour
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed this, to follow format as Jeffrey suggested
t = p.lastValidTime
if !p.lastUpdateTime.IsZero() {
// Check when timestamp has an interval of 5 days.
if time.Since(p.lastUpdateTime) > elaspedPeriod {
p.Log.Warnf("Unable to parse timestamp, using last valid timestamp %v: which is older than 5 days for log group %v: ", p.lastValidTime, p.Group)
}
}
// Check when timestamp has an interval of 5 days. | ||
// 432000 seconds = 5 days | ||
// p.lastUpdateTime -> is set in miliseconds | ||
if ((time.Now().UnixNano() / 1000000000) - (p.lastUpdateTime / 1000)) > 432000 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. The log timestamps that CWL accepts do not accept nanosecond precision. There's no need for us to use that level of precision.
TimezoneLoc: timezoneLoc} | ||
|
||
// make sure layout is compatible for "Sep 9", "Sep 9" , "Sep 09", "Sep 09" options | ||
logEntry := fmt.Sprintf("Sep 9 02:00:43 ip-10-4-213-132 sudo: vtayyare : TTY=pts/0 ; PWD=/home/vtayyare ; USER=root ; COMMAND=/bin/cat /var/log/secure\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Could make this a set of testcases. Especially since they're all expecting the same hour/minute value. Also, probably can be a simpler log line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will shorten the log lines and address this. Keeping test case as is for consistency in other fileconfig_test.go
@@ -22,6 +22,7 @@ import ( | |||
const ( | |||
reqSizeLimit = 1024 * 1024 | |||
reqEventsLimit = 10000 | |||
elaspedPeriod = 5 * 24 * time.Hour |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: misspelled
elaspedPeriod = 5 * 24 * time.Hour | |
elapsedPeriod = 5 * 24 * time.Hour |
but also this isn't descriptive. What is this used for? Change the name to reflect that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
5 days is pretty long to wait before emitting the log since we're using the update timestamp now. How did we decide on 5 days?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will modify this to 3 days, I based this initially of use case of cx who had issue with this regex mismatch. 5 days seemed large enough to ignore too much noise and also small enough to capture incorrect log events. I will modify to 3 days
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you all be open to a 1-day elapsed time for this log warning message? This is a hard one and I see Yared is trying to strike a balance between raising awareness and avoiding noise. IMO 1 day would be better since it is closer in time to when the impact started.
@@ -182,8 +236,8 @@ func applyRule1(t *testing.T, buf string) interface{} { | |||
return val | |||
} | |||
|
|||
// stdNumMonth // "1" //%-m | |||
// stdDay // "2" //%-d | |||
// stdNumMonth // "_1" //%-m |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be reverted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct must have forgot to remove it in the revision where I changed timestamp_layout
to be an array
reqEventsLimit = 10000 | ||
reqSizeLimit = 1024 * 1024 | ||
reqEventsLimit = 10000 | ||
logWarnInterval = 1 * 24 * time.Hour |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This should be more specific. logWarnInterval
is too generic. Is this interval for all log.Warn
messages? It's also not an interval. An interval determines how often something happens. This is more like a threshold.
if !p.lastUpdateTime.IsZero() { | ||
// Check when timestamp has an interval of 5 days. | ||
if time.Since(p.lastUpdateTime) > logWarnInterval { | ||
p.Log.Warnf("Unable to parse timestamp, using last valid timestamp %v: which is at least older by 1 day for log group %v: ", p.lastValidTime, p.Group) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
p.Log.Warnf("Unable to parse timestamp, using last valid timestamp %v: which is at least older by 1 day for log group %v: ", p.lastValidTime, p.Group) | |
p.Log.Warnf("Unable to parse timestamp, using last valid timestamp found in the logs %v: which is at least older than 1 day for log group %v: ", p.lastValidTime, p.Group) |
logWarnInterval = 1 * 24 * time.Hour | ||
reqSizeLimit = 1024 * 1024 | ||
reqEventsLimit = 10000 | ||
warnOldTimeStamp = 1 * 24 * time.Hour |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nit: Still think threshold should be somewhere in the name, but it isn't blocking.
Description of the issue
There is a regression between the old Python-based log agent and the new Go-based agent in how strict the timestamp format is. In the old Python logs agent, %d would match both zero and non-zero padded. There wasn't another option.
Although the documentation calls out %d as zero padded, the reality is that strptime will accept both.
This caused users to experience the following scenario:
When config is set with a
timestamp_format
that is inconsistent with the log's timestamp, the CloudWatch Agent would revert to last valid timestamp and push that as the log's events timestamp. This scenario involves where config could be set aswhile log file would have log timestamp formats
Apr 5 02:09:45
Description of changes
Add defaulting mechanism, so when config is set to zero padding timestamp regex and log timestamp has a non zero padded format the CloudWatch Agent will default and use a non zero padding time stamp. This was modified for the following timestamp format field options. See doc
Summary of changes
%d
and%-d
in jsontimestamp_format
both translate to"\\s{0,1}\\d{1,2}"
intimestamp_regex
for toml%m
and%-m
in jsontimestamp_format
both translate to"\\s{0,1}\\d{1,2}"
intimestamp_regex
for tomlSince the ParseInLocation function requires a
timestamp_layout
and go only supports_2
for %d and not %m,timestamp_layout
has been modified to be an array of layouts with both options%d
and%-d
both translate to _2. For exampletimestamp_format: "%b %-d %Y %H:%M:%S"
andtimestamp_format: "%b %d %Y %H:%M:%S"
both map to
%m
and%-m
in jsontimestamp_format
translate to the following arrays. Setting the month format in json config as the first option intimestamp_layout
array the alternative approach being the 2nd layout in arraytimestamp_format:
"%m %-d %H:%M:%S"
maps to the followingLicense
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
Tests
Unit tests
Manual test checks for timestamp translation
json
toml
Manual test for error Messaging Added
Requirements
Before commit the code, please do the following steps.
make fmt
andmake fmt-sh
make lint