Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default timestamp_format to be compatible with zero padding and non zero padding #870

Merged
merged 18 commits into from
Oct 16, 2023

Conversation

ymtaye
Copy link
Contributor

@ymtaye ymtaye commented Sep 28, 2023

Description of the issue

There is a regression between the old Python-based log agent and the new Go-based agent in how strict the timestamp format is. In the old Python logs agent, %d would match both zero and non-zero padded. There wasn't another option.
Although the documentation calls out %d as zero padded, the reality is that strptime will accept both.

This caused users to experience the following scenario:

When config is set with a timestamp_format that is inconsistent with the log's timestamp, the CloudWatch Agent would revert to last valid timestamp and push that as the log's events timestamp. This scenario involves where config could be set as

"timestamp_format": "%b %d %H:%M:%S"

while log file would have log timestamp formats
Apr 5 02:09:45

Description of changes

Add defaulting mechanism, so when config is set to zero padding timestamp regex and log timestamp has a non zero padded format the CloudWatch Agent will default and use a non zero padding time stamp. This was modified for the following timestamp format field options. See doc

%m
Month as a zero-padded decimal number

%-m
Month as a decimal number (not zero-padded)

%d
Day of the month as a zero-padded decimal number

%-d
Day of the month as a decimal number (not zero-padded)

Summary of changes

%d and %-d in json timestamp_format both translate to "\\s{0,1}\\d{1,2}" in timestamp_regex for toml
%m and %-m in json timestamp_format both translate to "\\s{0,1}\\d{1,2}" in timestamp_regex for toml

Since the ParseInLocation function requires a timestamp_layout and go only supports _2 for %d and not %m, timestamp_layout has been modified to be an array of layouts with both options

%d and %-d both translate to _2. For example
timestamp_format: "%b %-d %Y %H:%M:%S" and timestamp_format: "%b %d %Y %H:%M:%S"
both map to

timestamp_regex : "(\\w{3} \\s{0,1}\\d{1,2} \\d{4} \\d{2}:\\d{2}:\\d{2})"

%m and %-m in json timestamp_format translate to the following arrays. Setting the month format in json config as the first option in timestamp_layout array the alternative approach being the 2nd layout in array
timestamp_format: "%m %-d %H:%M:%S" maps to the following

timestamp_layout : ["1 _2 15:04:05", "01 _2 15:04:05"]

License

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Tests

Unit tests

--- PASS: TestTimestampRegexRule (0.00s)
    --- PASS: TestTimestampRegexRule/WithInvalidTimestampFormat (0.00s)
    --- PASS: TestTimestampRegexRule/WithNonZeroPaddedOptions (0.00s)
    --- PASS: TestTimestampRegexRule/WithZeroPaddedOptions (0.00s)
    --- PASS: TestTimestampRegexRule/WithZeroPaddedMonthWord (0.00s)
    --- PASS: TestTimestampRegexRule/WithNonZeroPaddedMonthWord (0.00s)
    --- PASS: TestTimestampRegexRule/WithYearAsTwoDigits (0.00s)
    --- PASS: TestTimestampRegexRule/WithYearAsFourDigits (0.00s)
    --- PASS: TestTimestampRegexRule/WithNoTimestampFormat (0.00s)
=== RUN   TestTimestampLayoutxRule
=== RUN   TestTimestampLayoutxRule/WithYearAsFourDigits
=== RUN   TestTimestampLayoutxRule/WithNoTimestampFormat
=== RUN   TestTimestampLayoutxRule/WithInvalidTimestampFormat
=== RUN   TestTimestampLayoutxRule/WithNonZeroPaddedOptions
=== RUN   TestTimestampLayoutxRule/WithZeroPaddedOptions
=== RUN   TestTimestampLayoutxRule/WithZeroPaddedMonthWord
=== RUN   TestTimestampLayoutxRule/WithNonZeroPaddedMonthWord
=== RUN   TestTimestampLayoutxRule/WithYearAsTwoDigits
--- PASS: TestTimestampLayoutxRule (0.00s)
    --- PASS: TestTimestampLayoutxRule/WithYearAsFourDigits (0.00s)
    --- PASS: TestTimestampLayoutxRule/WithNoTimestampFormat (0.00s)
    --- PASS: TestTimestampLayoutxRule/WithInvalidTimestampFormat (0.00s)
    --- PASS: TestTimestampLayoutxRule/WithNonZeroPaddedOptions (0.00s)
    --- PASS: TestTimestampLayoutxRule/WithZeroPaddedOptions (0.00s)
    --- PASS: TestTimestampLayoutxRule/WithZeroPaddedMonthWord (0.00s)
    --- PASS: TestTimestampLayoutxRule/WithNonZeroPaddedMonthWord (0.00s)
    --- PASS: TestTimestampLayoutxRule/WithYearAsTwoDigits (0.00s)
PASS
ok  	github.com/aws/amazon-cloudwatch-agent/translator/translate/logs/logs_collected/files/collect_list	0.019s

Manual test checks for timestamp translation

json

 "logs": {
        "logs_collected": {
            "files": {
                "collect_list": [
                    {
                        "file_path": "/opt/aws/amazon-cloudwatch-agent/logs/amazon-cloudwatch-agent.log",
                        "log_group_name": "amazon-cloudwatch-agent.log"
                    },
                    {
                        "file_path": "/opt/aws/amazon-cloudwatch-agent/logs/testing.log",
                        "log_group_name": "testing.log",
                        "log_stream_name": "testing.log",
                        "timestamp_format": "%m %d %H:%M:%S"
                    }
                ]
            }
        }
    }

toml

[inputs]

  [[inputs.cpu]]
    fieldpass = ["usage_idle", "usage_iowait", "usage_user", "usage_system"]
    interval = "60s"
    percpu = true
    totalcpu = false

  [[inputs.logfile]]
    destination = "cloudwatchlogs"
    file_state_folder = "/opt/aws/amazon-cloudwatch-agent/logs/state"

    [[inputs.logfile.file_config]]
      file_path = "/opt/aws/amazon-cloudwatch-agent/logs/amazon-cloudwatch-agent.log"
      from_beginning = true
      log_group_name = "amazon-cloudwatch-agent.log"
      pipe = false
      retention_in_days = -1

    [[inputs.logfile.file_config]]
      file_path = "/opt/aws/amazon-cloudwatch-agent/logs/testing.log"
      from_beginning = true
      log_group_name = "testing_ip-172-31-7-172"
      log_stream_name = "testing2.log"
      pipe = false
      retention_in_days = -1
      timestamp_layout = ["01 _2 15:04:05", "1 _2 15:04:05"]
      timestamp_regex = "(\\d{1,2} \\s{0,1}\\d{1,2} \\d{2}:\\d{2}:\\d{2})"

Manual test for error Messaging Added

[ec2-user@ip-172-31-7-172 ~]$ cat /opt/aws/amazon-cloudwatch-agent/logs/amazon-cloudwatch-agent.log | grep -m 10 "Unable to parse valid timestamp testing.log"
2023-09-29T16:06:16Z D! [outputs.cloudwatchlogs] Unable to parse valid timestamp testing.log:
2023-09-29T16:06:16Z D! [outputs.cloudwatchlogs] Unable to parse valid timestamp testing.log:
2023-09-29T16:06:16Z D! [outputs.cloudwatchlogs] Unable to parse valid timestamp testing.log:
2023-09-29T16:06:16Z D! [outputs.cloudwatchlogs] Unable to parse valid timestamp testing.log:
2023-09-29T16:06:16Z D! [outputs.cloudwatchlogs] Unable to parse valid timestamp testing.log:
2023-09-29T16:06:16Z D! [outputs.cloudwatchlogs] Unable to parse valid timestamp testing.log:
2023-09-29T16:06:16Z D! [outputs.cloudwatchlogs] Unable to parse valid timestamp testing.log:
2023-09-29T16:06:16Z D! [outputs.cloudwatchlogs] Unable to parse valid timestamp testing.log:
2023-09-29T16:06:16Z D! [outputs.cloudwatchlogs] Unable to parse valid timestamp testing.log:
2023-09-29T16:06:16Z D! [outputs.cloudwatchlogs] Unable to parse valid timestamp testing.log:
[ec2-user@ip-172-31-7-172 ~]$ cat /opt/aws/amazon-cloudwatch-agent/logs/testing.log | grep "Oct 3 15:15:12" -A 5 -B 5
 ip-10-4-213-132 sudo: vtayyare : TTY=pts/0 ; PWD=/home/vtayyare ; USER=root ; COMMAND=/bin/cat /var/log/secure
10 3 15:15:11  instance-id: i-02fce21a425a2efb3: Testing for %-d and %-m
Oct 3 15:15:11  ip-10-4-213-132 sudo: vtayyare : TTY=pts/0 ; PWD=/home/vtayyare ; USER=root ; COMMAND=/bin/cat /var/log/secure
 ip-10-4-213-132 sudo: vtayyare : TTY=pts/0 ; PWD=/home/vtayyare ; USER=root ; COMMAND=/bin/cat /var/log/secure
10 3 15:15:12  instance-id: i-02fce21a425a2efb3: Testing for %-d and %-m
Oct 3 15:15:12  ip-10-4-213-132 sudo: vtayyare : TTY=pts/0 ; PWD=/home/vtayyare ; USER=root ; COMMAND=/bin/cat /var/log/secure
 ip-10-4-213-132 sudo: vtayyare : TTY=pts/0 ; PWD=/home/vtayyare ; USER=root ; COMMAND=/bin/cat /var/log/secure
10 3 15:15:14  instance-id: i-02fce21a425a2efb3: Testing for %-d and %-m
Oct 3 15:15:14  ip-10-4-213-132 sudo: vtayyare : TTY=pts/0 ; PWD=/home/vtayyare ; USER=root ; COMMAND=/bin/cat /var/log/secure
 ip-10-4-213-132 sudo: vtayyare : TTY=pts/0 ; PWD=/home/vtayyare ; USER=root ; COMMAND=/bin/cat /var/log/secure
10 3 15:15:15  instance-id: i-02fce21a425a2efb3: Testing for %-d and %-m
Screenshot 2023-10-03 at 11 13 34 AM

Requirements

Before commit the code, please do the following steps.

  1. Run make fmt and make fmt-sh
  2. Run make lint

@ymtaye ymtaye changed the title default to %d for both %d and %-d Default to timestamp_format to %d Sep 28, 2023
@codecov-commenter
Copy link

codecov-commenter commented Sep 29, 2023

Codecov Report

Attention: 37 lines in your changes are missing coverage. Please review.

Comparison is base (96d4763) 57.58% compared to head (96dae45) 62.44%.
Report is 425 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #870      +/-   ##
==========================================
+ Coverage   57.58%   62.44%   +4.85%     
==========================================
  Files         370      339      -31     
  Lines       17548    17130     -418     
==========================================
+ Hits        10105    10696     +591     
+ Misses       6848     5882     -966     
+ Partials      595      552      -43     
Files Coverage Δ
cfg/commonconfig/commonconfig.go 8.00% <ø> (ø)
...md/amazon-cloudwatch-agent-config-wizard/wizard.go 59.55% <ø> (-8.51%) ⬇️
...amazon-cloudwatch-agent/amazon-cloudwatch-agent.go 2.64% <ø> (ø)
...oudwatch-agent/register_event_logger_notwindows.go 0.00% <ø> (ø)
...-cloudwatch-agent/register_event_logger_windows.go 0.00% <ø> (ø)
cmd/config-translator/translator.go 0.00% <ø> (ø)
cmd/xray-migration/commands_unix.go 42.50% <ø> (ø)
cmd/xray-migration/commands_windows.go 42.50% <ø> (ø)
cmd/xray-migration/xray-migration.go 30.28% <ø> (ø)
handlers/agentinfo/info.go 84.94% <ø> (ø)
... and 22 more

... and 207 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ymtaye ymtaye changed the title Default to timestamp_format to %d Default timestamp_format to be compatible with zero padding and non zero padding Sep 29, 2023
@ymtaye ymtaye marked this pull request as ready for review September 29, 2023 03:20
@ymtaye ymtaye requested a review from a team as a code owner September 29, 2023 03:20
plugins/inputs/logfile/README.md Outdated Show resolved Hide resolved
plugins/outputs/cloudwatchlogs/pusher.go Outdated Show resolved Hide resolved
// Check when timestamp has an interval of 5 days.
// 432000 seconds = 5 days
// p.lastUpdateTime -> is set in miliseconds
if ((time.Now().UnixNano() / 1000000000) - (p.lastUpdateTime / 1000)) > 432000 {
Copy link
Contributor

@okankoAMZ okankoAMZ Oct 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100000000

is this to convert from Nano seconds to seconds? can we not just use Unix() instead of UnixNano(), is there any particular reason we are using UnixNano ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can change this to use Unix() to make things more concise. Good callout

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. The log timestamps that CWL accepts do not accept nanosecond precision. There's no need for us to use that level of precision.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better yet, store the lastUpdateTime as a time.Time so you can do the check without all this conversion.

time.Since(p.lastUpdateTime) > [name_of_threshold_variable]

where [name_of_threshold_variable] is like 5 * 24 * time.Hour

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed this, to follow format as Jeffrey suggested

			t = p.lastValidTime
			if !p.lastUpdateTime.IsZero() {
				// Check when timestamp has an interval of 5 days.
				if time.Since(p.lastUpdateTime) > elaspedPeriod {
					p.Log.Warnf("Unable to parse timestamp, using last valid timestamp %v: which is older than 5 days for log group %v: ", p.lastValidTime, p.Group)
				}
			}

// Check when timestamp has an interval of 5 days.
// 432000 seconds = 5 days
// p.lastUpdateTime -> is set in miliseconds
if ((time.Now().UnixNano() / 1000000000) - (p.lastUpdateTime / 1000)) > 432000 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. The log timestamps that CWL accepts do not accept nanosecond precision. There's no need for us to use that level of precision.

plugins/outputs/cloudwatchlogs/pusher.go Outdated Show resolved Hide resolved
TimezoneLoc: timezoneLoc}

// make sure layout is compatible for "Sep 9", "Sep 9" , "Sep 09", "Sep 09" options
logEntry := fmt.Sprintf("Sep 9 02:00:43 ip-10-4-213-132 sudo: vtayyare : TTY=pts/0 ; PWD=/home/vtayyare ; USER=root ; COMMAND=/bin/cat /var/log/secure\n")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Could make this a set of testcases. Especially since they're all expecting the same hour/minute value. Also, probably can be a simpler log line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will shorten the log lines and address this. Keeping test case as is for consistency in other fileconfig_test.go

@@ -22,6 +22,7 @@ import (
const (
reqSizeLimit = 1024 * 1024
reqEventsLimit = 10000
elaspedPeriod = 5 * 24 * time.Hour
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: misspelled

Suggested change
elaspedPeriod = 5 * 24 * time.Hour
elapsedPeriod = 5 * 24 * time.Hour

but also this isn't descriptive. What is this used for? Change the name to reflect that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 days is pretty long to wait before emitting the log since we're using the update timestamp now. How did we decide on 5 days?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will modify this to 3 days, I based this initially of use case of cx who had issue with this regex mismatch. 5 days seemed large enough to ignore too much noise and also small enough to capture incorrect log events. I will modify to 3 days

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you all be open to a 1-day elapsed time for this log warning message? This is a hard one and I see Yared is trying to strike a balance between raising awareness and avoiding noise. IMO 1 day would be better since it is closer in time to when the impact started.

@@ -182,8 +236,8 @@ func applyRule1(t *testing.T, buf string) interface{} {
return val
}

// stdNumMonth // "1" //%-m
// stdDay // "2" //%-d
// stdNumMonth // "_1" //%-m
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be reverted.

Copy link
Contributor Author

@ymtaye ymtaye Oct 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct must have forgot to remove it in the revision where I changed timestamp_layout to be an array

jefchien
jefchien previously approved these changes Oct 13, 2023
reqEventsLimit = 10000
reqSizeLimit = 1024 * 1024
reqEventsLimit = 10000
logWarnInterval = 1 * 24 * time.Hour
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This should be more specific. logWarnInterval is too generic. Is this interval for all log.Warn messages? It's also not an interval. An interval determines how often something happens. This is more like a threshold.

if !p.lastUpdateTime.IsZero() {
// Check when timestamp has an interval of 5 days.
if time.Since(p.lastUpdateTime) > logWarnInterval {
p.Log.Warnf("Unable to parse timestamp, using last valid timestamp %v: which is at least older by 1 day for log group %v: ", p.lastValidTime, p.Group)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
p.Log.Warnf("Unable to parse timestamp, using last valid timestamp %v: which is at least older by 1 day for log group %v: ", p.lastValidTime, p.Group)
p.Log.Warnf("Unable to parse timestamp, using last valid timestamp found in the logs %v: which is at least older than 1 day for log group %v: ", p.lastValidTime, p.Group)

logWarnInterval = 1 * 24 * time.Hour
reqSizeLimit = 1024 * 1024
reqEventsLimit = 10000
warnOldTimeStamp = 1 * 24 * time.Hour
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit: Still think threshold should be somewhere in the name, but it isn't blocking.

@ymtaye ymtaye merged commit 5deb8c9 into aws:main Oct 16, 2023
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants