-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add EKS integration test for FluentBit log emission with Entity field #427
Conversation
terraform/eks/daemon/entity/main.tf
Outdated
role_arn = module.basic_components.role_arn | ||
version = var.k8s_version | ||
enabled_cluster_log_types = [ | ||
"api", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need these? Not sure what these do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried running the setup without these and still works. I think these provide control plane logs from EKS under /aws/eks
log group but we only care about application logs from Container Insights log group for these tests. Will remove in next commit.
// Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. | ||
// SPDX-License-Identifier: MIT | ||
|
||
module "common" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there no common terraform to create an eks cluster? seems like we have duplicate code here from our EKS integ tests...can we just reuse it and then apply the helm chart?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no common terraform to create EKS cluster. All terraform setups for each test under terraform/eks
folder have their own cluster setup. I think your point is valid though, having a common code to avoid potential duplication is better but I think it's out of scope for this PR.
entityServiceNameSource = "@entity.Attributes.AWS.ServiceNameSource" | ||
) | ||
|
||
type expectedEntity struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't this just be the cloudwatch.Entity
struct? Then we can have checks to ensure the KeyAttributes/Attributes are valid in the struct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When querying the logs, the response doesn't return in cloudwatch.Entity
type, it's Results [][]*ResultField
which is basically arrays of strings. We could try to use cloudwatch.Entity
just to hold the expected output but it would require additional logics to deference the maps just to turn them into strings so thought using a custom test struct is cleaner.
test/entity/entity_test.go
Outdated
|
||
// ValidateEntity queries a given LogGroup/LogStream combination given the start and end times, and executes an | ||
// log query for entity attributes to ensure the entity values are correct | ||
func ValidateEntity(t *testing.T, logGroup, logStream string, end *time.Time, queryString string, expectedEntity expectedEntity, entityPlatformType string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func ValidateEntity(t *testing.T, logGroup, logStream string, end *time.Time, queryString string, expectedEntity expectedEntity, entityPlatformType string) { | |
func ValidateLogEntity(t *testing.T, logGroup, logStream string, end *time.Time, queryString string, expectedEntity expectedEntity, entityPlatformType string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will fix in next commit
test/entity/entity_test.go
Outdated
t.Fatalf("application log group used for entity validation doesn't exsit: %s", logGroup) | ||
} | ||
begin := end.Add(-sleepForFlush * 2) | ||
log.Printf("Start time is " + begin.String() + " and end time is " + end.String()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make sure these are accurate...how are we tracking the begin time? It should be when the terraform is initiated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Begin time is 2 time before the end point. This is the current flow:
- Terraform initialize all resources(cluster, helm chart, etc)
- Test starts running
- Sleep 3 minutes
- End time is set after 3 minutes sleep time
- Begin time is 6 minutes before end time since it's doing the
*2
multiplication. We could remove the multiplication but I did it just to ensure we don't somehow miss a query by barely missing the start time.
end := time.Now() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but this assumes that the cluster was created within 6 minutes before the end time right? Wondering if there is a way to start the timer at terraform init. Not sure how we are doing it for our other tests
test/entity/entity_test.go
Outdated
if !assert.NotZero(t, len(result)) { | ||
return | ||
} | ||
if entityPlatformType == "EC2" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel like we should have a defined constant for required KeyAttributes and Attributes for each of the platforms.
Something like
ec2RequiredKeyAttributes := map[string]struct{}{} {
"Service": {},
"Environment":{},
...
}
And have some validator in a util func. Having a map[string]struct{}{} is just easier to do the checks...not sure if we have a List abstraction for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going to create a validator struct that performs different logics based on platforms.
@@ -197,6 +197,47 @@ func GetLogQueryStats(logGroupName string, startTime, endTime int64, queryString | |||
} | |||
} | |||
|
|||
// GetLogQueryResults for the log group between start/end (in epoch seconds) for the | |||
// query string. | |||
func GetLogQueryResults(logGroupName string, startTime, endTime int64, queryString string) ([][]types.ResultField, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't have something like this already? Something that checks for the existence of logs in a log group within a certain time period?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have GetLogQueryStats function which does what you described but that returns the statistics of the query result which does not include log fields so we can't validate entity from that. GetLogQueryResults
function is a copy of that function except that it returns results.Results
agentConfigPath: filepath.Join("resources", "compass_default_log.json"), | ||
podName: "log-generator", | ||
expectedEntity: expectedEntity{ | ||
entityType: "Service", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we could have an enum for different entity types: Service, Resource
Same for platformType: AWS::EKS, AWS::K8s, etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will address in the next CR where the values get re-used
Description of the issue
FluentBit has a new feature to emit CloudWatch logs with entity field which helps users navigate between their telemetry in the CloudWatch console. We want to add an integration test which helps us validate this behavior.
This PR focuses on the test scenario where FluentBit falls back to using K8sWorkload as the service name and uses default environment name.
Description of changes
License
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
Tests
Test run: https://github.com/aws/amazon-cloudwatch-agent/actions/runs/11790939699/job/32939527206 (Ran after commit 861e08b)