Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Statuscode Handler to Agent Health Extension #1423

Merged
merged 98 commits into from
Dec 12, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
98 commits
Select commit Hold shift + click to select a range
2348639
fixing issue
Paramadon Nov 12, 2024
b7fac5e
trying to fix logs
Paramadon Nov 12, 2024
45536b8
fixing issue
Paramadon Nov 12, 2024
75b024a
adding debug statements (swapping logger with log.print)
Paramadon Nov 13, 2024
b41de09
adding more log statments
Paramadon Nov 13, 2024
90a22a3
fixing tags
Paramadon Nov 13, 2024
8c6d037
fixing logs
Paramadon Nov 13, 2024
6ff99f8
fixing logs
Paramadon Nov 13, 2024
667f608
adding a log
Paramadon Nov 13, 2024
9de9448
fixing stats function for statuscodehandler
Paramadon Nov 13, 2024
1e690e3
fixing issue
Paramadon Nov 13, 2024
4c6adcf
making operation names shorter
Paramadon Nov 13, 2024
dc129c9
Last commit works to reduce name- this one is for filtering operation…
Paramadon Nov 13, 2024
07a06fa
adding status code only filter
Paramadon Nov 13, 2024
0f76726
removing status code from otel config
Paramadon Nov 13, 2024
965ccdb
had to add return statments to filter
Paramadon Nov 13, 2024
a8268fd
commenting out try configure for ec2tagger
Paramadon Nov 13, 2024
8ddb463
adding ecw client for ec2 tagger
Paramadon Nov 13, 2024
3c15e81
changing the middleware to metricID
Paramadon Nov 13, 2024
947b8f3
trying to get stat
Paramadon Nov 14, 2024
a76ef49
everything works just removing some debug statement and filter from h…
Paramadon Nov 15, 2024
abf484b
fixing filter issue
Paramadon Nov 15, 2024
c849f61
previous commit works, just cleaning up debug statemtns
Paramadon Nov 15, 2024
bd33dbc
adding unit tests
Paramadon Nov 15, 2024
822680f
fixing test
Paramadon Nov 19, 2024
d547e1c
fixing unit tests
Paramadon Nov 19, 2024
5dc40f4
running make fmt
Paramadon Nov 19, 2024
e5bf295
fixing unit tests
Paramadon Nov 19, 2024
e07463a
fixing tests
Paramadon Nov 19, 2024
a5ba52e
fixing yamls
Paramadon Nov 19, 2024
f3c8d46
fixing yamls
Paramadon Nov 19, 2024
f64b332
removing internal changes
Paramadon Nov 19, 2024
fa4c163
restoring files
Paramadon Nov 19, 2024
457e48c
removing unnecessary file changes and passing unit test
Paramadon Nov 19, 2024
2705e87
cleaning up code
Paramadon Nov 19, 2024
3083626
fixing up pointer issue and passing in operations
Paramadon Nov 20, 2024
9e40bd8
adding agent health to prometheus
Paramadon Nov 20, 2024
5745ee6
resolving comments
Paramadon Nov 22, 2024
6e0fdd3
resolving comments
Paramadon Nov 22, 2024
8aee0d9
adding tests
Paramadon Nov 22, 2024
f344518
resolving comments
Paramadon Nov 22, 2024
a7e0415
resolving comments
Paramadon Nov 22, 2024
d91706d
adding tests
Paramadon Nov 22, 2024
59f735f
fixing race condition
Paramadon Nov 25, 2024
ebd92bd
fixing formats
Paramadon Nov 25, 2024
44c4006
fixing tests
Paramadon Nov 25, 2024
ecb4be5
removing log statements
Paramadon Nov 25, 2024
dfa3d4b
fixed names
Paramadon Nov 25, 2024
39e3a55
Merge branch 'main' of github.com:aws/amazon-cloudwatch-agent into Co…
Paramadon Nov 26, 2024
b589993
temp save
Paramadon Dec 2, 2024
0e3b86b
This fixes issues with pr and makes provider singleton instead of han…
Paramadon Dec 4, 2024
b14a43c
fixing lint
Paramadon Dec 4, 2024
14528bd
fixing lint
Paramadon Dec 4, 2024
e9e062d
Merge branch 'main' of github.com:aws/amazon-cloudwatch-agent into Co…
Paramadon Dec 4, 2024
cd78e5e
fixing issue
Paramadon Dec 4, 2024
aff4bc5
fixing lint
Paramadon Dec 4, 2024
7d31af3
fixing make test
Paramadon Dec 4, 2024
aed19c9
restoring internal folder
Paramadon Dec 4, 2024
6ee61b5
adding Describe Tasts
Paramadon Dec 4, 2024
a8ee7e3
adding to api status codes
Paramadon Dec 5, 2024
1887230
fixing unit tests
Paramadon Dec 5, 2024
1e31fb5
restring files and fixing lint
Paramadon Dec 5, 2024
188955e
fixing unit tests and lint
Paramadon Dec 5, 2024
424b762
restoring test data
Paramadon Dec 5, 2024
25e5ffd
fixing issue
Paramadon Dec 5, 2024
6a48268
adding singleton unit test
Paramadon Dec 5, 2024
af57c05
moving map
Paramadon Dec 5, 2024
5d3b20d
fixing channels
Paramadon Dec 5, 2024
471842c
removing random files
Paramadon Dec 5, 2024
c275271
moving filter to handler
Paramadon Dec 5, 2024
344db54
Merge branch 'main' into CodeHandler
Paramadon Dec 5, 2024
c31465f
restoring files
Paramadon Dec 5, 2024
48e879f
merging CodeHandler
Paramadon Dec 5, 2024
ed0313d
removing logs
Paramadon Dec 5, 2024
7fb22ef
fixing op filter
Paramadon Dec 5, 2024
106a329
changing parameter name
Paramadon Dec 5, 2024
8db2db6
Merge branch 'CodeHandler' of github.com:aws/amazon-cloudwatch-agent …
Paramadon Dec 6, 2024
aff8062
adding assume role
Paramadon Dec 6, 2024
3017b84
resolving comments
Paramadon Dec 6, 2024
93ef867
fixing race
Paramadon Dec 10, 2024
504eabe
fixng tlx
Paramadon Dec 10, 2024
a900640
removing unnecessary logs
Paramadon Dec 10, 2024
91920ea
fixing lint
Paramadon Dec 10, 2024
5cacf23
shouldn't add to stats map if ShouldResetStats is false
Paramadon Dec 11, 2024
97e82e1
moving around functions
Paramadon Dec 11, 2024
c061e3d
moving around functions
Paramadon Dec 11, 2024
9023e49
resolving comments
Paramadon Dec 11, 2024
94a4bd0
pushing changes
Paramadon Dec 11, 2024
acfde6b
removing unecessar tls files
Paramadon Dec 11, 2024
d321723
fixing lint
Paramadon Dec 11, 2024
c0d46b7
Merge branch 'main' into CodeHandler
Paramadon Dec 11, 2024
b3ca2b9
reverting a json
Paramadon Dec 11, 2024
e2b75bf
Merge branch 'CodeHandler' of github.com:aws/amazon-cloudwatch-agent …
Paramadon Dec 11, 2024
381bd53
Improved performance
Paramadon Dec 11, 2024
402743b
Improving performance
Paramadon Dec 12, 2024
6dc1193
Adding race
Paramadon Dec 12, 2024
bb73b8f
fixing ling
Paramadon Dec 12, 2024
1eae32a
fixing lint
Paramadon Dec 12, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .golangci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -59,4 +59,4 @@ linters:
- nonamedreturns

issues:
new-from-rev: 3221f76
new-from-rev: 9af4477
5 changes: 3 additions & 2 deletions extension/agenthealth/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,9 @@ import (
)

type Config struct {
Paramadon marked this conversation as resolved.
Show resolved Hide resolved
IsUsageDataEnabled bool `mapstructure:"is_usage_data_enabled"`
Stats agent.StatsConfig `mapstructure:"stats"`
IsUsageDataEnabled bool `mapstructure:"is_usage_data_enabled"`
Stats *agent.StatsConfig `mapstructure:"stats,omitempty"`
IsStatusCodeEnabled bool `mapstructure:"is_status_code_enabled,omitempty"`
}

var _ component.Config = (*Config)(nil)
4 changes: 2 additions & 2 deletions extension/agenthealth/config_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -26,11 +26,11 @@ func TestLoadConfig(t *testing.T) {
},
{
id: component.NewIDWithName(TypeStr, "1"),
want: &Config{IsUsageDataEnabled: false, Stats: agent.StatsConfig{Operations: []string{agent.AllowAllOperations}}},
want: &Config{IsUsageDataEnabled: false, Stats: nil},
},
{
id: component.NewIDWithName(TypeStr, "2"),
want: &Config{IsUsageDataEnabled: true, Stats: agent.StatsConfig{Operations: []string{"ListBuckets"}}},
want: &Config{IsUsageDataEnabled: true, Stats: &agent.StatsConfig{Operations: []string{"ListBuckets"}}},
},
}
for _, testCase := range testCases {
Expand Down
29 changes: 25 additions & 4 deletions extension/agenthealth/extension.go
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ import (
"go.uber.org/zap"

"github.com/aws/amazon-cloudwatch-agent/extension/agenthealth/handler/stats"
"github.com/aws/amazon-cloudwatch-agent/extension/agenthealth/handler/stats/agent"
"github.com/aws/amazon-cloudwatch-agent/extension/agenthealth/handler/useragent"
)

Expand All @@ -24,11 +25,31 @@ var _ awsmiddleware.Extension = (*agentHealth)(nil)
func (ah *agentHealth) Handlers() ([]awsmiddleware.RequestHandler, []awsmiddleware.ResponseHandler) {
var responseHandlers []awsmiddleware.ResponseHandler
requestHandlers := []awsmiddleware.RequestHandler{useragent.NewHandler(ah.cfg.IsUsageDataEnabled)}
if ah.cfg.IsUsageDataEnabled {
req, res := stats.NewHandlers(ah.logger, ah.cfg.Stats)
requestHandlers = append(requestHandlers, req...)
responseHandlers = append(responseHandlers, res...)

if !ah.cfg.IsUsageDataEnabled {
ah.logger.Debug("Usage data is disabled, skipping stats handlers")
return requestHandlers, responseHandlers
}

statusCodeEnabled := ah.cfg.IsStatusCodeEnabled

var statsResponseHandlers []awsmiddleware.ResponseHandler
var statsRequestHandlers []awsmiddleware.RequestHandler
var statsConfig agent.StatsConfig
var agentStatsEnabled bool

if ah.cfg.Stats != nil {
statsConfig = *ah.cfg.Stats
agentStatsEnabled = true
} else {
agentStatsEnabled = false
}

statsRequestHandlers, statsResponseHandlers = stats.NewHandlers(ah.logger, statsConfig, statusCodeEnabled, agentStatsEnabled)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Could just pass in the stats pointer directly to determine if it's enabled.

Suggested change
statsRequestHandlers, statsResponseHandlers = stats.NewHandlers(ah.logger, statsConfig, statusCodeEnabled, agentStatsEnabled)
statsRequestHandlers, statsResponseHandlers = stats.NewHandlers(ah.logger, ah.cfg.Stats, statusCodeEnabled)

Comment on lines +36 to +48
Copy link
Contributor

@musa-asad musa-asad Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: don't need to define variables

Suggested change
var statsResponseHandlers []awsmiddleware.ResponseHandler
var statsRequestHandlers []awsmiddleware.RequestHandler
var statsConfig agent.StatsConfig
var agentStatsEnabled bool
if ah.cfg.Stats != nil {
statsConfig = *ah.cfg.Stats
agentStatsEnabled = true
} else {
agentStatsEnabled = false
}
statsRequestHandlers, statsResponseHandlers = stats.NewHandlers(ah.logger, statsConfig, statusCodeEnabled, agentStatsEnabled)
var statsConfig agent.StatsConfig
var agentStatsEnabled bool
if ah.cfg.Stats != nil {
statsConfig = *ah.cfg.Stats
agentStatsEnabled = true
} else {
agentStatsEnabled = false
}
statsRequestHandlers, statsResponseHandlers := stats.NewHandlers(ah.logger, statsConfig, statusCodeEnabled, agentStatsEnabled)

(and you can remove the conditional from @jefchien's nit)


requestHandlers = append(requestHandlers, statsRequestHandlers...)
responseHandlers = append(responseHandlers, statsResponseHandlers...)

return requestHandlers, responseHandlers
}

Expand Down
23 changes: 22 additions & 1 deletion extension/agenthealth/extension_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -10,18 +10,39 @@ import (
"github.com/stretchr/testify/assert"
"go.opentelemetry.io/collector/component/componenttest"
"go.uber.org/zap"

"github.com/aws/amazon-cloudwatch-agent/extension/agenthealth/handler/stats/agent"
)

func TestExtension(t *testing.T) {
ctx := context.Background()
cfg := &Config{IsUsageDataEnabled: true}
cfg := &Config{IsUsageDataEnabled: true, IsStatusCodeEnabled: true, Stats: &agent.StatsConfig{Operations: []string{"ListBuckets"}}}
extension := NewAgentHealth(zap.NewNop(), cfg)
assert.NotNil(t, extension)
assert.NoError(t, extension.Start(ctx, componenttest.NewNopHost()))
requestHandlers, responseHandlers := extension.Handlers()
// user agent, client stats, stats
assert.Len(t, requestHandlers, 3)
// client stats
assert.Len(t, responseHandlers, 2)
cfg.IsUsageDataEnabled = false
requestHandlers, responseHandlers = extension.Handlers()
// user agent
assert.Len(t, requestHandlers, 1)
assert.Len(t, responseHandlers, 0)
assert.NoError(t, extension.Shutdown(ctx))
}

func TestExtensionStatusCodeOnly(t *testing.T) {
ctx := context.Background()
cfg := &Config{IsUsageDataEnabled: true, IsStatusCodeEnabled: true}
extension := NewAgentHealth(zap.NewNop(), cfg)
assert.NotNil(t, extension)
assert.NoError(t, extension.Start(ctx, componenttest.NewNopHost()))
requestHandlers, responseHandlers := extension.Handlers()
// user agent, client stats, stats
assert.Len(t, requestHandlers, 1)
// client stats
assert.Len(t, responseHandlers, 1)
cfg.IsUsageDataEnabled = false
requestHandlers, responseHandlers = extension.Handlers()
Expand Down
6 changes: 1 addition & 5 deletions extension/agenthealth/factory.go
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,6 @@ import (

"go.opentelemetry.io/collector/component"
"go.opentelemetry.io/collector/extension"

"github.com/aws/amazon-cloudwatch-agent/extension/agenthealth/handler/stats/agent"
)

var (
Expand All @@ -28,9 +26,7 @@ func NewFactory() extension.Factory {
func createDefaultConfig() component.Config {
return &Config{
IsUsageDataEnabled: true,
Stats: agent.StatsConfig{
Operations: []string{agent.AllowAllOperations},
},
Stats: nil,
Copy link
Contributor

@musa-asad musa-asad Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Should IsStatusCodeEnabled be added for consistency?

Suggested change
Stats: nil,
IsStatusCodeEnabled: false,
Stats: nil,

Or remove Stats since it should set to nil regardless ?

Suggested change
Stats: nil,

}
}

Expand Down
4 changes: 1 addition & 3 deletions extension/agenthealth/factory_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,11 @@ import (
"github.com/stretchr/testify/assert"
"go.opentelemetry.io/collector/component/componenttest"
"go.opentelemetry.io/collector/extension/extensiontest"

"github.com/aws/amazon-cloudwatch-agent/extension/agenthealth/handler/stats/agent"
)

func TestCreateDefaultConfig(t *testing.T) {
cfg := NewFactory().CreateDefaultConfig()
assert.Equal(t, &Config{IsUsageDataEnabled: true, Stats: agent.StatsConfig{Operations: []string{agent.AllowAllOperations}}}, cfg)
assert.Equal(t, &Config{IsUsageDataEnabled: true, Stats: nil}, cfg)
assert.NoError(t, componenttest.CheckConfigStruct(cfg))
}

Expand Down
120 changes: 98 additions & 22 deletions extension/agenthealth/handler/stats/agent/agent.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,28 +15,29 @@ const (
)

type Stats struct {
CpuPercent *float64 `json:"cpu,omitempty"`
MemoryBytes *uint64 `json:"mem,omitempty"`
FileDescriptorCount *int32 `json:"fd,omitempty"`
ThreadCount *int32 `json:"th,omitempty"`
LatencyMillis *int64 `json:"lat,omitempty"`
PayloadBytes *int `json:"load,omitempty"`
StatusCode *int `json:"code,omitempty"`
SharedConfigFallback *int `json:"scfb,omitempty"`
ImdsFallbackSucceed *int `json:"ifs,omitempty"`
AppSignals *int `json:"as,omitempty"`
EnhancedContainerInsights *int `json:"eci,omitempty"`
RunningInContainer *int `json:"ric,omitempty"`
RegionType *string `json:"rt,omitempty"`
Mode *string `json:"m,omitempty"`
EntityRejected *int `json:"ent,omitempty"`
CPUPercent *float64 `json:"cpu,omitempty"`
MemoryBytes *uint64 `json:"mem,omitempty"`
FileDescriptorCount *int32 `json:"fd,omitempty"`
ThreadCount *int32 `json:"th,omitempty"`
LatencyMillis *int64 `json:"lat,omitempty"`
PayloadBytes *int `json:"load,omitempty"`
StatusCode *int `json:"code,omitempty"`
SharedConfigFallback *int `json:"scfb,omitempty"`
ImdsFallbackSucceed *int `json:"ifs,omitempty"`
AppSignals *int `json:"as,omitempty"`
EnhancedContainerInsights *int `json:"eci,omitempty"`
RunningInContainer *int `json:"ric,omitempty"`
RegionType *string `json:"rt,omitempty"`
Mode *string `json:"m,omitempty"`
EntityRejected *int `json:"ent,omitempty"`
StatusCodes map[string][5]int `json:"codes,omitempty"` //represents status codes 200,400,408,413,429,
}

// Merge the other Stats into the current. If the field is not nil,
// then it'll overwrite the existing one.
func (s *Stats) Merge(other Stats) {
if other.CpuPercent != nil {
s.CpuPercent = other.CpuPercent
if other.CPUPercent != nil {
s.CPUPercent = other.CPUPercent
}
if other.MemoryBytes != nil {
s.MemoryBytes = other.MemoryBytes
Expand Down Expand Up @@ -80,6 +81,26 @@ func (s *Stats) Merge(other Stats) {
if other.EntityRejected != nil {
s.EntityRejected = other.EntityRejected
}
if other.StatusCodes != nil {
Paramadon marked this conversation as resolved.
Show resolved Hide resolved
if s.StatusCodes == nil {
s.StatusCodes = make(map[string][5]int)
}

for key, value := range other.StatusCodes {
if existing, ok := s.StatusCodes[key]; ok {
s.StatusCodes[key] = [5]int{
existing[0] + value[0], // 200
existing[1] + value[1], // 400
existing[2] + value[2], // 408
existing[3] + value[3], // 413
existing[4] + value[4], // 429
}
} else {
s.StatusCodes[key] = value
}
}
}

}

func (s *Stats) Marshal() (string, error) {
Expand All @@ -104,6 +125,29 @@ func (of OperationsFilter) IsAllowed(operationName string) bool {
return of.allowAll || of.operations.Contains(operationName)
}

type StatsConfig struct {
// Operations are the allowed operation names to gather stats for.
Operations []string `mapstructure:"operations,omitempty"`
// UsageFlags are the usage flags to set on start up.
UsageFlags map[Flag]any `mapstructure:"usage_flags,omitempty"`
}

var StatusCodeOperations = []string{ // all the operations that are allowed
Paramadon marked this conversation as resolved.
Show resolved Hide resolved
"PutRetentionPolicy",
"DescribeInstances",
"DescribeTags",
"DescribeVolumes",
"DescribeContainerInstances",
"DescribeServices",
"DescribeTaskDefinition",
"ListServices",
"ListTasks",
"DescribeTasks",
"CreateLogGroup",
"CreateLogStream",
"AssumeRole",
}

func NewOperationsFilter(operations ...string) OperationsFilter {
allowed := collections.NewSet[string](operations...)
return OperationsFilter{
Expand All @@ -112,9 +156,41 @@ func NewOperationsFilter(operations ...string) OperationsFilter {
}
}

type StatsConfig struct {
// Operations are the allowed operation names to gather stats for.
Operations []string `mapstructure:"operations,omitempty"`
// UsageFlags are the usage flags to set on start up.
UsageFlags map[Flag]any `mapstructure:"usage_flags,omitempty"`
// NewStatusCodeOperationsFilter creates a new filter for allowed operations and status codes.
func NewStatusCodeOperationsFilter() OperationsFilter {
return NewOperationsFilter(StatusCodeOperations...)
}

// GetShortOperationName maps long operation names to short ones.
func GetShortOperationName(operation string) string {
switch operation {
case "PutRetentionPolicy":
return "prp"
case "DescribeInstances":
return "di"
case "DescribeTags":
return "dt"
case "DescribeTasks":
return "dts"
case "DescribeVolumes":
return "dv"
case "DescribeContainerInstances":
return "dci"
case "DescribeServices":
return "ds"
case "DescribeTaskDefinition":
return "dtd"
case "ListServices":
return "ls"
case "ListTasks":
return "lt"
case "CreateLogGroup":
return "clg"
case "CreateLogStream":
return "cls"
case "AssumeRole":
return "ar"
default:
return ""
}
}
Loading
Loading