Fix the fallback for container metrics logic to query both container and pod metrics #789

nabil-dbz · 2024-11-05T15:51:52Z

The main idea here is to avoid failure when querying pod level metrics throw an error. For example for the metric type is kubernetes.io/container/accelerator/duty_cycle, this query:

metric.type = "kubernetes.io/container/accelerator/duty_cycle" AND resource.type = "k8s_pod"

Throws the following error:

The supplied filter does not specify a valid combination of metric and monitored resource descriptors. 
The query will not return any time series.

Which makes the adapter return an error before trying to query the metrics using the k8s_container.

The proposed solution for this is to add a new resource type called PodContainerType for which we use the operator one_of to handle the fallback logic. However, given that the response might contain both k8s_pod and k8s_container metrics (time series), we're adding a post-processing step to consider k8s_container metrics if k8s_pod metrics are absent.

Also, to support resource label filters for custom metrics, we're adding resource.labels to the list of allowed prefixes for custom metrics.

CatherineF-dev · 2024-11-05T16:01:49Z

qq: will this bring some breaking changes?

Currently, the latest customer-metrics-stackdriver-adapter is deployed on all existing kubernetes versions. Because users apply the latest production yaml inside this repo.

…starting with resource.labels

nabil-dbz · 2024-11-05T19:53:22Z

I don't expect this to be breaking anything unless I'm missing some details that I'm not aware of. It would probably be a better idea to have an expert review this in detail. I tested the changes locally with Workload Autoscaler and this looked to be working just fine.

Let me know if you suspect my changes to be breaking things

raywainman · 2024-11-05T20:14:56Z

I will take a close look at this too, adding this to my list for this week. Thanks @nabil-dbz!

raywainman · 2024-11-11T20:50:48Z

custom-metrics-stackdriver-adapter/pkg/adapter/provider/provider.go

@@ -349,6 +311,27 @@ func (p *StackdriverProvider) ListAllExternalMetrics() []provider.ExternalMetric
 	}
 }

+func (p *StackdriverProvider) PostProcessPodContainerResp(response *stackdriver.ListTimeSeriesResponse, metricName string) (*stackdriver.ListTimeSeriesResponse, error) {


We should really document what this does because modifying a response object here could be really confusing.

I was initially going to suggest not returning a ListTimeSeriesResponse here but I see that this would cause a bunch of issues with the follow-up calls.

raywainman · 2024-11-11T20:51:46Z

custom-metrics-stackdriver-adapter/pkg/adapter/translator/query_builder.go

@@ -41,7 +41,7 @@ var (
 	allowedExternalMetricsFullLabelNames = []string{"resource.type", "reducer"}
 	// allowedCustomMetricsLabelPrefixes and allowedCustomMetricsFullLabelNames specify all metric labels allowed for querying
 	allowedCustomMetricsLabelPrefixes  = []string{"metric.labels"}
-	allowedCustomMetricsFullLabelNames = []string{"reducer"}
+	allowedCustomMetricsFullLabelNames = []string{"resource.labels.container_name", "reducer"}


I think you can omit resource.labels here, can't you? since it will be picked up by the prefix above?

raywainman · 2024-11-11T20:55:51Z

custom-metrics-stackdriver-adapter/pkg/adapter/translator/utils/filter_builder.go

@@ -107,6 +113,8 @@ func NewFilterBuilder(resourceType string) FilterBuilder {
 	switch resourceType {
 	case PodType:
 		schema = PodSchema
+	case PodContainerType:


What if we named this something like PodOrContainerType?

Otherwise this gets confusing :(

nabil-dbz added 3 commits November 5, 2024 14:55

Use one_of operator for the fallback for container metrics logic

fbfa0e7

Add resource.labels to the allowlist of prefixes for custom metrics

a6e7f58

Add ContainerType in addition to PodContainerType

d25ecd2

Allow the label resource.labels.container_name instead of all labels …

f85193b

…starting with resource.labels

raywainman reviewed Nov 11, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix the fallback for container metrics logic to query both container and pod metrics #789

Fix the fallback for container metrics logic to query both container and pod metrics #789

nabil-dbz commented Nov 5, 2024

CatherineF-dev commented Nov 5, 2024 •

edited

Loading

nabil-dbz commented Nov 5, 2024 •

edited

Loading

raywainman commented Nov 5, 2024

raywainman Nov 11, 2024

raywainman Nov 11, 2024

raywainman Nov 11, 2024

Fix the fallback for container metrics logic to query both container and pod metrics #789

Are you sure you want to change the base?

Fix the fallback for container metrics logic to query both container and pod metrics #789

Conversation

nabil-dbz commented Nov 5, 2024

CatherineF-dev commented Nov 5, 2024 • edited Loading

nabil-dbz commented Nov 5, 2024 • edited Loading

raywainman commented Nov 5, 2024

raywainman Nov 11, 2024

Choose a reason for hiding this comment

raywainman Nov 11, 2024

Choose a reason for hiding this comment

raywainman Nov 11, 2024

Choose a reason for hiding this comment

CatherineF-dev commented Nov 5, 2024 •

edited

Loading

nabil-dbz commented Nov 5, 2024 •

edited

Loading