Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the fallback for container metrics logic to query both container and pod metrics #789

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

nabil-dbz
Copy link

The main idea here is to avoid failure when querying pod level metrics throw an error. For example for the metric type is kubernetes.io/container/accelerator/duty_cycle, this query:

metric.type = "kubernetes.io/container/accelerator/duty_cycle" AND resource.type = "k8s_pod"

Throws the following error:

The supplied filter does not specify a valid combination of metric and monitored resource descriptors. 
The query will not return any time series.

Which makes the adapter return an error before trying to query the metrics using the k8s_container.

The proposed solution for this is to add a new resource type called PodContainerType for which we use the operator one_of to handle the fallback logic. However, given that the response might contain both k8s_pod and k8s_container metrics (time series), we're adding a post-processing step to consider k8s_container metrics if k8s_pod metrics are absent.

Also, to support resource label filters for custom metrics, we're adding resource.labels to the list of allowed prefixes for custom metrics.

@CatherineF-dev
Copy link
Contributor

CatherineF-dev commented Nov 5, 2024

qq: will this bring some breaking changes?

Currently, the latest customer-metrics-stackdriver-adapter is deployed on all existing kubernetes versions. Because users apply the latest production yaml inside this repo.

@nabil-dbz
Copy link
Author

nabil-dbz commented Nov 5, 2024

I don't expect this to be breaking anything unless I'm missing some details that I'm not aware of. It would probably be a better idea to have an expert review this in detail. I tested the changes locally with Workload Autoscaler and this looked to be working just fine.

Let me know if you suspect my changes to be breaking things

@raywainman
Copy link

I will take a close look at this too, adding this to my list for this week. Thanks @nabil-dbz!

@@ -349,6 +311,27 @@ func (p *StackdriverProvider) ListAllExternalMetrics() []provider.ExternalMetric
}
}

func (p *StackdriverProvider) PostProcessPodContainerResp(response *stackdriver.ListTimeSeriesResponse, metricName string) (*stackdriver.ListTimeSeriesResponse, error) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should really document what this does because modifying a response object here could be really confusing.

I was initially going to suggest not returning a ListTimeSeriesResponse here but I see that this would cause a bunch of issues with the follow-up calls.

@@ -41,7 +41,7 @@ var (
allowedExternalMetricsFullLabelNames = []string{"resource.type", "reducer"}
// allowedCustomMetricsLabelPrefixes and allowedCustomMetricsFullLabelNames specify all metric labels allowed for querying
allowedCustomMetricsLabelPrefixes = []string{"metric.labels"}
allowedCustomMetricsFullLabelNames = []string{"reducer"}
allowedCustomMetricsFullLabelNames = []string{"resource.labels.container_name", "reducer"}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can omit resource.labels here, can't you? since it will be picked up by the prefix above?

@@ -107,6 +113,8 @@ func NewFilterBuilder(resourceType string) FilterBuilder {
switch resourceType {
case PodType:
schema = PodSchema
case PodContainerType:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we named this something like PodOrContainerType?

Otherwise this gets confusing :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants