-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the fallback for container metrics logic to query both container and pod metrics #789
base: master
Are you sure you want to change the base?
Fix the fallback for container metrics logic to query both container and pod metrics #789
Conversation
qq: will this bring some breaking changes? Currently, the latest customer-metrics-stackdriver-adapter is deployed on all existing kubernetes versions. Because users apply the latest production yaml inside this repo. |
…starting with resource.labels
I don't expect this to be breaking anything unless I'm missing some details that I'm not aware of. It would probably be a better idea to have an expert review this in detail. I tested the changes locally with Workload Autoscaler and this looked to be working just fine. Let me know if you suspect my changes to be breaking things |
I will take a close look at this too, adding this to my list for this week. Thanks @nabil-dbz! |
@@ -349,6 +311,27 @@ func (p *StackdriverProvider) ListAllExternalMetrics() []provider.ExternalMetric | |||
} | |||
} | |||
|
|||
func (p *StackdriverProvider) PostProcessPodContainerResp(response *stackdriver.ListTimeSeriesResponse, metricName string) (*stackdriver.ListTimeSeriesResponse, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should really document what this does because modifying a response object here could be really confusing.
I was initially going to suggest not returning a ListTimeSeriesResponse here but I see that this would cause a bunch of issues with the follow-up calls.
@@ -41,7 +41,7 @@ var ( | |||
allowedExternalMetricsFullLabelNames = []string{"resource.type", "reducer"} | |||
// allowedCustomMetricsLabelPrefixes and allowedCustomMetricsFullLabelNames specify all metric labels allowed for querying | |||
allowedCustomMetricsLabelPrefixes = []string{"metric.labels"} | |||
allowedCustomMetricsFullLabelNames = []string{"reducer"} | |||
allowedCustomMetricsFullLabelNames = []string{"resource.labels.container_name", "reducer"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can omit resource.labels
here, can't you? since it will be picked up by the prefix above?
@@ -107,6 +113,8 @@ func NewFilterBuilder(resourceType string) FilterBuilder { | |||
switch resourceType { | |||
case PodType: | |||
schema = PodSchema | |||
case PodContainerType: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we named this something like PodOrContainerType
?
Otherwise this gets confusing :(
The main idea here is to avoid failure when querying pod level metrics throw an error. For example for the metric type is
kubernetes.io/container/accelerator/duty_cycle
, this query:Throws the following error:
Which makes the adapter return an error before trying to query the metrics using the
k8s_container
.The proposed solution for this is to add a new resource type called
PodContainerType
for which we use the operatorone_of
to handle the fallback logic. However, given that the response might contain bothk8s_pod
andk8s_container
metrics (time series), we're adding a post-processing step to considerk8s_container
metrics ifk8s_pod
metrics are absent.Also, to support resource label filters for custom metrics, we're adding
resource.labels
to the list of allowed prefixes for custom metrics.