add and emit pool owner metadata for alerting #327

gmdfalk · 2023-06-27T16:59:22Z

We'd like to have pool_owner metadata on each pool for both informational and alerting purposes.
The new attribute on the pool config, pool_owner, defaults to compute_infra.

Signed-off-by: Max Falk [email protected]

Description

Please fill out!

Testing Done

Please fill out! Generally speaking any new features should include
additional unit or integration tests to ensure the behaviour is
working correctly.

Signed-off-by: Max Falk <[email protected]>

nemacysts

minor nits, but nothing blocking :)

acceptance/srv-configs/clusterman-clusters/local-dev/default.mesos

nemacysts · 2023-06-27T17:38:09Z

clusterman/autoscaler/autoscaler.py

@@ -181,7 +181,7 @@ def run(self, dry_run: bool = False, timestamp: Optional[arrow.Arrow] = None) ->
            self.target_capacity_gauge.set(new_target_capacity, {"dry_run": dry_run})
            self.max_capacity_gauge.set(
                self.pool_manager.max_capacity,
-                {"dry_run": dry_run, "alert_on_max_capacity": self.pool_manager.alert_on_max_capacity},
+                {"dry_run": dry_run, "alert_on_max_capacity": self.pool_manager.alert_on_max_capacity, "team": self.pool_manager.pool_owner},


should we s/team/pool_owner to keep things consistent between config and metric labeling?

I knowingly set it to team because that's what our alertmanager automatically resolves for ticketing/alerting.
If we don't set it to team here, we'll have to relabel in the prometheus query for the max_pool_capacity alerting.

Still want to keep it at pool_owner? @nemacysts @jfongatyelp

oh, that's fine then - might be good to add that in a comment here tho since i'm not sure how many people are aware of that (or maybe i'm just the only one that didn't know this :p)

Signed-off-by: Max Falk <[email protected]>

nemacysts · 2023-07-05T21:43:02Z

clusterman/autoscaler/autoscaler.py

+                {
+                    "dry_run": dry_run,
+                    "alert_on_max_capacity": self.pool_manager.alert_on_max_capacity,
+                    "team": self.pool_manager.pool_owner,


@gmdfalk actually, I was just talking to @EmanElsaban about another alert we wanted to use internally and we realized that having a team label on all of the clusterman metrics would be useful - thoughts on adding the label to everything else in this PR?

@nemacysts Yeah, probably makes sense!

In that case, i think we should be able to add the team label to autoscaler._emit_requested_resource_metrics and drainer._emit_draining_metrics to tag onto most if not all metrics emitted?

++ - if we have a centralized place that sounds even better!

I'm a little torn about whether or not this would be of use for the drainer metrics: is there any case where we're not at fault for the drainer not working and we'd want pool owners to deal with drainer pages?

although, i suppose it would be useful for filtering regardless :)

* master: CLUSTERMAN-812: upgrade k8s client library (#334) Revert "Revert "Add support to use in-cluster service account (#338)"" (#340) Revert "Add support to use in-cluster service account (#338)" Add support to use in-cluster service account (#338) Updated PATH variable in supervisod.conf Grant home dir permission to user nobody Failing on signal errors pin cryptography for acceptance venv bump yelp-batch to 11.2.7 using right targz Adding new targz with metrics fix Adding old acceptance test Revert "Revert "Upgrading clusterman image to Ubuntu Jammy""

Signed-off-by: Max Falk <[email protected]>

This reverts commit 2595b5e.

* Revert "add and emit pool owner metadata for alerting (#327)" This reverts commit 2595b5e. * Revert "CLUSTERMAN-812: upgrade k8s client library (#334)" This reverts commit 6c4b8bb.

add and emit pool owner metadata for alerting

6934faa

Signed-off-by: Max Falk <[email protected]>

gmdfalk requested review from jfongatyelp, nemacysts and wilmer05 June 27, 2023 17:00

nemacysts approved these changes Jun 27, 2023

View reviewed changes

jfongatyelp approved these changes Jun 28, 2023

View reviewed changes

gmdfalk and others added 3 commits June 29, 2023 10:29

newline

d5a6c23

tests failing locally

f32e387

Signed-off-by: Max Falk <[email protected]>

fix formatting

327c8fa

Signed-off-by: Max Falk <[email protected]>

nemacysts approved these changes Jul 5, 2023

View reviewed changes

nemacysts reviewed Jul 5, 2023

View reviewed changes

gmdfalk added 7 commits August 21, 2023 09:18

add team label to all autoscaler metrics

553ee0d

Signed-off-by: Max Falk <[email protected]>

update tests

53fcf7c

Signed-off-by: Max Falk <[email protected]>

formatting

b912f4b

Signed-off-by: Max Falk <[email protected]>

dummy commit

69d9308

Signed-off-by: Max Falk <[email protected]>

precommit

ffc4439

Signed-off-by: Max Falk <[email protected]>

use team instead of pool_owner for emitted metrics

987f308

Signed-off-by: Max Falk <[email protected]>

gmdfalk merged commit 2595b5e into master Oct 13, 2023
2 of 5 checks passed

nemacysts added a commit that referenced this pull request Oct 18, 2023

Revert "add and emit pool owner metadata for alerting (#327)"

499a26e

This reverts commit 2595b5e.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add and emit pool owner metadata for alerting #327

add and emit pool owner metadata for alerting #327

gmdfalk commented Jun 27, 2023 •

edited

Loading

nemacysts left a comment

nemacysts Jun 27, 2023

gmdfalk Jun 29, 2023

nemacysts Jun 29, 2023

nemacysts Jul 5, 2023

gmdfalk Jul 6, 2023 •

edited

Loading

nemacysts Jul 7, 2023

nemacysts Jul 7, 2023

add and emit pool owner metadata for alerting #327

add and emit pool owner metadata for alerting #327

Conversation

gmdfalk commented Jun 27, 2023 • edited Loading

Description

Testing Done

nemacysts left a comment

Choose a reason for hiding this comment

nemacysts Jun 27, 2023

Choose a reason for hiding this comment

gmdfalk Jun 29, 2023

Choose a reason for hiding this comment

nemacysts Jun 29, 2023

Choose a reason for hiding this comment

nemacysts Jul 5, 2023

Choose a reason for hiding this comment

gmdfalk Jul 6, 2023 • edited Loading

Choose a reason for hiding this comment

nemacysts Jul 7, 2023

Choose a reason for hiding this comment

nemacysts Jul 7, 2023

Choose a reason for hiding this comment

gmdfalk commented Jun 27, 2023 •

edited

Loading

gmdfalk Jul 6, 2023 •

edited

Loading