Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a metric to expose non-ready machines or errors #1550

Open
renchap opened this issue Feb 2, 2023 · 4 comments
Open

Add a metric to expose non-ready machines or errors #1550

renchap opened this issue Feb 2, 2023 · 4 comments
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@renchap
Copy link

renchap commented Feb 2, 2023

Use-case:

We are deploying machines on Hetzner, and sometimes its not possible to create the machine due to account limit on resources:

 machine_controller.go:383] Failed to reconcile machine "xxx-m-1-68c6cd6957-6hk94": failed to create machine at cloudprovider, due to failed to create server, due to core limit exceeded (resource_limit_exceeded)    

It would be very useful to have a metric to monitor for this, and be able to have an alert when machines have been scheduler but are not successfully created.

@embik embik added the kind/feature Categorizes issue or PR as related to a new feature. label Feb 2, 2023
@rajaSahil
Copy link

@embik What needs to be done here, do we need to add metrics based on error reason and attach provider status?

@embik
Copy link
Member

embik commented Nov 18, 2024

@rajaSahil it's probably a bad idea to include reasons in metrics (as that can explode cardinality). The easiest way to solve this would probably be by exposing the machine counts in a MachineDeployment status (there should be different fields there with "ready" machines, created machines, etc) as metrics.

@rajaSahil
Copy link

@rajaSahil it's probably a bad idea to include reasons in metrics (as that can explode cardinality).

@embik I agree, we should not add reasons in metrics.

The easiest way to solve this would probably be by exposing the machine counts in a MachineDeployment status (there should be different fields there with "ready" machines, created machines, etc) as metrics.

Sure, I will take a look at it. Thanks!

@rajaSahil
Copy link

/assign

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

3 participants