Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overview of grafana and prometheus related issues #2214

Closed
consideRatio opened this issue Feb 16, 2023 · 4 comments
Closed

Overview of grafana and prometheus related issues #2214

consideRatio opened this issue Feb 16, 2023 · 4 comments
Labels
Engineering:SRE Cloud infrastructure operations and development. tech:grafana tech:prometheus

Comments

@consideRatio
Copy link
Contributor

consideRatio commented Feb 16, 2023

There are several tickets relates to grafana and prometheus that seems relevant to overview, so this is a meta-issue overviewing those.

Grafana

jupyterhub/grafana-dashboards

Prometheus

@consideRatio consideRatio added the Engineering:SRE Cloud infrastructure operations and development. label Feb 16, 2023
@consideRatio consideRatio changed the title Overview of grafana dashboarding issues Overview of grafana and prometheus related issues Feb 22, 2023
@pnasrat
Copy link
Contributor

pnasrat commented Feb 22, 2023

Just adding a few useful references on monitoring

It might be useful to sketch out the system and think through key metrics for end to end debugging

@yuvipanda
Copy link
Member

I tried to collect RED metrics from all user notebooks, and found that it is a bit too much - tracked in the links in berkeley-dsep-infra/datahub#1993. Prometheus crashes quickly.

However, I think we already do collect and store RED metrics from the hub, and nginx-ingress. We should expose these.

@colliand
Copy link
Contributor

I learned about RED metrics today.

Rate, Error, Duration

@consideRatio
Copy link
Contributor Author

I think this may no longer relevant enough to keep open, most issues are closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Engineering:SRE Cloud infrastructure operations and development. tech:grafana tech:prometheus
Projects
No open projects
Status: Needs Shaping / Refinement
Development

No branches or pull requests

4 participants