Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metric: migrate all histograms to use prometheus-backed version #1

Closed

Conversation

aadityasondhi
Copy link
Owner

In a previous change, a new prometheus-backed histogram library was
introduced to help standardize histogram buckets across the codebase.
This change migrates all existing histograms to use the new library.

related: cockroachdb#85990

Release justification: low risk, high benefit

Release note (ops change): This change introduces a new histogram
implementation that will reduce the total number of buckets and
standardize them across all usage. This should help increase the
usability of histograms when exported to a UI (i.e. Grafana).

This change builds on the previous one and adds a function to export
quantiles from the Prometheus-based histogram. This functionality is
used to store histogram data in the internal timeseries database. The
hdr library came with a function to do this, while Prometheus does not
have a public API for exporting quantiles.

The function implemented here is very similar to the one found
internally in Prometheus, using linear interpolation to calculate values
at a given quantile.

This commit also includes some additional testing and general
refactoring of the metrics code.

Release note: None

Release justification:
In a previous change, a new prometheus-backed histogram library was
introdced to help standardize histogram buckets across the codebase.
This change migrates all existing histograms to use the new library.

related: cockroachdb#85990

Release justification: low risk, high benefit
Release note (ops change): This change introduces a new histogram
implementation that will reduce the total number of buckets and
standardize them across all usage. This should help increase the
usability of histograms when exxported to a UI (i.e. Grafana).
aadityasondhi pushed a commit that referenced this pull request May 8, 2024
For some reason, `StopServiceForVirtualCluster` fails with this error on
drt clusters:

```
20:23:41 node_kill.go:51: operation status: killing node 1  with signal 15
20:23:41 cluster.go:2148: stoping virtual cluster
20:23:41 operation_impl.go:128: operation failure #1: no service for virtual cluster ""
```

The debug message has a bug, the virtual cluster is set to "system" but it
seems like the service discovery process isn't able to determine the cockroach
process based on dns settings in the drt project. This change makes the
node-kill operation more dns-agnostic by looking for the cockroach process.

Epic: none

Release note: None
aadityasondhi pushed a commit that referenced this pull request May 8, 2024
123517: roachtest: move node-kill operation to pkill/pgrep-based kill approach r=renatolabs a=itsbilal

For some reason, `StopServiceForVirtualCluster` fails with this error on drt clusters:

```
20:23:41 node_kill.go:51: operation status: killing node 1  with signal 15
20:23:41 cluster.go:2148: stoping virtual cluster
20:23:41 operation_impl.go:128: operation failure #1: no service for virtual cluster ""
```

The debug message has a bug, the virtual cluster is set to "system" but it seems like the service discovery process isn't able to determine the cockroach process based on dns settings in the drt project. This change makes the node-kill operation more dns-agnostic by looking for the cockroach process.

Epic: none

Release note: None

Co-authored-by: Bilal Akhtar <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant