Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define debug metrics group #2779

Open
6 tasks
lambdanis opened this issue Aug 8, 2024 · 1 comment
Open
6 tasks

Define debug metrics group #2779

lambdanis opened this issue Aug 8, 2024 · 1 comment
Labels
area/metrics Related to prometheus metrics

Comments

@lambdanis
Copy link
Contributor

lambdanis commented Aug 8, 2024

Separate metrics monitoring Tetragon health (used by operators) from metrics exposing details useful for debugging (used mainly by Tetragon developers, potentially high-cardinality). The idea is to disable the latter by default, to reduce the default metrics cardinality and performance overhead.

See Tetragon metrics framework for more context.

  • Define debug metrics group (unconstrained). See how health metrics group is defined: https://github.com/cilium/tetragon/blob/main/pkg/metricsconfig/healthmetrics.go
  • Identify debug metrics within the health group and move them into debug group. This would probably include:
    • metrics documented as "for internal use only"
    • metrics with unconstrained cardinality, e.g. "kprobe" label
    • any other metrics intended for Tetragon developers rather than operators
  • Move debug metrics to a separate endpoint (breaking change)
  • Disable debug metrics by default (breaking change)
  • Adjust how metrics docs are generated
  • Remove "For internal use only" annotation from the metrics help texts. The fact of being in the debug group indicates whether a metric is considered "internal".

After this is done, health metrics group should be marked as constrained.

@lambdanis lambdanis added the area/metrics Related to prometheus metrics label Aug 8, 2024
@lambdanis
Copy link
Contributor Author

Identified debug metrics

(not a complete list)

  • tetragon_bpf_missed_events_total

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/metrics Related to prometheus metrics
Projects
Development

No branches or pull requests

1 participant