Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lifecycle of metrics that cannot be initialised by setting to 0 #20

Open
antje-s opened this issue Aug 19, 2024 · 1 comment
Open

Lifecycle of metrics that cannot be initialised by setting to 0 #20

antje-s opened this issue Aug 19, 2024 · 1 comment

Comments

@antje-s
Copy link
Collaborator

antje-s commented Aug 19, 2024

How do we want to keep the metric portfolio clean over long service periods?

Example:
The metric wmo_wis2_gc_dataserver_status_flag (labels: centre_id|dataserver|report_by) has two values with assigned states for the respective data server.
Scenarios:

  1. If a data server is replaced by a new one, the old metric remains until the GC is restarted. If the last download did not work, the status remains as error.
  2. If a WIS2 Node is no longer in operation, the metrics for the data server would be included until the next GC restart.
  3. When a WIS2 Node switches to inline data, the metric for the dataserver status is no longer updated (if inline content is ok and is used/preferred). The status is only set when other messages are received for products that do not contain inline data.

This could be prevented by regularly deleting the metric after a certain period of time (e.g. 24h/1week) or adding a time label and deletion after a certain time period without changes (e.g. 24h/1week) . In this case, metrics are not consistently available for all data servers. However, the series are also interrupted by a restart of the Global Service.

@kaiwirt
Copy link
Collaborator

kaiwirt commented Dec 12, 2024

I would suggest to clear metrics, that are outdated after 24 hours. One metric that falls into this category is wmo_wis2_gc_dataserver_status_flag which should be removed if the given dataserver has not been connected to in the last 24 hours. A connection attempt (failed or successful) keeps this metric alive for that dataserver.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants