Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Azure] [container_registry] Incorrect azure.timegrain field value #7162

Closed
tetianakravchenko opened this issue Jul 27, 2023 · 6 comments
Closed
Assignees
Labels

Comments

@tetianakravchenko
Copy link
Contributor

container_registry:

"azure": {
      "subscription_id": "0e073ec1-c22f-4488-adde-da35ed609ccd",
      "timegrain": "PT5M",
      "resource": {
        "name": "XXX",
        "id": "/subscriptions/XXXX/resourceGroups/XXX/providers/Microsoft.ContainerRegistry/registries/XXX",
        "type": "Microsoft.ContainerRegistry/registries",
        "tags": {
          "sometag": "somevalue"
        },
        "group": "XXX"
      },
      "namespace": "Microsoft.ContainerRegistry/registries",
      "container_registry": {
        "storage_used": {
          "avg": 0
        }
      }
    }
  },

correct timegrain value for storage_used metric should be 1H - https://github.com/elastic/integrations/blob/main/packages/azure_metrics/data_stream/container_registry/agent/stream/stream.yml.hbs#L47-L49, bu it is a 5M for some reason

this metric is actually reported, but in unexpected way:
it reported with delay 1h, each 5min:
Screenshot 2023-07-26 at 14 32 26

It seems to be wrong

cc @tommyers-elastic @zmoog

@tommyers-elastic
Copy link
Contributor

yeh this looks like a bug somewhere, we'll look into it.

@zmoog
Copy link
Contributor

zmoog commented Aug 28, 2023

@tetianakravchenko, I am running some tests on this case to understand better what's going on.

First, the behavior of getting PT1H metrics every 5 minutes with 1 hour of delay is, unfortunately, a side effect of how the metricset works. It's an issue because it collects more metrics than needed but is less problematic.

Second, the azure.container_registry.storage_used.avg field with a PT5M timegrain is wrong, and we should fix it as soon as possible. However, I tried to reproduce this issue with the Agent integration version 1.0.24 and stack 8.10-SNAPSHOT, but I am getting the metrics with the expected PT1H time grain:

CleanShot 2023-08-28 at 01 53 33@2x

Can you share more about your settings (integration options, Agent, and stack versions)?

@tetianakravchenko
Copy link
Contributor Author

tetianakravchenko commented Sep 1, 2023

I actually indeed can't reproduce it now 😕 tested with stack version 8.8.2 and 8.9.1
Screenshot 2023-09-01 at 12 07 59

Sorry, it could be my mistake somehow. I couldn't find a screenshot of the document itself to proof that it was 5M

First, the behavior of getting PT1H metrics every 5 minutes with 1 hour of delay is, unfortunately, a side effect of how the metricset works. It's an issue because it collects more metrics than needed but is less problematic.

should I create a dedicated issue for this one? or it is fine to keep this issue for that?
Also with this might be another issue (tested and reproduced on stack 8.9.1):

Screenshot 2023-09-01 at 16 19 59

For example: notice Sep 1, 2023 @ 14:15:00.000 - there are 2 documents with this timestamp (for the same azure.resource.name), but those 2 documents have different ingest time - that is 1 hour apart, one with Sep 1, 2023 @ 15:15:34.000 and another with Sep 1, 2023 @ 16:15:30.000.
Note that this behavior occurs only after > 1h of running this integration. Have you seen the same when testing?

@zmoog
Copy link
Contributor

zmoog commented Sep 3, 2023

First, the behavior of getting PT1H metrics every 5 minutes with 1 hour of delay is, unfortunately, a side effect of how the metricset works. It's an issue because it collects more metrics than needed but is less problematic.

should I create a dedicated issue for this one? or it is fine to keep this issue for that?

I would create a different issue for this problem for two reasons:

  • The two problems are not related;
  • The one-hour delay with more values than needed is wrong but less critical than reporting the incorrect time grain.

Also with this might be another issue (tested and reproduced on stack 8.9.1):

This is a different angle to the same problem that's causing the one-hour delay on the metrics with PT1H time grain.

What about creating an issue related to PT1H time grain problems? WDYT?

@zmoog
Copy link
Contributor

zmoog commented Sep 3, 2023

Sorry, it could be my mistake somehow. I couldn't find a screenshot of the document itself to proof that it was 5M.

No worries, I trust your assessment even if you can't find a screenshot. I need your help to reproduce the issue.

@tetianakravchenko
Copy link
Contributor Author

@zmoog I've created this issue as a follow up for PT1H time grain problems - #7646

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants