Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[metricbeat] [gcp] group metrics by dimensions #36682

Merged
merged 12 commits into from
Nov 14, 2023

Conversation

gpop63
Copy link
Contributor

@gpop63 gpop63 commented Sep 26, 2023

Why

The existing metrics grouping logic does not play well with TSDB, and we need to adjust the grouping logic to avoid data loss when we enable TSDB.

What

Overview

  • Unify the grouping logic
  • Add a new event.batch_id field
  • Renaming

Unify the grouping logic

Unify the grouping logic for all metricsets to group metrics by the fields @timestamp and a selection of ECS and label fields.

Here's the complete list of the fields we are using for grouping:

  • @timestamp
  • cloud.account.id
  • cloud.availability_zone
  • cloud.instance.id
  • cloud.provider
  • cloud.region
  • All of Labels fields

We can use these fields as dimensions in the TSDB configuration.

Add a new event.batch_id field

Each GCP metric has a variable ingest delay. For example, container memory usage is available immediately, with a zero ingest delay; instead, container CPU usage is available with a 2-minute ingest delay. So, collecting memory and CPU usage for the same timestamp requires multiple collections (by default, the metricset collects metrics every 60 seconds).

Metrics like memory and CPU usage have identical dimension values for the same container. However, the metricset can't group them since it collected them over different collections due to the ingest delay. If unhandled, this situation can cause data loss when TSDB is enabled. More details are available in a GitHub issue comment.

To address this problem, the metricset adds a new event.batch_id field that we can use as a dimension. The metricset generates a UUID as batch ID during each collection and stores it in the event.batch_id field.

Renaming

I changed the names of some structs, functions, and variables. I intended to clarify the role and purpose, but I may be biased. I will pick different names to revert to the original values if they don't improve clarity.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Author's Checklist

  • [ ]

Related issues

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Sep 26, 2023
@mergify mergify bot assigned gpop63 Sep 26, 2023
@mergify
Copy link
Contributor

mergify bot commented Sep 26, 2023

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @gpop63? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v8./d.0 is the label to automatically backport to the 8./d branch. /d is the digit

@elasticmachine
Copy link
Collaborator

elasticmachine commented Sep 26, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-11-02T22:34:49.107+0000

  • Duration: 53 min 15 sec

Test stats 🧪

Test Results
Failed 0
Passed 1566
Skipped 96
Total 1662

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@zmoog zmoog force-pushed the gcp_group-metrics branch 3 times, most recently from a6937fe to c083a9d Compare October 4, 2023 11:33
@zmoog zmoog added the Team:Cloud-Monitoring Label for the Cloud Monitoring team label Oct 4, 2023
@zmoog zmoog requested a review from a team October 4, 2023 18:31
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Oct 4, 2023
@zmoog zmoog marked this pull request as ready for review October 4, 2023 18:36
@zmoog zmoog requested a review from a team as a code owner October 4, 2023 18:36
@zmoog zmoog requested review from endorama and kaiyan-sheng October 4, 2023 18:36
@zmoog
Copy link
Contributor

zmoog commented Oct 4, 2023

@gpop63, I can't add you as a reviewer since you're the original author of this PR, but please consider yourself a reviewer! 😇

@zmoog
Copy link
Contributor

zmoog commented Oct 13, 2023

@kaiyan-sheng, I learned you are reviewing the PR under the radar! 😄

You suggest delaying the longest ingest delay metrics collection to avoid adding the batch ID. It makes sense, and it's something we evaluated during development.

Since some metrics have a 5-minute delay, we opted for the batch ID, but we're open to taking a different trade-off.

Please add here your thoughts on this topic!

Copy link
Contributor

@kaiyan-sheng kaiyan-sheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay on reviewing this PR! I was thinking of finding the largest ingest delay among all metrics and then using that largest delay to calculate startTime and endTime in getTimeIntervalAligner function. This way we should get all the data points in a single collection to avoid using metric_names_fingerprint.

--- a/x-pack/metricbeat/module/gcp/metrics/metrics_requester.go
+++ b/x-pack/metricbeat/module/gcp/metrics/metrics_requester.go
@@ -79,6 +79,14 @@ func (r *metricsRequester) Metrics(ctx context.Context, serviceName string, alig
        var wg sync.WaitGroup
        results := make([]timeSeriesWithAligner, 0)
 
+       largestDelay := 0 * time.Second
+       for _, meta := range metricsToCollect {
+               metricMeta := meta
+               if metricMeta.ingestDelay > largestDelay {
+                       largestDelay = metricMeta.ingestDelay
+               }
+       }
+
        for mt, meta := range metricsToCollect {
                wg.Add(1)
 
@@ -87,7 +95,7 @@ func (r *metricsRequester) Metrics(ctx context.Context, serviceName string, alig
                        defer wg.Done()
 
                        r.logger.Debugf("For metricType %s, metricMeta = %d,  aligner = %s", mt, metricMeta, aligner)
-                       interval, aligner := getTimeIntervalAligner(metricMeta.ingestDelay, metricMeta.samplePeriod, r.config.period, aligner)
+                       interval, aligner := getTimeIntervalAligner(largestDelay, metricMeta.samplePeriod, r.config.period, aligner)

--- a/x-pack/metricbeat/module/gcp/metrics/timeseries.go
+++ b/x-pack/metricbeat/module/gcp/metrics/timeseries.go
@@ -9,8 +9,6 @@ import (
        "crypto/sha256"
        "encoding/hex"
        "fmt"
-       "strings"
-
        "github.com/elastic/beats/v7/metricbeat/mb"
        "github.com/elastic/beats/v7/x-pack/metricbeat/module/gcp"
        "github.com/elastic/elastic-agent-libs/mapstr"
@@ -145,9 +143,9 @@ func createEventsFromGroups(service string, groups map[string][]KeyValuePoint) [
                // Hashes metric names string using SHA-256 to always have
                // a constant length value and avoid overflowing the
                // current TSDB dimension field limit (1024).
-               metricNamesHash := hash(strings.Join(metricNames, ","))
-
-               _, _ = event.RootFields.Put("event.metric_names_hash", metricNamesHash)
+               //metricNamesHash := hash(strings.Join(metricNames, ","))
+               //
+               //_, _ = event.ModuleFields.Put("metric_names_fingerprint", metricNamesHash)

@gpop63 tried out the largestDelay and seems to work for TSDB (Thank you so much for testing!):

All 39971 documents taken from index .ds-metrics-gcp.gke-default-2023.10.31-000001 were successfully placed to index tsdb-index-enabled.
All 1792 documents taken from index .ds-metrics-gcp.compute-default-2023.10.31-000001 were successfully placed to index tsdb-index-enabled.

Just trying to figure out a way of fixing this issue without introducing a new field 🙂

@kaiyan-sheng kaiyan-sheng self-requested a review November 1, 2023 02:04
Copy link
Member

@endorama endorama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all the changes are sound and reasonable. As I'm not working on this actively and I have little familiarity with TSDB I'm just commenting with my 👍, not adding my approval.

@zmoog
Copy link
Contributor

zmoog commented Nov 2, 2023

I was thinking of finding the largest ingest delay among all metrics and then using that largest delay to calculate startTime and endTime in getTimeIntervalAligner function. This way we should get all the data points in a single collection to avoid using metric_names_fingerprint.

Yep, I get it. As said, this was one of the options on the table 😇

The single aspect that made me opt-in to adding the metrics names field is that Elasticsearch will use the same approach to handle this issue transparently for the users; some other metricsets—like Prometheus—have the same problem of collecting metrics for the same data point over multiple collections.

However, GCP reliably gives us the metadata (delay and sampling timings), and it provides us with an opportunity other metricsets do not have.

Since time intervals are calculated per-metric basis (using the ingest delay and sampling period), I expect the same results. And @gpop63's tests confirm this is the case.

If we all agree it's the best option, I will apply the change to the branch and rebuild a custom agent so we can run a final round of tests on all the metric sets to double-check. @gpop63, WDYT?

Just trying to figure out a way of fixing this issue without introducing a new field 🙂

@kaiyan-sheng I see, and I really appreciate it! Multiple perspectives and evaluation criteria constantly improve the final result!

gpop63 and others added 9 commits November 2, 2023 18:36
The dimensionsKey contains all dimension fields value we want to use
to group the time series.

We need to add the timestamp to the key, so we only group together time
series with the same timestamp.
# Update grouping key

The dimensionsKey contains all dimension fields values we want to use
to group the time series.

We need to add the timestamp to the key, so we only group time
series with the same timestamp.

# Add `event.created` field

We need to add an extra dimension to avoid data loss on TSDB
since GCP metrics with the same @timestamp become visible with
different "ingest delay".

For the full context, read elastic/integrations#6568 (comment)

# Drop ID() function

Remove the `ID()` function from the Metadata Collector.

Since we are unifying the metric grouping logic for all metric types, we
don't need to keep the `ID()` function anymore.

# Renaming

I also renamed some structs, functions, and variables with the purpose
of making their role and purpose more clear.

We can remove this part if it does not improve clarity.
We cannot use the `event.created` field because TSDB does not allow
the `date` field type for dimension.
The `event.batch_id` with its random values is a wrong choice as a
dimension field for a time series database. It would create a new
time series at each iteration, which is terrible.

The `event.metric_names` will keep the values to a recurring set of
field names all having the same ingest delay.
zmoog added 2 commits November 2, 2023 18:36
A single GCP metric name can be quite long, for example:

  subscription.streaming_pull_mod_ack_deadline_message_operation.count

we may collect enough metrics to overflow the current
length limit for dimension fields.

We hash the metric names value using SHA-256 to have a constant length
value.

I'm not 100% sure SHA-256 is the best option for this use case; we
don't have cryptographic solid needs, so we can probably use a
simpler algorithm to save computing cycles while getting a shorter hash
value.

We also drop `event.metric_names` because is no longer needed.
Replace it with `gcp.metric_names_fingerprint` to align this metricset
with what we are doing on other metricsets (for example, the
prometheus metricset).
@zmoog zmoog force-pushed the gcp_group-metrics branch 3 times, most recently from ff3be23 to f655e50 Compare November 2, 2023 22:28
The metricset now collects metrics values using the largest ingest
delay interval instead of the individual shortest ingest.

By using the largest ingest delay, the metricset gets all the metrics
values for each data point in a single collection.

Drop the `gcp.metric_names_fingerprint` because it's no longer needed.
@zmoog zmoog force-pushed the gcp_group-metrics branch from f655e50 to 0d984da Compare November 2, 2023 22:34
@gpop63
Copy link
Contributor Author

gpop63 commented Nov 6, 2023

All tests seem to pass for the metrics data streams we target.

Tests:

gke

Testing data stream metrics-gcp.gke-default.
Index being used for the documents is .ds-metrics-gcp.gke-default-2023.11.06-000001.
Index being used for the settings and mappings is .ds-metrics-gcp.gke-default-2023.11.06-000001.

The time series fields for the TSDB index are: 
        - dimension (4 fields):
                - agent.id
                - cloud.account.id
                - cloud.account.name
                - gcp.labels_fingerprint
        - counter (9 fields):
                - gcp.gke.container.cpu.core_usage_time.sec
                - gcp.gke.container.memory.page_fault.count
                - gcp.gke.container.restart.count
                - gcp.gke.node.cpu.core_usage_time.sec
                - gcp.gke.node.network.received_bytes.count
                - gcp.gke.node.network.sent_bytes.count
                - gcp.gke.node_daemon.cpu.core_usage_time.sec
                - gcp.gke.pod.network.received.bytes
                - gcp.gke.pod.network.sent.bytes
        - gauge (31 fields):
                - gcp.gke.container.cpu.limit_cores.value
                - gcp.gke.container.cpu.limit_utilization.pct
                - gcp.gke.container.cpu.request_cores.value
                - gcp.gke.container.cpu.request_utilization.pct
                - gcp.gke.container.ephemeral_storage.limit.bytes
                - gcp.gke.container.ephemeral_storage.request.bytes
                - gcp.gke.container.ephemeral_storage.used.bytes
                - gcp.gke.container.memory.limit.bytes
                - gcp.gke.container.memory.limit_utilization.pct
                - gcp.gke.container.memory.request.bytes
                - gcp.gke.container.memory.request_utilization.pct
                - gcp.gke.container.memory.used.bytes
                - gcp.gke.container.uptime.sec
                - gcp.gke.node.cpu.allocatable_cores.value
                - gcp.gke.node.cpu.allocatable_utilization.pct
                - gcp.gke.node.cpu.total_cores.value
                - gcp.gke.node.ephemeral_storage.allocatable.bytes
                - gcp.gke.node.ephemeral_storage.inodes_free.value
                - gcp.gke.node.ephemeral_storage.inodes_total.value
                - gcp.gke.node.ephemeral_storage.total.bytes
                - gcp.gke.node.ephemeral_storage.used.bytes
                - gcp.gke.node.memory.allocatable.bytes
                - gcp.gke.node.memory.allocatable_utilization.pct
                - gcp.gke.node.memory.total.bytes
                - gcp.gke.node.memory.used.bytes
                - gcp.gke.node.pid_limit.value
                - gcp.gke.node.pid_used.value
                - gcp.gke.node_daemon.memory.used.bytes
                - gcp.gke.pod.volume.total.bytes
                - gcp.gke.pod.volume.used.bytes
                - gcp.gke.pod.volume.utilization.pct
        - routing_path (4 fields):
                - agent.id
                - cloud.account.id
                - cloud.account.name
                - gcp.labels_fingerprint

Index tsdb-index-enabled successfully created.

Copying documents from .ds-metrics-gcp.gke-default-2023.11.06-000001 to tsdb-index-enabled...
All 22295 documents taken from index .ds-metrics-gcp.gke-default-2023.11.06-000001 were successfully placed to index tsdb-index-enabled.

compute

Testing data stream metrics-gcp.compute-default.
Index being used for the documents is .ds-metrics-gcp.compute-default-2023.11.06-000001.
Index being used for the settings and mappings is .ds-metrics-gcp.compute-default-2023.11.06-000001.

The time series fields for the TSDB index are: 
        - dimension (9 fields):
                - agent.id
                - cloud.account.id
                - cloud.account.name
                - cloud.availability_zone
                - cloud.instance.id
                - cloud.instance.name
                - cloud.machine.type
                - cloud.region
                - gcp.labels_fingerprint
        - gauge (19 fields):
                - gcp.compute.firewall.dropped.bytes
                - gcp.compute.firewall.dropped_packets_count.value
                - gcp.compute.instance.cpu.reserved_cores.value
                - gcp.compute.instance.cpu.usage.pct
                - gcp.compute.instance.cpu.usage_time.sec
                - gcp.compute.instance.disk.read.bytes
                - gcp.compute.instance.disk.read_ops_count.value
                - gcp.compute.instance.disk.write.bytes
                - gcp.compute.instance.disk.write_ops_count.value
                - gcp.compute.instance.memory.balloon.ram_size.value
                - gcp.compute.instance.memory.balloon.ram_used.value
                - gcp.compute.instance.memory.balloon.swap_in.bytes
                - gcp.compute.instance.memory.balloon.swap_out.bytes
                - gcp.compute.instance.network.egress.bytes
                - gcp.compute.instance.network.egress.packets.count
                - gcp.compute.instance.network.ingress.bytes
                - gcp.compute.instance.network.ingress.packets.count
                - gcp.compute.instance.uptime.sec
                - gcp.compute.instance.uptime_total.sec
        - routing_path (9 fields):
                - agent.id
                - cloud.account.id
                - cloud.account.name
                - cloud.availability_zone
                - cloud.instance.id
                - cloud.instance.name
                - cloud.machine.type
                - cloud.region
                - gcp.labels_fingerprint

Index tsdb-index-enabled successfully created.

Copying documents from .ds-metrics-gcp.compute-default-2023.11.06-000001 to tsdb-index-enabled...
All 6948 documents taken from index .ds-metrics-gcp.compute-default-2023.11.06-000001 were successfully placed to index tsdb-index-enabled.

redis

Testing data stream metrics-gcp.redis-default.
Index being used for the documents is .ds-metrics-gcp.redis-default-2023.11.06-000001.
Index being used for the settings and mappings is .ds-metrics-gcp.redis-default-2023.11.06-000001.

The time series fields for the TSDB index are: 
        - dimension (7 fields):
                - agent.id
                - cloud.account.id
                - cloud.account.name
                - cloud.instance.id
                - cloud.instance.name
                - cloud.machine.type
                - gcp.labels_fingerprint
        - gauge (31 fields):
                - gcp.redis.clients.blocked.count
                - gcp.redis.clients.connected.count
                - gcp.redis.commands.calls.count
                - gcp.redis.commands.total_time.us
                - gcp.redis.commands.usec_per_call.sec
                - gcp.redis.keyspace.avg_ttl.sec
                - gcp.redis.keyspace.keys.count
                - gcp.redis.keyspace.keys_with_expiration.count
                - gcp.redis.persistence.rdb.bgsave_in_progress
                - gcp.redis.replication.master.slaves.lag.sec
                - gcp.redis.replication.master.slaves.offset.bytes
                - gcp.redis.replication.master_repl_offset.bytes
                - gcp.redis.replication.offset_diff.bytes
                - gcp.redis.replication.role
                - gcp.redis.server.uptime.sec
                - gcp.redis.stats.cache_hit_ratio
                - gcp.redis.stats.connections.total.count
                - gcp.redis.stats.cpu_utilization.sec
                - gcp.redis.stats.evicted_keys.count
                - gcp.redis.stats.expired_keys.count
                - gcp.redis.stats.keyspace_hits.count
                - gcp.redis.stats.keyspace_misses.count
                - gcp.redis.stats.memory.maxmemory.mb
                - gcp.redis.stats.memory.system_memory_overload_duration.us
                - gcp.redis.stats.memory.system_memory_usage_ratio
                - gcp.redis.stats.memory.usage.bytes
                - gcp.redis.stats.memory.usage_ratio
                - gcp.redis.stats.network_traffic.bytes
                - gcp.redis.stats.pubsub.channels.count
                - gcp.redis.stats.pubsub.patterns.count
                - gcp.redis.stats.reject_connections.count
        - routing_path (7 fields):
                - agent.id
                - cloud.account.id
                - cloud.account.name
                - cloud.instance.id
                - cloud.instance.name
                - cloud.machine.type
                - gcp.labels_fingerprint

Index tsdb-index-enabled successfully created.

Copying documents from .ds-metrics-gcp.redis-default-2023.11.06-000001 to tsdb-index-enabled...
All 2821 documents taken from index .ds-metrics-gcp.redis-default-2023.11.06-000001 were successfully placed to index tsdb-index-enabled.

pubsub

Testing data stream metrics-gcp.pubsub-default.
Index being used for the documents is .ds-metrics-gcp.pubsub-default-2023.11.06-000001.
Index being used for the settings and mappings is .ds-metrics-gcp.pubsub-default-2023.11.06-000001.

The time series fields for the TSDB index are: 
        - dimension (4 fields):
                - agent.id
                - cloud.account.id
                - cloud.account.name
                - gcp.labels_fingerprint
        - gauge (46 fields):
                - gcp.pubsub.snapshot.backlog.bytes
                - gcp.pubsub.snapshot.backlog_bytes_by_region.bytes
                - gcp.pubsub.snapshot.config_updates.count
                - gcp.pubsub.snapshot.num_messages.value
                - gcp.pubsub.snapshot.num_messages_by_region.value
                - gcp.pubsub.snapshot.oldest_message_age.sec
                - gcp.pubsub.snapshot.oldest_message_age_by_region.sec
                - gcp.pubsub.subscription.ack_message.count
                - gcp.pubsub.subscription.backlog.bytes
                - gcp.pubsub.subscription.byte_cost.bytes
                - gcp.pubsub.subscription.config_updates.count
                - gcp.pubsub.subscription.dead_letter_message.count
                - gcp.pubsub.subscription.mod_ack_deadline_message.count
                - gcp.pubsub.subscription.mod_ack_deadline_message_operation.count
                - gcp.pubsub.subscription.mod_ack_deadline_request.count
                - gcp.pubsub.subscription.num_outstanding_messages.value
                - gcp.pubsub.subscription.num_undelivered_messages.value
                - gcp.pubsub.subscription.oldest_retained_acked_message_age.sec
                - gcp.pubsub.subscription.oldest_retained_acked_message_age_by_region.value
                - gcp.pubsub.subscription.oldest_unacked_message_age.sec
                - gcp.pubsub.subscription.oldest_unacked_message_age_by_region.value
                - gcp.pubsub.subscription.pull_ack_message_operation.count
                - gcp.pubsub.subscription.pull_ack_request.count
                - gcp.pubsub.subscription.pull_message_operation.count
                - gcp.pubsub.subscription.pull_request.count
                - gcp.pubsub.subscription.push_request.count
                - gcp.pubsub.subscription.retained_acked.bytes
                - gcp.pubsub.subscription.retained_acked_bytes_by_region.bytes
                - gcp.pubsub.subscription.seek_request.count
                - gcp.pubsub.subscription.sent_message.count
                - gcp.pubsub.subscription.streaming_pull_ack_message_operation.count
                - gcp.pubsub.subscription.streaming_pull_ack_request.count
                - gcp.pubsub.subscription.streaming_pull_message_operation.count
                - gcp.pubsub.subscription.streaming_pull_mod_ack_deadline_message_operation.count
                - gcp.pubsub.subscription.streaming_pull_mod_ack_deadline_request.count
                - gcp.pubsub.subscription.streaming_pull_response.count
                - gcp.pubsub.subscription.unacked_bytes_by_region.bytes
                - gcp.pubsub.topic.byte_cost.bytes
                - gcp.pubsub.topic.config_updates.count
                - gcp.pubsub.topic.oldest_retained_acked_message_age_by_region.value
                - gcp.pubsub.topic.oldest_unacked_message_age_by_region.value
                - gcp.pubsub.topic.retained_acked_bytes_by_region.bytes
                - gcp.pubsub.topic.send_message_operation.count
                - gcp.pubsub.topic.send_request.count
                - gcp.pubsub.topic.streaming_pull_response.count
                - gcp.pubsub.topic.unacked_bytes_by_region.bytes
        - routing_path (4 fields):
                - agent.id
                - cloud.account.id
                - cloud.account.name
                - gcp.labels_fingerprint

Index tsdb-index-enabled successfully created.

Copying documents from .ds-metrics-gcp.pubsub-default-2023.11.06-000001 to tsdb-index-enabled...
All 12254 documents taken from index .ds-metrics-gcp.pubsub-default-2023.11.06-000001 were successfully placed to index tsdb-index-enabled.

cloudrun

Testing data stream metrics-gcp.cloudrun_metrics-default.
Index being used for the documents is .ds-metrics-gcp.cloudrun_metrics-default-2023.11.06-000001.
Index being used for the settings and mappings is .ds-metrics-gcp.cloudrun_metrics-default-2023.11.06-000001.

The time series fields for the TSDB index are: 
        - dimension (4 fields):
                - agent.id
                - cloud.account.id
                - cloud.account.name
                - gcp.labels_fingerprint
        - gauge (7 fields):
                - gcp.cloudrun_metrics.container.billable_instance_time
                - gcp.cloudrun_metrics.container.cpu.allocation_time.sec
                - gcp.cloudrun_metrics.container.instance.count
                - gcp.cloudrun_metrics.container.memory.allocation_time
                - gcp.cloudrun_metrics.container.network.received.bytes
                - gcp.cloudrun_metrics.container.network.sent.bytes
                - gcp.cloudrun_metrics.request.count
        - routing_path (4 fields):
                - agent.id
                - cloud.account.id
                - cloud.account.name
                - gcp.labels_fingerprint

Index tsdb-index-enabled successfully created.

Copying documents from .ds-metrics-gcp.cloudrun_metrics-default-2023.11.06-000001 to tsdb-index-enabled...
All 819 documents taken from index .ds-metrics-gcp.cloudrun_metrics-default-2023.11.06-000001 were successfully placed to index tsdb-index-enabled.

dataproc

Testing data stream metrics-gcp.dataproc-default.
Index being used for the documents is .ds-metrics-gcp.dataproc-default-2023.11.06-000001.
Index being used for the settings and mappings is .ds-metrics-gcp.dataproc-default-2023.11.06-000001.

The time series fields for the TSDB index are: 
        - dimension (4 fields):
                - agent.id
                - cloud.account.id
                - cloud.account.name
                - gcp.labels_fingerprint
        - gauge (18 fields):
                - gcp.dataproc.batch.spark.executors.count
                - gcp.dataproc.cluster.hdfs.datanodes.count
                - gcp.dataproc.cluster.hdfs.storage_capacity.value
                - gcp.dataproc.cluster.hdfs.storage_utilization.value
                - gcp.dataproc.cluster.hdfs.unhealthy_blocks.count
                - gcp.dataproc.cluster.job.failed.count
                - gcp.dataproc.cluster.job.running.count
                - gcp.dataproc.cluster.job.submitted.count
                - gcp.dataproc.cluster.operation.failed.count
                - gcp.dataproc.cluster.operation.running.count
                - gcp.dataproc.cluster.operation.submitted.count
                - gcp.dataproc.cluster.yarn.allocated_memory_percentage.value
                - gcp.dataproc.cluster.yarn.apps.count
                - gcp.dataproc.cluster.yarn.containers.count
                - gcp.dataproc.cluster.yarn.memory_size.value
                - gcp.dataproc.cluster.yarn.nodemanagers.count
                - gcp.dataproc.cluster.yarn.pending_memory_size.value
                - gcp.dataproc.cluster.yarn.virtual_cores.count
        - routing_path (4 fields):
                - agent.id
                - cloud.account.id
                - cloud.account.name
                - gcp.labels_fingerprint

Index tsdb-index-enabled successfully created.

Copying documents from .ds-metrics-gcp.dataproc-default-2023.11.06-000001 to tsdb-index-enabled...
All 2268 documents taken from index .ds-metrics-gcp.dataproc-default-2023.11.06-000001 were successfully placed to index tsdb-index-enabled.

storage

Testing data stream metrics-gcp.storage-default.
Index being used for the documents is .ds-metrics-gcp.storage-default-2023.11.06-000001.
Index being used for the settings and mappings is .ds-metrics-gcp.storage-default-2023.11.06-000001.

The time series fields for the TSDB index are: 
        - dimension (4 fields):
                - agent.id
                - cloud.account.id
                - cloud.account.name
                - gcp.labels_fingerprint
        - gauge (9 fields):
                - gcp.storage.api.request.count
                - gcp.storage.authz.acl_based_object_access.count
                - gcp.storage.authz.acl_operations.count
                - gcp.storage.authz.object_specific_acl_mutation.count
                - gcp.storage.network.received.bytes
                - gcp.storage.network.sent.bytes
                - gcp.storage.storage.object.count
                - gcp.storage.storage.total.bytes
                - gcp.storage.storage.total_byte_seconds.bytes
        - routing_path (4 fields):
                - agent.id
                - cloud.account.id
                - cloud.account.name
                - gcp.labels_fingerprint

Index tsdb-index-enabled successfully created.

Copying documents from .ds-metrics-gcp.storage-default-2023.11.06-000001 to tsdb-index-enabled...
All 954 documents taken from index .ds-metrics-gcp.storage-default-2023.11.06-000001 were successfully placed to index tsdb-index-enabled.

loadbalancing

Testing data stream metrics-gcp.loadbalancing_metrics-default.
Index being used for the documents is .ds-metrics-gcp.loadbalancing_metrics-default-2023.11.06-000001.
Index being used for the settings and mappings is .ds-metrics-gcp.loadbalancing_metrics-default-2023.11.06-000001.

The time series fields for the TSDB index are: 
        - dimension (4 fields):
                - agent.id
                - cloud.account.id
                - cloud.account.name
                - gcp.labels_fingerprint
        - gauge (19 fields):
                - gcp.loadbalancing_metrics.https.backend_request.bytes
                - gcp.loadbalancing_metrics.https.backend_request.count
                - gcp.loadbalancing_metrics.https.backend_response.bytes
                - gcp.loadbalancing_metrics.https.request.bytes
                - gcp.loadbalancing_metrics.https.request.count
                - gcp.loadbalancing_metrics.https.response.bytes
                - gcp.loadbalancing_metrics.l3.external.egress.bytes
                - gcp.loadbalancing_metrics.l3.external.egress_packets.count
                - gcp.loadbalancing_metrics.l3.external.ingress.bytes
                - gcp.loadbalancing_metrics.l3.external.ingress_packets.count
                - gcp.loadbalancing_metrics.l3.internal.egress.bytes
                - gcp.loadbalancing_metrics.l3.internal.egress_packets.count
                - gcp.loadbalancing_metrics.l3.internal.ingress.bytes
                - gcp.loadbalancing_metrics.l3.internal.ingress_packets.count
                - gcp.loadbalancing_metrics.tcp_ssl_proxy.closed_connections.value
                - gcp.loadbalancing_metrics.tcp_ssl_proxy.egress.bytes
                - gcp.loadbalancing_metrics.tcp_ssl_proxy.ingress.bytes
                - gcp.loadbalancing_metrics.tcp_ssl_proxy.new_connections.value
                - gcp.loadbalancing_metrics.tcp_ssl_proxy.open_connections.value
        - routing_path (4 fields):
                - agent.id
                - cloud.account.id
                - cloud.account.name
                - gcp.labels_fingerprint

Index tsdb-index-enabled successfully created.

Copying documents from .ds-metrics-gcp.loadbalancing_metrics-default-2023.11.06-000001 to tsdb-index-enabled...
All 896 documents taken from index .ds-metrics-gcp.loadbalancing_metrics-default-2023.11.06-000001 were successfully placed to index tsdb-index-enabled.

cloudsql postgresql

Testing data stream metrics-gcp.cloudsql_postgresql-default.
Index being used for the documents is .ds-metrics-gcp.cloudsql_postgresql-default-2023.11.06-000001.
Index being used for the settings and mappings is .ds-metrics-gcp.cloudsql_postgresql-default-2023.11.06-000001.

The time series fields for the TSDB index are: 
        - dimension (4 fields):
                - agent.id
                - cloud.account.id
                - cloud.account.name
                - gcp.labels_fingerprint
        - counter (16 fields):
                - gcp.cloudsql_postgresql.database.insights.aggregate.execution_time
                - gcp.cloudsql_postgresql.database.insights.aggregate.io_time
                - gcp.cloudsql_postgresql.database.insights.aggregate.latencies
                - gcp.cloudsql_postgresql.database.insights.aggregate.lock_time
                - gcp.cloudsql_postgresql.database.insights.aggregate.row.count
                - gcp.cloudsql_postgresql.database.insights.aggregate.shared_blk_access.count
                - gcp.cloudsql_postgresql.database.insights.perquery.execution_time
                - gcp.cloudsql_postgresql.database.insights.perquery.io_time
                - gcp.cloudsql_postgresql.database.insights.perquery.lock_time
                - gcp.cloudsql_postgresql.database.insights.perquery.row.count
                - gcp.cloudsql_postgresql.database.insights.perquery.shared_blk_access.count
                - gcp.cloudsql_postgresql.database.insights.pertag.execution_time
                - gcp.cloudsql_postgresql.database.insights.pertag.io_time
                - gcp.cloudsql_postgresql.database.insights.pertag.lock_time
                - gcp.cloudsql_postgresql.database.insights.pertag.row.count
                - gcp.cloudsql_postgresql.database.insights.pertag.shared_blk_access.count
        - gauge (27 fields):
                - gcp.cloudsql_postgresql.database.auto_failover_request.count
                - gcp.cloudsql_postgresql.database.available_for_failover
                - gcp.cloudsql_postgresql.database.cpu.reserved_cores.count
                - gcp.cloudsql_postgresql.database.cpu.usage_time.sec
                - gcp.cloudsql_postgresql.database.cpu.utilization.pct
                - gcp.cloudsql_postgresql.database.disk.bytes_used.bytes
                - gcp.cloudsql_postgresql.database.disk.quota.bytes
                - gcp.cloudsql_postgresql.database.disk.read_ops.count
                - gcp.cloudsql_postgresql.database.disk.utilization.pct
                - gcp.cloudsql_postgresql.database.disk.write_ops.count
                - gcp.cloudsql_postgresql.database.memory.quota.bytes
                - gcp.cloudsql_postgresql.database.memory.total_usage.bytes
                - gcp.cloudsql_postgresql.database.memory.usage.bytes
                - gcp.cloudsql_postgresql.database.memory.utilization.pct
                - gcp.cloudsql_postgresql.database.network.connections.count
                - gcp.cloudsql_postgresql.database.network.received_bytes.count
                - gcp.cloudsql_postgresql.database.network.sent_bytes.count
                - gcp.cloudsql_postgresql.database.num_backends.count
                - gcp.cloudsql_postgresql.database.replication.network_lag.sec
                - gcp.cloudsql_postgresql.database.replication.replica_byte_lag.bytes
                - gcp.cloudsql_postgresql.database.replication.replica_lag.sec
                - gcp.cloudsql_postgresql.database.transaction.count
                - gcp.cloudsql_postgresql.database.transaction_id.count
                - gcp.cloudsql_postgresql.database.transaction_id_utilization.pct
                - gcp.cloudsql_postgresql.database.up
                - gcp.cloudsql_postgresql.database.uptime.sec
                - gcp.cloudsql_postgresql.database.vacuum.oldest_transaction_age
        - routing_path (4 fields):
                - agent.id
                - cloud.account.id
                - cloud.account.name
                - gcp.labels_fingerprint

Index tsdb-index-enabled successfully created.

Copying documents from .ds-metrics-gcp.cloudsql_postgresql-default-2023.11.06-000001 to tsdb-index-enabled...
All 760 documents taken from index .ds-metrics-gcp.cloudsql_postgresql-default-2023.11.06-000001 were successfully placed to index tsdb-index-enabled.

cloudsql mysql

Testing data stream metrics-gcp.cloudsql_mysql-default.
Index being used for the documents is .ds-metrics-gcp.cloudsql_mysql-default-2023.11.06-000001.
Index being used for the settings and mappings is .ds-metrics-gcp.cloudsql_mysql-default-2023.11.06-000001.

The time series fields for the TSDB index are: 
        - dimension (4 fields):
                - agent.id
                - cloud.account.id
                - cloud.account.name
                - gcp.labels_fingerprint
        - gauge (35 fields):
                - gcp.cloudsql_mysql.database.auto_failover_request.count
                - gcp.cloudsql_mysql.database.available_for_failover
                - gcp.cloudsql_mysql.database.cpu.reserved_cores.count
                - gcp.cloudsql_mysql.database.cpu.usage_time.sec
                - gcp.cloudsql_mysql.database.cpu.utilization.pct
                - gcp.cloudsql_mysql.database.disk.bytes_used.bytes
                - gcp.cloudsql_mysql.database.disk.quota.bytes
                - gcp.cloudsql_mysql.database.disk.read_ops.count
                - gcp.cloudsql_mysql.database.disk.utilization.pct
                - gcp.cloudsql_mysql.database.disk.write_ops.count
                - gcp.cloudsql_mysql.database.innodb_buffer_pool_pages_dirty.count
                - gcp.cloudsql_mysql.database.innodb_buffer_pool_pages_free.count
                - gcp.cloudsql_mysql.database.innodb_buffer_pool_pages_total.count
                - gcp.cloudsql_mysql.database.innodb_data_fsyncs.count
                - gcp.cloudsql_mysql.database.innodb_os_log_fsyncs.count
                - gcp.cloudsql_mysql.database.innodb_pages_read.count
                - gcp.cloudsql_mysql.database.innodb_pages_written.count
                - gcp.cloudsql_mysql.database.memory.quota.bytes
                - gcp.cloudsql_mysql.database.memory.total_usage.bytes
                - gcp.cloudsql_mysql.database.memory.usage.bytes
                - gcp.cloudsql_mysql.database.memory.utilization.pct
                - gcp.cloudsql_mysql.database.network.connections.count
                - gcp.cloudsql_mysql.database.network.received_bytes.count
                - gcp.cloudsql_mysql.database.network.sent_bytes.count
                - gcp.cloudsql_mysql.database.queries.count
                - gcp.cloudsql_mysql.database.questions.count
                - gcp.cloudsql_mysql.database.received_bytes.count
                - gcp.cloudsql_mysql.database.replication.last_io_errno
                - gcp.cloudsql_mysql.database.replication.last_sql_errno
                - gcp.cloudsql_mysql.database.replication.network_lag.sec
                - gcp.cloudsql_mysql.database.replication.replica_lag.sec
                - gcp.cloudsql_mysql.database.replication.seconds_behind_master.sec
                - gcp.cloudsql_mysql.database.sent_bytes.count
                - gcp.cloudsql_mysql.database.up
                - gcp.cloudsql_mysql.database.uptime.sec
        - routing_path (4 fields):
                - agent.id
                - cloud.account.id
                - cloud.account.name
                - gcp.labels_fingerprint

Index tsdb-index-enabled successfully created.

Copying documents from .ds-metrics-gcp.cloudsql_mysql-default-2023.11.06-000001 to tsdb-index-enabled...
All 344 documents taken from index .ds-metrics-gcp.cloudsql_mysql-default-2023.11.06-000001 were successfully placed to index tsdb-index-enabled.

cloudsql sqlserver

Testing data stream metrics-gcp.cloudsql_sqlserver-default.
Index being used for the documents is .ds-metrics-gcp.cloudsql_sqlserver-default-2023.11.06-000001.
Index being used for the settings and mappings is .ds-metrics-gcp.cloudsql_sqlserver-default-2023.11.06-000001.

The time series fields for the TSDB index are: 
        - dimension (4 fields):
                - agent.id
                - cloud.account.id
                - cloud.account.name
                - gcp.labels_fingerprint
        - gauge (23 fields):
                - gcp.cloudsql_sqlserver.database.audits_size.bytes
                - gcp.cloudsql_sqlserver.database.audits_upload.count
                - gcp.cloudsql_sqlserver.database.auto_failover_request.count
                - gcp.cloudsql_sqlserver.database.available_for_failover
                - gcp.cloudsql_sqlserver.database.cpu.reserved_cores.count
                - gcp.cloudsql_sqlserver.database.cpu.usage_time.sec
                - gcp.cloudsql_sqlserver.database.cpu.utilization.pct
                - gcp.cloudsql_sqlserver.database.disk.bytes_used.bytes
                - gcp.cloudsql_sqlserver.database.disk.quota.bytes
                - gcp.cloudsql_sqlserver.database.disk.read_ops.count
                - gcp.cloudsql_sqlserver.database.disk.utilization.pct
                - gcp.cloudsql_sqlserver.database.disk.write_ops.count
                - gcp.cloudsql_sqlserver.database.memory.quota.bytes
                - gcp.cloudsql_sqlserver.database.memory.total_usage.bytes
                - gcp.cloudsql_sqlserver.database.memory.usage.bytes
                - gcp.cloudsql_sqlserver.database.memory.utilization.pct
                - gcp.cloudsql_sqlserver.database.network.connections.count
                - gcp.cloudsql_sqlserver.database.network.received_bytes.count
                - gcp.cloudsql_sqlserver.database.network.sent_bytes.count
                - gcp.cloudsql_sqlserver.database.replication.network_lag.sec
                - gcp.cloudsql_sqlserver.database.replication.replica_lag.sec
                - gcp.cloudsql_sqlserver.database.up
                - gcp.cloudsql_sqlserver.database.uptime.sec
        - routing_path (4 fields):
                - agent.id
                - cloud.account.id
                - cloud.account.name
                - gcp.labels_fingerprint

Index tsdb-index-enabled successfully created.

Copying documents from .ds-metrics-gcp.cloudsql_sqlserver-default-2023.11.06-000001 to tsdb-index-enabled...
All 344 documents taken from index .ds-metrics-gcp.cloudsql_sqlserver-default-2023.11.06-000001 were successfully placed to index tsdb-index-enabled.

firestore

Testing data stream metrics-gcp.firestore-default.
Index being used for the documents is .ds-metrics-gcp.firestore-default-2023.11.06-000001.
Index being used for the settings and mappings is .ds-metrics-gcp.firestore-default-2023.11.06-000001.

The time series fields for the TSDB index are: 
        - dimension (4 fields):
                - agent.id
                - cloud.account.id
                - cloud.account.name
                - gcp.labels_fingerprint
        - gauge (3 fields):
                - gcp.firestore.document.delete.count
                - gcp.firestore.document.read.count
                - gcp.firestore.document.write.count
        - routing_path (4 fields):
                - agent.id
                - cloud.account.id
                - cloud.account.name
                - gcp.labels_fingerprint

Index tsdb-index-enabled successfully created.

Copying documents from .ds-metrics-gcp.firestore-default-2023.11.06-000001 to tsdb-index-enabled...
All 171 documents taken from index .ds-metrics-gcp.firestore-default-2023.11.06-000001 were successfully placed to index tsdb-index-enabled.

@zmoog zmoog merged commit 66fd810 into elastic:main Nov 14, 2023
7 checks passed
@zmoog zmoog mentioned this pull request Nov 14, 2023
6 tasks
Scholar-Li pushed a commit to Scholar-Li/beats that referenced this pull request Feb 5, 2024
* group metrics

* Add timestamp to the grouping key

The dimensionsKey contains all dimension fields value we want to use
to group the time series.

We need to add the timestamp to the key, so we only group together time
series with the same timestamp.

* Update grouping key, add event.created, drop ID()

# Update grouping key

The dimensionsKey contains all dimension fields values we want to use
to group the time series.

We need to add the timestamp to the key, so we only group time
series with the same timestamp.

# Drop ID() function

Remove the `ID()` function from the Metadata Collector.

Since we are unifying the metric grouping logic for all metric types, we
don't need to keep the `ID()` function anymore.

# Renaming

I also renamed some structs, functions, and variables with the purpose
of making their role and purpose more clear.

We can remove this part if it does not improve clarity.


---------

Co-authored-by: Maurizio Branca <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Cloud-Monitoring Label for the Cloud Monitoring team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants