Skip to content

Commit

Permalink
merge col-exporter-rewrite into main (#295)
Browse files Browse the repository at this point in the history
* Skip all fixture tests (#239)

* Initial structure for new pdata metrics exporter (#238)

* [Metrics Rewrite] add outline with todos for fragmenting work (#240)

* [Metrics Rewrite] attribute to label mapping (#243)

[Metrics Rewrite] attribute to label mapping

* [Metrics Rewrite] support for pdata Sum points (#242)

* [Metrics Rewrite] support for pdata Sum points

* update breaking-changes.md

* use concatentation instead of sprintf

* [Metrics Rewrite] support for pdata Gauge points (#244)

* Add logic to translate metric descriptors and initial flow (#247)

* Fixes from merge.

* Fix tests.

* Clean up test cases, re-disable integration tests.

* Add summary descriptors and label descriptors.

* Fix lint issues.

* Some fixes from review.

* Remove metric import.

* Fixes from review.
- Update default config method
- Simplify some of my lack-of-go expertise.

* Add unit test for metric domains.

* Fixes from review.

* Add breaking changes.

* Fixes from review.

* Update context to be TODO.

* Add support for exponential histograms and exemplars. (#251)

* Add support for exponential histograms and exemplars.

* Fixes from review.

* Fixes from review.

* Fixes from discussion.

* [Metrics Rewrite] implement monitored resource mapping (#252)

* [Metrics Rewrite] implement monitored resource mapping

* review fixes

* [Metrics Rewrite] update breaking-changes.md for monitored resource (#255)

* Add summary mapping to exporter. (#249)

* Add config to call `CreateServiceTimeSeries` (#259)

* Initial implementation of create service time series.

* Add a test case for create service timeseries.

* Add logic to auto-detect project id if not configured.

* Fix from code review

* Fix resource to be one that has retention policy for integration tests.

* Add support for histogram to metrics exporter. (#258)

BUG=210164184

* Re-enable ops-agent self-metric integration test. (#260)

* [Metrics Rewrite] add ExponentialHistogram fixture (#257)

* [Metrics Rewrite] add ExponentialHistogram fixture

* make tests deterministic

* few last changes

* close channel instead of sending a message

* Enable ops agent host metric integration test. (#264)

- There is a bug in upstream agent-metric-processor that sets incorrect units on usage metrics (GoogleCloudPlatform/opentelemetry-operations-collector#72)
- We update the expectations for inculsion of units in CreateTimeSeries
- We disable metric descriptors (for now).  Given the bug in agent-metric-processor, liekly ops-agent will need upstream fix for this first.

* add a feature gate, which defaults to false, for using the re-written exporter (#267)

* Enable Basic integration tests (#266)

* Enable basic counter test.

* Enable delta counter metrics.

- Note: Delta counters are NOW fake-delta (i.e. cumulatives with limited time windows)

* Enable non-monotonic-sum integration test.

* Re-enable summary integration test and fix design issues in summary translation.

- Summary exports percentiles, not quantiles
- Percentiles should include similar double precision in the string.

* Fix recordfixtures script to use featuregate (#270)

* Skip already seen attribute keys when creating LabelDescriptors (#272)

* Reenable GKE metrics agent fixtures (#271)

* Update breaking-changes.md for googlecloudmonitoring/point_count self observability (#277)

* Move logging to use zap-logger and set up self-observability to match collector expectations. (#275)

* Enable metric prefix integraiton tests. (#274)

* enable workloadapis prefix integration test.

* update unknown domain metrics expect.

* Add instrumentationLibraryToLabels method to metrics exporter. (#253)

* Add instrumentationLibraryToLabels method to metrics exporter.

BUG=https://b.corp.google.com/issues/210164355

* Remove custom_metrics_domains behaviour from metrics-exporter.

* Remove dependency on go.opentelemetry.io/collector (#279)

* remove dependency on go.opentelemetry.io/collector

* add ocgrpc metrics to exporters' self-obs metrics (#280)

* Use OC stackdriver exporter to capture self observability metrics as GCM protos (#282)

* Capture ocgrpc self observability metrics (#283)

* make integrationtest not internal (#285)

* Remove internal/ prefix for integrationtest (#288)

* Add batching support to metrics-exporter. (#286)

* Add batching support to metrics-exporter.

* Retry when we fail to write metric descriptors.

* Re-enable workload metrics integration tests (#278)

* update header year for new files (#296)

* Document new CreateMetricDescriptor behavior (#294)

* reenable disabled metrics test (#299)

Co-authored-by: Aaron Abbott <[email protected]>
Co-authored-by: Josh Suereth <[email protected]>
Co-authored-by: Thomas Barker <[email protected]>
Co-authored-by: Punya Biswal <[email protected]>
  • Loading branch information
5 people authored Feb 2, 2022
1 parent 242ff20 commit 7338096
Show file tree
Hide file tree
Showing 67 changed files with 22,825 additions and 2,076 deletions.
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ ifeq ($(UNAME_S),Darwin)
endif
endif

GOTEST_MIN = go test -v -timeout 60s
GOTEST_MIN = go test -v -timeout 70s
GOTEST = $(GOTEST_MIN) -race
GOTEST_WITH_COVERAGE = $(GOTEST) -coverprofile=coverage.txt -covermode=atomic

Expand Down
1 change: 0 additions & 1 deletion exporter/collector/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -201,7 +201,6 @@ and [memory limiter](https://github.com/open-telemetry/opentelemetry-collector/t
optimal network usage and avoiding memory overruns. You may also want to run an additional
[sampler](https://github.com/open-telemetry/opentelemetry-collector/tree/main/processor/probabilisticsamplerprocessor), depending on your needs.
## Deprecatations
The previous trace configuration (v0.21.0) has been deprecated in favor of the common configuration options available in OpenTelemetry. These will cause a failure to start
Expand Down
88 changes: 88 additions & 0 deletions exporter/collector/breaking-changes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
# Breaking changes vs old googlecloud exporter

The new pdata based exporter has some breaking changes from the original OpenCensus (OC)
stackdriver based `googlecloud` exporter:

## Metric Names and Descriptors

The previous collector exporter would default to sending metrics with the type:
`custom.googleapis.com/OpenCensus/{metric_name}`. This has been changed to
`workload.googleapis.com/{metric_name}`.

Additionally, the previous exporter had a hardcoded list of known metric domains
where this "prefix" would not be used. The new exporter allows full configuration
of this list via the `metric.known_domains` property.

The previous exporter would, by default, only call CreateMetricDescriptor for metrics with
domain `custom.googleapis.com` or `external.googleapis.com`. The new exporter will try to call
CreateMetricDescriptor regardless of domain, unless `metric.skip_create_descriptor` or
`metric.create_service_timeseries` are true. The exporter now calls CreateMetricDescriptor as a
best effort. It queues MetricDescriptors in a buffered channel, dropping them when the channel
is full; the next time that metric is seen, it will be retried.

Additionally, the DisplayName for a metric used to be exactly the
`{metric_name}`. Now, the metric name is chosen as the full-path after the
domain name of the metric type. E.g. if a metric called
`workload.googleapis.com/nginx/latency` is created, the display name will
be `nginx/latency` instead of `workload.googleapis.com/nginx/latency`.

## Monitored Resources

Mapping from OTel Resource to GCM monitored resource has been completely changed. The OC based
exporter worked by converting the OTel resource into an OC resource which the exporter
recognized. The `resource_mappings` config option allowed customizing this conversion so the OC
exporter would correctly convert to a GCM monitored resource.

Then new pdata based exporter works by interpreting the [OTel Resource semantic
conventions](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/resource/semantic_conventions/README.md)
as follows to determine the monitored resource type:

- Switch on the `cloud.platform` Resource attribute and if:
- `gcp_compute_engine`, send a `gce_instance` monitored resource.
- `gcp_kubernetes_engine`, send the most specific possible k8s monitored resource depending
on which resource keys are present and non-empty. In order, try for `k8s_container`,
`k8s_pod`, `k8s_node`, `k8s_cluster`.
- `aws_ec2`, send a `aws_ec2_instance` monitored resource.
- Otherwise, fallback to:
- `generic_task` if the `service.name` and `service.instance_id` resource attributes are
present and non-empty.
- `generic_node`

Once the type is determine, the monitored resource labels are populated from the mappings
defined in [`monitoredresource.go`](monitoredresource.go#L51). The new behavior will never send the
`global` monitored resource.

For now, it is not possible to customizate the mapping algorithm, beyond using the
[`resourceprocessor`](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/resourceprocessor)
in the collector pipeline before the exporter. If you have a use case for customizing the
behavior, please open an issue.

## Labels

Original label key mapping code is
[here](https://github.com/census-ecosystem/opencensus-go-exporter-stackdriver/blob/42e7e58efdb937e8477f827d3fba022212335dbc/sanitize.go#L26).
The new code does not:

- truncate label keys longer than 100 characters.
- prepend `key` when the first character is `_`.

## OTLP Sum

In the old exporter, delta sums were converted into GAUGE points ([see test
fixture](https://github.com/GoogleCloudPlatform/opentelemetry-operations-go/blob/9bc1f49ebe000b0b3b1aa5b7f201e7996effdcd8/exporter/collector/testdata/fixtures/delta_counter_metrics_expect.json#L15)).
The new pdata exporter sends these as CUMULATIVE points with the same delta time window
(reseting at each point) aka pseudo-cumulatives.

## OTLP Summary

The old exporter relied on upstream conversion of OTLP Summary into Gauge and
Cumulative points. The new exporter performs this conversion itself, which
means summary metric descriptors will include label description for `percentile`
labels.

## Self Observability Metrics

For each OTLP Summary metric point, the old exporter would add 1 to the
`googlecloudmonitoring/point_count` self-observability counter. For a Summary point with N
percentile values, the new exporter will add `N + 2` (one for each percentile timeseries, one
for count, and one for sum) to the counter.
50 changes: 39 additions & 11 deletions exporter/collector/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,25 +15,23 @@
package collector

import (
"go.opentelemetry.io/collector/config"
"go.opentelemetry.io/collector/exporter/exporterhelper"
"time"

"google.golang.org/api/option"
)

const (
DefaultTimeout = 12 * time.Second // Consistent with Cloud Monitoring's timeout
)

// Config defines configuration for Google Cloud exporter.
type Config struct {
config.ExporterSettings `mapstructure:",squash"`
ProjectID string `mapstructure:"project"`
UserAgent string `mapstructure:"user_agent"`
Endpoint string `mapstructure:"endpoint"`
ProjectID string `mapstructure:"project"`
UserAgent string `mapstructure:"user_agent"`
Endpoint string `mapstructure:"endpoint"`
// Only has effect if Endpoint is not ""
UseInsecure bool `mapstructure:"use_insecure"`

// Timeout for all API calls. If not set, defaults to 12 seconds.
exporterhelper.TimeoutSettings `mapstructure:",squash"` // squash ensures fields are correctly decoded in embedded struct.
exporterhelper.QueueSettings `mapstructure:"sending_queue"`
exporterhelper.RetrySettings `mapstructure:"retry_on_failure"`

ResourceMappings []ResourceMapping `mapstructure:"resource_mappings"`
// GetClientOptions returns additional options to be passed
// to the underlying Google Cloud API client.
Expand All @@ -47,6 +45,20 @@ type Config struct {
type MetricConfig struct {
Prefix string `mapstructure:"prefix"`
SkipCreateMetricDescriptor bool `mapstructure:"skip_create_descriptor"`
// If a metric belongs to one of these domains it does not get a prefix.
KnownDomains []string `mapstructure:"known_domains"`

// If true, set the instrumentation_source and instrumentation_version
// labels. Defaults to true.
InstrumentationLibraryLabels bool `mapstructure:"instrumentation_library_labels"`

// If true, this will send all timeseries using `CreateServiceTimeSeries`.
// Implicitly, this sets `SkipMetricDescriptor` to true.
CreateServiceTimeSeries bool `mapstructure:"create_service_timeseries"`

// Buffer size for the channel which asynchronously calls CreateMetricDescriptor. Default
// is 10.
CreateMetricDescriptorBufferSize int `mapstructure:"create_metric_descriptor_buffer_size"`
}

// ResourceMapping defines mapping of resources from source (OpenCensus) to target (Google Cloud).
Expand All @@ -64,3 +76,19 @@ type LabelMapping struct {
// When required label is missing, we fallback to default resource mapping.
Optional bool `mapstructure:"optional"`
}

// Known metric domains. Note: This is now configurable for advanced usages.
var domains = []string{"googleapis.com", "kubernetes.io", "istio.io", "knative.dev"}

// DefaultConfig creates the default configuration for exporter.
func DefaultConfig() Config {
return Config{
UserAgent: "opentelemetry-collector-contrib {{version}}",
MetricConfig: MetricConfig{
KnownDomains: domains,
Prefix: "workload.googleapis.com",
CreateMetricDescriptorBufferSize: 10,
InstrumentationLibraryLabels: true,
},
}
}
95 changes: 0 additions & 95 deletions exporter/collector/config_test.go

This file was deleted.

73 changes: 0 additions & 73 deletions exporter/collector/factory.go

This file was deleted.

Loading

0 comments on commit 7338096

Please sign in to comment.