OTel Collector Fails Health Check on AWS App Runner with Minimal Config #37081

garysassano · 2025-01-08T00:20:43Z

Problem Summary

I spent half a day trying to make the opentelemetry-collector-contrib work on AWS App Runner using its default health check configuration on / path and port 4318:

Despite multiple attempts with minimal configurations, the health check consistently failed. Eventually, I was only able to make it work by reverse-engineering the opentelemetry-collector-config default confmap from the logs and tweaking it. However, this resulted in a bloated configuration containing many components I don’t need.

Investigation Process

Extracting Default Configuration
Since I couldn’t find the default configuration in this repository, I used the logs generated by the opentelemetry-collector-config default confmap and reverse-engineered it with Claude 3.5 Sonnet.
(See image below for log details)
Reverse-Engineered Configuration
The resulting configuration included multiple receivers, exporters, and extensions, even though I only required OTLP.
Here’s the output I obtained (excluding Prometheus):

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
  zipkin:
    endpoint: 0.0.0.0:9411
  jaeger:
    protocols:
      grpc:
        endpoint: 0.0.0.0:14250
      thrift_http:
        endpoint: 0.0.0.0:14268
  opencensus:
    endpoint: 0.0.0.0:55678

exporters:
  debug:
    verbosity: detailed

extensions:
  health_check:
    endpoint: localhost:13133
  pprof:
    endpoint: 0.0.0.0:1777
  zpages:
    endpoint: 0.0.0.0:55679

service:
  telemetry:
    metrics:
      address: localhost:8888
      level: normal

  extensions: [health_check, pprof, zpages]

  pipelines:
    traces:
      receivers: [otlp, zipkin, jaeger, opencensus]
      exporters: [debug]
    metrics:
      receivers: [otlp, opencensus]
      exporters: [debug]
    logs:
      receivers: [otlp]
      exporters: [debug]

Final Working Configuration
I then replaced the debug exporter with my actual otlp exporter pointing to Honeycomb and finally got the setup working.

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
  zipkin:
    endpoint: 0.0.0.0:9411
  jaeger:
    protocols:
      grpc:
        endpoint: 0.0.0.0:14250
      thrift_http:
        endpoint: 0.0.0.0:14268
  opencensus:
    endpoint: 0.0.0.0:55678

exporters:
  otlp:
    endpoint: api.honeycomb.io:443
    headers:
      x-honeycomb-team: XXX

extensions:
  health_check:
    endpoint: localhost:13133
  pprof:
    endpoint: 0.0.0.0:1777
  zpages:
    endpoint: 0.0.0.0:55679

service:
  telemetry:
    metrics:
      address: localhost:8888
      level: normal

  extensions: [health_check, pprof, zpages]

  pipelines:
    traces:
      receivers: [otlp, zipkin, jaeger, opencensus]
      exporters: [otlp]
    metrics:
      receivers: [otlp, opencensus]
      exporters: [otlp]
    logs:
      receivers: [otlp]
      exporters: [otlp]

Minimal Config Attempt and Failure

I initially tried simplifying the configuration to only include the required OTLP components, as shown below:

receivers:
  otlp:
    protocols:
      http:
        endpoint: localhost:4318

exporters:
  otlp:
    endpoint: api.honeycomb.io:443
    headers:
      x-honeycomb-team: XXX

service:
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [otlp]

extensions:
  health_check:

However, this configuration failed the AWS App Runner health check.

Open Question

Why does the simplified configuration fail the App Runner health check while the bloated configuration doesn't?

Collector version

0.112.0

The text was updated successfully, but these errors were encountered:

bacherfl · 2025-01-08T07:00:15Z

Hi @garysassano! It looks like one difference between the minimal config and the working one is the address of the otlp receiver. In the minimal config, this is set to localhost:4318, which means that the receiver will only accept requests from the same machine. In the working configuration, this is set to 0.0.0.0:4318, so it will accept connections from all sources.
I do not know much about the architecture of AWS App Runner, but this may be a possible explanation why the health checks are not working with the minimal config.

Therefore, can you try to change the receivers.otlp.protocols.http.endpoint attribute in the minimal config to 0.0.0.0:4318 as well, and see if this helps?

garysassano · 2025-01-08T12:35:54Z

@bacherfl Your assumption was correct: changing receivers.otlp.protocols.http.endpoint from localhost:4318 to 0.0.0.0:4318 resolved the health check issue.

Does this imply that the health check mechanism used by AWS App Runner sends requests from outside the machine? And would it be a bad thing to leave the OTLP/HTTP receiver with 0.0.0.0:4318?

bacherfl · 2025-01-08T12:49:20Z

Does this imply that the health check mechanism used by AWS App Runner sends requests from outside the machine? And would it be a bad thing to leave the OTLP/HTTP receiver with 0.0.0.0:4318?

yes, it seems like the requests for the health checks are sent from outside the machine in this case. Leaving the endpoint at 0.0.0.0:4318 would be a valid solution, especially since most likely services sending data to the otlp receiver would also not be able to reach it otherwise.

One more thing though, If possible I would recommend to use the healthcheck extension running at port 13133 for the health checks, rather than the otlp endpoint, as this is more suited for that. I've seen that this is already included in the sample config, the only change required here is to also use the 0.0.0.0 endpoint, like with the otlp receiver:

extensions:
  health_check:
    endpoint: 0.0.0.0:13133

garysassano · 2025-01-08T13:36:49Z

So you suggest that I should change the AWS App Runner health check listener from TCP/4318 to TCP/13133 instead?

What is the primary advantage of performing the health check on the endpoint provided by the health_check service extension instead of checking the OTLP/HTTP endpoint directly?

bacherfl · 2025-01-08T13:58:43Z

Exactly, setting it to TCP/13133 will change the health check to use the health check extension.

I think the most notable difference here is that the health check extension will indicate that all of the components and thus the complete pipeline are ready. So for example, it could be that the otlp receiver endpoint is already up and running, while e.g. a processor or exporter is not fully initialised and ready.

garysassano · 2025-01-09T14:18:47Z

One issue with App Runner is that it doesn’t allow specifying a separate port for health checks; it requires using the same port where the service is listening. However, you can configure the health check to use a specific path. In this case, could I set the health check extension to use a path like /health on port 4318 or it would cause a conflict with the otlp/http receiver?

garysassano · 2025-01-09T14:30:12Z

Never mind, it's not possible:

01-09-2025 03:29:23 PM Error: cannot start pipelines: listen tcp 0.0.0.0:4318: bind: address already in use
01-09-2025 03:29:23 PM 2025/01/09 14:29:23 collector server run finished with error: cannot start pipelines: listen tcp 0.0.0.0:4318: bind: address already in use

garysassano added bug Something isn't working needs triage New item requiring triage labels Jan 8, 2025

bacherfl added question Further information is requested and removed bug Something isn't working needs triage New item requiring triage labels Jan 8, 2025

garysassano closed this as completed Jan 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OTel Collector Fails Health Check on AWS App Runner with Minimal Config #37081

OTel Collector Fails Health Check on AWS App Runner with Minimal Config #37081

garysassano commented Jan 8, 2025 •

edited

Loading

bacherfl commented Jan 8, 2025

garysassano commented Jan 8, 2025

bacherfl commented Jan 8, 2025

garysassano commented Jan 8, 2025

bacherfl commented Jan 8, 2025

garysassano commented Jan 9, 2025

garysassano commented Jan 9, 2025

OTel Collector Fails Health Check on AWS App Runner with Minimal Config #37081

OTel Collector Fails Health Check on AWS App Runner with Minimal Config #37081

Comments

garysassano commented Jan 8, 2025 • edited Loading

Problem Summary

Investigation Process

Open Question

Collector version

bacherfl commented Jan 8, 2025

garysassano commented Jan 8, 2025

bacherfl commented Jan 8, 2025

garysassano commented Jan 8, 2025

bacherfl commented Jan 8, 2025

garysassano commented Jan 9, 2025

garysassano commented Jan 9, 2025

garysassano commented Jan 8, 2025 •

edited

Loading