Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OTel Collector Fails Health Check on AWS App Runner with Minimal Config #37081

Closed
garysassano opened this issue Jan 8, 2025 · 7 comments
Closed
Labels
question Further information is requested

Comments

@garysassano
Copy link

garysassano commented Jan 8, 2025

Problem Summary

I spent half a day trying to make the opentelemetry-collector-contrib work on AWS App Runner using its default health check configuration on / path and port 4318:

image

Despite multiple attempts with minimal configurations, the health check consistently failed. Eventually, I was only able to make it work by reverse-engineering the opentelemetry-collector-config default confmap from the logs and tweaking it. However, this resulted in a bloated configuration containing many components I don’t need.


Investigation Process

  1. Extracting Default Configuration
    Since I couldn’t find the default configuration in this repository, I used the logs generated by the opentelemetry-collector-config default confmap and reverse-engineered it with Claude 3.5 Sonnet.
    (See image below for log details)
    Log Output

  2. Reverse-Engineered Configuration
    The resulting configuration included multiple receivers, exporters, and extensions, even though I only required OTLP.
    Here’s the output I obtained (excluding Prometheus):

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
  zipkin:
    endpoint: 0.0.0.0:9411
  jaeger:
    protocols:
      grpc:
        endpoint: 0.0.0.0:14250
      thrift_http:
        endpoint: 0.0.0.0:14268
  opencensus:
    endpoint: 0.0.0.0:55678

exporters:
  debug:
    verbosity: detailed

extensions:
  health_check:
    endpoint: localhost:13133
  pprof:
    endpoint: 0.0.0.0:1777
  zpages:
    endpoint: 0.0.0.0:55679

service:
  telemetry:
    metrics:
      address: localhost:8888
      level: normal

  extensions: [health_check, pprof, zpages]

  pipelines:
    traces:
      receivers: [otlp, zipkin, jaeger, opencensus]
      exporters: [debug]
    metrics:
      receivers: [otlp, opencensus]
      exporters: [debug]
    logs:
      receivers: [otlp]
      exporters: [debug]
  1. Final Working Configuration
    I then replaced the debug exporter with my actual otlp exporter pointing to Honeycomb and finally got the setup working.
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
  zipkin:
    endpoint: 0.0.0.0:9411
  jaeger:
    protocols:
      grpc:
        endpoint: 0.0.0.0:14250
      thrift_http:
        endpoint: 0.0.0.0:14268
  opencensus:
    endpoint: 0.0.0.0:55678

exporters:
  otlp:
    endpoint: api.honeycomb.io:443
    headers:
      x-honeycomb-team: XXX

extensions:
  health_check:
    endpoint: localhost:13133
  pprof:
    endpoint: 0.0.0.0:1777
  zpages:
    endpoint: 0.0.0.0:55679

service:
  telemetry:
    metrics:
      address: localhost:8888
      level: normal

  extensions: [health_check, pprof, zpages]

  pipelines:
    traces:
      receivers: [otlp, zipkin, jaeger, opencensus]
      exporters: [otlp]
    metrics:
      receivers: [otlp, opencensus]
      exporters: [otlp]
    logs:
      receivers: [otlp]
      exporters: [otlp]

  1. Minimal Config Attempt and Failure

I initially tried simplifying the configuration to only include the required OTLP components, as shown below:

receivers:
  otlp:
    protocols:
      http:
        endpoint: localhost:4318

exporters:
  otlp:
    endpoint: api.honeycomb.io:443
    headers:
      x-honeycomb-team: XXX

service:
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [otlp]

extensions:
  health_check:

However, this configuration failed the AWS App Runner health check.


Open Question

Why does the simplified configuration fail the App Runner health check while the bloated configuration doesn't?


Collector version

0.112.0

@garysassano garysassano added bug Something isn't working needs triage New item requiring triage labels Jan 8, 2025
@bacherfl
Copy link
Contributor

bacherfl commented Jan 8, 2025

Hi @garysassano! It looks like one difference between the minimal config and the working one is the address of the otlp receiver. In the minimal config, this is set to localhost:4318, which means that the receiver will only accept requests from the same machine. In the working configuration, this is set to 0.0.0.0:4318, so it will accept connections from all sources.
I do not know much about the architecture of AWS App Runner, but this may be a possible explanation why the health checks are not working with the minimal config.

Therefore, can you try to change the receivers.otlp.protocols.http.endpoint attribute in the minimal config to 0.0.0.0:4318 as well, and see if this helps?

@garysassano
Copy link
Author

@bacherfl Your assumption was correct: changing receivers.otlp.protocols.http.endpoint from localhost:4318 to 0.0.0.0:4318 resolved the health check issue.

Does this imply that the health check mechanism used by AWS App Runner sends requests from outside the machine? And would it be a bad thing to leave the OTLP/HTTP receiver with 0.0.0.0:4318?

@bacherfl
Copy link
Contributor

bacherfl commented Jan 8, 2025

Does this imply that the health check mechanism used by AWS App Runner sends requests from outside the machine? And would it be a bad thing to leave the OTLP/HTTP receiver with 0.0.0.0:4318?

yes, it seems like the requests for the health checks are sent from outside the machine in this case. Leaving the endpoint at 0.0.0.0:4318 would be a valid solution, especially since most likely services sending data to the otlp receiver would also not be able to reach it otherwise.

One more thing though, If possible I would recommend to use the healthcheck extension running at port 13133 for the health checks, rather than the otlp endpoint, as this is more suited for that. I've seen that this is already included in the sample config, the only change required here is to also use the 0.0.0.0 endpoint, like with the otlp receiver:

extensions:
  health_check:
    endpoint: 0.0.0.0:13133

@garysassano
Copy link
Author

So you suggest that I should change the AWS App Runner health check listener from TCP/4318 to TCP/13133 instead?

What is the primary advantage of performing the health check on the endpoint provided by the health_check service extension instead of checking the OTLP/HTTP endpoint directly?

@bacherfl
Copy link
Contributor

bacherfl commented Jan 8, 2025

Exactly, setting it to TCP/13133 will change the health check to use the health check extension.

I think the most notable difference here is that the health check extension will indicate that all of the components and thus the complete pipeline are ready. So for example, it could be that the otlp receiver endpoint is already up and running, while e.g. a processor or exporter is not fully initialised and ready.

@bacherfl bacherfl added question Further information is requested and removed bug Something isn't working needs triage New item requiring triage labels Jan 8, 2025
@garysassano
Copy link
Author

One issue with App Runner is that it doesn’t allow specifying a separate port for health checks; it requires using the same port where the service is listening. However, you can configure the health check to use a specific path. In this case, could I set the health check extension to use a path like /health on port 4318 or it would cause a conflict with the otlp/http receiver?

@garysassano
Copy link
Author

Never mind, it's not possible:

01-09-2025 03:29:23 PM Error: cannot start pipelines: listen tcp 0.0.0.0:4318: bind: address already in use
01-09-2025 03:29:23 PM 2025/01/09 14:29:23 collector server run finished with error: cannot start pipelines: listen tcp 0.0.0.0:4318: bind: address already in use

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants