Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AccessDeniedException when using ADOT with an EKS cluster #229

Open
fpaparoni opened this issue Jan 26, 2023 · 13 comments
Open

AccessDeniedException when using ADOT with an EKS cluster #229

fpaparoni opened this issue Jan 26, 2023 · 13 comments
Assignees

Comments

@fpaparoni
Copy link

I'm receiving an error with a basic setup of ADOT, so probably I'm missing something. I just created a new EKS cluster, adding ADOT as addon. Next step was to add a ClusterConfig like this

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: develop
  region: us-east-1

iam:
  withOIDC: true
  serviceAccounts:
    - metadata:
        name: adot-collector
        namespace: testnamespace
      attachPolicyARNs:
      - "arn:aws:iam::aws:policy/AmazonPrometheusRemoteWriteAccess"
      - "arn:aws:iam::aws:policy/AWSXrayWriteOnlyAccess"
      - "arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy"

after that I created the following OpenTelemetryCollector using the sidecar mode

apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: develop-collector-xray
spec:
  mode: sidecar 
  resources:
    requests:
      cpu: "1"
    limits:
      cpu: "1"
  serviceAccount: adot-collector
  config: |
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
          http:
            endpoint: 0.0.0.0:4318
            
    processors:
      batch:

    exporters:
      logging:
        loglevel: debug
      awsxray:
        region: 'us-east-1'

    service:
      pipelines:
        traces:
          receivers: [otlp]
          exporters: [awsxray]
      telemetry:
        logs:
          level: debug

I added the annotation

sidecar.opentelemetry.io/inject: "true"

to my pod definition. I started the application using the java agent and passing the required env variables

ENV OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
ENV OTEL_RESOURCE_ATTRIBUTES=service.namespace=test-be,service.name=test-be
ENV AWS_REGION=us-east-1
ENV OTEL_METRICS_EXPORTER=otlp
CMD java -javaagent:/app/bin/aws-opentelemetry-agent.jar -jar /app/bin/registry.jar

Once started I can see the injected sidecar pod when tracing doesn't work and from the logs I can see the following error

2023-01-26T12:05:36.665Z	debug	[email protected]/awsxray.go:70	response error	{"kind": "exporter", "data_type": "traces", "name": "awsxray", "error": "AccessDeniedException: \n\tstatus code: 403, request id: c3c8ff28-18c5-4c2c-a5b5-e48b93b020c4"}
2023-01-26T12:05:36.665Z	debug	[email protected]/awsxray.go:74	response: {

}	{"kind": "exporter", "data_type": "traces", "name": "awsxray"}

I'm probably missing some authorization somewhere but I don't have idea where because I followed the official guideline

https://docs.aws.amazon.com/eks/latest/userguide/opentelemetry.html

Any ideas?

Thanks

@mhausenblas
Copy link
Member

Looks like that IRSA is not working as expected. You want to make sure that the service account adot-collector that you created via eksctl is in the same namespace as the ADOT collector.

@fpaparoni
Copy link
Author

Yes I already checked, it's in the same namespace.

The only additional thing (but I don't know if it could be a problem) is that I have a constraint in my account so in every IAM role I must add a permission boundary. But of course I added it to the ClusterConfig otherwise I cannot create it (I didn't report it in the example). I don't know if this constraint can block the standard flow in other parts, but in the service account is present.

@mhausenblas
Copy link
Member

Oh the wonderful world of permission boundaries. Not sure if we have the complete picture, knowing this now. Two options: if you have Enterprise support, please cut us a ticket via your TAM or SA. If not, I'd work from left, that is, check: serviceaccount - > pod -> IAM role or try out a different mode (deployment).

Please note that we offer support via GitHub on a best effort basis, so could take some time (hence, suggesting the support route).

@fpaparoni
Copy link
Author

Oh the wonderful world of permission boundaries.

Yes I know :(

Not sure if we have the complete picture, knowing this now. Two options: if you have Enterprise support, please cut us a ticket via your TAM or SA.

Unfortunately on this account we a Basic plan for the moment

If not, I'd work from left, that is, check: serviceaccount - > pod -> IAM role or try out a different mode (deployment).

For check what you mean? Anyway now I tried using deployment mode and it works. For the development purposes it's ok but we would like to use the sidecar mode. Is it possible something missing inside pod configuration?

Please note that we offer support via GitHub on a best effort basis, so could take some time (hence, suggesting the support route).

I know of course ;)

@mhausenblas
Copy link
Member

Anyway now I tried using deployment mode and it works. For the development purposes it's ok but we would like to use the sidecar mode. Is it possible something missing inside pod configuration?

Interesting. Let me look into this (note that the add-on is using upstream OpenTelemetry operator) and get back to you.

Would you mind expanding on why you prefer sidecar over deployment or other non-sidecar modes?

@fpaparoni
Copy link
Author

Interesting. Let me look into this (note that the add-on is using upstream OpenTelemetry operator) and get back to you.

great

Would you mind expanding on why you prefer sidecar over deployment or other non-sidecar modes?

It's a consideration based on a previous environment with Jaeger where we switched from a single collector (sometimes it had problems but I really don't remember the specific cause) to a sidercar container. Of course we can evaluate different mode if it works :)

@mhausenblas
Copy link
Member

Thanks for the context @fpaparoni and I would recommend to evaluate other modes, yes. Depending on your workload (number of pods, using sidecar mode can be a rather resource intensive option).

@erichsueh3
Copy link
Contributor

Hi @fpaparoni, can you confirm that the Collector and the Pod you are annotating are in the same namespace? That may be a reason why the sidecar mode doesn't seem to be working.

@fpaparoni
Copy link
Author

Yes in both modes Collector, Pod and Service Account are in the same namespace. Deployment now works, if I switch to sidecar i receive an AccessDeniedException

@erichsueh3
Copy link
Contributor

erichsueh3 commented Feb 22, 2023

Hey @fpaparoni, any updates here? Were you ever able to get sidecar deployment of the Collector working? If not, I'd like to dive a bit deeper into why this issue might be happening.

@fpaparoni
Copy link
Author

We are using the deployment mode without problems and never switched back to sidecar. If it can be useful I can make some specific tests

@erichsueh3
Copy link
Contributor

We are using the deployment mode without problems and never switched back to sidecar. If it can be useful I can make some specific tests

I see - I've been trying to replicate your issue with no luck, but I haven't involved permission boundaries at all so that might be where the issue lies.

Also, when you say you can make specific tests, what are you referring to? What tests do you think would be useful to create?

@fpaparoni
Copy link
Author

I was thinking about looking at specific logs if useful, anyway we are now using without problems deployment mode and we won't come back

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants