Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ec2tagger: Unable to retrieve InstanceId. #367

Closed
rdonadono opened this issue Feb 19, 2022 · 8 comments
Closed

ec2tagger: Unable to retrieve InstanceId. #367

rdonadono opened this issue Feb 19, 2022 · 8 comments
Labels
aws/eks Amazon Elastic Kubernetes Service Stale

Comments

@rdonadono
Copy link

Hi team,

I'm trying to migrate the EKS metrics and logs from Prometheus to Cloudwatch using this agent but I have some problem.

I have followed this doc and first of all I have attached the policy CloudWatchAgentServerPolicy on my NodeGroups IAM role.

Then I executed this command as indicated from the doc.

ClusterName="<...>"
RegionName="<...>"
FluentBitHttpPort='2020'
FluentBitReadFromHead='Off'
[[ ${FluentBitReadFromHead} = 'On' ]] && FluentBitReadFromTail='Off'|| FluentBitReadFromTail='On'
[[ -z ${FluentBitHttpPort} ]] && FluentBitHttpServer='Off' || FluentBitHttpServer='On'
curl https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/quickstart/cwagent-fluent-bit-quickstart.yaml | sed 's/{{cluster_name}}/'${ClusterName}'/;s/{{region_name}}/'${RegionName}'/;s/{{http_server_toggle}}/"'${FluentBitHttpServer}'"/;s/{{http_server_port}}/"'${FluentBitHttpPort}'"/;s/{{read_from_head}}/"'${FluentBitReadFromHead}'"/;s/{{read_from_tail}}/"'${FluentBitReadFromTail}'"/' | kubectl apply -f - 

Next this step I verified on my EKS cluster if the deamonset works properly but I see them restarting in loop.

Logs of cloudwatch-agent pods show the same error:

2022/02/19 15:26:45 I! 2022/02/19 15:26:42 E! ec2metadata is not available
2022/02/19 15:26:42 I! attempt to access ECS task metadata to determine whether I'm running in ECS.
2022/02/19 15:26:43 W! retry [0/3], unable to get http response from http://169.254.170.2/v2/metadata, error: unable to get response from http://169.254.170.2/v2/metadata, error: Get "http://169.254.170.2/v2/metadata": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2022/02/19 15:26:44 W! retry [1/3], unable to get http response from http://169.254.170.2/v2/metadata, error: unable to get response from http://169.254.170.2/v2/metadata, error: Get "http://169.254.170.2/v2/metadata": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2022/02/19 15:26:45 W! retry [2/3], unable to get http response from http://169.254.170.2/v2/metadata, error: unable to get response from http://169.254.170.2/v2/metadata, error: Get "http://169.254.170.2/v2/metadata": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2022/02/19 15:26:45 I! access ECS task metadata fail with response unable to get response from http://169.254.170.2/v2/metadata, error: Get "http://169.254.170.2/v2/metadata": context deadline exceeded (Client.Timeout exceeded while awaiting headers), assuming I'm not running in ECS.
I! Detected the instance is OnPrem
2022/02/19 15:26:45 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/bin/default_linux_config.json ...
/opt/aws/amazon-cloudwatch-agent/bin/default_linux_config.json does not exist or cannot read. Skipping it.
2022/02/19 15:26:45 Reading json config file path: /etc/cwagentconfig/..2022_02_19_15_26_37.192607950/cwagentconfig.json ...
2022/02/19 15:26:45 Find symbolic link /etc/cwagentconfig/..data 
2022/02/19 15:26:45 Find symbolic link /etc/cwagentconfig/cwagentconfig.json 
2022/02/19 15:26:45 Reading json config file path: /etc/cwagentconfig/cwagentconfig.json ...
Valid Json input schema.
Got Home directory: /root
No csm configuration found.
No metric configuration found.
Configuration validation first phase succeeded
 
2022/02/19 15:26:45 I! Config has been translated into TOML /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml 
2022-02-19T15:26:45Z I! Starting AmazonCloudWatchAgent 1.247348.0
2022-02-19T15:26:45Z I! Loaded inputs: cadvisor k8sapiserver
2022-02-19T15:26:45Z I! Loaded aggregators: 
2022-02-19T15:26:45Z I! Loaded processors: ec2tagger k8sdecorator
2022-02-19T15:26:45Z I! Loaded outputs: cloudwatchlogs
2022-02-19T15:26:45Z I! Tags enabled: 
2022-02-19T15:26:45Z I! [agent] Config: Interval:1m0s, Quiet:false, Hostname:"ip-192-168-78-202.eu-central-1.compute.internal", Flush Interval:1s
2022-02-19T15:26:45Z I! [logagent] starting
2022-02-19T15:26:45Z I! [logagent] found plugin cloudwatchlogs is a log backend
2022-02-19T15:30:46Z E! [processors.ec2tagger] ec2tagger: Unable to retrieve InstanceId. This plugin must only be used on an EC2 instance
2022-02-19T15:30:46Z E! [telegraf] Error running agent: could not initialize processor ec2tagger: ec2tagger: Unable to retrieve InstanceId. This plugin must only be used on an EC2 instance

I have already verified if the problem is a network problem, but with this command from inside each node I can get the EC2 instance metadata correctly:

TOKEN=`curl -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600"` && curl -H "X-aws-ec2-metadata-token: $TOKEN" -v http://169.254.169.254/latest/meta-data/instance-id

This error shows up on fluent-bit pods too.

AWS for Fluent Bit Container Image Version 2.10.0
�[1mFluent Bit v1.6.8�[0m
* �[1m�[93mCopyright (C) 2019-2020 The Fluent Bit Authors�[0m
* �[1m�[93mCopyright (C) 2015-2018 Treasure Data�[0m
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2022/02/19 15:26:40] [ info] [engine] started (pid=1)
[2022/02/19 15:26:40] [ info] [storage] version=1.0.6, initializing...
[2022/02/19 15:26:40] [ info] [storage] root path '/var/fluent-bit/state/flb-storage/'
[2022/02/19 15:26:40] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2022/02/19 15:26:40] [ info] [storage] backlog input plugin: storage_backlog.8
[2022/02/19 15:26:40] [ info] [input:systemd:systemd.3] seek_cursor=s=82a20e741bc74377ba38eb0d776ad4dd;i=cb7... OK
[2022/02/19 15:26:40] [ info] [input:storage_backlog:storage_backlog.8] queue memory limit: 4.8M
[2022/02/19 15:26:40] [ info] [input:storage_backlog:storage_backlog.8] register tail.0/1-1645284109.541128008.flb
[2022/02/19 15:26:40] [ info] [input:storage_backlog:storage_backlog.8] register tail.0/1-1645284111.242931767.flb
[2022/02/19 15:26:40] [ info] [input:storage_backlog:storage_backlog.8] register tail.0/1-1645284111.243140450.flb
[2022/02/19 15:26:40] [ info] [input:storage_backlog:storage_backlog.8] register tail.0/1-1645284111.922505134.flb
[2022/02/19 15:26:40] [ info] [input:storage_backlog:storage_backlog.8] register tail.0/1-1645284112.23870133.flb
[2022/02/19 15:26:40] [ info] [input:storage_backlog:storage_backlog.8] register tail.0/1-1645284112.284614847.flb
[2022/02/19 15:26:40] [ info] [input:storage_backlog:storage_backlog.8] register tail.0/1-1645284116.263762778.flb
[2022/02/19 15:26:40] [ info] [input:storage_backlog:storage_backlog.8] register tail.0/1-1645284116.263971979.flb
[2022/02/19 15:26:40] [ info] [input:storage_backlog:storage_backlog.8] register tail.0/1-1645284116.284565147.flb
[2022/02/19 15:26:40] [ info] [input:storage_backlog:storage_backlog.8] register tail.0/1-1645284116.646807894.flb
[2022/02/19 15:26:40] [ info] [input:storage_backlog:storage_backlog.8] register tail.0/1-1645284116.647037408.flb
[2022/02/19 15:26:40] [ info] [input:storage_backlog:storage_backlog.8] register tail.0/1-1645284116.647200365.flb
[2022/02/19 15:26:40] [ info] [filter:kubernetes:kubernetes.0] https=1 host=kubernetes.default.svc port=443
[2022/02/19 15:26:40] [ info] [filter:kubernetes:kubernetes.0] local POD info OK
[2022/02/19 15:26:40] [ info] [filter:kubernetes:kubernetes.0] testing connectivity with API server...
[2022/02/19 15:26:45] [ info] [filter:kubernetes:kubernetes.0] API server connectivity OK
[2022/02/19 15:26:45] [ info] [http_server] listen iface=0.0.0.0 tcp_port=2020
[2022/02/19 15:26:45] [ info] [sp] stream processor started
[2022/02/19 15:26:45] [ info] [input:tail:tail.0] inotify_fs_add(): inode=108010284 watch_fd=1 name=/var/log/containers/aws-load-balancer-controller-859586cf74-rt9ls_kube-system_aws-load-balancer-controller-562ab1af9253b5ca83a3c8acef612683698b7f7ce6ac89da42c1d1277c181f00.log
[...]
[2022/02/19 15:26:46] [error] [filter:aws:aws.2] Could not retrieve ec2 metadata from IMDS
[2022/02/19 15:26:46] [error] [filter:aws:aws.2] Could not retrieve ec2 metadata from IMDS
[2022/02/19 15:26:46] [error] [filter:aws:aws.2] Could not retrieve ec2 metadata from IMDS
[2022/02/19 15:26:46] [error] [filter:aws:aws.2] Could not retrieve ec2 metadata from IMDS
[2022/02/19 15:26:46] [error] [filter:aws:aws.2] Could not retrieve ec2 metadata from IMDS
[2022/02/19 15:26:46] [error] [filter:aws:aws.2] Could not retrieve ec2 metadata from IMDS
[2022/02/19 15:26:46] [error] [filter:aws:aws.2] Could not retrieve ec2 metadata from IMDS
[2022/02/19 15:26:46] [error] [filter:aws:aws.2] Could not retrieve ec2 metadata from IMDS
[2022/02/19 15:26:46] [error] [filter:aws:aws.2] Could not retrieve ec2 metadata from IMDS
[2022/02/19 15:26:46] [ info] [input:tail:tail.4] inotify_fs_add(): inode=52448822 watch_fd=1 name=/var/log/containers/aws-node-mzs4r_kube-system_aws-node-b2ae85e13ca72e02a42ffd3d1832a691a037355de0770945feec31894f27ef3a.log
[...]
[2022/02/19 15:26:46] [error] [filter:aws:aws.3] Could not retrieve ec2 metadata from IMDS
[2022/02/19 15:26:46] [ info] [input:storage_backlog:storage_backlog.8] queueing tail.0:1-1645284109.541128008.flb
[2022/02/19 15:26:46] [ info] [input:storage_backlog:storage_backlog.8] queueing tail.0:1-1645284111.242931767.flb
[2022/02/19 15:26:46] [ info] [input:storage_backlog:storage_backlog.8] queueing tail.0:1-1645284111.243140450.flb
[2022/02/19 15:26:46] [ info] [input:storage_backlog:storage_backlog.8] queueing tail.0:1-1645284111.922505134.flb
[2022/02/19 15:26:46] [ info] [input:storage_backlog:storage_backlog.8] queueing tail.0:1-1645284112.23870133.flb
[2022/02/19 15:26:46] [ info] [input:storage_backlog:storage_backlog.8] queueing tail.0:1-1645284112.284614847.flb
[2022/02/19 15:26:46] [ info] [input:storage_backlog:storage_backlog.8] queueing tail.0:1-1645284116.263762778.flb
[2022/02/19 15:26:46] [ info] [input:storage_backlog:storage_backlog.8] queueing tail.0:1-1645284116.263971979.flb
[2022/02/19 15:26:46] [ info] [input:storage_backlog:storage_backlog.8] queueing tail.0:1-1645284116.284565147.flb
[2022/02/19 15:26:46] [ info] [input:storage_backlog:storage_backlog.8] queueing tail.0:1-1645284116.646807894.flb
[2022/02/19 15:26:46] [ info] [input:storage_backlog:storage_backlog.8] queueing tail.0:1-1645284116.647037408.flb
[2022/02/19 15:26:46] [ info] [input:storage_backlog:storage_backlog.8] queueing tail.0:1-1645284116.647200365.flb
[2022/02/19 15:26:47] [error] [filter:aws:aws.2] Could not retrieve ec2 metadata from IMDS
[2022/02/19 15:26:47] [error] [filter:aws:aws.3] Could not retrieve ec2 metadata from IMDS
[2022/02/19 15:26:50] [ info] [output:cloudwatch_logs:cloudwatch_logs.0] Creating log group /aws/containerinsights/<...>/application
[2022/02/19 15:26:50] [error] [aws_credentials] Could not read shared credentials file /root/.aws/credentials
[2022/02/19 15:26:50] [error] [aws_credentials] Failed to retrieve credentials for AWS Profile default
[2022/02/19 15:26:50] [ warn] [aws_credentials] No cached credentials are available and a credential refresh is already in progress. The current co-routine will retry.
[2022/02/19 15:26:50] [error] [signv4] Provider returned no credentials, service=logs
[2022/02/19 15:26:50] [error] [aws_client] could not sign request

I have a EKS v1.20 cluster created by eksctl and I have 2 NodeGroup, one of OnDemand and one of Spot with same configuration.

What can I do to understand the problem?

Thanks!

@rdonadono
Copy link
Author

I assume I have found the source of the problem in this issue.

@github-actions
Copy link
Contributor

This issue was marked stale due to lack of activity.

@github-actions github-actions bot added the Stale label May 21, 2022
@SaxyPandaBear SaxyPandaBear added aws/eks Amazon Elastic Kubernetes Service and removed Stale labels May 21, 2022
@github-actions
Copy link
Contributor

This issue was marked stale due to lack of activity.

@dtna7
Copy link

dtna7 commented Nov 1, 2022

I have the exact issue, after recently new Launch Templates started to disable v1, and only enabled v2. Is there an option we can set to disable, or bypass the reliance on IMDS?

@Tomer20
Copy link

Tomer20 commented Dec 4, 2022

I experience the same, any progress with this one?

@wolviecb
Copy link

wolviecb commented Dec 5, 2022

I got the same issue with the launch template only enabling V2

@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2023

This issue was marked stale due to lack of activity.

@github-actions github-actions bot added the Stale label Mar 6, 2023
@github-actions
Copy link
Contributor

Closing this because it has stalled. Feel free to reopen if this issue is still relevant, or to ping the collaborator who labeled it stalled if you have any questions.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Apr 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aws/eks Amazon Elastic Kubernetes Service Stale
Projects
None yet
Development

No branches or pull requests

6 participants