Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom network ENI fails silently due to lack of detailed spec defination for ENIConfig CRD #2416

Closed
wdai9162 opened this issue Jun 11, 2023 · 5 comments
Assignees

Comments

@wdai9162
Copy link

wdai9162 commented Jun 11, 2023

What happened: when spec.securityGroups in custom resource ENIConfig is defined as a key:value pair instead of a list, the creation and attachment of EKS node secondary ENI fails silently during tryAllocateENI(). The EKS node remains in Ready status but doesn't have any data plane resource to schedule pods.

Attach logs:

From /var/log/aws-routed-eni/ipamd.log, the node acknowledges that "Custom networking enabled true" and "Found ENI Config Name: ap-southeast-2c". However nothing happens thereafter nor any error is logged. See below log extraction:

{"level":"info","ts":"2023-05-31T10:56:23.754Z","caller":"ipamd/ipamd.go:387","msg":"Custom networking enabled true"}
....
....
{"level":"info","ts":"2023-05-31T10:56:26.758Z","caller":"ipamd/ipamd.go:845","msg":"Found ENI Config Name: ap-southeast-2c"} 
<nothing about custom network is logged hereafter, no confirmation of security group ID nor subnet ID> 

What you expected to happen:
EKS node secondary ENI gets successfully created, attahced and assigned with custom IP addresses from subnet defined in ENIConfig. Successful /var/log/aws-routed-eni/ipamd.log should look like the below:

{"level":"info","ts":"2023-06-11T04:28:13.649Z","caller":"ipamd/ipamd.go:838","msg":"Found ENI Config Name: ap-southeast-2a"}
{"level":"info","ts":"2023-06-11T04:28:13.751Z","caller":"ipamd/ipamd.go:812","msg":"ipamd: using custom network config: [sg-046e1029d6f0d8552], subnet-061ab7a98c5f1b531"}
{"level":"debug","ts":"2023-06-11T04:28:13.751Z","caller":"ipamd/ipamd.go:812","msg":"Found security-group id: sg-046e1029d6f0d8552"}
{"level":"info","ts":"2023-06-11T04:28:13.751Z","caller":"awsutils/awsutils.go:733","msg":"Using a custom network config for the new ENI"}
{"level":"info","ts":"2023-06-11T04:28:13.751Z","caller":"awsutils/awsutils.go:733","msg":"Creating ENI with security groups: [sg-046e1029d6f0d8552] in subnet: subnet-061ab7a98c5f1b531"}
{"level":"info","ts":"2023-06-11T04:28:14.146Z","caller":"awsutils/awsutils.go:733","msg":"Created a new ENI: eni-0bdaf3f4cac1d5c9c"}
{"level":"debug","ts":"2023-06-11T04:28:14.234Z","caller":"awsutils/awsutils.go:776","msg":"Discovered device number is used: 0"}
{"level":"debug","ts":"2023-06-11T04:28:14.234Z","caller":"awsutils/awsutils.go:776","msg":"Found a free device number: 1"}
{"level":"info","ts":"2023-06-11T04:28:15.305Z","caller":"ipamd/ipamd.go:853","msg":"Successfully created and attached a new ENI eni-0bdaf3f4cac1d5c9c to instance"}
(allocating IP addresses hereafter...) 

How to reproduce it (as minimally and precisely as possible): the issue can be reproduced using the BAD example template to create ENIConfig object

========BAD example========
$ k get ENIconfig ap-southeast-2a -o yaml
apiVersion: crd.k8s.amazonaws.com/v1alpha1 
kind: ENIConfig
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration : |
      {"apiVersion":"crd.k8s.amazonaws.com/v1alpha1","kind":"ENIConfig","metadata":{"annotations":{},"name":"ap-southeast-2a"},"spec":{"securityGroups":"sg-046e1029d6f0d8552","subnet":"subnet-061ab7a98c5f1b531"} }
  creationTimestamp: "2023-05-30T05:47:09Z"
  generation: 2
  name: ap-southeast-2a
  resourceVersion: "15824196"
  uid: fdee5606-1a62-4c58-8c6b-f21af6fb39c7
spec:
  securityGroups: sg-046e1029d6f0d8552                              <---------------- THIS NEEDS TO BE A YAML LIST
  subnet: subnet-061ab7a98c5f1b531
========BAD example========
========GOOD example========
$ k get ENIconfig ap-southeast-2a -o yaml
apiVersion: crd.k8s.amazonaws.com/v1alpha1 
kind: ENIConfig
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration : |
      {"apiVersion":"crd.k8s.amazonaws.com/v1alpha1","kind":"ENIConfig","metadata":{"annotations":{},"name":"ap-southeast-2a"},"spec":{"securityGroups":["sg-046e1029d6f0d8552"],"subnet":"subnet-061ab7a98c5f1b531"} }
  creationTimestamp: "2023-05-30T05:47:09Z"
  generation: 3
  name: ap-southeast-2a
  resourceVersion: "15827895"
  uid: fdee5606-1a62-4c58-8c6b-f21af6fb39c7
spec:
  securityGroups:
  - sg-046e1029d6f0d8552
  subnet: subnet-061ab7a98c5f1b531
========GOOD example========

Anything else we need to know?:

Environment: tested on both 1.24 and 1.26

  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.2", GitCommit:"7f6f68fdabc4df88cfea2dcf9a19b2b830f1e647", GitTreeState:"clean", BuildDate:"2023-05-17T14:20:07Z", GoVersion:"go1.20.4", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v5.0.1
  • CNI Version: tested on v1.12.5-eksbuild.2 and v1.10.4-eksbuild.1
  • OS (e.g: cat /etc/os-release):
  • Kernel (e.g. uname -a):
@wdai9162 wdai9162 added the bug label Jun 11, 2023
@jdn5126 jdn5126 self-assigned this Jun 11, 2023
@github-actions
Copy link

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days

@github-actions github-actions bot added the stale Issue or PR is stale label Aug 11, 2023
@jdn5126 jdn5126 added enhancement and removed stale Issue or PR is stale labels Aug 11, 2023
@github-actions
Copy link

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days

@github-actions github-actions bot added the stale Issue or PR is stale label Oct 11, 2023
@jdn5126 jdn5126 removed the stale Issue or PR is stale label Oct 11, 2023
@jdn5126 jdn5126 assigned jchen6585 and unassigned jdn5126 Oct 13, 2023
Copy link

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days

@github-actions github-actions bot added the stale Issue or PR is stale label Dec 13, 2023
@jdn5126 jdn5126 removed the stale Issue or PR is stale label Dec 14, 2023
@jdn5126
Copy link
Contributor

jdn5126 commented Jan 25, 2024

Closing this in favor of the container roadmap tracking issue: aws/containers-roadmap#867

This will be referenced in that issue and in aws/containers-roadmap#1709

@jdn5126 jdn5126 closed this as completed Jan 25, 2024
Copy link

This issue is now closed. Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants