Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issues with handling unmanaged ENIs with IPv6 only #3122

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

gavinbunney
Copy link
Contributor

@gavinbunney gavinbunney commented Nov 22, 2024

What type of PR is this?
bug

Which issue does this PR fix?:

We have unmanaged trunk ENIs attached to our worker nodes which are designed for ipv6 only networks. These ENIs are tagged with node.k8s.amazonaws.com/no_manage, however the listing of attached ENIs happens before the IPAMD process filters those ENIs. As such, the metadata retrieval process fails when looking up the ipv4 address details for these ENIs, and causes the aws-k8s-agent process to exit with an initialization failure.

In this log, eni-084b51xxxxxx / 0e:7c:f9:xx:xx:xx is the aws-vpc-cni managed ENI, and eni-0b2cf8b6xxxxxx / 0e:f3:73:xx:xx:xx is our managed eni:

{"level":"debug","ts":"2024-11-22T16:49:26.314Z","caller":"awsutils/awsutils.go:1317","msg":"Total number of interfaces found: 2 "}
{"level":"debug","ts":"2024-11-22T16:49:26.314Z","caller":"awsutils/awsutils.go:567","msg":"Found ENI MAC address: 0e:7c:f9:xx:xx:xx"}
{"level":"debug","ts":"2024-11-22T16:49:26.316Z","caller":"awsutils/awsutils.go:567","msg":"Found ENI: eni-084b51xxxxxx, MAC 0e:7c:f9:xx:xx:xx, device 0"}
{"level":"debug","ts":"2024-11-22T16:49:26.318Z","caller":"awsutils/awsutils.go:567","msg":"Found IPv6 addresses associated with interface. This is not efa-only interface"}
{"level":"debug","ts":"2024-11-22T16:49:26.322Z","caller":"awsutils/awsutils.go:567","msg":"Found ENI MAC address: 0e:f3:73:xx:xx:xx"}
{"level":"debug","ts":"2024-11-22T16:49:26.324Z","caller":"awsutils/awsutils.go:567","msg":"Found ENI: eni-0b2cf8b6xxxxxx, MAC 0e:f3:73:xx:xx:xx, device 1"}
{"level":"debug","ts":"2024-11-22T16:49:26.326Z","caller":"awsutils/awsutils.go:567","msg":"Found IPv6 addresses associated with interface. This is not efa-only interface"}
{"level":"warn","ts":"2024-11-22T16:49:26.327Z","caller":"awsutils/imds.go:376","msg":"failed to retrieve network/interfaces/macs/0e:f3:73:xx:xx:xx/subnet-ipv4-cidr-block from instance metadata EC2MetadataError: failed to make EC2Metadata request\n<?xml version=\"1.0\" encoding=\"iso-8859-1\"?>\n<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\"\n\t\t \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\n<html xmlns=\"http://www.w3.org/1999/xhtml\" xml:lang=\"en\" lang=\"en\">\n <head>\n  <title>404 - Not Found</title>\n </head>\n <body>\n  <h1>404 - Not Found</h1>\n </body>\n</html>\n\n\tstatus code: 404, request id: "}
{"level":"error","ts":"2024-11-22T16:49:26.327Z","caller":"aws-k8s-agent/main.go:42","msg":"Initialization failure: ipamd init: failed to retrieve attached ENIs info: DescribeAllENIs: failed to get local ENI metadata: get attached ENIs: failed to retrieve ENI metadata for ENI: 0e:f3:73:xx:xx:xx: EC2MetadataError: failed to make EC2Metadata request\n<?xml version=\"1.0\" encoding=\"iso-8859-1\"?>\n<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\"\n\t\t \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\n<html xmlns=\"http://www.w3.org/1999/xhtml\" xml:lang=\"en\" lang=\"en\">\n <head>\n  <title>404 - Not Found</title>\n </head>\n <body>\n  <h1>404 - Not Found</h1>\n </body>\n</html>\n\n\tstatus code: 404, request id: "}

What does this PR do / Why do we need it?:

This PR:

  • Gracefully handles missing IPv4 information for ENIs which are IPv6 only (using same pattern that the IPv6 IPs lookup uses)
  • Selects the trunk ENI by only including enis with IPv4 addresses (this isn't much better than the current select any trunk logic, but gets around the issue for now)

Testing done on this change:

Added unit tests to cover the new paths. Running in our EKS cluster the ENIs are now retrieved successfully (In this log, eni-084b51xxxxxx / 0e:7c:f9:xx:xx:xx is the aws-vpc-cni managed ENI, and eni-0b2cf8b6xxxxxx / 0e:f3:73:xx:xx:xx is our managed eni):

{"level":"debug","ts":"2024-11-22T17:45:13.815Z","caller":"awsutils/awsutils.go:1325","msg":"Total number of interfaces found: 2 "}
{"level":"debug","ts":"2024-11-22T17:45:13.815Z","caller":"awsutils/awsutils.go:567","msg":"Found ENI MAC address: 0e:7c:f9:8b:06:fd"}
{"level":"debug","ts":"2024-11-22T17:45:13.817Z","caller":"awsutils/awsutils.go:567","msg":"Found ENI: eni-084b51xxxxxx, MAC 0e:7c:f9:xx:xx:xx, device 0"}
{"level":"debug","ts":"2024-11-22T17:45:13.818Z","caller":"awsutils/awsutils.go:567","msg":"Found IPv6 addresses associated with interface. This is not efa-only interface"}
{"level":"debug","ts":"2024-11-22T17:45:13.819Z","caller":"awsutils/awsutils.go:567","msg":"Found ENI MAC address: 0e:f3:73:xx:xx:xx"}
{"level":"debug","ts":"2024-11-22T17:45:13.821Z","caller":"awsutils/awsutils.go:567","msg":"Found ENI: eni-0b2cf8b6xxxxxx, MAC 0e:f3:73:xx:xx:xx, device 1"}
{"level":"debug","ts":"2024-11-22T17:45:13.822Z","caller":"awsutils/awsutils.go:567","msg":"Found IPv6 addresses associated with interface. This is not efa-only interface"}
{"level":"info","ts":"2024-11-22T17:45:14.003Z","caller":"ipamd/ipamd.go:424","msg":"Got network card index 0 for ENI eni-084b51xxxxxx"}
{"level":"info","ts":"2024-11-22T17:45:14.004Z","caller":"ipamd/ipamd.go:424","msg":"eni-084b51xxxxxx is of type: interface"}
{"level":"info","ts":"2024-11-22T17:45:14.004Z","caller":"ipamd/ipamd.go:424","msg":"Got network card index 0 for ENI eni-0b2cf8b6xxxxxx"}
{"level":"info","ts":"2024-11-22T17:45:14.004Z","caller":"ipamd/ipamd.go:424","msg":"eni-0b2cf8b6xxxxxx is of type: trunk"}
{"level":"debug","ts":"2024-11-22T17:45:14.004Z","caller":"ipamd/ipamd.go:385","msg":"DescribeAllENIs success: ENIs: 2, tagged: 2"}

Will this PR introduce any new dependencies?:
n/a

Will this break upgrades or downgrades? Has updating a running cluster been tested?:
Tested with upgrading inplace without issues

Does this change require updates to the CNI daemonset config files to work?:
n/a

Does this PR introduce any user-facing change?:
n/a

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@gavinbunney gavinbunney requested a review from a team as a code owner November 22, 2024 19:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant