Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EKS Managed Node Group with CUSTOM AMI CoreDNS issue #3272

Open
1 task done
roman5595 opened this issue Jan 10, 2025 · 7 comments
Open
1 task done

EKS Managed Node Group with CUSTOM AMI CoreDNS issue #3272

roman5595 opened this issue Jan 10, 2025 · 7 comments
Labels

Comments

@roman5595
Copy link

Description

I am trying to deploy EKS with Managed Node Groups that will use Custom AMI. I am also trying to deploy kube-proxy,vpc-cni,coredns via this module. Issue that im experiencing is that coredns is in degraded state and terraform will timeout after 20 minutes.

Warning FailedScheduling 8s (x21 over 3m3s) default-scheduler no nodes available to schedule pods

  • ✋ I have searched the open/closed issues and my issue is not listed.

Versions

  • Module version [Required]: 20.14.0

  • Terraform version: v1.6.4

  • Provider version(s): v5.82.2

Reproduction Code

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 20.14.0"

  cluster_name                     = var.cluster_name
  cluster_version                  = var.cluster_version
  vpc_id                           = var.vpc_id
  subnet_ids                       = var.subnet_ids
  control_plane_subnet_ids         = var.control_plane_subnet_ids

  cloudwatch_log_group_class = "STANDARD" 
  cluster_enabled_log_types = ["audit" , "api" , "authenticator", "scheduler" , "controllerManager" ]

  cluster_addons = {
    kube-proxy = {}
    vpc-cni = {}
    coredns = {}
  }

  enable_cluster_creator_admin_permissions = true

  cluster_endpoint_public_access           = true
  cluster_endpoint_private_access          = false
  authentication_mode                      = "API_AND_CONFIG_MAP"


  access_entries = {


    EKS_DEPLOYER  = {
      kubernetes_groups = []
      principal_arn     = "arn:aws:iam::xxx:role/eks-cluster-role"

      policy_associations = {
        eks_cluster_admin = {
          policy_arn = "arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy"
          access_scope = {
            type = "cluster"
          }
        }
      }
    }
  }

  tags = var.tags
} 
module "logging-nodes" {
  source = "terraform-aws-modules/eks/aws//modules/eks-managed-node-group"

  name            = "logging-nodes"
  cluster_name    = module.eks.cluster_name
  cluster_version = var.worker_node_k8s_version
  cluster_ip_family    = "ipv4"

  subnet_ids = var.subnet_ids
  ami_type = "CUSTOM"
  ami_id  = var.worker_AMI_id

  #####################################################
  # CUSTOM Launch Template 
  #####################################################
  create_launch_template = false
  use_custom_launch_template = true
  launch_template_id = resource.aws_launch_template.custom_ami.id
  cluster_service_cidr= module.eks.cluster_service_cidr

  cluster_primary_security_group_id = module.eks.cluster_primary_security_group_id
  vpc_security_group_ids            = [module.eks.node_security_group_id]


  min_size     = var.min_size
  max_size     = var.max_size
  desired_size = var.desired_size


  update_config= {
      "max_unavailable": 2
    # "max_unavailable_percentage": 33
  }

}

  tags = var.tags

 depends_on = [ module.eks,resource.aws_launch_template.custom_ami ]
}
resource "aws_launch_template" "custom_ami" {
   depends_on = [ module.eks ]
   image_id = var.worker_AMI_id
   instance_type = var.instance_type
   update_default_version= true

  block_device_mappings {
    device_name = "/dev/xvda"
    ebs {
      volume_size           = 20
      volume_type           = "gp3"
      iops                  = 3000
      throughput            = 125
      delete_on_termination = true
    }
  }

   user_data = base64encode(<<-EOT
    MIME-Version: 1.0
    Content-Type: multipart/mixed; boundary="eks-user-data-boundary"

    --eks-user-data-boundary
    Content-Type: text/x-shellscript; charset="us-ascii"

    #!/bin/bash
    set -o errexit
    set -o pipefail
    set -o nounset

    sudo yum install -y https://s3.eu-west-1.amazonaws.com/amazon-ssm-eu-west-1/latest/linux_arm64/amazon-ssm-agent.rpm

    systemctl enable amazon-ssm-agent
    systemctl start amazon-ssm-agent

    touch /run/xtables.lock

    --eks-user-data-boundary
    Content-Type: application/node.eks.aws

    apiVersion: node.eks.aws/v1alpha1
    kind: NodeConfig
    metadata:
      creationTimestamp: null
    spec:
      cluster:
        apiServerEndpoint: "${module.eks.cluster_endpoint}"
        certificateAuthority: "${module.eks.cluster_certificate_authority_data}"
        cidr: "${module.eks.cluster_service_cidr}"
        name: "${module.eks.cluster_name}"
      containerd: {}
      instance:
        localStorage: {}
      kubelet:
        config:        
          clusterDNS:
          - "10.100.0.10"

        flags:
        - "--node-labels=alpha.eksctl.io/cluster-name=${module.eks.cluster_name},alpha.eksctl.io/nodegroup-name=worker-node"

    --eks-user-data-boundary--
  EOT
  )
}

Steps to reproduce the behavior:

Expected behavior

I Expected that once i will run terraform apply, EKS will be successfully created with Managed Node Group using CUSTOM AMI and all 3 addons will be in active state.

Actual behavior

CoreDNS addon is in degraded state due to : no nodes available to schedule pods - coredns replicas on cluster are in pending state.

Additional context

If i use this and exclude coredns, coredns is nevertheless deployed and deployment works without any issues, however i need to use this addon as i want to add additional configuration values :

  cluster_addons = {
    kube-proxy = {}
    vpc-cni = {}
#    coredns = {}
  }

similar github issue : #3062 (resolved)

AMI Details:
Architecture: ARM64
OS: Amazon Linux 2023

@bryantbiggs
Copy link
Member

What's shown is not valid Terraform

@roman5595
Copy link
Author

What exactly is not valid Terraform ? I am able to successfully deploy EKS cluster with nodegroups using this code, issue I have is with coredns addon.

@bryantbiggs
Copy link
Member

launch_template_id = resource.aws_launch_template.custom_ami.id

Is not valid

@bryantbiggs
Copy link
Member

also, more importantly - what are you trying to do?

  1. It looks like the custom AMI is a derivative of our AL2023 EKS optimized AMI? If so, just set the correct AMI type and specify that you want the bootstrap user data, no need for the externally created launch template
  2. SSM is already installed on the EKS AMIs, no need to try to re-install
  3. This depends_on = [ module.eks,resource.aws_launch_template.custom_ami ] is bad and should not be used, there are implicit dependency relationships established
  4. Why not define the node group in the cluster definition? This provides better support for correct ordering. Moving the nodegroup out of the main definition means you now need to somehow control ordering (if possible)
  5. Is this really a custom AMI or are you just pulling one of the EKS AMI IDs from SSM?

@roman5595
Copy link
Author

Hi,

First of all, thank you for your reply and assistance.

The AMI I will be using for my EKS cluster is a custom company AMI, but it is derived from the EKS-optimized AMI. I managed to get the setup working by specifying the AMI type directly instead of using the CUSTOM AMI type.

Could you clarify why it is recommended to define node groups directly in the cluster definition instead of using a separate module for node groups? I went through the FAQs but couldn’t find anything suggesting that this approach is preferred. Lets say I would like to upgrade ingress + system nodegroups first and after that application nodegroups and other nodegroups. Isnt in this case better to use module for managed node groups ?

@jonassteinberg1
Copy link

@roman5595 What do you mean when you say, "I managed to get the setup working by specifying the AMI type directly instead of using the CUSTOM AMI type."? Can you be specific here? I'm facing a very similar issue where I'm simply trying to use the Ubuntu EKS ami and it seems fairly straightforward on what to do, i.e. use a CUSTOM ami_type and then simply supply the ami_id in the aws eks module node group, but my nodes always fail to join the cluster. I'm thinking maybe the resolution you arrived at may solve my issue?

@roman5595
Copy link
Author

In my case I use AmazonLinux2023 EKS Optimized AMI(not direct official AWS AMI), so this means my config can look like this :

eks_managed_node_group_defaults = {
ami_type = "AL2023_x86_64_STANDARD" # you can specify particular AMI_TYPE if you are going to use derivative of EKS optimized Image
ami_id = "ami-xxxx"
create_launch_template = true
enable_bootstrap_user_data = true
}

According to doc : https://docs.aws.amazon.com/eks/latest/APIReference/API_Nodegroup.html#AmazonEKS-Type-Nodegroup-amiType , I dont see Ubuntu EKS AMI option here, so I think you would have to use CUSTOM ami_type and in this is different from my case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants