Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terraform shows false changes in user_data #3227

Open
anzinchenko89 opened this issue Dec 3, 2024 · 4 comments
Open

Terraform shows false changes in user_data #3227

anzinchenko89 opened this issue Dec 3, 2024 · 4 comments
Labels

Comments

@anzinchenko89
Copy link

anzinchenko89 commented Dec 3, 2024

Description

We are using EKS managed node group and AL2023. Durning the repository TF code changes that are not even related to the node groups and user_data, terraform always shows as user_data is being updated in place, but nothing has changed and it makes it appear like the Terraform is going to update the node group but it's false changes.

  • [+] ✋ I have searched the open/closed issues and my issue is not listed.

Versions

  • Module version [Required]:
    20.30.1

  • Terraform version:
    1.9.4

  • Provider version(s):

  • provider registry.terraform.io/hashicorp/aws v5.78.0
  • provider registry.terraform.io/hashicorp/cloudinit v2.3.5
  • provider registry.terraform.io/hashicorp/local v2.5.2
  • provider registry.terraform.io/hashicorp/null v3.2.3
  • provider registry.terraform.io/hashicorp/time v0.12.1
  • provider registry.terraform.io/hashicorp/tls v4.0.6
  • provider registry.terraform.io/hashicorp/vault v4.5.0
  • provider registry.terraform.io/integrations/github v6.4.0

Reproduction Code [Required]

module "nodegroup" {
  source   = "./node_group"
  for_each = local.worker_node_groups

  cluster_name              = var.cluster_name
  eks_cluster_name          = time_sleep.cluster.triggers["cluster_name"]
  cluster_endpoint          = time_sleep.cluster.triggers["cluster_endpoint"]
  cluster_auth_base64       = time_sleep.cluster.triggers["cluster_certificate_authority_data"]
  cluster_service_ipv4_cidr = var.eks_cluster_service_ipv4_cidr
  instance_type             = each.value.instance_type
  max_nodes                 = each.value.max_nodes
  min_nodes                 = each.value.min_nodes
  name                      = each.value.name

  block_device_mappings = {
    # Root volume
    xvda = {
      device_name = "/dev/xvda"
      ebs = {
        volume_size           = 50
        volume_type           = "gp3"
        iops                  = 3000
        throughput            = 125
        delete_on_termination = true
        encrypted             = true
      }
    }
    xvdb = {
      device_name = local.second_volume_name
      ebs = {
        volume_size           = each.value.root_volume_size
        volume_type           = "gp3"
        iops                  = 3000
        throughput            = 125
        delete_on_termination = true
        encrypted             = true
      }
    }
  }
  security_groups            = each.value.security_groups
  subnet_ids                 = var.worker_subnet_ids
  eks_worker_arn             = each.value.eks_worker_arn
  eks_node_group_ami_id      = var.eks_node_group_ami_id
  enable_bootstrap_user_data = true
  ami_type                   = "AL2023_x86_64_STANDARD"
  cloudinit_pre_nodeadm = [{
    content_type = "text/x-shellscript; charset=\"us-ascii\""
    content      = <<-EOT
      #!/bin/bash
      # This user data mounts the containerd directories to the second EBS volume which
      # It's being used pretty long bash script, so just to provide an example of the code structure
    EOT
    },
    {
      content_type = "application/node.eks.aws"
      content      = <<-EOT
        ---
        apiVersion: node.eks.aws/v1alpha1
        kind: NodeConfig
        spec:
          kubelet:
            config:
              registerWithTaints:
                - key: "ebs.csi.aws.com/agent-not-ready"
                  effect: "NoExecute"
                  value: "NoExecute"
                - key: "efs.csi.aws.com/agent-not-ready"
                  effect: "NoExecute"
                  value: "NoExecute"
              evictionHard:
                memory.available: "100Mi"
                nodefs.available: "10%"
                nodefs.inodesFree: "5%"
                imagefs.available: "15%"
                imagefs.inodesFree: "5%"
              evictionSoft:
                nodefs.available: "15%"
                nodefs.inodesFree: "10%"
                imagefs.available: "20%"
                imagefs.inodesFree: "10%"
              evictionSoftGracePeriod:
                nodefs.available: 60s
                nodefs.inodesFree: 60s
                imagefs.available: 60s
                imagefs.inodesFree: 60s
              evictionMaxPodGracePeriod: 180
              evictionPressureTransitionPeriod: 5m
              evictionMinimumReclaim:
                nodefs.available: 1Gi
                imagefs.available: 1Gi
      EOT
  }]
  aws_tags     = merge(var.aws_tags, each.value["tags"])
  default_tags = var.default_tags
  labels       = each.value.node_labels
  taints       = each.value.node_taints

  depends_on = [
    aws_eks_cluster.cluster
  ]
}

The module node_group is located within the our local repo and contains the following:

module "user_data" {
  source                     = "terraform-aws-modules/eks/aws//modules/_user_data"
  version                    = "~> 20.0"
  create                     = true
  ami_type                   = var.ami_type
  cluster_name               = var.eks_cluster_name
  cluster_endpoint           = var.cluster_endpoint
  cluster_auth_base64        = var.cluster_auth_base64
  cluster_service_ipv4_cidr  = var.cluster_service_ipv4_cidr
  enable_bootstrap_user_data = var.enable_bootstrap_user_data
  pre_bootstrap_user_data    = var.pre_bootstrap_user_data
  post_bootstrap_user_data   = var.post_bootstrap_user_data
  bootstrap_extra_args       = var.bootstrap_extra_args
  user_data_template_path    = var.user_data_template_path
  cloudinit_pre_nodeadm      = var.cloudinit_pre_nodeadm
}
resource "aws_launch_template" "workers" {
  name_prefix   = "${var.name}.${var.cluster_name}-"
  image_id      = var.eks_node_group_ami_id
  instance_type = var.instance_type
  ebs_optimized = true
  metadata_options {
    http_endpoint               = "enabled"
    http_tokens                 = "required"
    http_put_response_hop_limit = 2
  }
  monitoring {
    enabled = false
  }
  network_interfaces {
    device_index          = 0
    security_groups       = var.security_groups
    delete_on_termination = true
  }
  user_data = module.user_data.user_data
  dynamic "tag_specifications" {
    for_each = toset(var.tag_specifications)
    content {
      resource_type = tag_specifications.key
      tags = merge(
        var.aws_tags,
        var.default_tags,
        {
          Name = "${var.name}.${var.cluster_name}"
        }
      )
    }
  }
  tags = merge(
    var.aws_tags,
    {
      Name = "${var.name}.${var.cluster_name}"
    }
  )
  dynamic "block_device_mappings" {
    for_each = var.block_device_mappings

    content {
      device_name = try(block_device_mappings.value.device_name, null)

      dynamic "ebs" {
        for_each = try([block_device_mappings.value.ebs], [])

        content {
          delete_on_termination = try(ebs.value.delete_on_termination, null)
          encrypted             = try(ebs.value.encrypted, null)
          iops                  = try(ebs.value.iops, null)
          kms_key_id            = try(ebs.value.kms_key_id, null)
          snapshot_id           = try(ebs.value.snapshot_id, null)
          throughput            = try(ebs.value.throughput, null)
          volume_size           = try(ebs.value.volume_size, null)
          volume_type           = try(ebs.value.volume_type, null)
        }
      }

      no_device    = try(block_device_mappings.value.no_device, null)
      virtual_name = try(block_device_mappings.value.virtual_name, null)
    }
  }
}

Steps to reproduce the behavior:

Even after adding a new TF resources or changing any different the piece of code not related to the launch_template, user_data, it causes the "changes" in user_data

# module.cluster.module.nodegroup["default"].aws_eks_node_group.workers will be updated in-place
  ~ resource "aws_eks_node_group" "workers" {
        id                     = "###########"
        tags                   = {
            "Name"                                                = "eks_cluster.net"
            "k8s.io/cluster-autoscaler/eks_cluster" = "owned"
            "k8s.io/cluster-autoscaler/enabled"                   = "true"
        }
      ~ launch_template {
            id      = "lt-085883b0718ea3681"
            name    = "default.eks_cluster.net-2024060614060058700000002a"
          ~ version = "24" -> (known after apply)
        }

        # (3 unchanged blocks hidden)
    }
  ~ resource "aws_launch_template" "workers" {
        id                                   = "lt-085883b0718ea3681"
      ~ latest_version                       = 24 -> (known after apply)
        name                                 = "default.eks_cluster.net-2024060614060058700000002a"
        tags                                 = {
            "Name" = "default.eks_cluster.net"
        }
      ~ user_data                            = "Q29udGVudC1UeXBlOiBtdWx0aXBhcnQvbWl4ZWQ7IGJvdW5kYXJ5PSJNSU1FQk9VTkRBUlkiCk1JTUUtVmVyc2lvbjogMS4wDQoNCi0tTUlNRUJPVU5EQVJZDQpDb250ZW50LVRyYW5zZmVyLUVuY29kaW5nOiA3Yml0DQpDb250ZW50LVR5cGU6IHRleHQveC..........

After applying this plan, the launch template has not been updating and latest version is being used still the same (in this particular case launch template version remains as 24)

Expected behavior

After adding changes not related to user_data, launch_template, node_groups the terraform shouldn't consider the user_data to be updated-in place

Actual behavior

Terraform always detect the user_data drift even if the changes in a repository were applied to the resources not related to user_data, launch_template, eks node_groups.

@bryantbiggs
Copy link
Member

Why not use the module as it's designed - this is far from what we provide in this module so we won't be able to troubleshoot

@anzinchenko89
Copy link
Author

anzinchenko89 commented Dec 3, 2024

Yep, we're using the module not the way as it's designed, only the "user_data" submodule is needed. Can it be considered as the "wrong" way of using the module?
Actually such kind of the module usage didn't cause any issues till the EKS upgrade to 1.30 and migrating to AL2023 (we did that in September, with the aws provider version 5.66.0). It's hardly likely that moving to the new AMI could have led to the "weird" terraform behavior, but "something" forces the terraform to think that content of user_data has been modified, but it hasn't.

@prajwalakhuj-hcx
Copy link

Hi @bryantbiggs @anzinchenko89, I am also facing exact same issue, Terraform always detect the user_data drift even if the changes in a repository were applied to the resources not related to user_data, launch_template, eks node_groups.
I am using EKS module version of "17.24.0".

@anzinchenko89, please let me know if you got any fix for this! Thanks!

@anzinchenko89
Copy link
Author

Hi @prajwalakhuj-hcx, instead of using the module "user_data", I put the user_data templatefile directly in the launch template, like this

  user_data = base64encode(templatefile("${path.module}/templates/user_data_config.tpl", {
    cluster_name         = var.eks_cluster_name,
    cluster_endpoint     = var.cluster_endpoint,
    cluster_auth_base64  = var.cluster_auth_base64,
    cluster_service_cidr = var.cluster_service_ipv4_cidr,
    region               = var.region,
    volume_name          = var.second_volume_name
  }))

I assume that datasource data "cloudinit_config" causes terraform false changes in the user_data. Actually, there is an opened issue on the provider side, similar to our - hashicorp/terraform-provider-cloudinit#254

So, having removed the datasource cloudinit_config from code, terraform plan started to be shown appropriately. Perhaps, there is a bug in the provider terraform-provider-cloudinit, but definitely it's not an issue within the terraform-aws-eks module.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants