-
Notifications
You must be signed in to change notification settings - Fork 649
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: How to avoid drift in AMI IDs #4460
Comments
The only option the module currently supportss is to refer the AMI ref via an SSM paramater. In that case the one in the launch template is ingored. Terraform apply will still drift, but the AMI can be managed without runnig terraform. |
I have written a lambda function that triggers daily and updates the ami ref in the ssm parameter. As you said, there is still drift but at least I don't have to remember updating AMI IDs every now and then. @npalm do you think that this is a good approach and could be added to the base module? I can work on a PR if you think this is a good addition to the module capabilities. |
I think we should make, also have a look at #4464
Can you eloborate a bit more what the Lambda function does. Adding it to the base module could be fine. But let's also see if we can work on simply it all. @claytonolley any thoughts? |
Some background on my use case:
|
@npalm I reviewed #4464 and although I think the usage with Attaching my workaround function and terraform resources. AMI Updater Functionimport boto3
import logging
import os
from botocore.exceptions import ClientError
# Configure logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)
# Get dry run mode from environment variable
DRY_RUN = os.environ.get("DRY_RUN", "true").lower() == "true"
# SSM parameter name
SSM_PARAMETER_NAME = "/github-action-runners/latest_ami_id"
def get_latest_ami(ec2_client):
"""Get the latest AMI ID based on the filter criteria."""
try:
response = ec2_client.describe_images(
Owners=["135269210855"],
Filters=[
{"Name": "name", "Values": ["runs-on-v2.2-ubuntu24-full-x64-*"]},
{"Name": "state", "Values": ["available"]},
],
)
if not response["Images"]:
raise Exception("No matching AMIs found")
# Sort by creation date to get the latest
sorted_images = sorted(
response["Images"], key=lambda x: x["CreationDate"], reverse=True
)
return sorted_images[0]["ImageId"]
except ClientError as e:
logger.error(f"Error getting latest AMI: {str(e)}")
raise
def get_current_ami_id(ssm_client):
"""Get the current AMI ID from SSM Parameter Store."""
try:
response = ssm_client.get_parameter(Name=SSM_PARAMETER_NAME)
return response["Parameter"]["Value"]
except ClientError as e:
if e.response["Error"]["Code"] == "ParameterNotFound":
logger.info(f"Parameter {SSM_PARAMETER_NAME} not found")
return None
logger.error(f"Error getting current AMI ID from SSM: {str(e)}")
raise
def update_ami_parameter(ssm_client, ami_id):
"""Update the SSM parameter with the new AMI ID."""
try:
current_ami = get_current_ami_id(ssm_client)
if current_ami == ami_id:
logger.info(f"SSM parameter already contains latest AMI ID: {ami_id}")
return True, "Already using latest AMI"
if DRY_RUN:
logger.info(
f"[DRY RUN] Would update SSM parameter from AMI {current_ami} to {ami_id}"
)
return True, "Would update AMI (Dry Run)"
ssm_client.put_parameter(
Name=SSM_PARAMETER_NAME, Value=ami_id, Type="String", Overwrite=True
)
logger.info(
f"Successfully updated SSM parameter from AMI {current_ami} to {ami_id}"
)
return True, "Updated successfully"
except ClientError as e:
error_msg = str(e)
logger.error(f"Error updating SSM parameter: {error_msg}")
return False, f"Error: {error_msg}"
def lambda_handler(event, context):
"""Main Lambda handler."""
ec2_client = boto3.client("ec2")
ssm_client = boto3.client("ssm")
logger.info(f"Running in {'DRY RUN' if DRY_RUN else 'LIVE'} mode")
try:
# Get the latest AMI ID
latest_ami_id = get_latest_ami(ec2_client)
logger.info(f"Found latest AMI: {latest_ami_id}")
# Update SSM parameter
success, message = update_ami_parameter(ssm_client, latest_ami_id)
return {
"statusCode": 200 if success else 500,
"body": {
"dry_run": DRY_RUN,
"overall_success": success,
"latest_ami": latest_ami_id,
"message": f"{'[DRY RUN] ' if DRY_RUN else ''}{message}",
},
}
except Exception as e:
logger.error(f"Error in lambda execution: {str(e)}")
return {
"statusCode": 500,
"body": {"dry_run": DRY_RUN, "error": str(e), "overall_success": False},
} Terraform Resourcesdata "aws_iam_policy_document" "ami_updater_assume_role" {
statement {
actions = ["sts:AssumeRole"]
principals {
type = "Service"
identifiers = ["lambda.amazonaws.com"]
}
}
}
data "aws_iam_policy_document" "ami_updater_permissions" {
statement {
actions = [
"ec2:DescribeImages"
]
resources = ["*"]
}
statement {
actions = [
"ssm:PutParameter",
"ssm:GetParameter"
]
resources = ["arn:aws:ssm:*:*:parameter/github-action-runners/latest_ami_id"]
}
statement {
actions = [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents"
]
resources = ["arn:aws:logs:*:*:*"]
}
}
resource "aws_iam_role" "ami_updater" {
name = "${var.runner_prefix}-ami-updater"
assume_role_policy = data.aws_iam_policy_document.ami_updater_assume_role.json
}
resource "aws_iam_role_policy" "ami_updater" {
name = "${var.runner_prefix}-ami-updater-policy"
role = aws_iam_role.ami_updater.id
policy = data.aws_iam_policy_document.ami_updater_permissions.json
}
data "archive_file" "ami_updater" {
type = "zip"
source_file = "${path.module}/ami_updater.py"
output_path = "${path.module}/ami_updater.zip"
}
resource "aws_lambda_function" "ami_updater" {
filename = data.archive_file.ami_updater.output_path
source_code_hash = data.archive_file.ami_updater.output_base64sha256
function_name = "${var.runner_prefix}-ami-updater"
role = aws_iam_role.ami_updater.arn
handler = "ami_updater.lambda_handler"
runtime = "python3.9"
timeout = 30
environment {
variables = {
LOG_LEVEL = "INFO"
DRY_RUN = "false" # Set to "false" to enable actual updates
}
}
tags = merge(local.github_runner_additional_tags, var.tags)
}
resource "aws_cloudwatch_event_rule" "ami_updater" {
name = "${var.runner_prefix}-ami-updater"
description = "Trigger AMI updater Lambda function daily"
schedule_expression = "rate(1 day)"
}
resource "aws_cloudwatch_event_target" "ami_updater" {
rule = aws_cloudwatch_event_rule.ami_updater.name
target_id = "TriggerAMIUpdaterLambda"
arn = aws_lambda_function.ami_updater.arn
}
resource "aws_lambda_permission" "ami_updater" {
statement_id = "AllowEventBridgeInvoke"
action = "lambda:InvokeFunction"
function_name = aws_lambda_function.ami_updater.function_name
principal = "events.amazonaws.com"
source_arn = aws_cloudwatch_event_rule.ami_updater.arn
}
resource "aws_ssm_parameter" "latest_ami_id" {
name = "/github-action-runners/latest_ami_id"
description = "Latest AMI ID for GitHub runners"
type = "String"
value = "placeholder" # Will be updated by Lambda function
tags = merge(local.github_runner_additional_tags, var.tags)
} I also have created a draft PR in #4488 and would love to contribute. I tried making the code look as close as possible to other functions but will probably need some guidance in testing. (I am still using an old version of the module so didn't want to apply and change in my running infrastructure.) Looking forward for your input! |
Here my thought for the direction we should move witht the module
Wondering what you think about this direction? |
We do more or less the same. We have automated piplelines that runs on our test environment piples after updating our AMI's once all good. THe prod got updated with the the new AMI id via SSM. |
Created a PR to let the launch template use the AMI ID via SSM, See #4517 |
Hello and thank you for this great project. I have the following multi-runner configuration where I use an AMI by runs-on since they are really close to the GitHub Actions runners and are updated regularly. One thing I have noticed is once the terraform module is applied, if the ami-id is deleted by the ami provider, the builds get stuck. And I need to re-apply the module to update the id in the launch template.
Is there a way to make this more dynamic or easier to maintain?
Thanks!
The text was updated successfully, but these errors were encountered: