-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
terragrunt does not renew the AWS session token automatically when the token is expired #3817
Comments
Hey @gqrlt1207 , How are you assuming the role? We might need more details. Generally speaking, if the role assumption expires in the middle of an OpenTofu/Terraform run, preventing OpenTofu/Terraform from pushing state after an apply, there won't be anything Terragrunt can do. At that stage, OpenTofu/Terraform is in control of the run. I believe you can adjust the limit so that roles are assumable for longer than an hour, by the way:
https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_manage-assume.html |
thanks, @yhakbar , actually, we run everything in the Kubernetes container and use the IRSA approach to get access to the AWS. AWS set a 1-hour limit for the generated temporary, which can not be prolonged. Based on the above restriction, is it possible to start a go-routine once the 'terragrunt' command is called. and this go-routine will watch the environment variables AWS_ACCESS_KEY, AWS_ACCESS_SECRECT, AWS_SESSION_TOKEN, once they are updated, the 'terragrunt' will automatically assume the role using the refreshed AWS session token. what do you think? |
What you're asking for does happen when using the auth-provider-cmd, but it won't help you if your role expires in the middle of an OpenTofu/Terraform run. When Terragrunt spawns the process for OpenTofu/Terraform, environment variables and everything else are set, and OpenTofu/Terraform are in control from there until they finish, and Terragrunt does something else like spawn another process for another unit or something. The problem you're encountering is that in the middle of your OpenTofu/Terraform invocation, credentials are expiring, so OpenTofu/Terraform need to be the thing refreshing the credentials, or you set the expiration high enough that it doesn't matter. The docs I linked above show you how to change the maximum limit for role assumption duration in OIDC mediated assumed roles, which is my understanding of how IRSA works, though I might be mistaken. |
thanks @yhakbar, in our environment, we run 'terragrunt' in the Kubernetes pod, Kubernetes uses the IRSA to generate the temporary AWS session token for the pod to communicate with AWS. in terragrant.hcl file, we set the iam-role which terragrunt will assume by using the temporary AWS session token. there is the role-chaining limit in the AWS side in consideration of security. When we use temporary AWS session tokens to assume other IAM roles, the maximum token duration is 1 hour. this is what happens in our environment. Is it possible for 'terragrunt' to use the 'role' and 'token' file from the pod to assume the role in 'terragrunt'? |
@gqrlt1207 yup! I recommend reading the docs for the auth-provider-cmd. It gives you very fine grain control over how Terragrunt assumes roles. You can write a small scripts that extracts those values, and sends them to Terragrunt like this: "awsRole": {
"roleARN": "role-acquired-from-pod",
"sessionName": "my-session-name",
"duration": 3600,
"webIdentityToken": "token-acquired-from-pod"
}, |
thanks @yhakbar , I will try this. |
Describe the bug
when we tried to use 'terragrunt' to perform AWS RDS cluster migration from one region to another region, the AWS session token expired because the task took more than one hour to complete. this caused the terraform state file not to be updated in the AWS s3 bucket. ( we save all the terraform state files in the s3 bucket).
we run the 'terragrunt' command in Kubernetes pods and use IRSA to access the AWS resources. this makes it impossible to set the duration of the AWS session token to more than 1 hour.
Steps To Reproduce
Steps to reproduce the behavior, code snippets and examples which can be used to reproduce the issue.
Be sure that the maintainers can actually reproduce the issue. Bug reports that are too vague or hard to reproduce are hard to troubleshoot and fix.
// paste code snippets here
Expected behavior
A clear and concise description of what you expected to happen.
Nice to haves
Versions
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: