Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fargate tasks not starting with "ResourceInitializationError: unable to pull secrets or registry auth" #2194

Open
NoelLH opened this issue Oct 7, 2023 · 12 comments · Fixed by #2354

Comments

@NoelLH
Copy link

NoelLH commented Oct 7, 2023

I'm trying to use the new Fargate approach in eu-central-1. (The same test repo has been used with Artillery Pro in eu-west-2 before.)

I've confirmed I have a VPC in the region, 3 public subnets, and that Artillery is correctly automatically using those subnets, so I don't think networking per se is the problem.

I've set up a user with the permissions as documented today. I created what's in the docs as a policy and attached this directly to a user group which my user is in – I wasn't totally clear how a role should be used if created.

It seems the ecr:GetAuthorizationToken is part of a policy that Artillery itself sets up via a worker role, and that's why it isn't in the documented policy to be set up manually in AWS. But I'm not sure what to try now to get it to run.

Version info:

Artillery: 2.0.0-37
Node.js:   v18.18.0
OS:        darwin

Running this command:

artillery run-fargate --count 1 --region eu-central-1 --overrides '{\"config\": {\"phases\": [{\"duration\": 1, \"arrivalRate\": 1}]}}' --output reports/report.json --record api-donations.yaml

I expected to see this happen:

A test run on Fargate

Instead, this happened:

Test stopped with:

Launching workers... [14:23:04]
Waiting for Fargate... [14:23:05]
Waiting for workers to start: deprovisioning: 1 [14:23:37]
[
  {
    attachments: [ [Object] ],
    attributes: [ [Object] ],
    availabilityZone: 'eu-central-1a',
    clusterArn: 'arn:aws:ecs:eu-central-1:[AWS_ACCT_ID]:cluster/artilleryio-cluster',
    connectivity: 'CONNECTED',
    connectivityAt: 2023-10-07T13:23:09.090Z,
    containers: [ [Object] ],
    cpu: '4096',
    createdAt: 2023-10-07T13:23:05.779Z,
    desiredStatus: 'STOPPED',
    enableExecuteCommand: false,
    executionStoppedAt: 2023-10-07T13:23:15.947Z,
    group: 'family:artilleryio-loadgen-worker_fargate_artilleryio-cluster_8fa978b3a50ce517e081ee7c126a354204807b1b_155552',
    healthStatus: 'UNKNOWN',
    lastStatus: 'STOPPED',
    launchType: 'FARGATE',
    memory: '8192',
    overrides: {
      containerOverrides: [Array],
      inferenceAcceleratorOverrides: [],
      taskRoleArn: 'arn:aws:iam::[AWS_ACCT_ID]:role/artilleryio-ecs-worker-role'
    },
    platformVersion: '1.4.0',
    platformFamily: 'Linux',
    stopCode: 'TaskFailedToStart',
    stoppedAt: 2023-10-07T13:23:39.068Z,
    stoppedReason: 'ResourceInitializationError: unable to pull secrets or registry auth: execution resource retrieval failed: unable to retrieve ecr registry auth: service call has been retried 1 time(s): AccessDeniedException: User: arn:aws:sts::[AWS_ACCT_ID]:assumed-role/artilleryio-ecs-worker-role/1c0acd7ed5f84dff888dd1811f2922ce is not authorized to perform: ecr:GetAuthorizationToken on resource: * because no identity-based policy allows the ecr:GetAuthorizationToken action status code: 400, request id: ed9fb8dd-d720-4607-986d-8790c14d35b9',
    stoppingAt: 2023-10-07T13:23:25.972Z,
    tags: [],
    taskArn: 'arn:aws:ecs:eu-central-1:[AWS_ACCT_ID]:task/artilleryio-cluster/1c0acd7ed5f84dff888dd1811f2922ce',
    taskDefinitionArn: 'arn:aws:ecs:eu-central-1:[AWS_ACCT_ID]:task-definition/artilleryio-loadgen-worker_fargate_artilleryio-cluster_8fa978b3a50ce517e081ee7c126a354204807b1b_155552:1',
    version: 4,
    ephemeralStorage: { sizeInGiB: 20 }
  }
]
Error: Worker init failure, aborting test
Error: Worker init failure, aborting test
    at waitForTasks2 ([project-dir]/node_modules/@artilleryio/platform-fargate/lib/commands/run-cluster.js:14:19311)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async [project-dir]/node_modules/@artilleryio/platform-fargate/lib/commands/run-cluster.js:13:873
Error: Worker init failure, aborting test
    at waitForTasks2 ([project-dir]/node_modules/@artilleryio/platform-fargate/lib/commands/run-cluster.js:14:19311)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async [project-dir]/node_modules/@artilleryio/platform-fargate/lib/commands/run-cluster.js:13:873
Cleaning up... [14:23:47]
⠼ Error: error sending test data to Artillery Cloud
Test report may be incomplete
Request ID: a8b4c10f-6219-49a3-b989-0a389ad947ff
@hassy
Copy link
Member

hassy commented Oct 9, 2023

Thanks @NoelLH! Looking into it - that permission should be added automatically without you needing to do anything.

@peldax
Copy link

peldax commented Oct 13, 2023

Hi, we encountered the same issue today. We resolved it by manually removing all resources on AWS which forced artillery to recreate everything again, so it looks like some old setting got cached somewhere.

@hassy
Copy link
Member

hassy commented Oct 20, 2023

thanks for chiming in @peldax! @NoelLH - could you try one of:

  1. running the test in a different AWS account, or
  2. removing the old Artillery Pro CloudFormation stack, and then trying again

Everything is working as expected on my end, I've not been able to reproduce the issue.

@NoelLH
Copy link
Author

NoelLH commented Oct 20, 2023

Thanks both!

I'm tight for time at the moment so trying to avoid setting up a distinct AWS account for this if possible @hassy.

I first removed all CloudFormation stacks I could find in all relevant regions & waited for the resource deletions (there was stuff from Artillery Pro and also old Serverless Artillery experiments), but this seemed to make no difference.

I then delete the IAM role "artilleryio-ecs-worker-role" which had no permissions attached, and that changed the AccessDenied detail to:

    authorized to perform: iam:CreatePolicy on resource: policy 
    artilleryio-ecs-worker-policy because no identity-based policy allows the 
    iam:CreatePolicy action

Each time, it seems to create the worker role again OK but not any permissions/policies for it.

@NoelLH
Copy link
Author

NoelLH commented Oct 26, 2023

I think I've sorted this for our account.

I believe the problems were a combination of the all-or-nothing approach to the worker role creation, and 2 errors in the Artillery docs for Fargate which meant some of the required permissions weren't there when enough of the IAM resources were repeatedly deleted for Artillery to attempt their recreation:

  1. arn:aws:iam::123456789000:policy/ecs-worker-policy should be arn:aws:iam::123456789000:policy/artilleryio-ecs-worker-policy
  2. iam:AttachRolePolicy is required for resource arn:aws:iam::123456789000:role/artilleryio-ecs-worker-role, not [just] for the policy

@zeeshanpolaris
Copy link

I am unable to use fargate now with the new task definitions that have parameter store secrets. I was able to run in fargate a few months ago. This is what I am getting as a reason for task stopping.

ResourceInitializationError: unable to pull secrets or registry auth: execution resource retrieval failed: unable to retrieve secrets from ssm: service call has been retried 1 time(s): invalid ssm parameters: /artilleryio/ARTIFACTORY_AUTH,/artilleryio/ARTIFACTORY_EMAIL,/artilleryio/NPMRC,/artilleryio/NPM_REGISTRY,/artilleryio/NPM_SCOPE,/artilleryio/NPM_SCOPE_REGISTRY,/artilleryio/NPM_TOKEN'

@RobMullen
Copy link

Attempting use of Artillery for the first time in an AWS account, and experiencing the same thing as @zeeshanpolaris above.

Looks like there is a function ensureParameterExists that is likely intended to do this conditional parameter creation. But, I see no code references invoking it.

Perhaps this was missed in testing of the migration of the fargate support code in #2297 ? ( parameters already existing in test environment? )

@hassy
Copy link
Member

hassy commented Dec 5, 2023

@RobMullen @zeeshanpolaris apologies, fix incoming

@zeeshanpolaris
Copy link

@RobMullen @zeeshanpolaris apologies, fix incoming

Thank you. Appreciate it.

@hassy
Copy link
Member

hassy commented Dec 5, 2023

Thanks again for reporting the issue @zeeshanpolaris @RobMullen

Fix is in this PR: #2354

A canary version of Artillery will be published once we merge to main which you can try to check if running a test works. (You can install the canary with npm install -g artillery@canary) Will also publish v2.0.3 later today.

@zeeshanpolaris
Copy link

Thanks. I added those default values manually and got it working. However, I had to add these two additional permissions for cloudwatch logs in the policy used by the role.

{
"Effect": "Allow",
"Action": [
"logs:CreateLogStream",
"logs:CreateLogGroup"
],
"Resource": [
"arn:aws:logs:RegionHiddenForSecurity:AcountNumberHiddenForSecurity:log-group:artilleryio-log-group/*"
]
}

@RobMullen
Copy link

Thank you very much, @hassy , for jumping on this quickly!!
I too have worked around this via manual creation of the default parameter store entries.
Will remove the parameter store entries and try out the canary out when it becomes available.

@hassy hassy reopened this Dec 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants