Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resources that produce a diff after being applied reconcile continuously in a tight loop in the no-fork architecture #306

Closed
mbbush opened this issue Dec 1, 2023 · 3 comments

Comments

@mbbush
Copy link
Contributor

mbbush commented Dec 1, 2023

What happened?

The no-fork architecture seems to put itself in an infinite loop of updating the same resource as soon as possible when the state after apply is not the desired state. In the fork architecture, this only updated once every 10 minutes. In both architectures, this is invisible as far as the resource's status is concerned, but it does fire an event "Successfully requested update of external resource" for every update.

How can we reproduce it?

apiVersion: iam.aws.upbound.io/v1beta1
kind: Role
metadata:
  name: sample-role
spec:
  forProvider:
    assumeRolePolicy: |
      {
        "Statement": [
          {
            "Effect": "Allow",
            "Principal": {
              "Service": "eks.amazonaws.com"
            },
            "Action": "sts:AssumeRole"
          }
        ]
      }

When applied, AWS sets the assumeRolePolicy to

      {
        "Statement": [
          {
            "Effect": "Allow",
            "Principal": {
              "Service": "eks.amazonaws.com"
            },
            "Action": "sts:AssumeRole"
          }
        ],
        "Version": "2008-10-17"
      }

(ignoring whitespace differences that the terraform provider correctly ignores)

In provider-aws version 0.43, this update happens once every 10 minutes (the default drift detection interval). In provider-aws version 0.45 (using upjet 1.0.0) it happens roughly once per second, about as fast as the api calls to update the resource take to complete.

I'm honestly not sure if this is a feature or a bug. Certainly the best outcome would be for me to not have resources like this, with missing defaults that create a constant diff, and I will certainly do that. But I think this should at least be called out in migration documentation, as users may not be aware of resources which were doing a no-op update every 10 minutes (which is mostly harmless), but the tight loop seems like more of a resource issue.

@mbbush mbbush added the bug Something isn't working label Dec 1, 2023
@jeanduplessis
Copy link
Collaborator

@mbbush In the previous architecture, the problem was that in some cases, even though there was a need for an update (due to diff in the desired state and the actual state), the provider did not trigger the update and acted as if everything was normal and waited for the reconciliation in the poll-interval.

Therefore, update requests were made in the next reconciliation loop, not immediately, and ideally, you want this update request to be made immediately.

After discovering this problem, we talked about it and the team addressed it in the new architecture.

So this feels more like a feature that fixes a bug, rather than a new bug.

@jeanduplessis jeanduplessis added question and removed bug Something isn't working labels Dec 1, 2023
@mbbush
Copy link
Contributor Author

mbbush commented Dec 1, 2023

I'm inclined to agree that this is a feature, but it was a surprising one to discover, with some negative consequences in certain cases. I think the benefits are greater than the problems, so I think the only "fix" needed is better documentation/communication about this change, and the scenario that could be a problem.

@jeanduplessis
Copy link
Collaborator

We are aiming to make documentation improvements as part of the Upjet 1.2 release and will include this in that scope.

@jeanduplessis jeanduplessis added this to the 1.2 milestone Feb 3, 2024
@jeanduplessis jeanduplessis removed this from the 1.3 milestone Oct 20, 2024
@crossplane crossplane locked and limited conversation to collaborators Oct 20, 2024
@jeanduplessis jeanduplessis converted this issue into discussion #445 Oct 20, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants