Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v0.10] Backport of Updating Ports with correctDrift enabled using multiple-paths repo triggers an error #2834

Closed
weyfonk opened this issue Sep 11, 2024 · 7 comments

Comments

@weyfonk
Copy link
Contributor

weyfonk commented Sep 11, 2024

This is a backport of #2609 to v0.10.

@weyfonk weyfonk self-assigned this Sep 11, 2024
@weyfonk weyfonk added this to Fleet Sep 11, 2024
@github-project-automation github-project-automation bot moved this to 🆕 New in Fleet Sep 11, 2024
@weyfonk weyfonk moved this from 🆕 New to 🏗 In progress in Fleet Sep 11, 2024
@weyfonk weyfonk added this to the v2.9.2 milestone Sep 11, 2024
@weyfonk weyfonk moved this from 🏗 In progress to 👀 In review in Fleet Sep 11, 2024
@manno manno modified the milestones: v2.9.2, v2.9-Next1 Sep 12, 2024
@weyfonk
Copy link
Contributor Author

weyfonk commented Sep 16, 2024

(copied from #2609)

Additional QA

Problem

When failing to correct drift on a resource (eg. modified ports array on a service), Fleet would leave a GitRepo in Modified state, with no error on the corresponding bundle deployment status.

Solution

  • The drift correction error is now reflected in the bundle deployment status, which should in turn propagate it to the GitRepo status
  • Setting force: true on the GitRepo resolves the error by deleting and recreating the Helm release for the bundle deployment, hence recreating the service in this case.

Testing

See reproduction steps above, in the issue description.

Engineering Testing

Manual Testing

  1. Created a GitRepo with drift correction enabled (but not set to force mode) pointing to rancher/fleet-test-data's multiple-paths
  2. Edited the created service ports
  3. Checked status of the GitRepo and bundle deployment
  4. Updated the GitRepo drift correction mode to true
  5. Saw the GitRepo and bundle deployment status error disappear, once the service had been recreated.

Automated Testing

  • Integration tests cover propagation of the drift correction error to the bundle deployment status
  • End-to-end tests verify that when a bundle is marked as modified, patching a GitRepo to set its correctDrift.force option to true eventually updates the bundle status, in that the bundle will no longer appear as modified.

QA Testing Considerations

  • Test how the GitRepo status is reflected in the Rancher UI

Regressions Considerations

N/A

@weyfonk weyfonk moved this from 👀 In review to Needs QA review in Fleet Sep 16, 2024
@mmartin24 mmartin24 self-assigned this Oct 3, 2024
@mmartin24
Copy link
Collaborator

mmartin24 commented Oct 4, 2024

I am still observing this in Rancher v2.9-2d10b66bb2e1e5fc7568591ba41648002cf29b20-head with fleet:104.1.0+up0.10.4-rc.1 following reproduction steps on original ticket.

@weyfonk, am I perhaps not looking at something well or is it still this fix not being propagated into the above-mentioned fleet version?

image

Fleet reconciliation error log
{
  "level": "error",
  "ts": "2024-10-04T14:28:04Z",
  "logger": "bundledeployment",
  "msg": "Failed to deploy bundle",
  "controller": "bundledeployment",
  "controllerGroup": "fleet.cattle.io",
  "controllerKind": "BundleDeployment",
  "BundleDeployment": {
    "name": "ds-cluster-correct-79-multiple-paths-service",
    "namespace": "cluster-fleet-default-imported-0-f1158df57e01"
  },
  "namespace": "cluster-fleet-default-imported-0-f1158df57e01",
  "name": "ds-cluster-correct-79-multiple-paths-service",
  "reconcileID": "0b51e68c-b581-44cb-8a9d-4a00bc09c8ae",
  "status": {
    "conditions": [
      {
        "type": "Installed",
        "status": "True",
        "lastUpdateTime": "2024-10-04T14:09:56Z"
      },
      {
        "type": "Deployed",
        "status": "False",
        "lastUpdateTime": "2024-10-04T14:11:18Z",
        "reason": "Error",
        "message": "cannot patch \"mp-app-service\" with kind Service: Service \"mp-app-service\" is invalid: spec.ports[1].name: Duplicate value: \"required-name2\""
      },
      {
        "type": "Ready",
        "status": "True",
        "lastUpdateTime": "2024-10-04T14:10:04Z"
      },
      {
        "type": "Monitored",
        "status": "True",
        "lastUpdateTime": "2024-10-04T14:09:56Z"
      }
    ],
    "appliedDeploymentID": "s-e900fb60b86d8593e95a733a0c0d1794f2d71a00910f794d19bcd4d57deca:aa73273923fd2b194b95dc51be330a7b1be92dafa689e0afb400abda8b37d8c0",
    "release": "test-fleet-mp-service/ds-cluster-correct-79-multiple-paths-service:1",
    "ready": true,
    "nonModified": true,
    "display": {
      "deployed": "Error: cannot patch \"mp-app-service\" with kind Service: Service \"mp-app-service\" is invalid: spec.ports[1].name: Duplicate value: \"required-name2\"",
      "monitored": "True",
      "state": "ErrApplied"
    },
    "syncGeneration": 0
  },
  "error": "cannot patch \"mp-app-service\" with kind Service: Service \"mp-app-service\" is invalid: spec.ports[1].name: Duplicate value: \"required-name2\"",
  "errorVerbose": "cannot patch \"mp-app-service\" with kind Service: Service \"mp-app-service\" is invalid: spec.ports[1].name: Duplicate value: \"required-name2\"\nhelm.sh/helm/v3/pkg/kube.(*Client).Update\n\t/home/runner/go/pkg/mod/github.com/rancher/helm/[email protected]/pkg/kube/client.go:438\nhelm.sh/helm/v3/pkg/action.(*Install).performInstall\n\t/home/runner/go/pkg/mod/github.com/rancher/helm/[email protected]/pkg/action/install.go:456\nhelm.sh/helm/v3/pkg/action.(*Install).performInstallCtx.func1\n\t/home/runner/go/pkg/mod/github.com/rancher/helm/[email protected]/pkg/action/install.go:421\nruntime.goexit\n\t/home/runner/go/pkg/mod/golang.org/[email protected]/src/runtime/asm_amd64.s:1695",
  "stacktrace": "github.com/rancher/fleet/internal/cmd/agent/controller.(*BundleDeploymentReconciler).Reconcile\n\t/home/runner/work/fleet/fleet/internal/cmd/agent/controller/bundledeployment_controller.go:131\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:261\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:222"
}
{
  "level": "error",
  "ts": "2024-10-04T14:28:04Z",
  "msg": "Reconciler error",
  "controller": "bundledeployment",
  "controllerGroup": "fleet.cattle.io",
  "controllerKind": "BundleDeployment",
  "BundleDeployment": {
    "name": "ds-cluster-correct-79-multiple-paths-service",
    "namespace": "cluster-fleet-default-imported-0-f1158df57e01"
  },
  "namespace": "cluster-fleet-default-imported-0-f1158df57e01",
  "name": "ds-cluster-correct-79-multiple-paths-service",
  "reconcileID": "0b51e68c-b581-44cb-8a9d-4a00bc09c8ae",
  "error": "failed deploying bundle: cannot patch \"mp-app-service\" with kind Service: Service \"mp-app-service\" is invalid: spec.ports[1].name: Duplicate value: \"required-name2\"",
  "errorCauses": [
    {
      "error": "failed deploying bundle: cannot patch \"mp-app-service\" with kind Service: Service \"mp-app-service\" is invalid: spec.ports[1].name: Duplicate value: \"required-name2\""
    }
  ],
  "stacktrace": "sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:324\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:261\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:222"
}

@weyfonk
Copy link
Contributor Author

weyfonk commented Oct 11, 2024

Thanks @mmartin24 for raising this.

I am still able to reproduce issues with updating service ports on Rancher v2.9.3-alpha4 with Fleet v0.10.4-rc.1.
After a few seconds, although the bundle deployment containing the service appears as modified, the corresponding bundle sees its status updated to Ready, as if the bundle deployment were ready too. This in turn is reflected in the GitRepo owning that bundle.
This happens because the resources fields are cleared from the bundle deployment's status. Why that happens is still unclear, although this code is a prime suspect.

@sbulage sbulage modified the milestones: v2.9.3, v2.9.4 Oct 16, 2024
@weyfonk
Copy link
Contributor Author

weyfonk commented Oct 28, 2024

Confirmed: this line calls a DryRun on a Wrangler apply.Apply, which returns an empty set of objects.
In turn, that set is used to populate resources in the bundle deployment status, which explains why those resources don't appear in the status from that point onwards.

Fixing this would require either:

  • troubleshooting this call to Wrangler code
  • or rewriting logic where Fleet truncates the list of resources in the bundle deployment status before re-populating it
  • skipping that logic when there is no data to re-populate it with
    • Edit: I have tried this, and while resources are no longer cleared in the bundle deployment status, its nonModified field is set to true. I thought that this might be caused by a conflict between the Fleet controller and agent over bundle deployment status updates, but this has been fixed in v0.10 as well.

@weyfonk weyfonk moved this from 🏗 In progress to 👀 In review in Fleet Oct 31, 2024
@mmartin24
Copy link
Collaborator

Tested in v2.9-1d1065cd5bf09c23834720420e1153712fa43439-head with fleet:104.1.1+up0.10.5-rc.2 and still errorring. Same issue as in #2609 (comment).

Setting back to backlog

@mmartin24 mmartin24 moved this from 👀 In review to 📋 Backlog in Fleet Nov 4, 2024
@manno manno modified the milestones: v2.9.4, v2.9.5 Nov 5, 2024
@manno
Copy link
Member

manno commented Nov 15, 2024

This was never in a RC

@mmartin24
Copy link
Collaborator

Rechecked in 2.9 with Fleet fleet:104.1.1+up0.10.5-rc.2 and still occurring. Nevertheless agreed to close as it is a corner case (same as in #2609 (comment))

@github-project-automation github-project-automation bot moved this from 📋 Backlog to ✅ Done in Fleet Nov 27, 2024
@manno manno moved this from ✅ Done to Needs QA review in Fleet Dec 3, 2024
@manno manno reopened this Dec 3, 2024
@github-project-automation github-project-automation bot moved this from Needs QA review to 🆕 New in Fleet Dec 3, 2024
@manno manno closed this as completed Dec 4, 2024
@github-project-automation github-project-automation bot moved this from 🆕 New to ✅ Done in Fleet Dec 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

No branches or pull requests

4 participants