Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CNI: use tmpfs location for ipam plugin #24650

Merged
merged 1 commit into from
Dec 16, 2024
Merged

CNI: use tmpfs location for ipam plugin #24650

merged 1 commit into from
Dec 16, 2024

Conversation

tgross
Copy link
Member

@tgross tgross commented Dec 11, 2024

When a Nomad host reboots, the network namespace files in the tmpfs in /var/run are wiped out. So when we restore allocations after a host reboot, we need to be able to restore both the network namespace and the network configuration. But because the netns is newly created and we need to run the CNI plugins again, this create potential conflicts with the IPAM plugin which has written state to persistent disk at /var/lib/cni. These IPs aren't the ones advertised to Consul, so there's no particular reason to keep them around after a host reboot because all virtual interfaces need to be recreated too.

Reconfigure the CNI bridge configuration to use /var/run/cni as its state directory. We already expect this location to be created by CNI because the netns files are hard-coded to be created there too in libcni.

Note this does not fix the problem described for Docker in #24292 because that appears to be related to the netns itself being restored unexpectedly from Docker's state.

Ref: #24292 (comment)
Ref: https://www.cni.dev/plugins/current/ipam/host-local/#files

Testing & Reproduction steps

Run a cluster on a set of VMs, with at least one client. This can't be a server+client because we need to reboot the hosts. You should probably set the server.heartbeat_grace = "5m" to give yourself time to work.

  • Run a non-Docker task with network.mode = "bridge". Wait for it to be healthy.
  • Reboot the client host.
  • Make sure the alloc is restored, the tasks are restarted, and networking works.

Contributor Checklist

  • Changelog Entry If this PR changes user-facing behavior, please generate and add a
    changelog entry using the make cl command.
  • Testing Please add tests to cover any new functionality or to demonstrate bug fixes and
    ensure regressions will be caught.
  • Documentation If the change impacts user-facing functionality such as the CLI, API, UI,
    and job configuration, please update the Nomad website documentation to reflect this. Refer to
    the website README for docs guidelines. Please also consider whether the
    change requires notes within the upgrade guide.

Reviewer Checklist

  • Backport Labels Please add the correct backport labels as described by the internal
    backporting document.
  • Commit Type Ensure the correct merge method is selected which should be "squash and merge"
    in the majority of situations. The main exceptions are long-lived feature branches or merges where
    history should be preserved.
  • Enterprise PRs If this is an enterprise only PR, please add any required changelog entry
    within the public repository.

When a Nomad host reboots, the network namespace files in the tmpfs in
`/var/run` are wiped out. So when we restore allocations after a host reboot, we
need to be able to restore both the network namespace and the network
configuration. But because the netns is newly created and we need to run the CNI
plugins again, this create potential conflicts with the IPAM plugin which has
written state to persistent disk at `/var/lib/cni`. These IPs aren't the ones
advertised to Consul, so there's no particular reason to keep them around after
a host reboot because all virtual interfaces need to be recreated too.

Reconfigure the CNI bridge configuration to use `/var/run/cni` as its state
directory. We already expect this location to be created by CNI because the
netns files are hard-coded to be created there too in `libcni`.

Note this does not fix the problem described for Docker in #24292 because that
appears to be related to the netns itself being restored unexpectedly from
Docker's state.

Ref: #24292 (comment)
Ref: https://www.cni.dev/plugins/current/ipam/host-local/#files
@tgross
Copy link
Member Author

tgross commented Dec 12, 2024

I'm going to move this back into draft until we've had a bit more time to look into the CNI check command workflow being discussed in #24292 (comment)

tgross added a commit that referenced this pull request Dec 12, 2024
When the Nomad client restarts and restores allocations, the network namespace
for an allocation may exist but no longer be correctly configured. For example,
if the host is rebooted and the task was a Docker task using a pause container,
the network namespace may be recreated by the docker daemon.

When we restore an allocation, use the CNI "check" command to verify that any
existing network namespace matches the expected configuration. This requires CNI
plugins of at least version 1.2.0 to avoid a bug in older plugin versions that
would cause the check to fail. If the check fails, fail the restore so that the
allocation can be recreated (rather than silently having networking fail).

This should fix the gap left #24650 for Docker task drivers and any other
drivers with the `MustInitiateNetwork` capability.

Fixes: #24292
Ref: #24650
@tgross tgross changed the title cni: use tmpfs location for ipam plugin CNI: use tmpfs location for ipam plugin Dec 12, 2024
tgross added a commit that referenced this pull request Dec 12, 2024
When the Nomad client restarts and restores allocations, the network namespace
for an allocation may exist but no longer be correctly configured. For example,
if the host is rebooted and the task was a Docker task using a pause container,
the network namespace may be recreated by the docker daemon.

When we restore an allocation, use the CNI "check" command to verify that any
existing network namespace matches the expected configuration. This requires CNI
plugins of at least version 1.2.0 to avoid a bug in older plugin versions that
would cause the check to fail. If the check fails, fail the restore so that the
allocation can be recreated (rather than silently having networking fail).

This should fix the gap left #24650 for Docker task drivers and any other
drivers with the `MustInitiateNetwork` capability.

Fixes: #24292
Ref: #24650
@tgross tgross removed the request for review from gulducat December 12, 2024 20:52
tgross added a commit that referenced this pull request Dec 12, 2024
When the Nomad client restarts and restores allocations, the network namespace
for an allocation may exist but no longer be correctly configured. For example,
if the host is rebooted and the task was a Docker task using a pause container,
the network namespace may be recreated by the docker daemon.

When we restore an allocation, use the CNI "check" command to verify that any
existing network namespace matches the expected configuration. This requires CNI
plugins of at least version 1.2.0 to avoid a bug in older plugin versions that
would cause the check to fail. If the check fails, fail the restore so that the
allocation can be recreated (rather than silently having networking fail).

This should fix the gap left #24650 for Docker task drivers and any other
drivers with the `MustInitiateNetwork` capability.

Fixes: #24292
Ref: #24650
Copy link
Member

@jrasell jrasell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

I tested this using an AWS environment with 1 server and 1 client instance. The client instance has CNI v1.3.0 plugins installed.

JobSpec:

job "example" {

  group "webserver" {
    network {
      mode = "bridge"
      port "http" {
        to = 80
      }
    }

    task "python3" {
      driver = "exec"

      config {
        command = "python3"
        args    = ["-m", "http.server", "80"]
      }

      resources {
        cpu    = 500
        memory = 256
      }
    }
  }
}

Running through the steps using a build from main f4529485563924462dbdccdd1b4cacbd9d68616e when I rebooted the instance the allocation failed with the error detailed below:

2024-12-13T10:57:44Z  Setup Failure          failed to setup alloc: pre-run hook "network" failed: failed to configure networking for alloc: failed to configure network: plugin type="bridge" failed (add): failed to allocate for range 0: 172.26.64.12 has been allocated to 0dc525f4-c491-7e25-a67d-0121069ad55e, duplicate allocation is not allowed
2024-12-13T10:57:41Z  Failed Restoring Task  failed to restore task; will not run until server is contacted

I then tested this patch 9e1a365ae3d22f04c6bb8aa0ce0fee6d1f83ae6f I performed the same steps as detailed in the PR and performed in the previous test. When the instance was rebooted the following task events are recorded:

2024-12-13T11:08:42Z  Started                Task started by client
2024-12-13T11:08:41Z  Failed Restoring Task  failed to restore task; will not run until server is contacted

The allocation HTTP server is accessible and responds as it did before the reboot.

@tgross tgross marked this pull request as ready for review December 13, 2024 15:55
@tgross tgross marked this pull request as draft December 13, 2024 15:55
tgross added a commit that referenced this pull request Dec 13, 2024
When the Nomad client restarts and restores allocations, the network namespace
for an allocation may exist but no longer be correctly configured. For example,
if the host is rebooted and the task was a Docker task using a pause container,
the network namespace may be recreated by the docker daemon.

When we restore an allocation, use the CNI "check" command to verify that any
existing network namespace matches the expected configuration. This requires CNI
plugins of at least version 1.2.0 to avoid a bug in older plugin versions that
would cause the check to fail.

If the check fails, destroy the network namespace and try to recreate it from
scratch once. If that fails in the second pass, fail the restore so that the
allocation can be recreated (rather than silently having networking fail).

This should fix the gap left #24650 for Docker task drivers and any other
drivers with the `MustInitiateNetwork` capability.

Fixes: #24292
Ref: #24650

retry and tests

[squashme]
tgross added a commit that referenced this pull request Dec 13, 2024
When the Nomad client restarts and restores allocations, the network namespace
for an allocation may exist but no longer be correctly configured. For example,
if the host is rebooted and the task was a Docker task using a pause container,
the network namespace may be recreated by the docker daemon.

When we restore an allocation, use the CNI "check" command to verify that any
existing network namespace matches the expected configuration. This requires CNI
plugins of at least version 1.2.0 to avoid a bug in older plugin versions that
would cause the check to fail.

If the check fails, destroy the network namespace and try to recreate it from
scratch once. If that fails in the second pass, fail the restore so that the
allocation can be recreated (rather than silently having networking fail).

This should fix the gap left #24650 for Docker task drivers and any other
drivers with the `MustInitiateNetwork` capability.

Fixes: #24292
Ref: #24650
@tgross tgross marked this pull request as ready for review December 13, 2024 18:42
tgross added a commit that referenced this pull request Dec 13, 2024
When the Nomad client restarts and restores allocations, the network namespace
for an allocation may exist but no longer be correctly configured. For example,
if the host is rebooted and the task was a Docker task using a pause container,
the network namespace may be recreated by the docker daemon.

When we restore an allocation, use the CNI "check" command to verify that any
existing network namespace matches the expected configuration. This requires CNI
plugins of at least version 1.2.0 to avoid a bug in older plugin versions that
would cause the check to fail.

If the check fails, destroy the network namespace and try to recreate it from
scratch once. If that fails in the second pass, fail the restore so that the
allocation can be recreated (rather than silently having networking fail).

This should fix the gap left #24650 for Docker task drivers and any other
drivers with the `MustInitiateNetwork` capability.

Fixes: #24292
Ref: #24650
@tgross tgross merged commit 24fa743 into main Dec 16, 2024
26 checks passed
@tgross tgross deleted the b-24292-ipam branch December 16, 2024 14:36
tgross added a commit that referenced this pull request Dec 18, 2024
When the Nomad client restarts and restores allocations, the network namespace
for an allocation may exist but no longer be correctly configured. For example,
if the host is rebooted and the task was a Docker task using a pause container,
the network namespace may be recreated by the docker daemon.

When we restore an allocation, use the CNI "check" command to verify that any
existing network namespace matches the expected configuration. This requires CNI
plugins of at least version 1.2.0 to avoid a bug in older plugin versions that
would cause the check to fail.

If the check fails, destroy the network namespace and try to recreate it from
scratch once. If that fails in the second pass, fail the restore so that the
allocation can be recreated (rather than silently having networking fail).

This should fix the gap left #24650 for Docker task drivers and any other
drivers with the `MustInitiateNetwork` capability.

Fixes: #24292
Ref: #24650
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/ent/1.7.x+ent Changes are backported to 1.7.x+ent backport/ent/1.8.x+ent Changes are backported to 1.8.x+ent backport/1.9.x backport to 1.9.x release line type/bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants