Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CNI: use check command when restoring from restart #24658

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

tgross
Copy link
Member

@tgross tgross commented Dec 12, 2024

When the Nomad client restarts and restores allocations, the network namespace for an allocation may exist but no longer be correctly configured. For example, if the host is rebooted and the task was a Docker task using a pause container, the network namespace may be recreated by the docker daemon.

When we restore an allocation, use the CNI "check" command to verify that any existing network namespace matches the expected configuration. This requires CNI plugins of at least version 1.2.0 to avoid a bug in older plugin versions that would cause the check to fail. So we check the plugin fingerprint before performing the check.

If the check fails, destroy the network namespace and try to recreate it from scratch once. If that fails in the second pass, fail the restore so that the allocation can be recreated (rather than silently having networking fail).

This should fix the gap left #24650 for Docker task drivers and any other drivers with the MustInitiateNetwork capability.

Fixes: #24292
Ref: #24650

Testing & Reproduction steps

Run a cluster on a set of VMs, with at least one client. This can't be a server+client because we need to reboot the hosts. You should probably set the server.heartbeat_grace = "5m" to give yourself time to work.

  • Run a Docker task with network.mode = "bridge". Wait for it to be healthy.
  • Restart the agent.
  • Make sure the alloc is untouched and networking works.
  • Reboot the client host.
  • Make sure the alloc is restored, the tasks are restarted, and networking works.

Contributor Checklist

  • Changelog Entry If this PR changes user-facing behavior, please generate and add a
    changelog entry using the make cl command.
  • Testing Please add tests to cover any new functionality or to demonstrate bug fixes and
    ensure regressions will be caught.
  • Documentation If the change impacts user-facing functionality such as the CLI, API, UI,
    and job configuration, please update the Nomad website documentation to reflect this. Refer to
    the website README for docs guidelines. Please also consider whether the
    change requires notes within the upgrade guide.

Reviewer Checklist

  • Backport Labels Please add the correct backport labels as described by the internal
    backporting document.
  • Commit Type Ensure the correct merge method is selected which should be "squash and merge"
    in the majority of situations. The main exceptions are long-lived feature branches or merges where
    history should be preserved.
  • Enterprise PRs If this is an enterprise only PR, please add any required changelog entry
    within the public repository.

@tgross tgross changed the title cni: use check command when restoring from restart CNI: use check command when restoring from restart Dec 12, 2024
@tgross tgross added theme/networking type/bug backport/ent/1.7.x+ent Changes are backported to 1.7.x+ent backport/ent/1.8.x+ent Changes are backported to 1.8.x+ent backport/1.9.x backport to 1.9.x release line labels Dec 12, 2024
@tgross tgross added this to the 1.9.x milestone Dec 12, 2024
@tgross tgross force-pushed the b-24292-netns-restore branch from 98343be to 66dcfcb Compare December 12, 2024 20:52
@tgross tgross requested review from jrasell and shoenig December 12, 2024 20:52
@tgross tgross force-pushed the b-24292-netns-restore branch from 66dcfcb to 9daf267 Compare December 12, 2024 21:12
@apollo13
Copy link
Contributor

Your reproduction steps say:

Run a Docker/Podman task with network.mode = "bridge". Wait for it to be healthy.

is the podman driver actually affected by it? I thought that one would not use a pause container and always let nomad configure the network namespaces on it's own.

@tgross
Copy link
Member Author

tgross commented Dec 12, 2024

is the podman driver actually affected by it? I thought that one would not use a pause container and always let nomad configure the network namespaces on it's own.

You know what, I assumed that it used pause containers too (I hardly ever have run it myself) but that's not the case: https://github.com/hashicorp/nomad-driver-podman/blob/main/driver.go#L87. So yeah, just Docker here. Will edit the repro steps.

@apollo13
Copy link
Contributor

apollo13 commented Dec 12, 2024

You know what, I assumed that it used pause containers too

Luckily no, podman run supports --network=ns:/path/for/ns/to/join in addition to the half-baked --network=container:id that docker has (and obviously via the API as well).

Copy link
Member

@jrasell jrasell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this using an AWS environment with 1 server and 1 client instance. The client instance has CNI v1.3.0 plugins installed.

JobSpec:

job "example" {

  group "nginx" {
    network {
      mode = "bridge"
      port "http" {
        to = 80
      }
    }

    task "nginx" {
      driver = "docker"

      config {
        image = "nginx:latest"
        ports = ["http"]
      }

      resources {
        cpu    = 500
        memory = 256
      }
    }
  }
}

Running through the steps using a build from main f4529485563924462dbdccdd1b4cacbd9d68616e when I rebooted the instance the allocation was restarted, but the networking was broken and I was unable to perform a simple curl to NGINX.

I then tested this patch 9daf267e33bd502d4a386dc1dd6d4a728894c194 I performed the same steps as detailed in the PR and performed in the previous test. When the instance was rebooted, the allocation failed and a new allocation scheduled in its place.

Failed allocation events:

2024-12-13T08:59:18Z  Terminated     Exit Code: 0
2024-12-13T08:59:18Z  Setup Failure  failed to setup alloc: pre-run hook "network" failed: failed to configure networking for alloc: Interface name nomad not found
2024-12-13T08:57:08Z  Started        Task started by client
2024-12-13T08:57:08Z  Driver         Downloading image
2024-12-13T08:57:07Z  Task Setup     Building Task Directory
2024-12-13T08:57:07Z  Received       Task received by client

Client logs around time of failure marking:

2024-12-13T08:59:18.385Z [ERROR] allocrunner/alloc_runner.go:403: client.alloc_runner: prerun failed: alloc_id=8f3b3d5a-906a-7f53-8d72-e9c782bb4dee error="pre-run hook \"network\" failed: failed to configure networking for alloc: Interface name nomad not found"
2024-12-13T08:59:18.385Z [INFO]  taskrunner/task_runner.go:1468: client.alloc_runner.task_runner: Task event: alloc_id=8f3b3d5a-906a-7f53-8d72-e9c782bb4dee task=nginx type="Setup Failure" msg="failed to setup alloc: pre-run hook \"network\" failed: failed to configure networking for alloc: Interface name nomad not found" failed=true
2024-12-13T08:59:18.387Z [INFO]  docker/handle.go:229: client.driver_mgr.docker: stopped container: container_id=9d6cc3a56a51596885f36fec7c4d74ea1002e8727c1fc213851ac42fed53e8cb driver=docker

@tgross
Copy link
Member Author

tgross commented Dec 13, 2024

I've updated this PR with a retry where if the check fails the first time, we attempt to recreate the network namespace (attempting this once). This should reduce the cases where, after a reboot, the allocation cannot be restored and gets failed. And I've added tests covering that logic.

@tgross tgross marked this pull request as ready for review December 13, 2024 18:42
@tgross tgross requested review from a team as code owners December 13, 2024 18:42
@tgross tgross force-pushed the b-24292-netns-restore branch from 04a444a to 137e7d6 Compare December 13, 2024 18:53
When the Nomad client restarts and restores allocations, the network namespace
for an allocation may exist but no longer be correctly configured. For example,
if the host is rebooted and the task was a Docker task using a pause container,
the network namespace may be recreated by the docker daemon.

When we restore an allocation, use the CNI "check" command to verify that any
existing network namespace matches the expected configuration. This requires CNI
plugins of at least version 1.2.0 to avoid a bug in older plugin versions that
would cause the check to fail.

If the check fails, destroy the network namespace and try to recreate it from
scratch once. If that fails in the second pass, fail the restore so that the
allocation can be recreated (rather than silently having networking fail).

This should fix the gap left #24650 for Docker task drivers and any other
drivers with the `MustInitiateNetwork` capability.

Fixes: #24292
Ref: #24650
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/ent/1.7.x+ent Changes are backported to 1.7.x+ent backport/ent/1.8.x+ent Changes are backported to 1.8.x+ent backport/1.9.x backport to 1.9.x release line theme/networking type/bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CNI: Docker container lose network configuration after host reboot
4 participants