Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nomad 1.9.0 fails to negotiate docker API version if endpoint argument is set #24212

Closed
waldemarmeier opened this issue Oct 15, 2024 · 4 comments · Fixed by #24237
Closed

nomad 1.9.0 fails to negotiate docker API version if endpoint argument is set #24212

waldemarmeier opened this issue Oct 15, 2024 · 4 comments · Fixed by #24237
Assignees

Comments

@waldemarmeier
Copy link

Hi there,

it seems that nomad 1.9.0 fails to negotiate the docker API version with the daemon if the endpoint argument is set in the task driver configuration. It breaks the support for the docker task driver.

It correlates with setting the endpoint argument in the plugin stanza:
https://github.com/hashicorp/nomad/blob/61dd1f3f1090f8ba74be58afbfacd586f472b8b2/drivers/docker/driver.go#L1943C27-L1943C44

Further, the DOCKER_API_VERSION env variable is ignored as well. If this behavior is intentioned it should documented somewhere.

Nomad version

Nomad v1.9.0
BuildDate 2024-10-10T07:13:43Z
Revision 7ad36851ec02f875e0814775ecf1df0229f0a615

Operating system and Environment details

AWS EC2 running amazon linux 2023

NAME="Amazon Linux"
VERSION="2023"
ID="amzn"
ID_LIKE="fedora"
VERSION_ID="2023"
PLATFORM_ID="platform:al2023"
PRETTY_NAME="Amazon Linux 2023.5.20241001"
ANSI_COLOR="0;33"
CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2023"
HOME_URL="https://aws.amazon.com/linux/amazon-linux-2023/"
DOCUMENTATION_URL="https://docs.aws.amazon.com/linux/"
SUPPORT_URL="https://aws.amazon.com/premiumsupport/"
BUG_REPORT_URL="https://github.com/amazonlinux/amazon-linux-2023"
VENDOR_NAME="AWS"
VENDOR_URL="https://aws.amazon.com/"
SUPPORT_END="2028-03-15"
[root@test]# docker version
Client:
 Version:           25.0.5
 API version:       1.44
 Go version:        go1.22.5
 Git commit:        5dc9bcc
 Built:             Wed Aug 21 00:00:00 2024
 OS/Arch:           linux/amd64
 Context:           default

Server:
 Engine:
  Version:          25.0.6
  API version:      1.44 (minimum version 1.24)
  Go version:       go1.22.5
  Git commit:       b08a51f
  Built:            Wed Aug 21 00:00:00 2024
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.7.20
  GitCommit:        8fc6bcff51318944179630522a095cc9dbf9f353
 runc:
  Version:          1.1.14
  GitCommit:        2c9f5602f0ba3d9da1c2596322dfc4e156844890
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0
[root@test]# cat /etc/nomad.d/nomad-client.hcl
data_dir  = "/var/lib/nomad/"


client {
    enabled = true
    disable_remote_exec = true
    servers = []

    # config for same node
    server_join {
        retry_join = ["127.0.0.1:4647"]
    }
}

# Modify our port to avoid a collision with server1
ports {
  http = 5656
}
[root@test]# cat /etc/nomad.d/nomad-shared.hcl

name = "test-host"

bind_addr = "0.0.0.0" # the default

disable_update_check = true
enable_syslog = true

log_level = "TRACE"

leave_on_interrupt = true
leave_on_terminate = true

# docker stuff
plugin "docker" {
  config {
    # if you comment out the following line the driver will work properly
    endpoint = "unix:///var/run/docker.sock"

    allow_privileged = false
    pull_activity_timeout = "1m"


    volumes {
      enabled      = true
    }

    auth {
      config = "/root/.docker/config.json"
    }
  }
}

Issue

Reproduction steps

  1. Use the mentioned version of amazonlinux 2023
  2. install nomad 1.9.0 and docker docker version from above which is docker-25.0.6-1.amzn2023.0.2.src.rpm

Expected Result

If the endpoint argument is set the nomad docker task driver should negotiate the API version with the docker daemon or at least respect the DOCKER_API_VERSION environment variable. If this behavior is intentional it should be documented.

Actual Result

Nomad fails to negotitate the API version if endpoint argument is set

Job file (if appropriate)

not needed

Nomad Server logs (if appropriate)

not needed

Nomad Client logs (if appropriate)

With endpoint argument set

[root@test]# /usr/bin/nomad agent -config=/etc/nomad.d/nomad-shared.hcl -config=/etc/nomad.d/nomad-client.hcl | grep docker
    2024-10-15T15:38:49.011+0200 [DEBUG] agent.plugin_loader.docker: using standard client connection: plugin_dir=/var/lib/nomad/plugins endpoint=unix:///var/run/docker.sock
    2024-10-15T15:38:49.011+0200 [TRACE] agent.plugin_loader.docker: task event loop shutdown: plugin_dir=/var/lib/nomad/plugins
    2024-10-15T15:38:49.012+0200 [INFO]  agent: detected plugin: name=docker type=driver plugin_version=0.1.0
    2024-10-15T15:38:49.012+0200 [TRACE] agent.plugin_loader.docker: task event loop shutdown: plugin_dir=/var/lib/nomad/plugins
    2024-10-15T15:38:49.057+0200 [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=docker0
    2024-10-15T15:38:49.057+0200 [DEBUG] client.fingerprint_mgr.network: unable to parse link speed: path=/sys/class/net/docker0/speed device=docker0
    2024-10-15T15:38:49.057+0200 [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: interface=docker0 mbits=1000
    2024-10-15T15:38:49.098+0200 [DEBUG] client.driver_mgr.docker: using standard client connection: driver=docker endpoint=unix:///var/run/docker.sock
    2024-10-15T15:38:49.101+0200 [ERROR] client.driver_mgr.docker: failed to list pause containers for recovery: driver=docker error="Error response from daemon: client version 1.46 is too new. Maximum supported API version is 1.44"
    2024-10-15T15:38:49.101+0200 [DEBUG] client.driver_mgr.docker: could not connect to docker daemon: driver=docker endpoint=unix:///var/run/docker.sock error="Error response from daemon: client version 1.46 is too new. Maximum supported API version is 1.44"
    2024-10-15T15:38:49.101+0200 [DEBUG] client.driver_mgr: initial driver fingerprint: driver=docker health=undetected description="Failed to connect to docker daemon"
    2024-10-15T15:38:49.101+0200 [DEBUG] client.driver_mgr: detected drivers: drivers="map[healthy:[exec] undetected:[raw_exec qemu java docker]]"

without endpoint argument

[root@test]# /usr/bin/nomad agent -config=/etc/nomad.d/nomad-shared.hcl -config=/etc/nomad.d/nomad-client.hcl | grep docker
    2024-10-15T15:41:39.926+0200 [TRACE] agent.plugin_loader.docker: task event loop shutdown: plugin_dir=/var/lib/nomad/plugins
    2024-10-15T15:41:39.930+0200 [DEBUG] agent.plugin_loader.docker: using client connection initialized from environment: plugin_dir=/var/lib/nomad/plugins
    2024-10-15T15:41:39.931+0200 [TRACE] agent.plugin_loader.docker: task event loop shutdown: plugin_dir=/var/lib/nomad/plugins
    2024-10-15T15:41:39.931+0200 [INFO]  agent: detected plugin: name=docker type=driver plugin_version=0.1.0
    2024-10-15T15:41:39.992+0200 [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=docker0
    2024-10-15T15:41:39.992+0200 [DEBUG] client.fingerprint_mgr.network: unable to parse link speed: path=/sys/class/net/docker0/speed device=docker0
    2024-10-15T15:41:39.992+0200 [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: interface=docker0 mbits=1000
    2024-10-15T15:41:40.027+0200 [DEBUG] client.driver_mgr.docker: using client connection initialized from environment: driver=docker
    2024-10-15T15:41:40.067+0200 [DEBUG] client.driver_mgr: initial driver fingerprint: driver=docker health=healthy description=Healthy
    2024-10-15T15:41:40.067+0200 [DEBUG] client.driver_mgr: detected drivers: drivers="map[healthy:[exec docker] undetected:[raw_exec qemu java]]"
    2024-10-15T15:41:40.095+0200 [DEBUG] client.driver_mgr.docker: using client connection initialized from environment: driver=docker

Setting the DOCKER_API_VERSION environment variable does not make a diffrence.

[root@test]# export DOCKER_API_VERSION='1.44'
[root@test]# /usr/bin/nomad agent -config=/etc/nomad.d/nomad-shared.hcl -config=/etc/nomad.d/nomad-client.hcl | grep docker
    2024-10-15T15:51:50.694+0200 [TRACE] agent.plugin_loader.docker: task event loop shutdown: plugin_dir=/var/lib/nomad/plugins
    2024-10-15T15:51:50.698+0200 [DEBUG] agent.plugin_loader.docker: using standard client connection: plugin_dir=/var/lib/nomad/plugins endpoint=unix:///var/run/docker.sock
    2024-10-15T15:51:50.698+0200 [INFO]  agent: detected plugin: name=docker type=driver plugin_version=0.1.0
    2024-10-15T15:51:50.699+0200 [TRACE] agent.plugin_loader.docker: task event loop shutdown: plugin_dir=/var/lib/nomad/plugins
    2024-10-15T15:51:50.757+0200 [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=docker0
    2024-10-15T15:51:50.757+0200 [DEBUG] client.fingerprint_mgr.network: unable to parse link speed: path=/sys/class/net/docker0/speed device=docker0
    2024-10-15T15:51:50.757+0200 [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: interface=docker0 mbits=1000
    2024-10-15T15:51:50.795+0200 [DEBUG] client.driver_mgr.docker: using standard client connection: driver=docker endpoint=unix:///var/run/docker.sock
    2024-10-15T15:51:50.798+0200 [ERROR] client.driver_mgr.docker: failed to list pause containers for recovery: driver=docker error="Error response from daemon: client version 1.46 is too new. Maximum supported API version is 1.44"
    2024-10-15T15:51:50.799+0200 [DEBUG] client.driver_mgr.docker: could not connect to docker daemon: driver=docker endpoint=unix:///var/run/docker.sock error="Error response from daemon: client version 1.46 is too new. Maximum supported API version is 1.44"
    2024-10-15T15:51:50.799+0200 [DEBUG] client.driver_mgr: initial driver fingerprint: driver=docker health=undetected description="Failed to connect to docker daemon"
    2024-10-15T15:51:50.799+0200 [DEBUG] client.driver_mgr: detected drivers: drivers="map[healthy:[exec] undetected:[raw_exec qemu java docker]]"
@pkazmierczak
Copy link
Contributor

Hi @waldemarmeier, thanks for reporting this. I'll look into the issue asap.

@roman-vynar
Copy link
Contributor

Same as #24181 (comment)

@pkazmierczak
Copy link
Contributor

hi @waldemarmeier, I merged a fix to main, it'll be released soon with 1.9.1.

@waldemarmeier
Copy link
Author

Thanks @pkazmierczak !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging a pull request may close this issue.

3 participants