Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nomad 1.7.7 running from container cgroup v2 errors #23476

Closed
sirbudd opened this issue Jul 1, 2024 · 4 comments
Closed

Nomad 1.7.7 running from container cgroup v2 errors #23476

sirbudd opened this issue Jul 1, 2024 · 4 comments
Labels

Comments

@sirbudd
Copy link

sirbudd commented Jul 1, 2024

Nomad version

Nomad v1.7.7
BuildDate 2024-04-16T19:26:43Z
Revision 0f34c85ee63f6472bd2db1e2487611f4b176c70c

Operating system and Environment details

3.13.12
NAME="Alpine Linux"
ID=alpine
VERSION_ID=3.13.12
PRETTY_NAME="Alpine Linux v3.13"

Issue

I'm running Nomad from a docker container. After upgrading the Nomad from 1.6.10 to 1.7.7 the Nomad docker container refuses to start with the following error:

[ERROR] agent: error starting agent: error="client setup failed: failed to initialize process manager: failed to create nomad cgroup: write /sys/fs/cgroup/cgroup.subtree_control: device or resource busy"

Reproduction steps

This is the dockerfile:

ARG CI_REGISTRY
FROM centos:7 as builder

ARG NOMAD_VERSION
ARG NOMAD_SHA256

SHELL ["/bin/bash", "-o", "pipefail", "-c"]

RUN set -xe \
    && curl -sSLO https://releases.hashicorp.com/nomad/${NOMAD_VERSION}/nomad_${NOMAD_VERSION}_linux_amd64.zip \
    && sha256sum nomad_${NOMAD_VERSION}_linux_amd64.zip | grep -q $NOMAD_SHA256 \
    && yum -y -e 0 -q install unzip \
    && unzip nomad_${NOMAD_VERSION}_linux_amd64.zip \
    && mv nomad /tmp/


FROM alpine:3.13

COPY --from=builder /tmp/nomad /usr/bin/nomad

RUN addgroup nomad && \
    adduser -S -G nomad nomad

ENV GLIBC_VERSION "2.34-r0"
ENV GOSU_VERSION 1.10
ENV DUMB_INIT_VERSION 1.2.0
ENV CGO_ENABLED=0

RUN set -x && \
    apk --update add --no-cache --virtual .gosu-deps dpkg curl gnupg && \
    apk add --no-cache ca-certificates bash && \
    curl -L -o /tmp/glibc-${GLIBC_VERSION}.apk https://github.com/andyshinn/alpine-pkg-glibc/releases/download/${GLIBC_VERSION}/glibc-${GLIBC_VERSION}.apk && \
    apk add --allow-untrusted /tmp/glibc-${GLIBC_VERSION}.apk && \
    rm -rf /tmp/glibc-${GLIBC_VERSION}.apk /var/cache/apk/* && \
    curl -L -o /usr/local/bin/dumb-init https://github.com/Yelp/dumb-init/releases/download/v${DUMB_INIT_VERSION}/dumb-init_${DUMB_INIT_VERSION}_amd64 && \
    chmod +x /usr/local/bin/dumb-init && \
    dpkgArch="$(dpkg --print-architecture | awk -F- '{ print $NF }')" && \
    curl -L -o /usr/local/bin/gosu "https://github.com/tianon/gosu/releases/download/$GOSU_VERSION/gosu-$dpkgArch" && \
    curl -L -o /usr/local/bin/gosu.asc "https://github.com/tianon/gosu/releases/download/$GOSU_VERSION/gosu-$dpkgArch.asc" && \
    export GNUPGHOME="$(mktemp -d)" && \
    #gpg --keyserver https://keys.openpgp.org --recv-keys B42F6819007F00F88E364FD4036A9C25BF357DD4 && \
    #gpg --batch --verify /usr/local/bin/gosu.asc /usr/local/bin/gosu && \
    rm -rf "$GNUPGHOME" /usr/local/bin/gosu.asc && \
    chmod +x /usr/local/bin/gosu && \
    gosu nobody true && \
    apk del .gosu-deps && \
    apk add --no-cache tzdata

RUN mkdir -p /nomad/data && \
    mkdir -p /etc/nomad && \
    chown -R nomad:nomad /nomad

EXPOSE 4646 4647 4648 4648/udp

COPY docker-entrypoint.sh /usr/local/bin/docker-entrypoint.sh

RUN chmod +x /usr/local/bin/docker-entrypoint.sh

ENTRYPOINT ["/usr/local/bin/docker-entrypoint.sh"]

The built image is run using docker-compose:

---
version: '3.7'
services:
  nomad:
    network_mode: host
    privileged: true
    image: REPOSITORY_PATH
    command:
      - agent
    environment:
      - NOMAD_ADDR=https://${HOSTNAME}:4646
    volumes:
      - /etc/nomad/data:/nomad/data:rw
      - /etc/nomad/config:/nomad/config:rw

Expected Result

The Nomad server running properly.

Actual Result

The error:

[ERROR] agent: error starting agent: error="client setup failed: failed to initialize process manager: failed to create nomad cgroup: write /sys/fs/cgroup/cgroup.subtree_control: device or resource busy"
@pkazmierczak
Copy link
Contributor

Hi @sirbudd, thanks for reporting the issue. Unfortunately, we do not support running Nomad inside a container, and since Nomad 1.7 introduced NUMA aware scheduling and changed the way we fingerprint CPU and interact with cgroups, scenarios in which Nomad used to work may no longer work, sadly.

@sirbudd
Copy link
Author

sirbudd commented Jul 1, 2024

Thank you for the quick answer
I moved the Nomad server to run as a Service and everything is good now

@SipSeb
Copy link

SipSeb commented Oct 29, 2024

We stumbled upon the same problem and ended up here. After playing around, even though the scenario is not officially supported, we discovered, that there is an option to work around this:

--cgroupns=host

With this option set, it looks like Nomad 1.7.7 is running in a podman container as well.

(For reference: taken from https://www.whexy.com/posts/cgroup-inside-containers)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants