Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker containers with non-root users will fail silently with exit code 2 if MiniWDL's run directory belongs to a secondary group and has the group sticky bit set #706

Open
adamnovak opened this issue Jul 17, 2024 · 0 comments

Comments

@adamnovak
Copy link
Contributor

I tried to run a MiniWDL workflow with --dir output/miniwdl-runs, to put its work and output in that directory. That directory has permissions and ownership:

drwxr-sr-x   7 anovak patenlab   6 Jul 17 08:25 .

It belongs to the patenlab group, and has the group sticky bit set, so files and directories I create within it will by default also belong to the patenlab group, and for directories will also have the group sticky bit set.

But my current selected group is the prismuser group:

$ id -gn
prismuser

When MiniWDL makes the stdout.txt and stderr.txt files for tasks under its run directory, they end up being owned by patenlab. But when MiniWDL decides what groups to grant to the containers it runs, it gets only the currently selected ("effective") group ID and grants just that group:

# add invoking user's group to ensure that command can access the mounted working
# directory even if the docker image assumes some arbitrary uid
groups = [str(os.getegid())]

If you run a container that runs as root in the container (the Docker default, usually), this works fine, because the container's root user gets to do anything it wants to any file mounted into the container. But if the Docker container image is built to run as a different user (like biocontainers/samtools@sha256:3ff48932a8c38322b0a33635957bc6372727014357b4224d420726da100f5470 which is built to run as the user biodocker with ID 1000), then the passed group does not actually grant the container permission to write to the task standard output and standard error files (or the task working directory).

Then you get logs like this:

2024-07-16 16:22:03.817 wdl.w:GiraffeDeepVariant.t:call-mergeAlignmentBAMChunks docker image :: tag: "biocontainers/samtools@sha256:3ff48932a8c38322b0a33635957bc6372727014357b4224d420726da100f5470", id: "sha256:dd1f04d1de562b8b97a96147ada9a84b247e470677eab4f049d997bf2654c00c", RepoDigest: true
2024-07-16 16:22:05.704 wdl.w:GiraffeDeepVariant.t:call-mergeAlignmentBAMChunks docker task failed :: service: "oz31t1kpcwdn", task: "h1hmernee7", node: "wgdl01wkys", message: "task: non-zero exit (2)"
2024-07-16 16:22:05.704 wdl.w:GiraffeDeepVariant.t:call-mergeAlignmentBAMChunks docker task exit :: state: "failed", exit_code: 2
2024-07-16 16:22:06.597 wdl.w:GiraffeDeepVariant.t:call-mergeAlignmentBAMChunks task mergeAlignmentBAMChunks (../tasks/bioinfo_utils.wdl Ln 503 Col 1) failed :: dir: "/private/home/anovak/workspace/lr-giraffe/output/miniwdl-runs/20240716_141700_GiraffeDeepVariant/call-mergeAlignmentBAMChunks", error: "CommandFailed", exit_status: 2, stderr_file: "/private/home/anovak/workspace/lr-giraffe/output/miniwdl-runs/20240716_141700_GiraffeDeepVariant/call-mergeAlignmentBAMChunks/stderr.txt", stdout_file: "/private/home/anovak/workspace/lr-giraffe/output/miniwdl-runs/20240716_141700_GiraffeDeepVariant/call-mergeAlignmentBAMChunks/stdout.txt"
2024-07-16 16:22:06.601 wdl.w:GiraffeDeepVariant call failure propagating :: from: "call-mergeAlignmentBAMChunks", dir: "/private/home/anovak/workspace/lr-giraffe/output/miniwdl-runs/20240716_141700_GiraffeDeepVariant"
2024-07-16 16:22:06.601 wdl.w:GiraffeDeepVariant aborting workflow
2024-07-16 16:22:06.604 miniwdl-run run with --verbose to include task standard error streams in this log
2024-07-16 16:22:06.604 miniwdl-run task command failed with exit status 2 :: error: "CommandFailed", exit_status: 2, stderr_file: "/private/home/anovak/workspace/lr-giraffe/output/miniwdl-runs/20240716_141700_GiraffeDeepVariant/call-mergeAlignmentBAMChunks/stderr.txt", stdout_file: "/private/home/anovak/workspace/lr-giraffe/output/miniwdl-runs/20240716_141700_GiraffeDeepVariant/call-mergeAlignmentBAMChunks/stdout.txt", dir: "/private/home/anovak/workspace/lr-giraffe/output/miniwdl-runs/20240716_141700_GiraffeDeepVariant", from_dir: "/private/home/anovak/workspace/lr-giraffe/output/miniwdl-runs/20240716_141700_GiraffeDeepVariant/call-mergeAlignmentBAMChunks"
Command exited with non-zero status 2

But if you consult stdout.txt and stderr.txt, they are empty (since the container couldn't write to them).

(This is very hard to debug; probably in the situation where MiniWDL got nothing from any of the stuff it injected into the container, it ought to show the service-level logs to the user.)

If you run with --debug (or maybe with --verbose? I used both), you get access to the actual Docker service logs, which are a little more useful:

2024-07-17 08:25:59.315 wdl.w:GiraffeDeepVariant.t:call-mergeAlignmentBAMChunks docker service logs :: stdout: [], stderr: ["/bin/sh: 1: cannot create ../stdout.txt: Permission denied"]

Probably what MiniWDL should do is check what group the task working directory actually got, and pass that group along to the container, instead of or in addition top the user's primary group. If the user is relying on group-level access to read workflow inputs, and the workflow inputs maybe get mounted into the container in place (not sure if this happens), then MiniWDL should really give the container all the user's groups.

A different solution might be to set the group on the run's top-level directory when it is created, so then we can rely on it matching the effective group of the MiniWDL process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant