Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tasks: Reset SELinux level of images on startup #560

Merged
merged 1 commit into from
Oct 13, 2023
Merged

Conversation

martinpitt
Copy link
Member

In some situations (in particular the fedora-rawhide-boot and -payload images) gain an extra SELinux "Multi-category range" context like this:

system_u:object_r:container_file_t:s0:c220,c230  fedora-rawhide-boot-ba19e37f2b38f6578574275b7eb209e5b1850bceb9eb671cb366db16e4bf78fc.iso

This makes them undeletable by image-prune, rm, or anything else that the container could do. This causes old images to pile up, and the host eventually fails with ENOSPC.

There is nothing that the container can do to fix these files, and it's not clear what causes this in the first place. To mitigate, reset the context level in the system unit at startup.


Our bots have ran into ENOSPC quite frequently, but so far I had always only quick-fixed this by mass-deleting the image cache. This should fix it more permanently.

I rolled this out to all our bots. I confirmed it fixed the broken contexts on cockpit-10, so by tomorrow, cockpit-10 should do an image-prune run and clean up its current 501 GB in /var/cache/cockpit-tasks/images/

In some situations (in particular the fedora-rawhide-boot and -payload
images) gain an extra SELinux "Multi-category range" context like this:

    system_u:object_r:container_file_t:s0:c220,c230  fedora-rawhide-boot-ba19e37f2b38f6578574275b7eb209e5b1850bceb9eb671cb366db16e4bf78fc.iso

This makes them undeletable by `image-prune`, `rm`, or anything else
that the container could do. This causes old images to pile up, and the
host eventually fails with ENOSPC.

There is nothing that the container can do to fix these files, and it's
not clear what causes this in the first place. To mitigate, reset the
context level in the system unit at startup.
@martinpitt martinpitt requested a review from jelly October 13, 2023 07:28
@jelly jelly merged commit 67b544d into main Oct 13, 2023
2 checks passed
@jelly jelly deleted the tasks-selinux branch October 13, 2023 07:45
@travier
Copy link

travier commented Oct 13, 2023

This looks like the category that you get when you write a file from a SELinux confined container. Where is curl called from?

@martinpitt
Copy link
Member Author

It's all the same cockpit/tasks container, and all happens in a bots/ checkout. The commands which create files are either curl -O or qemu-img create, and sometimes cp/mv. There are no context or container switches anywhere.

@travier
Copy link

travier commented Oct 13, 2023

This launches multiple instances / containers and each one will get a different set of categories. If an image is written by one instance then another one won't be able to delete it.

You should tell podman that the image cache is shared between containers with :z: https://github.com/cockpit-project/cockpituous/blob/60c398425df2ad025fad8696c99055b70be99928/tasks/install-service#L60C2-L60C54

From https://docs.podman.io/en/latest/markdown/podman-run.1.html#volume-v-source-volume-host-dir-container-dir-options:

The z option tells Podman that two or more containers share the volume content.

@martinpitt
Copy link
Member Author

Thanks @travier , that certainly smells related. I just think that's not the whole story -- I can perfectly reproduce this with running ./image-create fedora-rawhide-boot and trying to rm the file inside the same running container. Also, why does it only happen to a small subset of files? This really doesn't feel right to me.

That said, I'm happy to try the :z and see if that fixes it.

@martinpitt
Copy link
Member Author

I tested that: I changed the --volume to :rw,z, ran ./image-create fedora-rawhide-boot, and it still creates a weird context:

cd bots
./image-create fedora-rawhide-boot
ls -lZ /cache/images/fedora*

-rw-------. 1 user user system_u:object_r:container_file_t:s0:c319,c834  720640000 Oct 13 09:03 /cache/images/fedora-rawhide-boot-ba19e37f2b38f6578574275b7eb209e5b1850bceb9eb671cb366db16e4bf78fc.iso

The difference is now that at least that container can rm the file, that didn't work before. But another container still cannot remove it.

Still, it was a good idea, thanks @travier !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants