Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"nomad job stop -purge" does not destroy the container #396

Open
OneOfTheJohns opened this issue Dec 13, 2024 · 6 comments
Open

"nomad job stop -purge" does not destroy the container #396

OneOfTheJohns opened this issue Dec 13, 2024 · 6 comments

Comments

@OneOfTheJohns
Copy link

Hi,

Im running nomad-podman env with versions:
Nomad v1.8.1
podman version 5.1.1

On a Freebsd host.

I do get the containers running, but when i want to remove them by running:
nomad job stop -purge <container_id>
Nomad job gets removed, but container still keeps living.

Is that a bug, or a feature?

@OneOfTheJohns
Copy link
Author

Based on https://developer.hashicorp.com/nomad/plugins/drivers/podman#plugin-options - Container - Defaults to true. This option can be used to disable Nomad from removing a container when the task exits. . So it seems that by default Nomad should destroy the running container. So it seems like a bug.

@shoenig
Copy link
Contributor

shoenig commented Dec 16, 2024

Hi @OneOfTheJohns, nomad job stop -purge immediately removes the job from the perspective of the server, but leaves the cleanup of the allocation(s) to the normal reconciliation / garbage collection process of each client. Are you waiting long enough to allow the clients to do that GC? If you do see the clients doing GC steps in logs but the container is still running then yeah that would be a bug.

@OneOfTheJohns
Copy link
Author

Hi, thanks for your answer @shoenig . How long does this cleanup take? It actually might be that im not waiting long enough...

@courtland
Copy link
Contributor

@OneOfTheJohns
Copy link
Author

Sorry for long delay of not sharing information on this ticket. I have tried to run the nomad system gc (to run the cleanup) did not seem to help. So i ran nomad monitor -log-level=DEBUG -node-id=<node_id> to check the logs, and it seems that im getting quite a bunch of errors like :

2025-01-08T08:19:41.969+0200 [DEBUG] client.driver_mgr.nomad-driver-podman: Could not get container stats, unknown error: driver=podman @module=podman.podman.default error="&errors.errorString{s:\"cannot get stats of container, status code: 400\"}" timestamp="2025-01-08T08:19:41.969+0200"

I did remove some of the podman containers manually by podman rm -f <container_id> previously, as i needed to clear some space, did run those commands before running nomad job stop -purge <job_id>, which might have started those errors?

@OneOfTheJohns
Copy link
Author

Okay, found out that if i do just nomad job stop <job_id> without -purge, and then run nomad system gc, then containers are getting destroyed... interesting if that is expected behaviour, that in case of -purge podman created containers are not getting destroyed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants