"nomad job stop -purge" does not destroy the container #396

OneOfTheJohns · 2024-12-13T14:34:13Z

Hi,

Im running nomad-podman env with versions:
Nomad v1.8.1
podman version 5.1.1

On a Freebsd host.

I do get the containers running, but when i want to remove them by running:
nomad job stop -purge <container_id>
Nomad job gets removed, but container still keeps living.

Is that a bug, or a feature?

The text was updated successfully, but these errors were encountered:

OneOfTheJohns · 2024-12-16T11:09:23Z

Based on https://developer.hashicorp.com/nomad/plugins/drivers/podman#plugin-options - Container - Defaults to true. This option can be used to disable Nomad from removing a container when the task exits. . So it seems that by default Nomad should destroy the running container. So it seems like a bug.

shoenig · 2024-12-16T14:53:23Z

Hi @OneOfTheJohns, nomad job stop -purge immediately removes the job from the perspective of the server, but leaves the cleanup of the allocation(s) to the normal reconciliation / garbage collection process of each client. Are you waiting long enough to allow the clients to do that GC? If you do see the clients doing GC steps in logs but the container is still running then yeah that would be a bug.

OneOfTheJohns · 2024-12-16T15:10:54Z

Hi, thanks for your answer @shoenig . How long does this cleanup take? It actually might be that im not waiting long enough...

courtland · 2024-12-18T15:57:27Z

https://developer.hashicorp.com/nomad/docs/commands/system/gc

OneOfTheJohns · 2025-01-08T09:25:40Z

Sorry for long delay of not sharing information on this ticket. I have tried to run the nomad system gc (to run the cleanup) did not seem to help. So i ran nomad monitor -log-level=DEBUG -node-id=<node_id> to check the logs, and it seems that im getting quite a bunch of errors like :

2025-01-08T08:19:41.969+0200 [DEBUG] client.driver_mgr.nomad-driver-podman: Could not get container stats, unknown error: driver=podman @module=podman.podman.default error="&errors.errorString{s:\"cannot get stats of container, status code: 400\"}" timestamp="2025-01-08T08:19:41.969+0200"

I did remove some of the podman containers manually by podman rm -f <container_id> previously, as i needed to clear some space, did run those commands before running nomad job stop -purge <job_id>, which might have started those errors?

OneOfTheJohns · 2025-01-09T11:05:14Z

Okay, found out that if i do just nomad job stop <job_id> without -purge, and then run nomad system gc, then containers are getting destroyed... interesting if that is expected behaviour, that in case of -purge podman created containers are not getting destroyed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"nomad job stop -purge" does not destroy the container #396

"nomad job stop -purge" does not destroy the container #396

OneOfTheJohns commented Dec 13, 2024

OneOfTheJohns commented Dec 16, 2024

shoenig commented Dec 16, 2024

OneOfTheJohns commented Dec 16, 2024

courtland commented Dec 18, 2024

OneOfTheJohns commented Jan 8, 2025

OneOfTheJohns commented Jan 9, 2025

"nomad job stop -purge" does not destroy the container #396

"nomad job stop -purge" does not destroy the container #396

Comments

OneOfTheJohns commented Dec 13, 2024

OneOfTheJohns commented Dec 16, 2024

shoenig commented Dec 16, 2024

OneOfTheJohns commented Dec 16, 2024

courtland commented Dec 18, 2024

OneOfTheJohns commented Jan 8, 2025

OneOfTheJohns commented Jan 9, 2025