Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nomad doesn't start new allocs during GC #24778

Closed
EtienneBruines opened this issue Jan 6, 2025 · 1 comment
Closed

Nomad doesn't start new allocs during GC #24778

EtienneBruines opened this issue Jan 6, 2025 · 1 comment

Comments

@EtienneBruines
Copy link
Contributor

EtienneBruines commented Jan 6, 2025

Nomad version

Nomad v1.9.4
BuildDate 2024-12-18T15:16:22Z
Revision 5e49fcd+CHANGES

Operating system and Environment details

Ubuntu 22.04.5 LTS on amd64

Issue

When the Nomad client is busy doing GC (or sleeping in between those intervals), it does not start any new allocs for new jobs that were assigned to this client. The client does not even "receive" such tasks during this period.

Too busy doing GC causes log messages like this:

{"@level":"info","@message":"marking allocation for GC","@module":"client.gc","@timestamp":"2025-01-06T10:21:36.250494Z","alloc_id":"ac8fd9bd-39f9-133f-c1ae-eb45c1ecc275"}
{"@level":"info","@message":"garbage collecting allocation","@module":"client.gc","@timestamp":"2025-01-06T10:21:36.252995Z","alloc_id":"feb5dc4c-a549-7b82-a18e-733acd2a7013","reason":"number of allocations (68) is over the limit (50)"}

Reproduction steps

Have a client too with GC'ing

Expected Result

Allocs to still be started - at least during the "sleep" part of GC-ing the old allocs.

Actual Result

No allocs are being started (stuck in pending, not yet having Task received) until the GC is complete - which may take a while.

Job file (if appropriate)

Not applicable.

Nomad Server logs (if appropriate)

Nomad Client logs (if appropriate)

Only logs this:

{"@level":"info","@message":"marking allocation for GC","@module":"client.gc","@timestamp":"2025-01-06T10:21:36.250494Z","alloc_id":"ac8fd9bd-39f9-133f-c1ae-eb45c1ecc275"}
{"@level":"info","@message":"garbage collecting allocation","@module":"client.gc","@timestamp":"2025-01-06T10:21:36.252995Z","alloc_id":"feb5dc4c-a549-7b82-a18e-733acd2a7013","reason":"number of allocations (68) is over the limit (50)"}

After the GC-ing is complete (perhaps 20 minutes or so later), it starts the alloc and logs things like:

{"@level":"info","@message":"Task event","@module":"client.alloc_runner.task_runner","@timestamp":"2025-01-06T10:23:26.213687Z","alloc_id":"db84c9fb-e9e3-df5e-bc34-42e11f57a32e","failed":false,"msg":"Task received by client","task":"sync","type":"Received"}

Note

I understand that this might be by-design, but I'm not sure why. If the resources are available and the scheduler thought it was a good idea, those allocs should be able to start at some point.

@Juanadelacuesta
Copy link
Member

HI @EtienneBruines thank you for taking the time to create this tickets and report the issues you see. This one seems to be closely related to #24778, I will close this one and as a duplicate and leave the other one to help us track the status better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

2 participants