Nomad has the wrong `unique.storage.bytesfree` after a restart #14871

TimoWilken · 2022-10-11T09:17:30Z

Nomad version

Output from nomad version: Nomad v1.3.3 (428b2cd8014c48ee9eae23f02712b7219da16d30)

Operating system and Environment details

CentOS 7; Nomad installed from Hashicorp Stable RPM repo and run via the bundled systemd service.

Issue

When Nomad is restarted, it resets its unique.storage.bytesfree property to however much disk space is available at that moment. However, if allocations are currently running, the disk space they use is counted twice -- once in the decrease in unique.storage.bytesfree and again by Nomad when deciding whether there is enough space to place another alloc on the same host.

Would it be possible to specify the value of the unique.storage.bytesfree metric manually (e.g. a client.disk_total_free_bytes setting in /etc/nomad.d/nomad.hcl), analogous to the way we can specify the amount of memory (client.memory_total_mb) or CPU (client.cpu_total_compute) that Nomad should assume is present on the host? Or could Nomad allocate based on unique.storage.bytestotal instead, and let us specify an amount of disk space to reserve for the system, as is done with CPU and memory in client.reserved.*?

Reproduction steps

Run a Nomad agent on a host with 110G of disk space available on an external disk, and place Nomad's state directory on that external disk.
Run a job on that host that requests a large ephemeral_disk (e.g. 50G) and just creates a 50G file in its task directory and sleeps forever.
Restart the Nomad agent on that host.
Now try to run a second copy of the above job on the same host.

Expected Result

The second job should run in parallel to the first (as notionally 60G should still be available to allocate -- 110G minus the 50G taken by the first job).

Actual Result

unique.storage.bytesfree goes down to 60G after the restart, and Nomad thinks it only has 10G of disk space left to allocate on the host (which is 60G minus 50G "taken" by the first, running job).

The text was updated successfully, but these errors were encountered:

tgross · 2022-11-22T21:27:07Z

Hi @TimoWilken! Thanks to your clear instructions, I was able to reproduce this pretty easily. As it turns out, this is a duplicate of #6172. I'm going to rename that issue to clarify the problem a bit and then close this one out.

In short, there's an unfortunate interaction here where the fingerprint for storage is a StaticFingerprinter. That means it runs once at startup and then the scheduler has to account for the ephermeral disk is knows about.

So if we check the storage on our node:

$ nomad node status -self -verbose | grep storage
unique.storage.bytesfree              = 19552616448
unique.storage.bytestotal             = 41555521536
unique.storage.volume                 = /dev/sda1

Then create a job with:

ephemeral_disk {
  size    = 5000
}

We'll exec into that allocation to create a large file:

$ nomad alloc exec f880fb6d /bin/sh
/ # dd if=/dev/urandom of=/alloc/data/example.bin bs=5MB count=100
100+0 records in
100+0 records out
500000000 bytes (476.8MB) copied, 1.996275 seconds, 238.9MB/s

/ # ls -lah /alloc/data/
total 477M
drwxrwxrwx    2 nobody   nobody      4.0K Nov 22 21:06 .
drwxrwxrwx    5 nobody   nobody      4.0K Nov 22 21:04 ..
-rw-r--r--    1 root     root      476.8M Nov 22 21:06 example.bin

But we'll see the free storage isn't changed:

$ nomad node status -self -verbose | grep storage
unique.storage.bytesfree              = 19552616448
unique.storage.bytestotal             = 41555521536
unique.storage.volume                 = /dev/sda1

Then we restart. At that point we see the amount of storage used is less the amount we wrote to disk (500793344 bytes or 477.59MB). Note this isn't the same as the reserved amount, which would be 10x that.

$ nomad node status -self -verbose | grep storage
unique.storage.bytesfree              = 19051823104
unique.storage.bytestotal             = 41555521536
unique.storage.volume                 = /dev/sda1

Letting the user set this value is a nice hack and might help out with the situation where the storage fingerprinter can't read the correct value for some exotic storage configuration (I can't think of what that might be).

I suspect the right behavior here is to change the storage fingerprinter. Currently it runs df(1) with the datadir, so that we get the results of df for the file system the datadir is on. We should also rummage around in the allocation directories to subtract the amount of space we've actually used in alloc/data.

github-actions · 2023-03-23T02:10:40Z

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

TimoWilken added the type/bug label Oct 11, 2022

tgross closed this as completed Nov 22, 2022

tgross mentioned this issue Nov 22, 2022

incorrect calculation of free storage after client restart #6172

Open

tgross added stage/duplicate theme/fingerprint labels Nov 22, 2022

tgross mentioned this issue Nov 30, 2022

dimension "disk" exhausted #15016

Closed

github-actions bot locked as resolved and limited conversation to collaborators Mar 23, 2023

tgross added this to Nomad - Community Issues Triage Jun 24, 2024

tgross moved this to Done in Nomad - Community Issues Triage Jun 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nomad has the wrong `unique.storage.bytesfree` after a restart #14871

Nomad has the wrong `unique.storage.bytesfree` after a restart #14871

TimoWilken commented Oct 11, 2022

tgross commented Nov 22, 2022

github-actions bot commented Mar 23, 2023

Nomad has the wrong unique.storage.bytesfree after a restart #14871

Nomad has the wrong unique.storage.bytesfree after a restart #14871

Comments

TimoWilken commented Oct 11, 2022

Nomad version

Operating system and Environment details

Issue

Reproduction steps

Expected Result

Actual Result

tgross commented Nov 22, 2022

github-actions bot commented Mar 23, 2023

Nomad has the wrong `unique.storage.bytesfree` after a restart #14871

Nomad has the wrong `unique.storage.bytesfree` after a restart #14871