dynamic host volumes: account for other claims in capability check #24684

tgross · 2024-12-16T21:22:01Z

When we feasibility check a dynamic host volume against a volume request, we check the attachment mode and access mode. This only ensures that the capabilities match, but doesn't enforce the semantics of the capabilities against other claims that may be made on the allocation.

Add support for checking the requested capability against other allocations that the volume claimed.

Ref: #24479

CSI volumes support multi-node access patterns on the same volume ID, but dynamic host volumes by nature do not. The underlying volume may actually be multi-node (ex. NFS), but Nomad is ignorant of this. Remove the CSI-specific multi-node access modes and instead include the single-node access modes intended that are currently in the alpha edition of the CSI spec but which are better suited for DHV. This PR has been extracted from #24684 to keep reviews manageable. Ref: #24479 Ref: #24684

tgross · 2024-12-18T19:25:59Z

~~Moving back to draft as Daniel has pointed out a fairly serious design flaw in how this works with group.count > 1 on the first deploy.~~ Fixed!

When we feasibility check a dynamic host volume against a volume request, we check the attachment mode and access mode. This only ensures that the capabilities match, but doesn't enforce the semantics of the capabilities against other claims that may be made on the allocation. Add support for checking the requested capability against other allocations that the volume claimed. Ref: #24479

tgross · 2024-12-18T21:24:30Z

We're now using the scheduler's context's ProposedAllocs (requiring a trip back to the state store to get the Job structs, but that's no big deal), and that lets us remove the previously-proposed tracking of the previous alloc ID, counting writers, etc.

In addition to the unit tests, I've run this across deployments of a job with group.count=2 and the single-node-single-writer mode to verify everything is working as we'd expect.

gulducat

lgtm! just one bit of comment clarity

gulducat · 2024-12-18T21:58:02Z

scheduler/feasible.go

+				// all allocs for the same job will have the same read-only flag
+				// and capabilities, so we only need to check a given job once
+				continue


I see what this is doing, but I don't follow the reasoning in this comment. I read it as meaning that different groups in the job will all have the same volume params, but this is more about allocs per group, specifically count, right?

Suggested change

// all allocs for the same job will have the same read-only flag

// and capabilities, so we only need to check a given job once

continue

// all allocs for the same task group will have the same read-only

// flag and capabilities, so we only need to check a given job once

continue

using the job's ns+id as the seen key, without any reference to group, threw me a bit, I think. not that you should change that, just that the comment is pretty important to clarify this.

Yeah, very good point. So that's actually a bug here because if a job has multiple allocs for different task groups on the same node, then we'd potentially miss volume requests to check. Ex. a job has alloc A for group A and alloc B for group B on the same node, and only group B has a volume request. If we happen to check alloc A first we'd miss that.

Will fix.

tgross added the theme/storage label Dec 16, 2024

tgross added this to the 1.10.0 milestone Dec 16, 2024

tgross added type/enhancement theme/scheduling labels Dec 16, 2024

vercel bot deployed to Preview – nomad-ui December 16, 2024 21:22 View deployment

vercel bot deployed to Preview – nomad-ui December 17, 2024 19:06 View deployment

tgross force-pushed the dhv-scheduling-caps-counts branch from 80e3481 to 44e88d5 Compare December 17, 2024 20:46

vercel bot deployed to Preview – nomad-ui December 17, 2024 20:47 View deployment

tgross mentioned this pull request Dec 17, 2024

dynamic host volumes: remove multi-node access modes #24705

Merged

tgross force-pushed the dhv-scheduling-caps-counts branch from 44e88d5 to 447e41a Compare December 18, 2024 14:21

tgross marked this pull request as ready for review December 18, 2024 14:21

tgross requested review from a team as code owners December 18, 2024 14:21

tgross requested review from pkazmierczak and gulducat December 18, 2024 14:22

vercel bot deployed to Preview – nomad-ui December 18, 2024 14:22 View deployment

tgross force-pushed the dhv-scheduling-caps-counts branch from 447e41a to 983591d Compare December 18, 2024 16:38

vercel bot deployed to Preview – nomad-ui December 18, 2024 16:40 View deployment

tgross force-pushed the dhv-scheduling-caps-counts branch from 983591d to 366ccbf Compare December 18, 2024 18:59

vercel bot deployed to Preview – nomad-ui December 18, 2024 19:00 View deployment

tgross marked this pull request as draft December 18, 2024 19:22

tgross force-pushed the dhv-scheduling-caps-counts branch from 366ccbf to f33b7a5 Compare December 18, 2024 19:38

vercel bot deployed to Preview – nomad-ui December 18, 2024 19:41 View deployment

tgross force-pushed the dhv-scheduling-caps-counts branch from f33b7a5 to 2294a80 Compare December 18, 2024 20:13

vercel bot deployed to Preview – nomad-ui December 18, 2024 20:16 View deployment

tgross force-pushed the dhv-scheduling-caps-counts branch from 2294a80 to 038a626 Compare December 18, 2024 20:36

vercel bot deployed to Preview – nomad-ui December 18, 2024 20:37 View deployment

tgross force-pushed the dhv-scheduling-caps-counts branch from 038a626 to 482a6ea Compare December 18, 2024 21:12

vercel bot deployed to Preview – nomad-ui December 18, 2024 21:13 View deployment

tgross force-pushed the dhv-scheduling-caps-counts branch from 482a6ea to bbd1e22 Compare December 18, 2024 21:21

vercel bot deployed to Preview – nomad-ui December 18, 2024 21:22 View deployment

tgross marked this pull request as ready for review December 18, 2024 21:23

gulducat approved these changes Dec 18, 2024

View reviewed changes

address comments from code review

99e5318

vercel bot deployed to Preview – nomad-ui December 18, 2024 22:08 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dynamic host volumes: account for other claims in capability check #24684

dynamic host volumes: account for other claims in capability check #24684

tgross commented Dec 16, 2024 •

edited

Loading

tgross commented Dec 18, 2024 •

edited

Loading

tgross commented Dec 18, 2024 •

edited

Loading

gulducat left a comment

gulducat Dec 18, 2024

tgross Dec 18, 2024

dynamic host volumes: account for other claims in capability check #24684

Are you sure you want to change the base?

dynamic host volumes: account for other claims in capability check #24684

Conversation

tgross commented Dec 16, 2024 • edited Loading

tgross commented Dec 18, 2024 • edited Loading

tgross commented Dec 18, 2024 • edited Loading

gulducat left a comment

Choose a reason for hiding this comment

gulducat Dec 18, 2024

Choose a reason for hiding this comment

tgross Dec 18, 2024

Choose a reason for hiding this comment

tgross commented Dec 16, 2024 •

edited

Loading

tgross commented Dec 18, 2024 •

edited

Loading

tgross commented Dec 18, 2024 •

edited

Loading