-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dynamic host volumes: account for other claims in capability check #24684
base: dynamic-host-volumes
Are you sure you want to change the base?
Conversation
80e3481
to
44e88d5
Compare
CSI volumes support multi-node access patterns on the same volume ID, but dynamic host volumes by nature do not. The underlying volume may actually be multi-node (ex. NFS), but Nomad is ignorant of this. Remove the CSI-specific multi-node access modes and instead include the single-node access modes intended that are currently in the alpha edition of the CSI spec but which are better suited for DHV. This PR has been extracted from #24684 to keep reviews manageable. Ref: #24479 Ref: #24684
CSI volumes support multi-node access patterns on the same volume ID, but dynamic host volumes by nature do not. The underlying volume may actually be multi-node (ex. NFS), but Nomad is ignorant of this. Remove the CSI-specific multi-node access modes and instead include the single-node access modes intended that are currently in the alpha edition of the CSI spec but which are better suited for DHV. This PR has been extracted from #24684 to keep reviews manageable. Ref: #24479 Ref: #24684
CSI volumes support multi-node access patterns on the same volume ID, but dynamic host volumes by nature do not. The underlying volume may actually be multi-node (ex. NFS), but Nomad is ignorant of this. Remove the CSI-specific multi-node access modes and instead include the single-node access modes intended that are currently in the alpha edition of the CSI spec but which are better suited for DHV. This PR has been extracted from #24684 to keep reviews manageable. Ref: #24479 Ref: #24684
44e88d5
to
447e41a
Compare
447e41a
to
983591d
Compare
983591d
to
366ccbf
Compare
|
366ccbf
to
f33b7a5
Compare
f33b7a5
to
2294a80
Compare
2294a80
to
038a626
Compare
038a626
to
482a6ea
Compare
When we feasibility check a dynamic host volume against a volume request, we check the attachment mode and access mode. This only ensures that the capabilities match, but doesn't enforce the semantics of the capabilities against other claims that may be made on the allocation. Add support for checking the requested capability against other allocations that the volume claimed. Ref: #24479
482a6ea
to
bbd1e22
Compare
We're now using the scheduler's context's In addition to the unit tests, I've run this across deployments of a job with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm! just one bit of comment clarity
scheduler/feasible.go
Outdated
// all allocs for the same job will have the same read-only flag | ||
// and capabilities, so we only need to check a given job once | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what this is doing, but I don't follow the reasoning in this comment. I read it as meaning that different group
s in the job
will all have the same volume params, but this is more about allocs per group
, specifically count
, right?
// all allocs for the same job will have the same read-only flag | |
// and capabilities, so we only need to check a given job once | |
continue | |
// all allocs for the same task group will have the same read-only | |
// flag and capabilities, so we only need to check a given job once | |
continue |
using the job
's ns+id as the seen
key, without any reference to group
, threw me a bit, I think. not that you should change that, just that the comment is pretty important to clarify this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, very good point. So that's actually a bug here because if a job has multiple allocs for different task groups on the same node, then we'd potentially miss volume requests to check. Ex. a job has alloc A for group A and alloc B for group B on the same node, and only group B has a volume request. If we happen to check alloc A first we'd miss that.
Will fix.
When we feasibility check a dynamic host volume against a volume request, we check the attachment mode and access mode. This only ensures that the capabilities match, but doesn't enforce the semantics of the capabilities against other claims that may be made on the allocation.
Add support for checking the requested capability against other allocations that the volume claimed.
Ref: #24479