-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discussion for Adding a Batch Scheduling Sub-Landscape #3761
Comments
instead of a sub landscape for batch scheduling I'd really be in favor of
just adding it in the main landscape (in orchestration + management)
are you cool with that? then you can PR it int he main landscape
…On Fri, Feb 23, 2024 at 4:17 PM Alexander Scammon ***@***.***> wrote:
As part of the CNCF Batch Working Group
<https://github.com/cncf/landscape#new-entries> (part of the TAG Runtime
<https://github.com/cncf/tag-runtime>), we'd like to discuss adding a sub
landscape focused on Batch Scheduling similar to the wasm sub landscape
<#2387>.
Example Draft
To illustrate what we were hoping to do, we worked up an example Batch
Scheduling landscape here:
- https://nimble-crisp-0235dd.netlify.app/?group=batch-scheduling
Please note that this is merely a rough draft of what a Batch Scheduling
landscape could look like. We anticipate more projects will be added as we
socialize this landscape throughout the community.
If this discussion would be better in a PR, we'd be happy to submit the
changes that would be necessary and we can have the discussion there.
Rationale
The conversation around Batch Schedulers in the context of cloud and
Kubernetes has been a complicated one over the last couple of years. As
AI/ML continues to dominate discussions, the desire for solutions in this
space has amplified. However, we find that people who want to solve this
particular challenge often don't know where to start and don't know that
there are existing options available.
As a result, companies often create their own bespoke solutions. Just
about every KubeCon, another company announces that they are planning to
open-source their new Batch Scheduler, often with extremely similar
properties to the existing solutions. We'd much prefer to guide people to
join forces on the existing solutions, ideally contributing to the
conversations ongoing in the Kubernetes Batch Working Group
<https://github.com/kubernetes/community/blob/master/wg-batch/README.md>
(a sister working group the CNCF group working on k8s-specific issues)
around Kueue <https://github.com/kubernetes-sigs/kueue> and improving the
core of Kubernetes to be more Batch Scheduling-friendly.
We think adding a landscape for Batch Scheduling could help bring
awareness to the community that potential solutions already exist and that
they have a place to start from.
We don't intend for the landscape to answer every question people have
about Batch Scheduling on Kubernetes. Much like the vast CNCF landscape
itself, it will be a starting point for people to work from and do their
own diligence on what will work for them.
We don't relish bringing more complexity to an already overwhelming array
of options on the existing landscape (and we really appreciate the recent
improvements and simplifications in the recent update). However, there did
not seem to be any meaningful way of describing the current landscape of
Batch Schedulers within the context of the larger landscape. We are open to
ideas, of course, which is why we're reaching out for discussion.
—
Reply to this email directly, view it on GitHub
<#3761>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAPSIIWYPHULSWW6S2POFLYVEIPZAVCNFSM6AAAAABDXNOKU2VHI2DSMVQWIX3LMV43ASLTON2WKOZSGE2TCOBYHE2DINQ>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
--
Cheers,
Chris Aniszczyk
https://aniszczyk.org
|
We tried to make that approach work at first and it really didn't fit. The problem is that there are a bunch of batch schedulers that need to be mentioned (Slurm/SUNK, LFS, PBS, etc.) that don't really belong in the larger cloud landscape. Yet, in terms of batch scheduling we'd like to acknowledge that there are ways of using these more traditional batch schedulers in the context of k8s. |
honestly I don't mind listing SLURM and some of that in the larger scape
but that's just me.
I need to figure out how to come up with some rules about sub landscapes
and how to ensure we don't have TOO MANY of them.
…On Fri, Feb 23, 2024 at 5:10 PM Alexander Scammon ***@***.***> wrote:
We tried to make that approach work at first and it really didn't fit. The
problem is that there are a bunch of batch schedulers that need to be
mentioned (Slurm/SUNK, LFS, PBS, etc.) that don't really belong in the
larger cloud landscape. Yet, in terms of batch scheduling we'd like to
acknowledge that there are ways of using these more traditional batch
schedulers in the context of k8s.
—
Reply to this email directly, view it on GitHub
<#3761 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAPSIO2X77GXX6Z2QJ5IULYVEOVBAVCNFSM6AAAAABDXNOKU2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNRSGEYTQNJWGQ>
.
You are receiving this because you commented.Message ID:
***@***.***>
--
Cheers,
Chris Aniszczyk
https://aniszczyk.org
|
IMO, it's better to identify the project about batch system for HPC & AI; as some projects has dependencies, e.g. volcano/kueue vs. k8s. We're trying to propose different solutions in ecosystem with those projects.
Do we have something like label/tag to filter projects? That may makes it easier. |
I am curious about the where and why didn't fit?
I agree, once you open the gates, there's no going back. Projects like Slurm do belong to my 2 cents... |
+1 to melding this (and most other notions of "sub-landscape") into the larger one albeit w/ appropriate tag/label and ability to depict it through a lens (e.g. batch) based on query filter. Even within the context of "batch" there a a number of things that IMO ought to show up that are substantial but don't wholly live within the category (e.g. multi-cluster scheduling, gang / co-scheduling, feature discovery, DRA, et cetera are all relevant but certainly not confined to batch). |
As part of the CNCF Batch Working Group (part of the TAG Runtime), we'd like to discuss adding a sub landscape focused on Batch Scheduling similar to the wasm sub landscape.
Example Draft
To illustrate what we were hoping to do, we worked up an example Batch Scheduling landscape here:
Please note that this is merely a rough draft of what a Batch Scheduling landscape could look like. We anticipate more projects will be added as we socialize this landscape throughout the community.
If this discussion would be better in a PR, we'd be happy to submit the changes that would be necessary and we can have the discussion there.
Rationale
The conversation around Batch Schedulers in the context of cloud and Kubernetes has been a complicated one over the last couple of years. As AI/ML continues to dominate discussions, the desire for solutions in this space has amplified. However, we find that people who want to solve this particular challenge often don't know where to start and don't know that there are existing options available.
As a result, companies often create their own bespoke solutions. Just about every KubeCon, another company announces that they are planning to open-source their new Batch Scheduler, often with extremely similar properties to the existing solutions. We'd much prefer to guide people to join forces on the existing solutions, ideally contributing to the conversations ongoing in the Kubernetes Batch Working Group (a sister working group the CNCF group working on k8s-specific issues) around Kueue and improving the core of Kubernetes to be more Batch Scheduling-friendly.
We think adding a landscape for Batch Scheduling could help bring awareness to the community that potential solutions already exist and that they have a place to start from.
We don't intend for the landscape to answer every question people have about Batch Scheduling on Kubernetes. Much like the vast CNCF landscape itself, it will be a starting point for people to work from and do their own diligence on what will work for them.
We don't relish bringing more complexity to an already overwhelming array of options on the existing landscape (and we really appreciate the recent improvements and simplifications in the recent update). However, there did not seem to be any meaningful way of describing the current landscape of Batch Schedulers within the context of the larger landscape. We are open to ideas, of course, which is why we're reaching out for discussion.
The text was updated successfully, but these errors were encountered: