-
Notifications
You must be signed in to change notification settings - Fork 724
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-2170: Add the manifests overlay for Kubeflow Training V2 #2382
base: master
Are you sure you want to change the base?
Changes from all commits
a46294f
963bfbc
1100986
a038ef2
1f1b0c2
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,2 @@ | ||
resources: | ||
- manager.yaml | ||
# TODO (andreyvelich): Move it to overlays once we copy the JobSet manifests. | ||
namespace: kubeflow-system | ||
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,84 @@ | ||
apiVersion: rbac.authorization.k8s.io/v1 | ||
kind: ClusterRole | ||
metadata: | ||
name: kubeflow-training-admin-v2 | ||
labels: | ||
rbac.authorization.kubeflow.org/aggregate-to-kubeflow-admin-v2: "true" | ||
aggregationRule: | ||
clusterRoleSelectors: | ||
- matchLabels: | ||
rbac.authorization.kubeflow.org/aggregate-to-kubeflow-training-admin-v2: "true" | ||
rules: [] | ||
|
||
--- | ||
apiVersion: rbac.authorization.k8s.io/v1 | ||
kind: ClusterRole | ||
metadata: | ||
name: kubeflow-training-edit-v2 | ||
labels: | ||
rbac.authorization.kubeflow.org/aggregate-to-kubeflow-edit-v2: "true" | ||
rbac.authorization.kubeflow.org/aggregate-to-kubeflow-training-admin-v2: "true" | ||
rules: | ||
- apiGroups: | ||
- kubeflow.org | ||
resources: | ||
- clustertrainingruntimes | ||
- trainingruntimes | ||
- trainjobs | ||
verbs: | ||
- create | ||
- delete | ||
- get | ||
- list | ||
- patch | ||
- update | ||
- watch | ||
- apiGroups: | ||
- kubeflow.org | ||
resources: | ||
- trainjobs/status | ||
verbs: | ||
- get | ||
- apiGroups: | ||
- "" | ||
resources: | ||
- persistentvolumeclaims | ||
verbs: | ||
- create | ||
- delete | ||
- get | ||
- list | ||
- watch | ||
- apiGroups: | ||
- "" | ||
resources: | ||
- events | ||
verbs: | ||
- get | ||
- list | ||
- watch | ||
|
||
--- | ||
apiVersion: rbac.authorization.k8s.io/v1 | ||
kind: ClusterRole | ||
metadata: | ||
name: kubeflow-training-view-v2 | ||
labels: | ||
rbac.authorization.kubeflow.org/aggregate-to-kubeflow-view-v2: "true" | ||
rules: | ||
- apiGroups: | ||
- kubeflow.org | ||
resources: | ||
- clustertrainingruntimes | ||
- trainingruntimes | ||
- trainjobs | ||
verbs: | ||
- get | ||
- list | ||
- watch | ||
- apiGroups: | ||
- kubeflow.org | ||
resources: | ||
- trainjobs/status | ||
verbs: | ||
- get |
Original file line number | Diff line number | Diff line change | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,20 @@ | ||||||||||||||
apiVersion: kustomize.config.k8s.io/v1beta1 | ||||||||||||||
kind: Kustomization | ||||||||||||||
namespace: kubeflow | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. May I ask a question? Since we have defined the namespace for manager and webhook:
Does There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for your reviewing. It seems that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think, we should make our namespace configurable and use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Doris-xm FYI, I think what @andreyvelich means is that we should address these
training-operator/manifests/v2/base/webhook/kustomization.yaml Lines 13 to 14 in 1dfa40c
And specify their namespace in
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for your detailed explanation! I've moved the namespace configuration from |
||||||||||||||
resources: | ||||||||||||||
- ../../base/crds | ||||||||||||||
- ../../base/manager | ||||||||||||||
- ../../base/rbac | ||||||||||||||
- ../../base/webhook | ||||||||||||||
- ../../base/runtimes/pre-training | ||||||||||||||
- kubeflow-training-roles.yaml | ||||||||||||||
# TODO (andreyvelich): JobSet should support kubeflow namespace. | ||||||||||||||
- https://github.com/kubernetes-sigs/jobset/releases/download/v0.6.0/manifests.yaml | ||||||||||||||
images: | ||||||||||||||
- name: kubeflow/training-operator-v2 | ||||||||||||||
newTag: latest | ||||||||||||||
secretGenerator: | ||||||||||||||
- name: training-operator-v2-webhook-cert | ||||||||||||||
namespace: kubeflow | ||||||||||||||
options: | ||||||||||||||
disableNameSuffixHash: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How this is going to work since we don't copy JobSet manifests in the Kubeflow Training ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, the JobSet manifest is referred from a remote URL, so the
namespace
configuration in the base or overlays cannot override the JobSet manifests, I think.From the changlog, it seems that the latest release hasn’t included the changes in kubernetes-sigs/jobset/#719. We may need to manually modify the namespace setting for JobSet. I'm not sure if I need to deal with it in this pr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andreyvelich I tried it with:
It does not work. Maybe we should modify the jobset manifests to make namespace configurable.
And just a question, do we really need to install all components in the same namespace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recently @kannon92 made a PR to support other namespaces in JobSet: kubernetes-sigs/jobset#719.
Probably, we should use
kubeflow
namespace to install the JobSet operator within Kubeflow Platform if @kubeflow/wg-training-leads or @kubeflow/wg-manifests-leads don't have any other suggestions.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, this PR seems not to be included in the current release of jobset:
kubernetes-sigs/jobset@release-6.0...v0.7.2
In the manifest, the namespace is hardcoded into jobset-system:
https://github.com/kubernetes-sigs/jobset/releases/download/v0.7.2/manifests.yaml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM. For me, I just don't know how to overlay namespace items outside of the
.metadata
field if we only specify namespace field in kustomization.yaml. Here is the example of which namespace field lies outside of the.metadata
field:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use the var references, similar to this example. We will tell Kustomize how to override the namespace value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM. That's what I want to do with the JobSet manifests!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Doris-xm So now we can wait for the next release of JobSet. And make some appropriate changes to the manifest after that:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you all for the insightful discussion! I’ve learned a lot from this. Much appreciated!