-
Notifications
You must be signed in to change notification settings - Fork 733
Issues: kubeflow/trainer
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
Support Kubernetes v1.32
good first issue
help wanted
kind/feature
#2434
opened Feb 13, 2025 by
astefanutti
Cap
nproc_per_node
based on the CPU resources of the node for PyTorch TrainJob
#2407
opened Jan 31, 2025 by
astefanutti
[SDK] Use dictionary unpacking to pass trainer function arguments
area/sdk
kind/feature
#2383
opened Jan 9, 2025 by
astefanutti
Permission denied when reading TrainJob function script when run as non-root user
area/sdk
kind/bug
#2372
opened Jan 7, 2025 by
astefanutti
"zero-trust" security / networking for training jobs
kind/feature
lifecycle/needs-triage
#2341
opened Nov 29, 2024 by
astefanutti
KEP-2170: Add AMD ROCm Torch Distributed Training Runtime
area/runtime
kind/feature
#2335
opened Nov 26, 2024 by
astefanutti
ProTip!
no:milestone will show everything without a milestone.