GitHub - yhalpha/kubedl: A unified operator for running deep learning/machine learning workloads on Kubernetes

KubeDL enables deep learning workloads to run on Kubernetes more easily and efficiently.

KubeDL is a CNCF sandbox project.

Features

Support training and inferences workloads (Tensorflow, Pytorch. Mars etc.)in a single unified controller. Features include advanced scheduling, acceleration using cache, metadata persistentcy, file sync, enable service discovery for training in host network etc.
Automatically tunes the best container-level configurations before an ML model is deployed as inference services. - Morphling Github
Model lineage and versioning to track the history of a model natively in CRD: when the model is trained using which data and which image, each version of the model, which version is running etc.
Enables storing and versioning a model leveraging container images. Each model version is stored as its own image and can later be served with Serving framework.

Check the website: https://kubedl.io

KubeDL-Morphling paper accepted at ACM Socc 2021: Morphling: Fast, Near-Optimal Auto-Configuration for Cloud-Native Model Serving

Name		Name	Last commit message	Last commit date
Latest commit History 355 Commits
.github		.github
.license		.license
apis		apis
client		client
cmd/options		cmd/options
config		config
console		console
controllers		controllers
docs		docs
example		example
hack		hack
helm/kubedl		helm/kubedl
pkg		pkg
scripts		scripts
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
Dockerfile.dashboard		Dockerfile.dashboard
GOVERNANCE.md		GOVERNANCE.md
LICENSE		LICENSE
Makefile		Makefile
OWNERS		OWNERS
PROJECT		PROJECT
README.md		README.md
SECURITY.md		SECURITY.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go