-
Notifications
You must be signed in to change notification settings - Fork 108
Machine Learning Pipeline in Shifu
Hu Zhanghao edited this page Jul 3, 2019
·
4 revisions
One of Shifu's pros is an end-to-end modeling pipeline in machine learning. With only configurations settings, a whole machine learning pipeline can be built and models can be developed and pushed to production much easier.
Shifu is designed per each step in whole pipeline and commands are all step names. Configurations are in ModelConfig.json and ColumnConfig.json.
Steps are built well in Shifu to support end-to-end machine learning model training.
- 'new': create a model workspace for specified model training; user can set data path, headers and data schema well
- 'init': basic check how many columns and generate template for ColumnConfig.json; set categorical columns in this step
- 'stats': do statistics on each column for mean, stddev, ks, iv, binning and other stats info.
- 'norm': do normalization like zscore, maxmin or woe transform for further model training. Missing value and exceptional value processing are all in this step
- 'varsel': do variable selection by statistics info like KS / IV or sensitivity analysis
- 'train': train model according to algorithm configured: LR/NN/RF/GBT are supported well.
- 'posttrain': do binning model score computing
- 'eval': evaluate model performance based on multiple evaluation data sets
- 'export': export to PMML format LR/NN models