Machine Learning Pipeline in Shifu

One of Shifu's pros is an end-to-end modeling pipeline in machine learning. With only configurations settings, a whole machine learning pipeline can be built and models can be developed and pushed to production much easier.

Pipeline in Shifu

Shifu Pipeline

Shifu is designed per each step in whole pipeline and commands are all step names. Configurations are in ModelConfig.json and ColumnConfig.json.

Steps are built well in Shifu to support end-to-end machine learning model training.

'new': create a model workspace for specified model training; user can set data path, headers and data schema well
'init': basic check how many columns and generate template for ColumnConfig.json; set categorical columns in this step
'stats': do statistics on each column for mean, stddev, ks, iv, binning and other stats info.
'norm': do normalization like zscore, maxmin or woe transform for further model training. Missing value and exceptional value processing are all in this step
'varsel': do variable selection by statistics info like KS / IV or sensitivity analysis
'train': train model according to algorithm configured: LR/NN/RF/GBT are supported well.
'posttrain': do binning model score computing
'eval': evaluate model performance based on multiple evaluation data sets
'export': export to PMML format LR/NN models

Shifu: A Distributed Model Training Framework on Hadoop

DOWNLOAD

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Machine Learning Pipeline in Shifu

Pipeline in Shifu

Clone this wiki locally