Train Regression Model in Shifu

In most case, Shifu is designed for 0-1 regression, including data binning, data normalization and variable selection. But we can also do Linear Regression using Shifu.

There are two ways to train regression model in Shifu.

Method 1: Make temporary 0-1 Tag

Create a temporary 0-1 target column by using original target (you can decide how to do do that.)
Run shifu stats, shifu norm, shifu varsel as normal
After the ColumnConfig.json is generated, and final variables are selected, then change temporary target column to original target column, and remove tags in posTags and negTags
Add OutputActivationFunc to ModelConfig.json -> train -> params. The value of OutputActivationFunc could be Linear|ReLU|LeakyReLU|Swish. Depends on what you need.
Rerun shifu norm and shifu train step to build model

Method 2: Native

Keep posTags and negTags empty in ModelConfig.json. (Attention: "" is not empty, [] is empty.)
Use EqualTotal to do binning when run shifu stats
Use ONEHOT or ZSCALE_ONEHOT to do data normalization
Since IV/KS are all zeros, you can use SE to do variable selection. Or you can use shifu varsel -f <variables.names.file> to select variables manually
Add OutputActivationFunc to ModelConfig.json -> train -> params. The value of OutputActivationFunc could be Linear|ReLU|LeakyReLU|Swish. Depends on what you need.
Rerun shifu norm and shifu train step to build model

GBDT Regression Support

Natively GBDT supports regression if impurity set to variance, please follow the steps above to prepare well before training and then run GBDT 'shifu train' to train a regression model. In 'eval' step, one parameter need to set to avoid sigmoid of final output:

 "evals" : [ {
    "name" : "Eval1",
    "dataSet" : {
      ...
    },
    "gbtScoreConvertStrategy" : 'RAW',
    ...
  } ]

Shifu: A Distributed Model Training Framework on Hadoop

DOWNLOAD

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Train Regression Model in Shifu

Method 1: Make temporary 0-1 Tag

Method 2: Native

GBDT Regression Support

Clone this wiki locally