Skip to content

Train Regression Model in Shifu

wu haifeng edited this page Mar 1, 2021 · 4 revisions

In most case, Shifu is designed for 0-1 regression, including data binning, data normalization and variable selection. But we can also do Linear Regression using Shifu.

There are two ways to train regression model in Shifu.

Method 1: Make temporary 0-1 Tag

  • Create a temporary 0-1 target column by using original target (you can decide how to do do that.)
  • Run shifu stats, shifu norm, shifu varsel as normal
  • After the ColumnConfig.json is generated, and final variables are selected, then change temporary target column to original target column, and remove tags in posTags and negTags
  • Add OutputActivationFunc to ModelConfig.json -> train -> params. The value of OutputActivationFunc could be Linear|ReLU|LeakyReLU|Swish. Depends on what you need.
  • Rerun shifu norm and shifu train step to build model

Method 2: Native

  • Keep posTags and negTags empty in ModelConfig.json. (Attention: "" is not empty, [] is empty.)
  • Use EqualTotal to do binning when run shifu stats
  • Use ONEHOT or ZSCALE_ONEHOT to do data normalization
  • Since IV/KS are all zeros, you can use SE to do variable selection. Or you can use shifu varsel -f <variables.names.file> to select variables manually
  • Add OutputActivationFunc to ModelConfig.json -> train -> params. The value of OutputActivationFunc could be Linear|ReLU|LeakyReLU|Swish. Depends on what you need.
  • Rerun shifu norm and shifu train step to build model

GBDT Regression Support

Natively GBDT supports regression if impurity set to variance, please follow the steps above to prepare well before training and then run GBDT 'shifu train' to train a regression model. In 'eval' step, one parameter need to set to avoid sigmoid of final output:

 "evals" : [ {
    "name" : "Eval1",
    "dataSet" : {
      ...
    },
    "gbtScoreConvertStrategy" : 'RAW',
    ...
  } ]
Clone this wiki locally