Skip to content

Grid Search And Random Search Support in Shifu

Junshi Guo edited this page Feb 22, 2021 · 4 revisions

Grid Search in Shifu

Configure through train#params

If you set parameters in train#params to list, training in Shifu will be treated as Grid Search. All combinations will be treated as model training parameters.

    "params" : {
      "NumHiddenLayers" : [1, 2],
      "ActivationFunc" :  [ ["tanh"], [ "Sigmoid", "Sigmoid" ] ],
      "NumHiddenNodes" : [ [50], [45, 45 ] ],
      "LearningRate" : [0.1, 0.2],
      "Propagation" : "Q"
    },

The hyper parameters (parameters that has multiple possible values) will be sorted in alphabetic order, and then apply index of each hyper parameter list in numeric order to get the final training parameters. Take above params as an example, the for hyper parameters will be sorted as below.

"ActivationFunc" :  [ ["tanh"], [ "Sigmoid", "Sigmoid" ] ]
"LearningRate" : [0.1, 0.2]
"NumHiddenLayers" : [1, 2]
"NumHiddenNodes" : [ [50], [45, 45 ] ]

For the four hyper parameters, the configured parameter value list sizes are 2, 2, 2 and 2. Using list index to represent parameter value, the training parameter combinations will be 0000, 0001, 0010, 0011... The final effective training combinations will be as below.

{"ActivationFunc": ["tanh"], "LearningRate": 0.1, "NumHiddenLayers": 1, "NumHiddenNodes": [50], "Propagation" : "Q"}
{"ActivationFunc": ["tanh"], "LearningRate": 0.1, "NumHiddenLayers": 1, "NumHiddenNodes": [45, 45], "Propagation" : "Q"}
{"ActivationFunc": ["tanh"], "LearningRate": 0.1, "NumHiddenLayers": 2, "NumHiddenNodes": [50], "Propagation" : "Q"}
{"ActivationFunc": ["tanh"], "LearningRate": 0.1, "NumHiddenLayers": 2, "NumHiddenNodes": [45, 45], "Propagation" : "Q"}
{"ActivationFunc": ["tanh"], "LearningRate": 0.2, "NumHiddenLayers": 1, "NumHiddenNodes": [50], "Propagation" : "Q"}
{"ActivationFunc": ["tanh"], "LearningRate": 0.2, "NumHiddenLayers": 1, "NumHiddenNodes": [45, 45], "Propagation" : "Q"}
{"ActivationFunc": ["tanh"], "LearningRate": 0.2, "NumHiddenLayers": 2, "NumHiddenNodes": [50], "Propagation" : "Q"}
{"ActivationFunc": ["tanh"], "LearningRate": 0.2, "NumHiddenLayers": 2, "NumHiddenNodes": [45, 45], "Propagation" : "Q"}
{"ActivationFunc": ["Sigmoid", "Sigmoid"], "LearningRate": 0.1, "NumHiddenLayers": 1, "NumHiddenNodes": [50], "Propagation" : "Q"}
{"ActivationFunc": ["Sigmoid", "Sigmoid"], "LearningRate": 0.1, "NumHiddenLayers": 1, "NumHiddenNodes": [45, 45], "Propagation" : "Q"}
{"ActivationFunc": ["Sigmoid", "Sigmoid"], "LearningRate": 0.1, "NumHiddenLayers": 2, "NumHiddenNodes": [50], "Propagation" : "Q"}
{"ActivationFunc": ["Sigmoid", "Sigmoid"], "LearningRate": 0.1, "NumHiddenLayers": 2, "NumHiddenNodes": [45, 45], "Propagation" : "Q"}
{"ActivationFunc": ["Sigmoid", "Sigmoid"], "LearningRate": 0.2, "NumHiddenLayers": 1, "NumHiddenNodes": [50], "Propagation" : "Q"}
{"ActivationFunc": ["Sigmoid", "Sigmoid"], "LearningRate": 0.2, "NumHiddenLayers": 1, "NumHiddenNodes": [45, 45], "Propagation" : "Q"}
{"ActivationFunc": ["Sigmoid", "Sigmoid"], "LearningRate": 0.2, "NumHiddenLayers": 2, "NumHiddenNodes": [50], "Propagation" : "Q"}
{"ActivationFunc": ["Sigmoid", "Sigmoid"], "LearningRate": 0.2, "NumHiddenLayers": 2, "NumHiddenNodes": [45, 45], "Propagation" : "Q"}

Configure through file (since version 0.11.0)

You can also use an external file to set Grid Search params. In this case, you need to set train#gridConfigFile to a local file path, relative or absolute. Each line in the configuration file should be : pairs delimited by ';'. And now, only param combinations you list in file will be treated as model training parameters.

FeatureSubsetStrategy:ONETHIRD;MaxDepth:7
FeatureSubsetStrategy:0.5;MaxDepth:6
FeatureSubsetStrategy:0.5;MaxDepth:6:LearningRate:0.04

After a model is trained, last validation error is used to select the best one. At last, when you check your ModelConfig.json with best parameters setting well.

Random Search in Shifu

If two many combinations in Shifu grid search and size is over 30, random search is enabled automatically to select by default 30 combinations and then train model and select best parameters by validation error.

Two parameters in shifuconfig can be tuned for grid search and random search.

 ## each round how many jobs can be running, if bagging is 10, two rounds of bagging with each one 5 guagua jobs 
shifu.train.bagging.inparallel=5

## if number of hyper parameter composite over such threshold, random search will be enabled. 
shifu.gridsearch.threshold=30

One Sample for Tree Model Grid Search in Shifu

The same as NN, just set parameters in training to be list, training will be treated as grid search.

     "params" : {
        "TreeNum":[10, 50, 100],
        "FeatureSubsetStrategy": ["ALL", 'ONETHIRD', "HALF"],
        "MaxDepth": 8,
        "MaxStatsMemoryMB": 256,
        "Impurity":"variance",
        "LearningRate": [0.1, 0.5, 0.05],
        "MinInstancesPerNode": 1,
        "MinInfoGain": 0.0,
        "Loss": "squared"
    },

Grid Search Configuration File Support in Shifu

You can also put grid search param combinations in a file, e.g. grid.conf in below.

  "train" : {
    "baggingNum" : 1,
    "baggingWithReplacement" : false,
    "baggingSampleRate" : 1.0,
    "validSetRate" : 0.1,
    "numTrainEpochs" : 1500,
    "isContinuous" : false,
    "workerThreadCount" : 4,
    "algorithm" : "GBT",
    "gridConfigFile" : "grid.conf",
    "params" : {
        "MaxDepth": 8,
        "MaxStatsMemoryMB": 256,
        "Impurity":"variance",
        "MinInstancesPerNode": 1,
        "MinInfoGain": 0.0,
        "Loss": "squared"
    },
    "customPaths" : {}
  },

Contents in grid.conf file can be this format:

TreeNum:10;FeatureSubsetStrategy:ALL;LearningRate:0.1
TreeNum:10;FeatureSubsetStrategy:ALL;LearningRate:0.5
TreeNum:50;FeatureSubsetStrategy:ONETHIRD;LearningRate:0.5
TreeNum:100;FeatureSubsetStrategy:HALF;LearningRate:0.05

And train#params would hold common/default param values. Even your set grid search format in below, only combinations in the file are supported.

    "params" : {
        "MaxDepth": 8,
        "MaxStatsMemoryMB": 256,
        "Impurity":"variance",
        "MinInstancesPerNode": 1,
        "MinInfoGain": 0.0,
        "Loss": "squared"
    },
Clone this wiki locally