-
Notifications
You must be signed in to change notification settings - Fork 108
Native Bagging Modeling Framework
Bagging is a simple model ensemble technique to improve model performance. In Shifu, bagging is supported native in all algorithms.
"train" : {
"baggingNum" : 5,
"baggingWithReplacement" : false,
"baggingSampleRate": 1,
...
}
If baggingNum is set to multiple, Shifu will train multiple bagging jobs in parallel. By sampling enabled, each model is for different training data. In evaluation part, all models will be used to evaluate test data and final average model score are used to predict test data performance.
Multiple models will be found in /models/ like model0.nn-model5.nn. Such model files can be deployed in production and with ModelRunner
support to load multiple models and do averaging internal to get a final model score.
By using export command, models like LR and NN will be exported to standard PNMML format and which can easity to be deployed in production.
In Shifu GBT, if baggingNum is set to 5, 5 GBT models will be trained and at last they will be averaged to get a better performance results.
This is a very good feature to improve stability of GBT model and which showes 3-5 percent improvement compared 5 GBT models with 1 GBT model.
Usually in each job treeNum can be set for number of trees in Random Forest, consider capability of each job. Set baggingNum to higher values which can easy get more trees in Random Forest and which is proved to be fast in practice.
Bagging in above are all by sampling data, while in Shifu bagging by different algorithm parameters is also supported. Check here how to support grid search. In grid search, users can set multiple parameter combinations and multiple model will be trained. Then do it in evaluation step by multiple models.
"params" : {
"NumHiddenLayers" : 1,
"ActivationFunc" : [ "tanh" ],
"NumHiddenNodes" : [ [30], [45], [60] ],
"LearningRate" : 0.1,
"FeatureSubsetStrategy" : 1,
"DropoutRate": 0.1,
"Propagation" : "Q"
},
Three model will hidden node 30, 45, 60 will be trained on three models, then you can do evaluation on such three models without any configurations.