-
Notifications
You must be signed in to change notification settings - Fork 108
Multiple Classification Support in Shifu
By default regression is used to process binary classification issue. This is helpful:
- Regression is friendly for decision tuning in final model score threshold
- Categorical and numerical features and other statistics are based on binary classifcation
- Based on 0-1 binary classification, multiple classification is easy to be implemented.
From Shifu 0.2.8, multiple classification feature is supported native in Shifu.
Set negTags and posTags in ModelConfig.json to make one of both is empty, and the other one with multiple elements which is for multiple classifcation.
"dataSet" : {
"source" : "HDFS",
...
"posTags" : [ "M", "B", "C" ],
"negTags" : [ ],
}
Here 'M', 'B' and 'C' are treated as multiple classes. Please make sure such configurations in eval part is configurated well.
Only Random Forest and Neural Network support native multiple classification in Shifu. One hidden parameters in train part of ModelConfig.json denotes native or onevsall solutions:
"train": {
"multiClassifyMethod": "native",
...
}
After model is training, evaluation will only output confusion matrix result for model preformance.
"train": {
"multiClassifyMethod": "ONEVSALL", // or "ONEVSREST"
...
}
No matter what baggingNum set, it will trigger # of classes training job and each one is a regression for that class. Evaluation will base on multiple models and select the biggest model score as the final classification result. The same as native multiple classification, only confusion matrix will be printed at final evaluation console.