Skip to content

Latest commit

 

History

History
10 lines (6 loc) · 1.23 KB

README.md

File metadata and controls

10 lines (6 loc) · 1.23 KB

Random Forest Classification

Load an Excel file and process a random forest classifcation based on the user selected features and label

Before executing the script, the user need to modify the inputfile variable to the desired Excel spreadsheet. The default reading mode will take the first row as the header and the first column as the index to a pandas dataframe.

Once the data is successfully loaded, it will prompt the user to select one or more features for classification, then one feature as the class label. This random forest nodes will grow parallel and randomly pick 80% of the samples for training. The warm start is enabled for the classifier to adjust the nodes during 300 iteration times. The weights for different classes will be adjusted in case the input labels are imbalanced. Once the classifier is trained completely to achieve a stable out-of-bag error, the classifier will be apply to all samples for classification.

The output will be two figures. The first one is the confusion matrix of the classification results with the accuracy annotated. The second figure is the change of out-of-bag accuracy durint the iteration.

Dependencies: numpy, pandas, scikit-learn, and matplotlib.