An implementation of the simplified C4.5 algorithm.
Use milksets
package for loading UCI datasets.
git clone https://github.com/usualwitch/decision-tree.git
cd decision-tree
pip install .
from milksets import seeds
from dt import DecisionTree
X, y = seeds.load()
tr = DecisionTree(X, y, prune_method='Comp', loss='entropy')
tr.postprune()
For details of the tree building and pruning algorithms, see this post.
The general cost-complexity pruning method we employ is to first build a tree without pruning, and calculate alphas from this tree. Then we use cross validation to select a best alpha to prune the tree.
Both algorithms use cost-complexity pruning strategy. We applied a 5-fold stratified cross-validation to find the best alpha. The results are listed in the following table.
name | tree | alpha | acc |
german | C4.5 | 0.0 | 0.700 |
CART | 0.0 | 0.675 | |
0.5 | 0.704 | ||
iris | C4.5 | 0.0 | 0.430 |
CART | 0.0 | 0.933 | |
page_blocks | C4.5 | 0.0 | 0.721 |
CART | 0.0 | 0.963 | |
seeds | C4.5 | 0.0 | 0.774 |
CART | 0.0 | 0.936 | |
wine | C4.5 | 0.0 | 0.636 |
CART | 0.0 | 0.936 |
In general CART works better than C4.5. This simplified version of C4.5’s test error is quite unstable with regard to the random choice of train and test sets, which should be resolved to make it applicable.
The experiment result is displayed in the picture below.
Prune methods correspond to the three rows, including
- Reduce error pruning (error can be substituted by another loss function)
- Pessimistic pruning
- Cost-complexity pruning
Loss functions correspond to the three columns, including
- empirical entropy
- gini impurity
- 0-1 loss
From the figure,
- Reduce error pruning and pessimistic pruning is insensitive to loss functions.
- In C4.5, where we use entropy to build the tree, cost-complexity pruning works pretty bad with 0-1 loss. Using entropy as loss in postpruning is a more coherent choice.
The average accuracy of the prune methods and losses are listed in the table below.
prune method | loss | acc |
cost-complexity | 0-1 | 0.517 |
entropy | 0.671 | |
gini | 0.601 | |
pessimistic | 0-1 | 0.489 |
entropy | 0.469 | |
gini | 0.489 | |
reduce error | 0-1 | 0.654 |
entropy | 0.617 | |
gini | 0.620 |
-
J. Ross Quinlan. C4.5: programs for machine learning (1993). https://dl.acm.org/doi/book/10.5555/152181
-
Mingers, J. An Empirical Comparison of Pruning Methods for Decision Tree Induction. Machine Learning, 4, 227–243 (1989). https://doi.org/10.1023/A:1022604100933
-
Hastie, T.; Tibshirani, R. & Friedman, J. (2001), The Elements of Statistical Learning, Springer New York Inc. , New York, NY, USA (2001).