6.3 Decision trees

Notes

Decision Trees are powerful algorithms, capable of fitting complex datasets. The decision trees make predictions based on the bunch of if/else statements by splitting a node into two or more sub-nodes.

With versatility, the decision tree is also prone to overfitting. One of the reason why this algorithm often overfits because of its depth. It tends to memorize all the patterns in the train data but struggle to performs well on the unseen data (validation or test set).

To overcome with overfitting problem, we can reduce the complexity of the algorithm by reducing the depth size.

The decision tree with only a single depth is called decision stump and it only has one split from the root.

Classes, functions, and methods:

DecisionTreeClassifier: classification model from sklearn.tree class.
max_depth: hyperparameter to control the depth of decision tree algorithm.
export_text: method from sklearn.tree class to display the text report showing the rules of a decision tree.

Note: we have already covered DictVectorizer in session 3 and roc_auc_score in session 4 respectively.

Add notes from the video (PRs are welcome)

⚠️	The notes are written by the community. If you see an error here, please create a PR with a fix.

Navigation

Machine Learning Zoomcamp course
Session 6: Decision Trees and Ensemble Learning
Previous: Data cleaning and preparation
Next: Decision tree learning algorithm

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

03-decision-trees.md

03-decision-trees.md

6.3 Decision trees

Notes

Navigation

Files

03-decision-trees.md

Latest commit

History

03-decision-trees.md

File metadata and controls

6.3 Decision trees

Notes

Navigation