Skip to content

Latest commit

 

History

History
55 lines (39 loc) · 2.98 KB

06-random-forest.md

File metadata and controls

55 lines (39 loc) · 2.98 KB

6.6 Ensemble learning and random forest

Slides

Notes

Ensemble learning is a machine learning paradigm where multiple models, often referred to as 'weak learners', are strategically combined to solve a particular computational intelligence problem. This approach frequently yields superior predictive performance compared to using a single model.

Random Forest is an example of ensemble learning where each model is a decision tree and their predictions are aggregated to identify the most popular result. Random forest only selects a random subset of features from the original data to make predictions. The 'randomness' in Random Forest stems from two key aspects:

  • Each tree is potentially trained on a bootstrapped sample of the original data, introducing randomness at the row level.
  • At each node during tree construction, only a random subset of features is considered for splitting. This feature randomness helps decorrelate the trees, preventing overfitting and promoting generalization to unseen data.

Bootstrapping is a resampling technique where numerous subsets of the data are created by sampling the original data with replacement. This means that some data points may appear multiple times in a single bootstrap sample, while others may be excluded. In Random Forest, each decision tree is trained on a distinct bootstrap sample, further contributing to the diversity and robustness of the ensemble.

Parameter tuning is crucial for optimizing the performance of a Random Forest model. Two critical parameters are max_depth, which controls the maximum depth of each decision tree, and n_estimators, which determines the number of trees in the forest. Increasing max_depth allows for more complex trees, potentially leading to overfitting. Conversely, a larger n_estimators generally improves model accuracy but increases computational cost.

In random forests, the decision trees are trained independently to each other.

Classes, functions, and methods:

  • from sklearn.ensemble import RandomForestClassifier: random forest classifier from sklearn ensemble class.
  • plt.plot(x, y): draw line plot for the values of y against x values.

Add notes from the video (PRs are welcome)

⚠️ The notes are written by the community.
If you see an error here, please create a PR with a fix.

Navigation