5. Model Performance and Optimization

Confusion matrix

Sensitivity (Recall / True Positive Rate): Number of positives out of all positives

TP / (TP + FN)
When you want to avoid false negatives
% of people with a disease that are identified as having the disease
"Recall all the positives in the dataset, how many did you get right?"

Specificity (True Negative rate): Correct positives out of the predicted positive results

TN / (TN + FP)
When you want to avoid false positives
% of people without a disease that are identified as not having the disease
"Recall all the negatives in the dataset, how many did you get right?"

Precision:

TP / (TP + FP)
How accurate are you?
Out of all the things you said were positive, how many actually were?

Accuracy: Proportion of all predictions correctly identified

(TP + TN) / Total
Correct / total = Accuracy
How correct overall am I?

ROC / AUC: Visualise the balance of sensitivity (TPR) and specificity (FPR) of a binary classifier when using different cutoff points for the classification

https://www.youtube.com/watch?v=4jRBRDbJemM

GINI Impurity: Measure of impurity when evaluating which feature to use in splitting decision trees. The feature with the lowest GINI impurity score gets chosen as the root node.

F1 Score: Combination of Recall and precision, takes more into account FP and FN than accuracy.

2 * ( Recall * Precision) / ( Recall + Precision )

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

05-model-performance.md

05-model-performance.md

5. Model Performance and Optimization

Files

05-model-performance.md

Latest commit

History

05-model-performance.md

File metadata and controls

5. Model Performance and Optimization