You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If a class has less members than n_splits, then for some splits of StratifiedKFold we will have no representatives of that class in the validation or in the training set. For instance, splitting [0] * 7 + [1] * 3 with StratifiedKFold(n_splits=3) yields the following training and validation sets, where in split 3 there is no sample of class 1 in the validation set!
Some metrics may not be computed and/or return wrong values.
Metrics such as precision_score, recall_score, f1_score cannot be computed if there is no sample for a given class.
For instance, in the example above, using the validation set of split 3 to compute f1_score for y_pred = [0, 0, 0] will return 0.0 (despite y_pred matching perfectly y_true!!) and raise this warning:
However in our case we do not compute metrics per-split and then average across all splits, but instead we take all the out-of-sample predictions (generated during the various splits) and then compute the metric using all samples. Therefore, no class can ever have 0 samples during evaluation.
A class in the validation set may not be present in the training set.
This would be dramatic, because after the training the model would not even be aware the existence of a class that is however present in the validation set. So it is guaranteed that the model will never predict that class.
However I have never observed this happening on our data. I am not even sure it is possible.
Evaluating on classes with few samples may be not very meaningful.
Does it really make sense to take into account the model performance with respect to a class that has only 1 member in the training set or in the validation set?
We could remove or merge classes with less than 3 samples.
But this should be discussed with the scientists. Maybe those classes with few samples are very important and well defined and must be kept anyway?
The text was updated successfully, but these errors were encountered:
Context & Description
When we run k-fold cross validation, we use
n_splits=3
.morphoclass/dvc/training/configs/splitter-stratifKFold.yaml
Line 4 in 021c632
But for layer

L4
and layerL6
of the datasetinterneurons
, we have classesL4_BP
andL6_DBC
with2<3
samples.This situation generates the following Python warning when iterating over
StratifiedKFold.split(X, y)
:What happens?
If a class has less members than
n_splits
, then for some splits ofStratifiedKFold
we will have no representatives of that class in the validation or in the training set. For instance, splitting[0] * 7 + [1] * 3
withStratifiedKFold(n_splits=3)
yields the following training and validation sets, where insplit 3
there is no sample of class1
in the validation set!Why may this be an issue?
Metrics such as
precision_score
,recall_score
,f1_score
cannot be computed if there is no sample for a given class.For instance, in the example above, using the validation set of
split 3
to computef1_score
fory_pred = [0, 0, 0]
will return0.0
(despitey_pred
matching perfectlyy_true
!!) and raise this warning:However in our case we do not compute metrics per-split and then average across all splits, but instead we take all the out-of-sample predictions (generated during the various splits) and then compute the metric using all samples. Therefore, no class can ever have
0
samples during evaluation.This would be dramatic, because after the training the model would not even be aware the existence of a class that is however present in the validation set. So it is guaranteed that the model will never predict that class.
However I have never observed this happening on our data. I am not even sure it is possible.
Does it really make sense to take into account the model performance with respect to a class that has only 1 member in the training set or in the validation set?
However as long as we look at
micro
orweighted
averages, the impact of (potentially awful) performance on classes with 1 or 2 samples is limited. But this could be bad if we want to look atmacro
averages. https://github.com/scikit-learn/scikit-learn/blob/baf828ca126bcb2c0ad813226963621cafe38adb/sklearn/metrics/_classification.py#L1049-L1062How do we solve this?
We could remove or merge classes with less than 3 samples.
But this should be discussed with the scientists. Maybe those classes with few samples are very important and well defined and must be kept anyway?
The text was updated successfully, but these errors were encountered: