You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for sharing the code online; I would like to share some insight. I found a bit of inconsistency with the number of samples in the valid and testing data split.
For 1009 subjects with 100% data usage, here's how the split calculation works:
Total subjects = 1009.
Data usage = 100% (i.e., all 1009 subjects are used).
Now, applying the train_length and val_length:
Train length = 70% of 1009 = 1009 × 0.7 = 706.3
1009×0.7=706.3, which rounds to approximately 706 subjects.
Validation length = 10% of 1009 = 1009 × 0.1 = 100.9 which rounds to approximately 101 subjects.
Test length = Remaining subjects = 1009 −706 − 101 = 202 subjects.
Final Split:
Train set: 706 subjects.
Validation set: 101 subjects.
Test set: 202 subjects.
However, given the piece of code in dataloader.py file for
The test data we get is 101, and the valid data is 202, which I think is swapped and caused data to miss calculation for the test dataset.
Please guide me if I am mistaken. I think it is due to the error in the for loop as ( for test_index, valid_index in split2.split), and it must be ( for valid_index, test_index in split2.split).
Thanks a lot. If you have an in-depth explanation of why so, you could email me: [email protected]
The text was updated successfully, but these errors were encountered:
Hi @ubc-tea @AnushreeBannadabhavi ,
Thank you for sharing the code online; I would like to share some insight. I found a bit of inconsistency with the number of samples in the valid and testing data split.
For 1009 subjects with 100% data usage, here's how the split calculation works:
Total subjects = 1009.
Data usage = 100% (i.e., all 1009 subjects are used).
Now, applying the train_length and val_length:
Train length = 70% of 1009 = 1009 × 0.7 = 706.3
1009×0.7=706.3, which rounds to approximately 706 subjects.
Validation length = 10% of 1009 = 1009 × 0.1 = 100.9 which rounds to approximately 101 subjects.
Test length = Remaining subjects = 1009 −706 − 101 = 202 subjects.
Final Split:
Train set: 706 subjects.
Validation set: 101 subjects.
Test set: 202 subjects.
However, given the piece of code in dataloader.py file for
The test data we get is 101, and the valid data is 202, which I think is swapped and caused data to miss calculation for the test dataset.
Please guide me if I am mistaken. I think it is due to the error in the for loop as ( for test_index, valid_index in split2.split), and it must be ( for valid_index, test_index in split2.split).
Thanks a lot. If you have an in-depth explanation of why so, you could email me: [email protected]
The text was updated successfully, but these errors were encountered: