-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dataset "usability" for AI #29
Comments
Images that seem to be white or black have data in them. Just normalize[0 - 1], multiply it by 255, and plot it or save it. |
This comment is the answer to Q1 in: BIMCV-COVID19+/FAQ.md |
my bad, I successfully applied normalization on BIMCV-COVID19+ so I thought that would translate to padchest dataset too. Thanks for the insight @stbnps |
I performed the following experiment
Achieving the following results
Specificity:
Sensibility:
The issue
The network seems to perform very well on dataset [3], where each image was manually reviewed by radiologists [4]. However it performs significantly worse on dataset [1], where most labels were extracted using NLP and the images were not reviewed (even leading to the inclusion of completely white, or completely black images [5]).
Do you think the quality of the images and annotations may be a limiting factor for the performance of the network?
References
[1] http://ceib.bioinfo.cipf.es/covid19/resized_padchest_neumo.tar.gz
[2] https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia
[3] https://www.kaggle.com/c/rsna-pneumonia-detection-challenge
[4] https://www.kaggle.com/c/rsna-pneumonia-detection-challenge/overview/acknowledgements
[5] https://github.com/BIMCV-CSUSP/BIMCV-COVID-19/tree/master/padchest-covid#iti---proposal-for-datasets
The text was updated successfully, but these errors were encountered: