Releases: deepakri201/DICOMScanClassification
Ground truth assignment for series
This csv includes the ground truth assignment for each series that was used for training/validating/testing the AI models. These ground truth labels were assigned by developing various regular expressions for parsing the SeriesDescription.
Note -- they have not been verified by a radiologist or clinician.
The ground truth column = gt.
The csv includes identifying information, including the:
- collection_id
- PatientID
- StudyInstanceUID
- SeriesInstanceUID
- SOPInstanceUID.
The csv also holds metadata extracted from the DICOM files, including the:
- RepetitionTime
- EchoTime
- FlipAngle
- InversionTime
- EchoTrainLength
- TriggerTime
- IPP_2 = image position patient z coordinate
- PixelSpacing_x and PixelSpacing_y
The following fields were generated to determine if the volume was 3D or 4D:
- NumSlices = number of slices in the series
- MedianIndex = the index of the middle slice - used for the AI models.
- NumVolumes = number of 3D volumes in the series, calculated by getting the number of times an IPP appeared
- NumSeriesInStudyWithGt = the number of series in the study that had the same ground truth class
- is_4D = our assignment if the series was a 4D series (set to TRUE) or a 3D series (set to FALSE)
The following are derived from the DICOM metadata:
- has_contrast = set to TRUE if contrast is used, FALSE otherwise
- has_multiple_orientations = set to TRUE if multiple_orientations, FALSE otherwise. Determined by number of unique ImageOrientationPatient values.
- has_scanningSequence_SE = set to TRUE if the scanningSequence contains SE, FALSE otherwise
- has_scanningSequence_EP = set to TRUE if the scanningSequence contains EP, FALSE otherwise
- has_scanningSequence_GR = set to TRUE if the scanningSequence contains GR, FALSE otherwise
Other columns of interest:
- gcs_url = Google Cloud storage location, used for downloading
- viewer_url = OHIF viewer url to quickly view the series
Note -- Not all of the fields listed above were used for the development of the AI models.
Please refer to the paper for further details:
Krishnaswamy D, Kovács B, Denner S, Pieper S, Clunie D, Bridge CP, Kapur T, Maier-Hein KH, Fedorov A. Automatic classification of prostate MR series type using image content and metadata. arXiv preprint arXiv:2404.10892. 2024 Apr 16.
https://arxiv.org/pdf/2404.10892
Data and metadata required for model development and testing
Here we attach the files that are necessary for the model development and training for the MIDL 2024 submission. We include the:
- pretrained_models - this folder holds the three pre-trained models for the metadata only (random forest classifier based), images only (CNN-based), images+metadata (CNN-based).
- metadata_only_model.pkl
- images_only_model/
- images_only_model_fold0/
- images_only_model_fold1/
- images_only_model_fold2/
- images_only_model_fold3/
- images_and_metadata_model/
- images_and_metadata_fold0/
- images_and_metadata_fold1/
- images_and_metadata_fold2/
- images_and_metadata_fold3/
- scaling_factors.csv - csv file used for scaling the metatdata
- npy_files/ - this holds the preprocessed mid 2D slices used for training/validation/testing. They are numbered by SOPInstanceUID.
- df_gt_results.csv - csv that holds necessary data for each slice included, for instance PatientID, SeriesInstanceUID, other DICOM metadata used for classification