Releases · deepakri201/DICOMScanClassification

This csv includes the ground truth assignment for each series that was used for training/validating/testing the AI models. These ground truth labels were assigned by developing various regular expressions for parsing the SeriesDescription.
Note -- they have not been verified by a radiologist or clinician.

The ground truth column = gt.

The csv includes identifying information, including the:

collection_id
PatientID
StudyInstanceUID
SeriesInstanceUID
SOPInstanceUID.

The csv also holds metadata extracted from the DICOM files, including the:

RepetitionTime
EchoTime
FlipAngle
InversionTime
EchoTrainLength
TriggerTime
IPP_2 = image position patient z coordinate
PixelSpacing_x and PixelSpacing_y

The following fields were generated to determine if the volume was 3D or 4D:

NumSlices = number of slices in the series
MedianIndex = the index of the middle slice - used for the AI models.
NumVolumes = number of 3D volumes in the series, calculated by getting the number of times an IPP appeared
NumSeriesInStudyWithGt = the number of series in the study that had the same ground truth class
is_4D = our assignment if the series was a 4D series (set to TRUE) or a 3D series (set to FALSE)

The following are derived from the DICOM metadata:

has_contrast = set to TRUE if contrast is used, FALSE otherwise
has_multiple_orientations = set to TRUE if multiple_orientations, FALSE otherwise. Determined by number of unique ImageOrientationPatient values.
has_scanningSequence_SE = set to TRUE if the scanningSequence contains SE, FALSE otherwise
has_scanningSequence_EP = set to TRUE if the scanningSequence contains EP, FALSE otherwise
has_scanningSequence_GR = set to TRUE if the scanningSequence contains GR, FALSE otherwise

Other columns of interest:

gcs_url = Google Cloud storage location, used for downloading
viewer_url = OHIF viewer url to quickly view the series

Note -- Not all of the fields listed above were used for the development of the AI models.

Please refer to the paper for further details:
Krishnaswamy D, Kovács B, Denner S, Pieper S, Clunie D, Bridge CP, Kapur T, Maier-Hein KH, Fedorov A. Automatic classification of prostate MR series type using image content and metadata. arXiv preprint arXiv:2404.10892. 2024 Apr 16.
https://arxiv.org/pdf/2404.10892

Here we attach the files that are necessary for the model development and training for the MIDL 2024 submission. We include the:

pretrained_models - this folder holds the three pre-trained models for the metadata only (random forest classifier based), images only (CNN-based), images+metadata (CNN-based).
- metadata_only_model.pkl
- images_only_model/
  - images_only_model_fold0/
  - images_only_model_fold1/
  - images_only_model_fold2/
  - images_only_model_fold3/
- images_and_metadata_model/
  - images_and_metadata_fold0/
  - images_and_metadata_fold1/
  - images_and_metadata_fold2/
  - images_and_metadata_fold3/
scaling_factors.csv - csv file used for scaling the metatdata
npy_files/ - this holds the preprocessed mid 2D slices used for training/validation/testing. They are numbered by SOPInstanceUID.
df_gt_results.csv - csv that holds necessary data for each slice included, for instance PatientID, SeriesInstanceUID, other DICOM metadata used for classification

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: deepakri201/DICOMScanClassification

Ground truth assignment for series

Data and metadata required for model development and testing