Skip to content

Releases: deepakri201/DICOMScanClassification

Ground truth assignment for series

31 Jul 15:49
807ebc8
Compare
Choose a tag to compare

This csv includes the ground truth assignment for each series that was used for training/validating/testing the AI models. These ground truth labels were assigned by developing various regular expressions for parsing the SeriesDescription.
Note -- they have not been verified by a radiologist or clinician.

The ground truth column = gt.

The csv includes identifying information, including the:

  • collection_id
  • PatientID
  • StudyInstanceUID
  • SeriesInstanceUID
  • SOPInstanceUID.

The csv also holds metadata extracted from the DICOM files, including the:

  • RepetitionTime
  • EchoTime
  • FlipAngle
  • InversionTime
  • EchoTrainLength
  • TriggerTime
  • IPP_2 = image position patient z coordinate
  • PixelSpacing_x and PixelSpacing_y

The following fields were generated to determine if the volume was 3D or 4D:

  • NumSlices = number of slices in the series
  • MedianIndex = the index of the middle slice - used for the AI models.
  • NumVolumes = number of 3D volumes in the series, calculated by getting the number of times an IPP appeared
  • NumSeriesInStudyWithGt = the number of series in the study that had the same ground truth class
  • is_4D = our assignment if the series was a 4D series (set to TRUE) or a 3D series (set to FALSE)

The following are derived from the DICOM metadata:

  • has_contrast = set to TRUE if contrast is used, FALSE otherwise
  • has_multiple_orientations = set to TRUE if multiple_orientations, FALSE otherwise. Determined by number of unique ImageOrientationPatient values.
  • has_scanningSequence_SE = set to TRUE if the scanningSequence contains SE, FALSE otherwise
  • has_scanningSequence_EP = set to TRUE if the scanningSequence contains EP, FALSE otherwise
  • has_scanningSequence_GR = set to TRUE if the scanningSequence contains GR, FALSE otherwise

Other columns of interest:

  • gcs_url = Google Cloud storage location, used for downloading
  • viewer_url = OHIF viewer url to quickly view the series

Note -- Not all of the fields listed above were used for the development of the AI models.

Please refer to the paper for further details:
Krishnaswamy D, Kovács B, Denner S, Pieper S, Clunie D, Bridge CP, Kapur T, Maier-Hein KH, Fedorov A. Automatic classification of prostate MR series type using image content and metadata. arXiv preprint arXiv:2404.10892. 2024 Apr 16.
https://arxiv.org/pdf/2404.10892

Data and metadata required for model development and testing

22 Jun 22:02
807ebc8
Compare
Choose a tag to compare

Here we attach the files that are necessary for the model development and training for the MIDL 2024 submission. We include the:

  • pretrained_models - this folder holds the three pre-trained models for the metadata only (random forest classifier based), images only (CNN-based), images+metadata (CNN-based).
    • metadata_only_model.pkl
    • images_only_model/
      • images_only_model_fold0/
      • images_only_model_fold1/
      • images_only_model_fold2/
      • images_only_model_fold3/
    • images_and_metadata_model/
      • images_and_metadata_fold0/
      • images_and_metadata_fold1/
      • images_and_metadata_fold2/
      • images_and_metadata_fold3/
  • scaling_factors.csv - csv file used for scaling the metatdata
  • npy_files/ - this holds the preprocessed mid 2D slices used for training/validation/testing. They are numbered by SOPInstanceUID.
  • df_gt_results.csv - csv that holds necessary data for each slice included, for instance PatientID, SeriesInstanceUID, other DICOM metadata used for classification