-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assertions and other #144
Comments
run with --dbg flag so you end up in pdb upon failure and could check details on what is going on? |
can you share the command you are using and how your data is set up? we may be able to optimize this and make the conversion much faster and more efficient |
knock knock @mfalkiewicz -- without further info we wouldn't be able to reproduce or figure really out why is this happening and would need just to close this issue unresolved |
My apologies guys, I will provide the details in the coming days. |
Hello, I think the assert studyUID == file_studyUID is simply too much. I mean, the check is valid and makes sense, but in my use cases what doesn't make sense is that it breaks the pipeline, i.e. heudiconv exits with with AssertionError. Having to re-start processing of 1000 subjects because of that error is really annoying. After removing that assertion (and replacing it with a warning), heudiconv nicely traverses the entire dataset, no worries. Running this with --dbg indeed shows that studyID and file_studyID don't match, but I'm pretty sure the data comes from the same protocol. Regarding the crashes due to encountering folders instead of files, here is a more detailed problem description. I have datasets in which dicom folders are structured differently. Some dicoms have .DCM extension, some don't have an extension at all (for example, data comes from different scanners). Thus, in order to match the extension-less files I need to use a wildcard *. However, the second layer of mess is that the subfolder structure of DICOM directories also varies across subjects. For instance, One idea to solve the latter problem is to use something that replicates dcm2niix behavior when dealing with the folders. It would be much more convenient if I could provide a template 'sub*/dicom' and all subdirectories would be traversed in search for dicoms. I hope that clarifies these issues. |
I love certainty and confidence, but I really prefer facts and "better be safe than sorry"!
is there a chance you could share what those values were? (anyhow close?) may be there is a chance to share (privately if necessary) minimal set of sample files or dcmdump of them to figure out the reason? |
Re:
well -- we kinda already have it but I guess it might be triggered (only?) if no template is provided. You could try |
I also am finding that the studyUID check is feeling very strong. I have an old dataset from 2006 on a Siemens Trio I'd like to update and publish/reanalyze. I know that the data come from the same subject and session but for some reason the Dicom files have an empty string for Accession Number and a different Study Instance UID for every scan. Perhaps something was misconfigured on the machine at that time or something. However, as a result I can't move forward with heudiconv despite this seeming like a non-fatal issue in the larger scheme (I know the structure of the files based on their filename). Is the best practice to manually rewrite the headers of my dicom files or comment out this assertion in the heudiconv? Feels like there should be a third way which just makes a warning about this rather than stopping all processing? Does studyUID need to be consistent for later processing steps or is this just a check in case two sets of files were somehow mixed up? |
@gureckis there was work on disabling the grouping in #200, and I'm all for revisiting it. This check was added to ensure conflicting studies would not be mixed together. However, as we see in #126 (comment) - we would want to give users the power to allow it under certain conditions. |
@nicholst - just sent an email about a grouping where the series id is the same, but should be split into separate files. so another related problem that may require custom grouping. |
Hi, I'm working on the dataset that @nicholst emailed about and wanted to provide some more details. We have some datasets where there are 2 scans (T2 and PD) with the same series UID that need to be kept as two separate scans within a session. When I run Heudiconv, it groups these 2 scans together and outputs them as 2 runs of the same scan instead of 2 separate scans. I believe that Heudiconv is grouping these together because of the shared series UID and I cannot find a way to separate them. Grouping by accession ID is not an option as this field is either empty or the same across all scans within a session. Custom grouping would be ideal for resolving this as there are a number of fields in the dicom headers that would allow for the scans to be grouped as needed. Is there currently any functionality to do this? Also, in the dicominfo tsv file, I know that some of the fields are distinct for the 2 scans (e.g. dcm_dir_name) and I have tried building a heuristic file using this field, but this did not work because I assume that the grouping has already been performed by this stage. Let me know if you need any more details. Any advice would be greatly appreciated! |
I wanted to give some more information on the problem with PD+T2 image pairs. The distressing thing is that we can't stop it from creating invalid BIDS files, because it is appending an integer after our modality label. In particular, our two images are coming out as:
which of course isn't valid. Our heuristic file is below; you can see that we use the string Any suggests will be gratefully appreciated! (w/ @DanielDelbarre) import os
def create_key(template, outtype=('nii.gz',), annotation_classes=None):
if template is None or not template:
raise ValueError('Template must be a valid format string')
return template, outtype, annotation_classes
def infotodict(seqinfo):
"""Heuristic evaluator for determining which runs belong where
allowed template fields - follow python string module:
item: index within category
subject: participant id
seqitem: run number during scanning
subindex: sub index within group
"""
PDT2 = create_key('sub-{subject}/{session}/anat/sub-{subject}_{session}_PDT2_')
T1w = create_key('sub-{subject}/{session}/anat/sub-{subject}_{session}_T1w')
t1ce = create_key('sub-{subject}/{session}/anat/sub-{subject}_{session}_ce-Gd_T1w')
T1w_3d = create_key('sub-{subject}/{session}/anat/sub-{subject}_{session}_acq-3d_T1w')
T2w = create_key('sub-{subject}/{session}/anat/sub-{subject}_{session}_T2w')
PD = create_key('sub-{subject}/{session}/anat/sub-{subject}_{session}_PD')
info = {PDT2: [], T1w: [], t1ce: [], T1w_3d: [], T2w: [], PD: []}
for s in seqinfo:
if ((s.dcm_dir_name == 't2') and (s.dim3 > 60)) or ((s.dcm_dir_name == 'pd') and (s.dim3 > 60)):
info[PDT2] = [s.series_id]
if (s.dcm_dir_name == 't1n'):
info[T1w] = [s.series_id]
if (s.dcm_dir_name == 't1ce'):
info[t1ce] = [s.series_id]
if (s.dcm_dir_name == '3d_t1n'):
info[T1w_3d] = [s.series_id]
if (s.dcm_dir_name == 't2') and (s.dim3 < 60):
info[T2w] = [s.series_id]
if (s.dcm_dir_name == 'pd') and (s.dim3 < 60):
info[PD] = [s.series_id]
return info |
FTR: Woohoo -- I have managed to find the sample phantom PD+T2w acquisition I have done for another issue (multi-echo) troubleshooting: http://datasets.datalad.org/?dir=/dicoms/dartmouth-phantoms/bids_test6-PD+T2w/ So just confirming the undesired behavior:
|
@DanielDelbarre @nicholst -- if you take your dicoms and convert using dcm2niix manually with
|
Thank you for replicating this problem @yarikoptic! @DanielDelbarre will have to confirm Monday, but I'm pretty sure that that is what we find as well -- .json identical except for |
I believe adding heudiconv/heudiconv/convert.py Line 575 in 47efa78
will solve the problem (though hard-coding all the valid multi-echo acquisitions shouldn't be our long term solution). I'm hoping to change this behavior in a future release by separating the |
I wonder if in the long(er) run, e.g. in the context of reproin heuristic, we should allow for having multiple modalities being listed joined by |
Someone with more MR physics knowledge than me should weigh in on this. As far as I know, we're talking about the very special case of one excitation generating multiple images (via multiple echos). The only uses cases I know of are PD+T2 and multi-echo sequences used to support parametric mapping (or other corrections that generate maps after some modelling/transformation process... this is in contrast to PD+T2 which is just two raw images with arbitrary units). Hence, unless we see another use case I'd be against adding a generic |
Well, I also had only PD+T2w combination. If I get it right, such "syntax handling" is largely heudiconv (and then reproin) specific, not BIDS per se, to just be able to accomplish the mission in a generic way (as opposed to custom to be hardcoded Asked our physics guy, he mentioned GE scanners being able to produce multiple contrasts from a single acquisition... Google lead e.g. to this ad... |
Dear heudiconv developers,
thank you for coming up with this awesome project. I am trying to use heudiconv now to convert massive datasets, often with several thousands of subjects. As you know, such datasets are often messy. Running a pass through the data with heudiconv often takes several hours. Unfortunately, heudiconv seems to crash on almost every inconsistency it encounters. For instance:
This error popped out after waiting for 3 hours and processing of 17 subjects (the actual processing started after 3 hours - data for ~1500 subjects). I think the default behavior in this setting should be dumping the information about failed subject to a text file and continuation for other subjects.
The other problem that I encountered are DICOM folder templates. heudiconv crashes when it encounters a folder instead of a dicom. I think in this situation it should simply continue the exectution.
Unfortunately, I do not have time to get deeply into the logic of the above mentioned assertion, nor the exact mechanisms of scanning files through dicoms. I would be very greatful if you could fix these features. This software is extremely useful!
The text was updated successfully, but these errors were encountered: