-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/bruker data #275
Feature/bruker data #275
Conversation
Sonatype Lift is retiringSonatype Lift will be retiring on Sep 12, 2023, with its analysis stopping on Aug 12, 2023. We understand that this news may come as a disappointment, and Sonatype is committed to helping you transition off it seamlessly. If you’d like to retain your data, please export your issues from the web console. |
@jspaezp this PR from nf-core templates nf-core#103 would be good to merged into your branch. While it creates some unnecessary work, it is better to integrate in an early stage than later to avoid the need to change more files when we go for release. |
that sounds good! the only issue here is that the current base docker image I am using for it does not have zip built-in. solutions: A: build and host a new image with zip, B add to the documentation the note saying that if This would be pretty much the dockerfile for reference.
EDIT: https://github.com/jspaezp/miniconda-alpine-zip/pkgs/container/miniconda-alpine-zip I created the image here and will be use it for now, we can figure out any alternatives later This is great, thanks a lot! |
* changed files to paths in final diann analysis * updated decompression to support zip * added exception when the decompressed file already matches the required pattern
I am happy to let you know that I think this is ready to have a final review/merge. Since the last update I:
I am testing it using the following data: Input sdrf:
Nextflow config
|
@@ -156,6 +156,9 @@ params { | |||
add_triqler_output = false | |||
quantify_decoys = false | |||
|
|||
// Bruker data | |||
convert_dotd = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah nice, so with this we could still force conversion to mzml if ever needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed that is how it would work! the current implementation uses tdf2mzml to generate the mzML from .d files and that gets passed to DIA-NN. I have not really tested that it gives good results (In my experience it doesnt ... ) but it would be a possibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting! Yes it is perfectly fine how it is, as a last resort/fallback.
Out of curiosity: According to your experience, what goes wrong when using converted mzmls? Is it just runtime/storage or also quality of the results?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well ... I have not tried in a while, but If my memory is not failing me it was all of the former to different degrees depending on how the mzml was generaged.
- If the mzml was generated with arguments/sotware/modes that collapsed the mobility dimension ("centroided over it"), results tend to be poor, I believe this happens becuase the noise that can be easily identified as noise by resolving on the mobility dimension gets over-represented, which leads to "virtually poor" scan quality.
- In the case of software that does not collapse the mobility dimension, the first thing is that files end up being absoluteley massive. Since each "scan" along the mobility dimension ends up being hundreds of scans in the mzml, the resulting file is the equivalent of having an instrument scanning at +1_000 Hz. Which makes any software reading it very slow and disk usage absolutely horrendous.
I have not tried in a while and we have been exploring different ways to have a good intermediate file format ... the search remains :P
logger = getLogger(__name__) | ||
|
||
SECOND_RESOLUTION = 5 | ||
MQC_YML = """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed?
What is the difference to the multiqc yml file in the repo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This adds as part of the qc information on the chromatogram and total ions that is extracted directly from the .d files in a distributed way.
This specific yml "file" is not used unless multiqc is run in a standalone way (and it lets me test it in a much faster way than having it in a sharded way as part of the pipeline)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I guess that's fine. Might just be awkward if the two versions get out-of-sync at some point. Not sure how likely that is.
I didn't check, but is this documented? Maybe add a comment about its "standalone usage" to that variable or the function that writes out its contents.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am adding a couple of comments and more information on the module docstring for the standalone usage (outside of the quantms pipeline).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on f1b0bf7
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jspaezp check some of my comments, solutions for which containers must be used to be able to deploy using conda/docker/singularity.
Let me know what do you think.
Co-authored-by: Yasset Perez-Riverol <[email protected]>
Co-authored-by: Yasset Perez-Riverol <[email protected]>
Hello @ypriverol, @jpfeuffer, @benpullman and @WangHong007 Thank you very much for the review and suggestions! I believe I have incorporated all the suggestions and notes. I am happy to take into account any other suggestions you might have and to work together in the future! Kindest wishes, |
TODO
PR checklist
nf-core lint
).nextflow run . -profile test,docker --outdir <OUTDIR>
). Note locally this does not run anything.docs/usage.md
is updated. Depends on Important! Template update for nf-core/tools v2.9 nf-core/quantms#103docs/output.md
is updated. Depends on Important! Template update for nf-core/tools v2.9 nf-core/quantms#103CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).