-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(feat) initial centroid implementation for tdf #156
base: master
Are you sure you want to change the base?
Conversation
@jspaezp This is amazing, I really appreciate it. It would be really nice to figure this out, many people are using Bruker platform. If we can contribute more RAW files, let me know. I have hundreds of them, from HeLa to, sg. species to community (SiHuMi_X) to stool files, so we can do proper tests. Please keep me in the loop, I'm happy to test at any time. |
I think we can start using some public data (https://www.ebi.ac.uk/pride/archive/projects/PXD028735 + https://pmc.ncbi.nlm.nih.gov/articles/PMC8967878/) For DDA my future reference ...
diaPASEF
I feel like MS2 quant is out of the scope of sage ... so I will not be implementing that anytime soon. |
The reason for it not being there is that it is a somewhat strange construct for TIMS data, because you throw away all IM information. While there might be use cases where it works just fine, I think centroiding without this information will produce relatively poor quantifications.
This indeed is probably incorrect. We should be able to parse this out of timsrust and propagate this correctly.
If this is a highly requested community feature, doesn't it make more sense to implement a "ms1_spectrum_reader" directly in TimsRust...? |
I mean ... yes but that is already what we are doing for the "DDA on DIA data" here, not 100% sure why that is dramatically different.
I would argue there is some demand ... not sure if you want to commit to a specific implementation of the centroiding in the crate. Having said that ... this draft PR is definitely a patch ... and a way to propagate the ims information from the detection tot he peptide idx. (PrecursorRange more accurately) Lines 218 to 252 in 888afad
which we would still need to centroid, since we would need something that returns spectra with peaks in the hundreds (sage retains ~150/250 usually ... depending on params) and definitely not the 200,000 that are common on an ms1 frame. |
FWIW, all MS1 peaks are retained - but yes, centroiding to some degree will probably be necessary if a single frame has 200k peaks |
The more you know ... I didn't realize the processing was different for MS1/MS2 |
What is this?
This basically implements a very simple centroiding strategy for tdf (bruker) data which should enable using LFQ on it.
Why is it needed?
Right now the spectrum reader for timsrust does not export MS1's ... @sander-willems-bruker might have a better idea as to why.
What is still missing
frame index
which means that we might have collisions in indices (which .... I dont think should be a problem ...).FYI
@treitpeter
LMK what you think! I will wait a bit to get feedback on API design+thoughts before I do a final "ready to review" PR.