Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle outputs from classical peak integration pipelines #32

Closed
Adafede opened this issue Oct 21, 2024 · 1 comment · Fixed by #36
Closed

Handle outputs from classical peak integration pipelines #32

Adafede opened this issue Oct 21, 2024 · 1 comment · Fixed by #36

Comments

@Adafede
Copy link

Adafede commented Oct 21, 2024

In the frame of my review for openjournals/joss-reviews#7313.

This is more of a discussion rathen than a hard requirement...

I work with LC-MS data processing softwares on a daily basis, and to really get your repository out of its actual "niche", I believe that handling directly some of the classical outputs from other softwares would help a lot!

While the ".xlsx" way to do things might help some users, if others really want to integrate your pipeline into larger ones, they will need some convenience scripts to generate these from their usually used softwares, do you think handling some examples might be doable?

@Y0dler
Copy link
Collaborator

Y0dler commented Oct 24, 2024

That is a very legit point.

Since the vendors – at least as far as I am aware – all have their own proprietary, closed data formats, it is unfortunately impossible to simply start from the raw data as it comes out of the measurement device which would obviously be the ideal case. (For example, I worked with a Sciex device and they have their *.wiff and *.wiff.scan formats which fall into the binary/closed category.) So, instead one has to at the very least take a detour via Proteowizard and convert the raw data to open formats like *.mzML. With the current implementation in PeakPerformance, one would then have to parse the converted files (this should be possible with other packages like pyOpenMS or pymzML) and save the contents as *.npy files from which the pre-manufactured data pipeline shown in example notebooks 1 and 3 starts.

That is admittedly a very roundabout way to go about things^^ It would definitely be advantageous to provide a function that starts from *.mzML files. In case of QqTOF analyses, this would still require users to provide information on which experiment (meaning e.g. one product ion scan) pertains to which substance and which TOF ranges to extract from it etc. In case of QqQs in multi reaction monitoring mode, the mass transitions have to be specified before the LC-MS/MS analysis, hence that info would already be contained in the raw data file and would merely have to be parsed. So, it's not a trivial function to implement but a worthwhile one, nonetheless.

The reason that this does not exist (or at least not yet) is simply that we had an orthogonal development by someone else at our institute which provided us with a convenient access to the peak data (I am not sure how much I am allowed to say about this, though, sorry about that). This is also how we start the parallelized PeakPerformance pipeline on the computation cluster which I believe I mentioned in the JOSS review thread. Therefore, this connection to *.mzML was not a high priority at the time. And that is also why we don't have a straight-forward example from an original raw data file to PeakPerformance at hand because we use a different route which is unfortunately specific to the infrastructure at our institute...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants