Handle outputs from classical peak integration pipelines #32

Adafede · 2024-10-21T15:05:42Z

In the frame of my review for openjournals/joss-reviews#7313.

This is more of a discussion rathen than a hard requirement...

I work with LC-MS data processing softwares on a daily basis, and to really get your repository out of its actual "niche", I believe that handling directly some of the classical outputs from other softwares would help a lot!

While the ".xlsx" way to do things might help some users, if others really want to integrate your pipeline into larger ones, they will need some convenience scripts to generate these from their usually used softwares, do you think handling some examples might be doable?

Y0dler · 2024-10-24T17:43:46Z

That is a very legit point.

Since the vendors – at least as far as I am aware – all have their own proprietary, closed data formats, it is unfortunately impossible to simply start from the raw data as it comes out of the measurement device which would obviously be the ideal case. (For example, I worked with a Sciex device and they have their *.wiff and *.wiff.scan formats which fall into the binary/closed category.) So, instead one has to at the very least take a detour via Proteowizard and convert the raw data to open formats like *.mzML. With the current implementation in PeakPerformance, one would then have to parse the converted files (this should be possible with other packages like pyOpenMS or pymzML) and save the contents as *.npy files from which the pre-manufactured data pipeline shown in example notebooks 1 and 3 starts.

That is admittedly a very roundabout way to go about things^^ It would definitely be advantageous to provide a function that starts from *.mzML files. In case of QqTOF analyses, this would still require users to provide information on which experiment (meaning e.g. one product ion scan) pertains to which substance and which TOF ranges to extract from it etc. In case of QqQs in multi reaction monitoring mode, the mass transitions have to be specified before the LC-MS/MS analysis, hence that info would already be contained in the raw data file and would merely have to be parsed. So, it's not a trivial function to implement but a worthwhile one, nonetheless.

The reason that this does not exist (or at least not yet) is simply that we had an orthogonal development by someone else at our institute which provided us with a convenient access to the peak data (I am not sure how much I am allowed to say about this, though, sorry about that). This is also how we start the parallelized PeakPerformance pipeline on the computation cluster which I believe I mentioned in the JOSS review thread. Therefore, this connection to *.mzML was not a high priority at the time. And that is also why we don't have a straight-forward example from an original raw data file to PeakPerformance at hand because we use a different route which is unfortunately specific to the infrastructure at our institute...

Adafede mentioned this issue Oct 21, 2024

[REVIEW]: PeakPerformance - A tool for Bayesian inference-based fitting of LC-MS/MS peaks openjournals/joss-reviews#7313

Closed

Y0dler mentioned this issue Nov 6, 2024

Add example starting from .mzML file and vendor file format #36

Merged

Y0dler closed this as completed in #36 Nov 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle outputs from classical peak integration pipelines #32

Handle outputs from classical peak integration pipelines #32

Adafede commented Oct 21, 2024

Y0dler commented Oct 24, 2024

Handle outputs from classical peak integration pipelines #32

Handle outputs from classical peak integration pipelines #32

Comments

Adafede commented Oct 21, 2024

Y0dler commented Oct 24, 2024