Unexpected low sensitivity for asp-n digested samples #137

JB91451 · 2022-05-09T13:26:48Z

Describe the question or problem
Is there anything known about sensitivity issues for HCD / Asp-N workflows?

Details
Dear MSGF+ developers,

I am currently analysing a batch of samples, either digested with lys-c or asp-n. All samples were measured on a QExactive and are searched against a six-frame genome-translation derived database containing peptides generated by the corresponding enzyme. As the sample files are searched with Comet, MS-Fragger and MSGF+, the post-processing involves a peptideProphet and iProphet pipeline and thus the conversion of mzident to pepXML (using CLevel=2).

However, while for lys-c there is consistently between 10 and 15% more identified spectra at 0.1% FDR for MSGF+ compared to comet (MS-Fragger searches did not yet run but the range should be the same), there is an extreme drop in MSGF+'s sensitivity when it comes to the asp-n digested samples: ~3000 vs. 700 identified spectra; 15000 vs. 4200; 12000 vs. 2600. The samples are different fractions, not replicates, so the difference between them is expected.

The only differences in the parameter files between asp-n and lys-c searches are the fasta file and the enzyme selection. I did not choose no-cleavage in order to keep the number of missed cleavage sites.

In the 2014 publication I saw that the HCD model for a standard workflow was trained for tryptic peptides using the Freeze-2011 dataset (blue line in figure 1), while the non-tryptic peptides were trained directly on CID and ETD data (red lines in figure 1) only. Could this be the reason?

Best regards,
Juergen

Useful extras

parameter files used to run MS-GF+
MSGFPlus_Params_QE1_AspN.txt
MSGFPlus_Params_QE1_LysC.txt

alchemistmatt · 2022-05-10T14:59:19Z

This is an interesting observation, and I agree with your theory that the training data is likely the source of the differences in identification rates. MS-GF+ is not under active development, so you'll just have to work with the results that it produces for your Asp-N searches. This just goes to show that:
a) MS/MS peptide identification is not easy (thus a plethora of identification tool options)
b) Different MS/MS identification tools have their strengths and weaknesses

sangtaekim · 2022-05-10T16:15:02Z

MS-GF+ includes two parameter files for AspN, both trained from iontrap data ("Low-res"). A quick fix is is to use "InstrumentID=0" to force MS-GF+ to use the AspN param set. If you have enough spectra (e.g. >50K), a better solution is to run a search with "InstrumentID=0" and create a new param set using https://msgfplus.github.io/msgfplus/ScoringParamGen.html.

JB91451 · 2022-05-11T16:06:27Z

Thank you both for your answers. I will try to generate a new param set. Doeos it matter for this purpose whether I use the very same files, that I want to analyse? Or should I look for some unrelated projects, e.g. from PRIDE?

sangtaekim · 2022-05-11T16:11:09Z

@JB91451 It will be fine to use the same files.

JB91451 added the question label May 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected low sensitivity for asp-n digested samples #137

Unexpected low sensitivity for asp-n digested samples #137

JB91451 commented May 9, 2022

alchemistmatt commented May 10, 2022

sangtaekim commented May 10, 2022

JB91451 commented May 11, 2022

sangtaekim commented May 11, 2022

Unexpected low sensitivity for asp-n digested samples #137

Unexpected low sensitivity for asp-n digested samples #137

Comments

JB91451 commented May 9, 2022

alchemistmatt commented May 10, 2022

sangtaekim commented May 10, 2022

JB91451 commented May 11, 2022

sangtaekim commented May 11, 2022