You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the question or problem
Is there anything known about sensitivity issues for HCD / Asp-N workflows?
Details
Dear MSGF+ developers,
I am currently analysing a batch of samples, either digested with lys-c or asp-n. All samples were measured on a QExactive and are searched against a six-frame genome-translation derived database containing peptides generated by the corresponding enzyme. As the sample files are searched with Comet, MS-Fragger and MSGF+, the post-processing involves a peptideProphet and iProphet pipeline and thus the conversion of mzident to pepXML (using CLevel=2).
However, while for lys-c there is consistently between 10 and 15% more identified spectra at 0.1% FDR for MSGF+ compared to comet (MS-Fragger searches did not yet run but the range should be the same), there is an extreme drop in MSGF+'s sensitivity when it comes to the asp-n digested samples: ~3000 vs. 700 identified spectra; 15000 vs. 4200; 12000 vs. 2600. The samples are different fractions, not replicates, so the difference between them is expected.
The only differences in the parameter files between asp-n and lys-c searches are the fasta file and the enzyme selection. I did not choose no-cleavage in order to keep the number of missed cleavage sites.
In the 2014 publication I saw that the HCD model for a standard workflow was trained for tryptic peptides using the Freeze-2011 dataset (blue line in figure 1), while the non-tryptic peptides were trained directly on CID and ETD data (red lines in figure 1) only. Could this be the reason?
This is an interesting observation, and I agree with your theory that the training data is likely the source of the differences in identification rates. MS-GF+ is not under active development, so you'll just have to work with the results that it produces for your Asp-N searches. This just goes to show that:
a) MS/MS peptide identification is not easy (thus a plethora of identification tool options)
b) Different MS/MS identification tools have their strengths and weaknesses
MS-GF+ includes two parameter files for AspN, both trained from iontrap data ("Low-res"). A quick fix is is to use "InstrumentID=0" to force MS-GF+ to use the AspN param set. If you have enough spectra (e.g. >50K), a better solution is to run a search with "InstrumentID=0" and create a new param set using https://msgfplus.github.io/msgfplus/ScoringParamGen.html.
Thank you both for your answers. I will try to generate a new param set. Doeos it matter for this purpose whether I use the very same files, that I want to analyse? Or should I look for some unrelated projects, e.g. from PRIDE?
Describe the question or problem
Is there anything known about sensitivity issues for HCD / Asp-N workflows?
Details
Dear MSGF+ developers,
I am currently analysing a batch of samples, either digested with lys-c or asp-n. All samples were measured on a QExactive and are searched against a six-frame genome-translation derived database containing peptides generated by the corresponding enzyme. As the sample files are searched with Comet, MS-Fragger and MSGF+, the post-processing involves a peptideProphet and iProphet pipeline and thus the conversion of mzident to pepXML (using CLevel=2).
However, while for lys-c there is consistently between 10 and 15% more identified spectra at 0.1% FDR for MSGF+ compared to comet (MS-Fragger searches did not yet run but the range should be the same), there is an extreme drop in MSGF+'s sensitivity when it comes to the asp-n digested samples: ~3000 vs. 700 identified spectra; 15000 vs. 4200; 12000 vs. 2600. The samples are different fractions, not replicates, so the difference between them is expected.
The only differences in the parameter files between asp-n and lys-c searches are the fasta file and the enzyme selection. I did not choose no-cleavage in order to keep the number of missed cleavage sites.
In the 2014 publication I saw that the HCD model for a standard workflow was trained for tryptic peptides using the Freeze-2011 dataset (blue line in figure 1), while the non-tryptic peptides were trained directly on CID and ETD data (red lines in figure 1) only. Could this be the reason?
Best regards,
Juergen
Useful extras
MSGFPlus_Params_QE1_AspN.txt
MSGFPlus_Params_QE1_LysC.txt
The text was updated successfully, but these errors were encountered: