Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected low sensitivity for asp-n digested samples #137

Open
JB91451 opened this issue May 9, 2022 · 4 comments
Open

Unexpected low sensitivity for asp-n digested samples #137

JB91451 opened this issue May 9, 2022 · 4 comments
Labels

Comments

@JB91451
Copy link

JB91451 commented May 9, 2022

Describe the question or problem
Is there anything known about sensitivity issues for HCD / Asp-N workflows?

Details
Dear MSGF+ developers,

I am currently analysing a batch of samples, either digested with lys-c or asp-n. All samples were measured on a QExactive and are searched against a six-frame genome-translation derived database containing peptides generated by the corresponding enzyme. As the sample files are searched with Comet, MS-Fragger and MSGF+, the post-processing involves a peptideProphet and iProphet pipeline and thus the conversion of mzident to pepXML (using CLevel=2).

However, while for lys-c there is consistently between 10 and 15% more identified spectra at 0.1% FDR for MSGF+ compared to comet (MS-Fragger searches did not yet run but the range should be the same), there is an extreme drop in MSGF+'s sensitivity when it comes to the asp-n digested samples: ~3000 vs. 700 identified spectra; 15000 vs. 4200; 12000 vs. 2600. The samples are different fractions, not replicates, so the difference between them is expected.

The only differences in the parameter files between asp-n and lys-c searches are the fasta file and the enzyme selection. I did not choose no-cleavage in order to keep the number of missed cleavage sites.

In the 2014 publication I saw that the HCD model for a standard workflow was trained for tryptic peptides using the Freeze-2011 dataset (blue line in figure 1), while the non-tryptic peptides were trained directly on CID and ETD data (red lines in figure 1) only. Could this be the reason?

Best regards,
Juergen

Useful extras

@alchemistmatt
Copy link
Collaborator

This is an interesting observation, and I agree with your theory that the training data is likely the source of the differences in identification rates. MS-GF+ is not under active development, so you'll just have to work with the results that it produces for your Asp-N searches. This just goes to show that:
a) MS/MS peptide identification is not easy (thus a plethora of identification tool options)
b) Different MS/MS identification tools have their strengths and weaknesses

@sangtaekim
Copy link
Collaborator

MS-GF+ includes two parameter files for AspN, both trained from iontrap data ("Low-res"). A quick fix is is to use "InstrumentID=0" to force MS-GF+ to use the AspN param set. If you have enough spectra (e.g. >50K), a better solution is to run a search with "InstrumentID=0" and create a new param set using https://msgfplus.github.io/msgfplus/ScoringParamGen.html.

@JB91451
Copy link
Author

JB91451 commented May 11, 2022

Thank you both for your answers. I will try to generate a new param set. Doeos it matter for this purpose whether I use the very same files, that I want to analyse? Or should I look for some unrelated projects, e.g. from PRIDE?

@sangtaekim
Copy link
Collaborator

@JB91451 It will be fine to use the same files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants