Questions about Comet hits and features #72

wfondrie · 2022-10-10T17:28:30Z

wfondrie
Oct 10, 2022
Maintainer

A question from a Twitter DM, with minor edits:

I specified reporting 12 matches per scan and the PIN file has 12 lines for each scan (Comet default is 5 matches). [In olden times, the first match did not have a deltaCN value. You had to go to the second match to get the deltaCN. I compute an alternative deltaCN from the top hit to the average of xcorrs for hits 4 to 12 for my pipeline discriminant function. Jimmy has since changed deltaCN to be the difference between dissimilar sequences, so I can probably ditch my alternative deltaCN.]

Does mokapot ignore everything except for the “top” hit? If so, how do you define the top hit? There can be matches that are different peptide sequences but have the same xcorr values (top hit ties to the precision Jimmy outputs [4 decimal places]). How many output lines would you usually set in the params file? Just one?

Do you know what units the dM column values are in? The experimental masses and calculated masses seem to be MH+ values in daltons. The differences in those values does not equal the numbers in the dM column. The numbers in the dM column are much smaller. The sign seems to agree (the delta is exp-calc). Does mokapot use delta masses in daltons or in PPM (or does it not matter)?

The charge states look like a boolean grid with columns from 1 to 6. Are all 6 needed? I usually just accept 2+, 3+, and 4+ peptides on Orbi with Comet. Would 3 columns work or are all 6 needed? One last question. When doing a PPM delta mass, what mass is usually used in the denominator (the experimental mass, the theoretical mass, or an average of the two values)? Apologies for so many questions. Thanks.

Answered by wfondrie

Oct 10, 2022

Does mokapot ignore everything except for the “top” hit?

Mokapot uses all of the hits during the semi-supervised model training, but only the top-hit is retained after target-decoy competition (TDC). This is necessary to maintain the theoretical guarantees that it can provide. Notably, the alternative "mix-max" procedure can allow for multiple PSMs per spectrum, but it is not implemented in mokapot; however, it is available with Percolator.

If so, how do you define the top hit?

The top-hit for each spectrum is the one that has the highest score from the model that mokapot has learned. Ties, although extremely rare in this case, are broken randomly.

How many output lines would you usu…

View full answer

wfondrie · 2022-10-10T17:49:34Z

wfondrie
Oct 10, 2022
Maintainer Author

Does mokapot ignore everything except for the “top” hit?

Mokapot uses all of the hits during the semi-supervised model training, but only the top-hit is retained after target-decoy competition (TDC). This is necessary to maintain the theoretical guarantees that it can provide. Notably, the alternative "mix-max" procedure can allow for multiple PSMs per spectrum, but it is not implemented in mokapot; however, it is available with Percolator.

If so, how do you define the top hit?

The top-hit for each spectrum is the one that has the highest score from the model that mokapot has learned. Ties, although extremely rare in this case, are broken randomly.

How many output lines would you usually set in the params file? Just one?

Because mokapot does use all of the PSMs in training, it can sometimes be useful to provide more than one hit per spectrum. I typically use the top 5.

Do you know what units the dM column values are in?

I'm not entirely sure for Comet, since they are calculated internally. If mokapot is used to read the PepXML from Comet or another search engine, a mass_diff feature is created that is the difference between the precursor_neutral_mass and calc_neutral_pep_mass fields provided by the PepXML.

Does mokapot use delta masses in daltons or in PPM (or does it not matter)?

For mokapot it doesn't matter. Each is considered a feature that mokapot can learn from, so in some cases, including both may be helpful.

The charge states look like a boolean grid with columns from 1 to 6. Are all 6 needed? I usually just accept 2+, 3+, and 4+ peptides on Orbi with Comet. Would 3 columns work or are all 6 needed?

I would filter our PSMs that do not match the charge states you would accept, then drop their corresponding columns prior to analyzing them with mokapot. If we fileter PSMs after computing FDR, it invalidates the FDR estimate.

When doing a PPM delta mass, what mass is usually used in the denominator (the experimental mass, the theoretical mass, or an average of the two values)?

Mokapot doesn't calculate PPM delta mass (it is provided by search engines). That being said, I normally calculate it using the theoretical mass in the denominator. My reason is that we're essentially saying, "if we assume spectrum X was generated by peptide Y, then the mass error would be..." I imagine folks calculate it in many different ways.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about Comet hits and features #72

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Questions about Comet hits and features #72

wfondrie Oct 10, 2022 Maintainer

Replies: 1 comment

wfondrie Oct 10, 2022 Maintainer Author

wfondrie
Oct 10, 2022
Maintainer

wfondrie
Oct 10, 2022
Maintainer Author