You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To have truly valid precursor- and peptide-level FDR estimates for data independent acquisition (DIA) results, we need to add support for the peptide competition step described in section 2.4 of the DIAmeter paper.
Roughly, this involves pairing each target peptide with a decoy that generated it, such that they can compete against each other (as in, we retain only the one with the highest score). On the surface this appears easy enough---we could sort the sequences of all of the target and decoy hits then randomly pair those with matching compositions. However, this simplistic strategy will be overly conservative, because in practice we fail to observe many possible decoy peptides.
An alternative approach would be to require a FASTA (or other protein/peptide database) that explicitly defines all of the possible target and decoy peptides in the search space. We could then follow a similar approach as above, or extract the same amino acid indices as a target peptide in its protein context within the corresponding decoy. Unfortunately, either of these approaches could be problematic: In the former, we assume that we have some knowledge as to how decoys were generated (such as excluding terminal amino acids). In the latter, we assume that the user did not shuffle the full protein sequence, but rather peptides with the protein (note that shuffling protein sequences is statistically invalid, but still occurs).
In conclusion, we need to think about the most robust way to implement this.
The text was updated successfully, but these errors were encountered:
To have truly valid precursor- and peptide-level FDR estimates for data independent acquisition (DIA) results, we need to add support for the peptide competition step described in section 2.4 of the DIAmeter paper.
Roughly, this involves pairing each target peptide with a decoy that generated it, such that they can compete against each other (as in, we retain only the one with the highest score). On the surface this appears easy enough---we could sort the sequences of all of the target and decoy hits then randomly pair those with matching compositions. However, this simplistic strategy will be overly conservative, because in practice we fail to observe many possible decoy peptides.
An alternative approach would be to require a FASTA (or other protein/peptide database) that explicitly defines all of the possible target and decoy peptides in the search space. We could then follow a similar approach as above, or extract the same amino acid indices as a target peptide in its protein context within the corresponding decoy. Unfortunately, either of these approaches could be problematic: In the former, we assume that we have some knowledge as to how decoys were generated (such as excluding terminal amino acids). In the latter, we assume that the user did not shuffle the full protein sequence, but rather peptides with the protein (note that shuffling protein sequences is statistically invalid, but still occurs).
In conclusion, we need to think about the most robust way to implement this.
The text was updated successfully, but these errors were encountered: