Gene level parsimony option in protein association dialog #2764

chambm · 2023-10-31T17:57:27Z

added gene level parsimony option in protein association dialog

bumped schema to 23.11 for gene level parsimony
updated audit logs
fixed Skyline build system to copy vendor DLLs to both Debug and Release directories so vendor reader tests don't fail in Debug config
removed MinPeptidesPerProtein test from TestHugeAssociateProteins because it's now hidden in the Refine -> Associate Proteins dialog

nickshulman · 2023-10-31T18:02:32Z

pwiz_tools/Skyline/CommandArgs.cs

@@ -1276,6 +1276,8 @@ public bool ImportingSearch

        public static readonly Argument ARG_AP_GROUP_PROTEINS = new Argument(@"associate-proteins-group-proteins",
            (c, p) => c.AssociateProteinsGroupProteins = p.IsNameOnly || bool.Parse(p.Value));
+        public static readonly Argument ARG_AP_GENE_LEVEL = new Argument(@"associate-proteins-gene-level-parsimony",


Instead of having a name-only argument, do you think we should have an argument called "associate-proteins-parsimony-level" where the valid values are "protein" (default) or "gene"?

I considered that in concert with using a combobox in the Associate Proteins dialog rather than 2 checkboxes. It was easier to implement as just an additional checkbox, and I didn't see a significant reason to prefer a combobox, but if you see one then I'm happy to consider it.

pwiz_tools/Skyline/CommandArgs.cs

chambm · 2023-11-01T16:48:28Z

pwiz_tools/Skyline/Model/Proteome/ProteinAssociation.cs

+
+                    return new FastaRecord(longestProtein.RecordIndex, 0,
+                        new FastaSequence(longestProtein.Sequence.Name, longestProtein.Sequence.Description, null,
+                            string.Join("", proteinToPeptides.Keys.Select(o => o.Sequence.Sequence))), longestProtein.Metadata);


@brendanx67 This is what I came up with to pick/make a protein sequence that contains all the member peptides. It will use the longest sequence of the group if it contains all the peptides, and if it doesn't then it'll concatenate all the protein sequences together. Some alternatives might be: concatenate all the peptides instead of the proteins, or only concatenate enough proteins to contain all the peptides (i.e. a parsimonious approach to concatenating).

Or throw this hacky workaround away and go back to the drawing board, making gene group a new FastaSequenceGroup that allows its peptides to map to only some of the parent's FastaSequenceList. I honestly don't know how much work that would be though.

* bumped schema to 23.12 for gene level parsimony * fixed Skyline build system to copy vendor DLLs to both Debug and Release directories so vendor reader tests don't fail in Debug config * removed MinPeptidesPerProtein test from TestHugeAssociateProteins because it's now hidden in the Refine -> Associate Proteins dialog

chambm · 2024-01-03T20:57:15Z

@brendanx67 Good to go for this PR?

…ptions * greatly improved speed by using Nick's ReferenceValue for PeptideDocNodes * fixed issues with cancellation in some parsimony calculations

…ocNode" because reference equality was not actually being used. Check for cancellation in loop which digests every protein sequence.

…k/20231025_gene_level_parsimony

nickshulman

Looks very nice.

Here are some comments.
The only comments that I actually care about are the two where "Distinct" is being called on a list of PeptideDocNode objects.
Feel free to ignore all my other comments.

pwiz_tools/Skyline/EditUI/AssociateProteinsDlg.cs

pwiz_tools/Skyline/Model/Proteome/ProteinAssociation.cs

nickshulman · 2024-01-16T21:05:05Z

pwiz_tools/Skyline/Model/Proteome/ProteinAssociation.cs

@@ -561,11 +610,65 @@ public void ApplyParsimonyOptions(bool groupProteins, bool findMinimalProteinLis
            _finalResults.FinalSharedPeptideCount = sharedPeptidesRemaining.Values.Sum();
        }

-        private Dictionary<IProteinRecord, PeptideAssociationGroup> CalculateProteinGroups(MappingResultsInternal results, ILongWaitBroker broker)
+        public class GeneLevelEqualityComparer : EqualityComparer<IProteinRecord>


Instead of writing an EqualityComparer, I probably would have written a new class called "GeneOrProteinSequence" which I used as the Key in the dictionary.
The reason is that it's easier to understand what is going on if the dictionary has a completely different key type, compared to if it has a special EqualityComparer.

But, this way is fine, too.

After I wrote this, I realized I could have just used the gene name (or protein name if gene isn't set) as the key. Does that seem sensible to you as well?

pwiz_tools/Skyline/Model/Proteome/ProteinAssociation.cs

nickshulman · 2024-01-16T21:21:34Z

pwiz_tools/Skyline/Model/Proteome/ProteinAssociation.cs

+                    if (proteinToPeptides.Count == 1)
+                        return proteinToPeptides.Keys.First();
+
+                    var longestProtein = proteinToPeptides.OrderByDescending(kvp2 => kvp2.Key.Sequence.Sequence.Length).First().Key;


This function is checking whether the longest protein sequence contains all of the peptide sequences.
It's true that the longest protein sequence is the one that is most likely to contain all of the peptide sequences, but it is theoretically possible that a shorter protein sequence in the group might happen to contain all the peptide sequences.
(This probably doesn't matter).

chambm requested review from nickshulman and brendanx67 October 31, 2023 17:57

nickshulman reviewed Oct 31, 2023

View reviewed changes

chambm commented Nov 1, 2023

View reviewed changes

chambm mentioned this pull request Nov 1, 2023

Skyline: .wiff tests failing - is this expected? #2761

Closed

chambm linked an issue Nov 1, 2023 that may be closed by this pull request

Skyline: .wiff tests failing - is this expected? #2761

Closed

chambm added 2 commits January 3, 2024 12:57

* resolved resx split conflicts

e725032

chambm force-pushed the Skyline/work/20231025_gene_level_parsimony branch from 712319c to e725032 Compare January 3, 2024 18:02

chambm and others added 3 commits January 9, 2024 16:55

* fixed duplicate calls to parsimony calculation when changing some o…

0d71585

…ptions * greatly improved speed by using Nick's ReferenceValue for PeptideDocNodes * fixed issues with cancellation in some parsimony calculations

Change a couple of "ReferenceValue<PeptideDocNode>" back to "PeptideD…

d50fc21

…ocNode" because reference equality was not actually being used. Check for cancellation in loop which digests every protein sequence.

Merge remote-tracking branch 'remotes/origin/master' into Skyline/wor…

4fcb148

…k/20231025_gene_level_parsimony

nickshulman reviewed Jan 16, 2024

View reviewed changes

chambm added 2 commits January 17, 2024 16:51

* incorporated Nick's review recommendations

1bd5be0

Merge branch 'master' into Skyline/work/20231025_gene_level_parsimony

ad5d9ab

chambm merged commit 072abd4 into master Jan 18, 2024
11 checks passed

chambm deleted the Skyline/work/20231025_gene_level_parsimony branch January 18, 2024 20:54

nickshulman mentioned this pull request Jan 19, 2024

Speed up "BiblioSpecLiteLibrary.FindSource" by caching results in a Dictionary #2822

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gene level parsimony option in protein association dialog #2764

Gene level parsimony option in protein association dialog #2764

chambm commented Oct 31, 2023

nickshulman Oct 31, 2023

chambm Oct 31, 2023

chambm Nov 1, 2023

chambm commented Jan 3, 2024

nickshulman left a comment

nickshulman Jan 16, 2024

chambm Jan 17, 2024 •

edited

Loading

nickshulman Jan 16, 2024

Gene level parsimony option in protein association dialog #2764

Gene level parsimony option in protein association dialog #2764

Conversation

chambm commented Oct 31, 2023

nickshulman Oct 31, 2023

Choose a reason for hiding this comment

chambm Oct 31, 2023

Choose a reason for hiding this comment

chambm Nov 1, 2023

Choose a reason for hiding this comment

chambm commented Jan 3, 2024

nickshulman left a comment

Choose a reason for hiding this comment

nickshulman Jan 16, 2024

Choose a reason for hiding this comment

chambm Jan 17, 2024 • edited Loading

Choose a reason for hiding this comment

nickshulman Jan 16, 2024

Choose a reason for hiding this comment

chambm Jan 17, 2024 •

edited

Loading