Improve pVACvector graph building algorithm #1163

susannasiebert · 2024-12-13T15:18:09Z

Previously pVACvector would clip “problematic” peptides (i.e. peptides without incoming or outgoing good junctions in the graph where a good junction is a connection to other peptides without novel junctional neoantigens). It would then attempt to build a whole new graph with the updated set of peptides. This would result in not all possible combinations of clipped and non-clipped peptides to be tested. Additionally, the building a whole new graph after clipping is non-ideal since valid junctions that were previously discovered are ignored.

This PR updates the algorithm to work roughly as follows (biggest updates bolded):

The graph is built iteratively and made up of combinations of clipped and non-clipped peptides, with or without spacers
After each attempt (i.e. initial attempt, adding spacers, clipping), all good junctions (no novel junctional neoantigens) are added to the graph and the annealing is run
If the annealing fails to find a path, all missing junctions in the graph are identified, i.e. all peptide combinations that created novel junctional neoantigens during all previous attempts. These are then reprocessed by adding a spacer and/or clipping
When attempting clipping, all variations of clipping are tried. I.e. on the first iteration of clipping (clipping one amino acid) we test clipping just the left peptide, just the right peptide, and clipping both. When clipping two amino acids, we test clipping 0-2, 2-0, 1-2, 2-1, and 2-2, and so on.

As a result, pVACvector should have a higher likelihood of finding a result sooner or finding a result at all.

Closes #1087

…junctions

susannasiebert · 2024-12-13T20:44:36Z

TODO:

Support percentile cutoff

chrisamiller

I like the approach. Searching for all problematic junctions avoids getting caught in local minima, where we've got a bunch of good junctions, but get stuck on a handful that just don't like to be connect, for whatever reason. I suspect that this will do more work than is necessary in some cases (if there is only one problematic junction, and a simple spacer or clip fixes it, you wouldn't need to check the rest of the graph). That said, trying to fix them one by one and backtrack if you hit a dead end means the traversal algorithm gets complex quickly. As long as this runs in a reasonable amount of time, I think it's the right approach.
When clipping, are we somehow ensuring that we don't clip out key parts of the core epitope? (e.g. if the mutation is at amino acid 2 of the gene, we probably can't be clipping from the left). It's a rare case, and maybe just a todo item - look into handling that in the future. I'm reasonably sure it would require some additional information being passed into pVACvector that isn't there now.
I also wonder if we should go back and run it on a few real cases to see whether this successfully improves the speed and number of successful vectors. I think it should!

Looks good

susannasiebert · 2024-12-16T20:53:02Z

It's difficult to determine which junction would've been the one to complete the graph without actually running it :) so I think this probably the optimal solution unless we can figure out an algorithm to determine the junctions or set of junctions that are the currently "blocking" the graph from being completed.
We don't currently do this since we don't have that info in the input fasta but it shouldn't be too complicated to accomplish. I imagine that the pvacseq generate_protein fasta command - when run in combination with a aggregated tsv of accepted neoantigens - could mark their postion in the fasta and pVACvector could use that information to not clip into the epitope. But that would be a future feature.
I ran this update on one past patient where we didn't previously find a solution it found a solution and also ran faster/needed less iterations of clipping. Happy to have it tested on additional cases but I wonder if we should have this block the next release (I'm really hoping to get that out by the end of the week).

susannasiebert added 3 commits October 1, 2024 08:52

Refactor pVACvector to build graph iteratively and only test missing …

bd74a8b

…junctions

Merge remote-tracking branch 'origin/staging' into pvacvector

829db12

Update pVACvector test data

cd5a5f1

susannasiebert linked an issue Dec 13, 2024 that may be closed by this pull request

Fix various pVACvector issues #1087

Open

susannasiebert added 2 commits December 13, 2024 10:12

Update pvacvector fasta creation test

206751e

Update output file documentation

4e704ce

Add support for the percentile threshold cutoff to pVACvector

35eee1e

chrisamiller approved these changes Dec 16, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve pVACvector graph building algorithm #1163

Improve pVACvector graph building algorithm #1163

susannasiebert commented Dec 13, 2024

susannasiebert commented Dec 13, 2024

chrisamiller left a comment

susannasiebert commented Dec 16, 2024

Improve pVACvector graph building algorithm #1163

Are you sure you want to change the base?

Improve pVACvector graph building algorithm #1163

Conversation

susannasiebert commented Dec 13, 2024

susannasiebert commented Dec 13, 2024

chrisamiller left a comment

Choose a reason for hiding this comment

susannasiebert commented Dec 16, 2024