Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve pVACvector graph building algorithm #1163

Open
wants to merge 6 commits into
base: staging
Choose a base branch
from

Conversation

susannasiebert
Copy link
Contributor

Previously pVACvector would clip “problematic” peptides (i.e. peptides without incoming or outgoing good junctions in the graph where a good junction is a connection to other peptides without novel junctional neoantigens). It would then attempt to build a whole new graph with the updated set of peptides. This would result in not all possible combinations of clipped and non-clipped peptides to be tested. Additionally, the building a whole new graph after clipping is non-ideal since valid junctions that were previously discovered are ignored.

This PR updates the algorithm to work roughly as follows (biggest updates bolded):

  • The graph is built iteratively and made up of combinations of clipped and non-clipped peptides, with or without spacers
  • After each attempt (i.e. initial attempt, adding spacers, clipping), all good junctions (no novel junctional neoantigens) are added to the graph and the annealing is run
  • If the annealing fails to find a path, all missing junctions in the graph are identified, i.e. all peptide combinations that created novel junctional neoantigens during all previous attempts. These are then reprocessed by adding a spacer and/or clipping
  • When attempting clipping, all variations of clipping are tried. I.e. on the first iteration of clipping (clipping one amino acid) we test clipping just the left peptide, just the right peptide, and clipping both. When clipping two amino acids, we test clipping 0-2, 2-0, 1-2, 2-1, and 2-2, and so on.

As a result, pVACvector should have a higher likelihood of finding a result sooner or finding a result at all.

Closes #1087

@susannasiebert susannasiebert linked an issue Dec 13, 2024 that may be closed by this pull request
@susannasiebert
Copy link
Contributor Author

TODO:

  • Support percentile cutoff

Copy link
Contributor

@chrisamiller chrisamiller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I like the approach. Searching for all problematic junctions avoids getting caught in local minima, where we've got a bunch of good junctions, but get stuck on a handful that just don't like to be connect, for whatever reason. I suspect that this will do more work than is necessary in some cases (if there is only one problematic junction, and a simple spacer or clip fixes it, you wouldn't need to check the rest of the graph). That said, trying to fix them one by one and backtrack if you hit a dead end means the traversal algorithm gets complex quickly. As long as this runs in a reasonable amount of time, I think it's the right approach.

  • When clipping, are we somehow ensuring that we don't clip out key parts of the core epitope? (e.g. if the mutation is at amino acid 2 of the gene, we probably can't be clipping from the left). It's a rare case, and maybe just a todo item - look into handling that in the future. I'm reasonably sure it would require some additional information being passed into pVACvector that isn't there now.

  • I also wonder if we should go back and run it on a few real cases to see whether this successfully improves the speed and number of successful vectors. I think it should!

Looks good

@susannasiebert
Copy link
Contributor Author

  1. It's difficult to determine which junction would've been the one to complete the graph without actually running it :) so I think this probably the optimal solution unless we can figure out an algorithm to determine the junctions or set of junctions that are the currently "blocking" the graph from being completed.
  2. We don't currently do this since we don't have that info in the input fasta but it shouldn't be too complicated to accomplish. I imagine that the pvacseq generate_protein fasta command - when run in combination with a aggregated tsv of accepted neoantigens - could mark their postion in the fasta and pVACvector could use that information to not clip into the epitope. But that would be a future feature.
  3. I ran this update on one past patient where we didn't previously find a solution it found a solution and also ran faster/needed less iterations of clipping. Happy to have it tested on additional cases but I wonder if we should have this block the next release (I'm really hoping to get that out by the end of the week).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix various pVACvector issues
2 participants