Skip to content

Commit

Permalink
Publication ready version (#23)
Browse files Browse the repository at this point in the history
* renamed scores to penalties, renamed --n_threads to --threads and some changes to the graph

* renamed overshadowing names

* renaming

* some further minor code reformatting

* updated docs

* updated output.md

* updated workflow.png

* updated readme

* changed how dinuc repeats and polyx are counted to make it more intuitive

* tsv output now contains a more intuitive primer name and primers are given in fasta per pool

* fasta output for sanger/qpcr mode
  • Loading branch information
jonas-fuchs authored Dec 1, 2023
1 parent 7523ffb commit 75c474f
Show file tree
Hide file tree
Showing 20 changed files with 320 additions and 272 deletions.
1 change: 0 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,6 @@ This program is currently being developed and in an alpha state. You are welcome
* [How it works](https://github.com/jonas-fuchs/varVAMP/blob/master/docs/how_varvamp_works.md)
* [FAQ](https://github.com/jonas-fuchs/varVAMP/blob/master/docs/FAQ.md)

<a href="https://www.buymeacoffee.com/jofox" target="_blank"><img src="https://www.buymeacoffee.com/assets/img/custom_images/orange_img.png" alt="Buy Me A Coffee" style="height: 41px !important;width: 174px !important;box-shadow: 0px 3px 2px 0px rgba(190, 190, 190, 0.5) !important;-webkit-box-shadow: 0px 3px 2px 0px rgba(190, 190, 190, 0.5) !important;" ></a>
---

**Important disclaimer:**
Expand Down
2 changes: 1 addition & 1 deletion docs/FAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ In your case varVAMP could not find suitable replacement primers in the TILED mo

4. **I have multiple hits after SANGER/QPCR mode. Which should I use?**

varVAMP sorts all amplicons and qpcr designs by score and always assigns the lowest number to the best one of non-overlapping amplicons/qpcr schemes. If you are not interested in a specific gene region, amplicon_0 or qpcr_scheme_0 are your best candidates!
varVAMP sorts all amplicons and qpcr designs by their penalty and always assigns the lowest number to the one with the lowest penalty of the non-overlapping amplicons/qpcr schemes. If you are not interested in a specific gene region, amplicon_0 or qpcr_scheme_0 are your best candidates!

5. **What is deltaG reported for the QPCR mode?**

Expand Down
26 changes: 13 additions & 13 deletions docs/how_varvamp_works.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,28 +23,28 @@ varVAMP searches for potential primer regions as defined by a user-defined numbe
varVAMP uses [`primer3-py`](https://pypi.org/project/primer3-py/) to search for potential primers. Some of the evaluation process, determining if primers match certain criteria, was adapted from [`primalscheme`](www.github.com/aresti/primalscheme). The primer search contains multiple steps:
1. Digest the primer regions into kmers with the min and max length of primers. This is performed on a consensus sequence that does not contain ambiguous characters but is just the majority consensus of the alignment. Therefore, primer parameters will be later calculated for the best fitting primer.
2. Evaluate if these kmers are potential primers independent of their orientation (temperature, GC, size, poly-x repeats and poly dinucleotide repeats) and dependent on their orientation (secondary structure, GC clamp, number of GCs in the last 5 bases of the 3' end and min 3' nucleotides without an ambiguous base). Filter for kmers that satisfy all constraints and calculate their penalties (explained in the last section).
3. Sanger and tiled mode: Find lowest scoring primer. varVAMP sorts the primers by their score and always takes the best scoring if middle third of the primer has not been covered by a better primer. This greatly reduces the complexity of the later amplicon search while only retaining the best scoring primer of a set of overlapping primers.
3. Sanger and tiled mode: Find primer with the lowest penalty. varVAMP sorts the primers by their penalty and always takes one with the lowest penalty if middle third of the primer has not been covered by a primer with a lower penalty. This greatly reduces the complexity of the later amplicon search while only retaining the best primer of a set of overlapping primers.

### Amplicon search

#### Amplicon-tiling
To search for the best scoring amplicon, varVAMP uses a graph based approach.
To search for the best amplicon, varVAMP uses a graph based approach.
1. Create all possible amplicons with the given length constraints and ensure that primer pairs are not forming dimers.
2. Create a graph containing all amplicons and their potential neighboring amplicons. To design a good scheme, the next primer has to lie within the second half of the current primer and satisfy the overlap constraint. The cost to go to a neighboring amplicon is determined by the amplicon score.
2. Create a graph containing all amplicons and their potential neighboring amplicons. To design a good scheme, the next primer has to lie within the second half of the current primer and satisfy the overlap constraint. The cost to go to a neighboring amplicon is determined by the amplicon penalty.
3. Use the [Dijkstra algorithm](https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm) to find the path with the lowest costs from a given start node.
4. Determine potential stop nodes as amplicons with the furthest stop in the alignment.
5. Determine shortest paths between the start and all potential stop nodes. Get the lowest scoring.
6. Repeat steps 3-5 for each start node until the best coverage is reached. Voila! We have the best scoring amplicon scheme!
7. Lastly, the best scoring scheme is evaluated for primer dimers in their respective pools. If a primer dimer pair is found, varVAMP evaluates for each primer their overlapping and previously not considered primers (primer search step 2) and again minimizes the score. The scheme and all primers are updated. If no alternative primers can be found, varVAMP issues a warning and reports the unsolvable primer dimers.
5. Determine shortest paths between the start and all potential stop nodes. Get the lowest cost.
6. Repeat steps 3-5 for each start node until the best coverage is reached. Voila! We have the best amplicon scheme with the lowest cumulative primer penalties in respect to the amplicon length!
7. Lastly, the best scheme is evaluated for primer dimers in their respective pools. If a primer dimer pair is found, varVAMP evaluates for each primer their overlapping and previously not considered primers (primer search step 2) and again minimizes the penalty. The scheme and all primers are updated. If no alternative primers can be found, varVAMP issues a warning and reports the unsolvable primer dimers.

#### Sanger sequencing
1. varVAMP sorts all amplicons by their score and takes the non-overlapping amplicon with the lowest score!
2. As varVAMP gives a size penalty to amplicons, varVAMP automatically finds amplicons with low primer scores close to your optimal length (if possible).
1. varVAMP sorts all amplicons by their penalties and takes the non-overlapping amplicon with the lowest penalty!
2. As varVAMP gives a size penalty to amplicons, varVAMP automatically finds amplicons with low primer penalties close to your optimal length (if possible).

#### primer BLAST module
1. varVAMP generates a fasta query and searches for possible hits with the same settings as [primer blast](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-13-134).
2. For each amplicon varVAMP searches for off-targets, defined as hits in the db that are the maximum amplicons length multiplied by `BLAST_SIZE_MULTI` apart from each other.
3. varVAMP appends a high penalty score to the amplicons score if these producing off-targets. This ensures that all other available amplicons are preferentially used.
3. varVAMP appends a high penalty to the amplicons if these produce off-targets. This ensures that all other available amplicons are preferentially used.
4. Reports if amplicons with off-targets are in the final scheme.

#### qPCR
Expand All @@ -62,24 +62,24 @@ To search for the best scoring amplicon, varVAMP uses a graph based approach.
```python
PRIMER_MAX_BASE_PENALTY
```
Each primer is scored for its deviation from the optimal size, GC content and temperature. Base penalty is the sum of these deviations. Primer base penalties higher than the max base penalty are excluded.
Each primer is penalized for its deviation from the optimal size, GC content and temperature. Base penalty is the sum of these deviations. Primer base penalties higher than the max base penalty are excluded.

```python
PRIMER_3_PENALTY
```
Each position in the primer is scored for mismatches in all sequences. If a 3' penalty is given the first in the tuple is multiplied with the frequency mismatch at the very 3' end. The next is multiplied with the -1 freq and so on. Increase penalty if you want to shift amplicons towards best 3' matching. Set the `PRIMER_3_PENALTY` to 0 if you do not care about 3' mismatches.
Each position in the primer is penalized for mismatches in all sequences. If a 3' penalty is given the first in the tuple is multiplied with the frequency mismatch at the very 3' end. The next is multiplied with the -1 freq and so on. Increase penalty if you want to shift amplicons towards best 3' matching. Set the `PRIMER_3_PENALTY` to 0 if you do not care about 3' mismatches.

```python3
PRIMER_PERMUTATION_PENALTY
```
The number of permutations of a primer is multiplied by the penalty. For example 24 permutations and a penalty of 0.1 will yield a penalty of 2.4. Set `PRIMER_PERMUTATION_PENALTY` to 0 if you do not care about the number of permutations.

All scores of a primer are summed up and yield a final score. The score for each amplicon is then the score of its LEFT + RIGHT primers multiplied by the fold increase of the amplicon length compared to the optional length. This insures that in the final scheme not only large amplicons are used.
All penalties of a primer are summed up and yield a final penalty. The penalty for each amplicon is then the penalty of its LEFT + RIGHT primers multiplied by the fold increase of the amplicon length compared to the optional length. This insures that in the final scheme not only large amplicons are used.

```python3
BLAST_PENALTY
```

If the `-db` argument is used, varVAMP will perform a BLAST search and evaluate off-targets against this database for each amplicon. If an off-target effect is predicted varVAMP will add this penalty to the amplicon score. This insures that this amplicon is only considered if no other amplicons are in this alignment region.
If the `-db` argument is used, varVAMP will perform a BLAST search and evaluate off-targets against this database for each amplicon. If an off-target effect is predicted varVAMP will add this penalty to the amplicon penalty. This insures that this amplicon is only considered if no other amplicons are in this alignment region.

#### [Previous: Wet lab protocol](./wet_lab_protocol.md)&emsp;&emsp;[Next: FAQ](./FAQ.md)
31 changes: 17 additions & 14 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,22 +3,25 @@
varVAMP produces multiple main output files:


| Mode | Output | Description |
| --- | --- | --- |
| ALL | ambiguous_consensus.fasta | The consensus sequence containing ambiguous nucleotides. |
| ALL | amplicon_plot.pdf | A nice overview for your final amplicon design. |
| ALL| amplicons.bed | A bed file showing the amplicon location compared to the consensus sequence. |
| ALL| per_base_mismatches.pdf | Barplot of the percent mismatches at each nucleotide position of the primer. |
| ALL | primers.bed | A bed file with the primer locations. Includes the primer score. The lower, the better. |
| ALL | varvamp_log.txt | Log file. |
| TILED | unsolvable_primer_dimers.tsv | Only produced if there are primer dimers without replacements. Tells which primers form dimers and at which temperature.
| TILED/SANGER | primer_to_amplicon_assignments.tabular | Simple tab separated file, which primers belong together. Useful for bioinformatic workflows that include primer trimming |
| TILED/SANGER | primer.tsv | A tab separated file with important parameters for the primers including the sequence with ambiguous nucleotides (already in the right strand) and the gc and temperature of the best fitting primer as well as for the mean for all permutations of the primer. |
| QPCR | qpcr_design.tsv | A tab separated file with important parameters for the qPCR amplicon including the deltaG. |
| QPCR | qpcr_primers.tsv | A tab separated file with important parameters for the primers and probes including the sequence with ambiguous nucleotides (already in the right strand) and the gc and temperature of the best fitting primer and probe as well as for the mean for all permutations. |
| Mode | Output | Description |
|--------------|----------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| ALL | ambiguous_consensus.fasta | The consensus sequence containing ambiguous nucleotides. |
| ALL | amplicon_plot.pdf | A nice overview for your final amplicon design. |
| ALL | amplicons.bed | A bed file showing the amplicon location compared to the consensus sequence. |
| ALL | per_base_mismatches.pdf | Barplot of the percent mismatches at each nucleotide position of the primer. |
| ALL | primers.bed | A bed file with the primer locations. Includes the primer penalty. The lower, the better. |
| ALL | varvamp_log.txt | Log file. |
| TILED | unsolvable_primer_dimers.tsv | Only produced if there are primer dimers without replacements. Tells which primers form dimers and at which temperature. |
| TILED | primers_pool_0/1.fasta | Primer sequences per pool in fasta format. |
| SANGER | primers.fasta | Primer sequences in fasta format. |
| TILED/SANGER | primer_to_amplicon_assignments.tabular | Simple tab separated file, which primers belong together. Useful for bioinformatic workflows that include primer trimming |
| TILED/SANGER | primer.tsv | A tab separated file with important parameters for the primers including the sequence with ambiguous nucleotides (already in the right strand) and the gc and temperature of the best fitting primer as well as for the mean for all permutations of the primer. |
| QPCR | qpcr_design.tsv | A tab separated file with important parameters for the qPCR amplicon including the deltaG. |
| QPCR | qpcr_primers.tsv | A tab separated file with important parameters for the primers and probes including the sequence with ambiguous nucleotides (already in the right strand) and the gc and temperature of the best fitting primer and probe as well as for the mean for all permutations. |
| QPCR | oligos.fasta | Oligo sequences in fasta format. |


It also produces some secondary output files [*data/*]:
It also produces some secondary output files in `data` :

| Mode | Output | Description |
| --- | --- | --- |
Expand Down
2 changes: 1 addition & 1 deletion docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -149,7 +149,7 @@ BLAST_SETTINGS = { # blast settings for query search
}
BLAST_MAX_DIFF = 0.8 # allowed % differences between primer and BLAST hit
BLAST_SIZE_MULTI = 2 # multiplier for the max_amp size of off targets (in relation to max amp size)
BLAST_PENALTY = 50 # amplicon score increase -> considered only if no other possibilities
BLAST_PENALTY = 50 # amplicon penalty increase -> considered only if no other possibilities
```
To apply these new settings just repeat the installation procedure in the varVAMP dir:
```shell
Expand Down
2 changes: 1 addition & 1 deletion docs/wet_lab_protocol.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ The QIAmp Viral RNA Kit was used to isolate RNA out of viral supernatant followi

• Elute in 60 µl Buffer AVE

Thanks to Mathias Schmmerer for providing HepE infected cell cultures.
Thanks to Mathias Schemmerer for providing HepE infected cell cultures.

## Amplicon Generation by SuperScript-IV One-Step-PCR
The RNA amplification was performed with the following approach and PCR protocol for every single amplicon.
Expand Down
Binary file modified docs/workflow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion varvamp/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
"""Tool to design amplicons for highly variable virusgenomes"""
_program = "varvamp"
__version__ = "0.9.5"
__version__ = "1.0"
Loading

0 comments on commit 75c474f

Please sign in to comment.