Skip to content

Commit

Permalink
changed sanger to single (#25)
Browse files Browse the repository at this point in the history
  • Loading branch information
jonas-fuchs authored Dec 14, 2023
1 parent 75c474f commit 3cc8e75
Show file tree
Hide file tree
Showing 14 changed files with 59 additions and 59 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,9 @@ For a lot of virus genera it is difficult to design pan-specific primers. varVAM

<img src="https://github.com/jonas-fuchs/varVAMP/blob/master/docs/varvamp.png" alt="varVAMP logo" />

**SANGER**: varVAMP searches for the very best primers and reports back non-overlapping amplicons which can be used for PCR-based screening approaches.
**SINGLE**: varVAMP searches for the very best primers and reports back non-overlapping amplicons which can be used for PCR-based screening approaches.

<img src="https://github.com/jonas-fuchs/varVAMP/blob/master/docs/sanger.png" alt="sanger" />
<img src="https://github.com/jonas-fuchs/varVAMP/blob/master/docs/single.png" alt="single" />

**TILED**: varVAMP uses a graph based approach to design overlapping amplicons that tile the entire viral genome. This designs amplicons that are suitable for Oxford Nanopore or Illumina based full-genome sequencing.

Expand Down
4 changes: 2 additions & 2 deletions docs/FAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ The most easy way is to set the number of ambiguous characters you can tolerate

In your case varVAMP could not find suitable replacement primers in the TILED mode. You can either rerun varVAMP and try different settings or you can perform a third pool that contains a amplicon that has one of the conflicting dimers. Notably, varVAMP also reports the dimer melting temperature. If it is still reasonable low, using a hot start polymerase might still lead to successful PCR amplification.

4. **I have multiple hits after SANGER/QPCR mode. Which should I use?**
4. **I have multiple hits after SINGLE/QPCR mode. Which should I use?**

varVAMP sorts all amplicons and qpcr designs by their penalty and always assigns the lowest number to the one with the lowest penalty of the non-overlapping amplicons/qpcr schemes. If you are not interested in a specific gene region, amplicon_0 or qpcr_scheme_0 are your best candidates!

Expand All @@ -38,7 +38,7 @@ The coverage is estimated on an alignment that still has gaps. If there are a lo

9. **How fast is varVAMP?**

varVAMP is pretty fast given the complexity of the problem. Running time is depended on the alignment length, number of sequences and the running mode. While the TILED is rather slow, qPCR and SANGER can be faster. An alignment with a few hundred sequences and with a genome size of 10 kb will likely run in under a minute for the TILED mode. For large e.g. DNA viruses (200 kb) it takes considerably longer, but should still finish in minutes. Running time optimizations are planned.
varVAMP is pretty fast given the complexity of the problem. Running time is depended on the alignment length, number of sequences and the running mode. While the TILED is rather slow, qPCR and SINGLE can be faster. An alignment with a few hundred sequences and with a genome size of 10 kb will likely run in under a minute for the TILED mode. For large e.g. DNA viruses (200 kb) it takes considerably longer, but should still finish in minutes. Running time optimizations are planned.

10. **Can I contribute?**

Expand Down
4 changes: 2 additions & 2 deletions docs/how_varvamp_works.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ varVAMP searches for potential primer regions as defined by a user-defined numbe
varVAMP uses [`primer3-py`](https://pypi.org/project/primer3-py/) to search for potential primers. Some of the evaluation process, determining if primers match certain criteria, was adapted from [`primalscheme`](www.github.com/aresti/primalscheme). The primer search contains multiple steps:
1. Digest the primer regions into kmers with the min and max length of primers. This is performed on a consensus sequence that does not contain ambiguous characters but is just the majority consensus of the alignment. Therefore, primer parameters will be later calculated for the best fitting primer.
2. Evaluate if these kmers are potential primers independent of their orientation (temperature, GC, size, poly-x repeats and poly dinucleotide repeats) and dependent on their orientation (secondary structure, GC clamp, number of GCs in the last 5 bases of the 3' end and min 3' nucleotides without an ambiguous base). Filter for kmers that satisfy all constraints and calculate their penalties (explained in the last section).
3. Sanger and tiled mode: Find primer with the lowest penalty. varVAMP sorts the primers by their penalty and always takes one with the lowest penalty if middle third of the primer has not been covered by a primer with a lower penalty. This greatly reduces the complexity of the later amplicon search while only retaining the best primer of a set of overlapping primers.
3. Single and tiled mode: Find primer with the lowest penalty. varVAMP sorts the primers by their penalty and always takes one with the lowest penalty if middle third of the primer has not been covered by a primer with a lower penalty. This greatly reduces the complexity of the later amplicon search while only retaining the best primer of a set of overlapping primers.

### Amplicon search

Expand All @@ -37,7 +37,7 @@ To search for the best amplicon, varVAMP uses a graph based approach.
6. Repeat steps 3-5 for each start node until the best coverage is reached. Voila! We have the best amplicon scheme with the lowest cumulative primer penalties in respect to the amplicon length!
7. Lastly, the best scheme is evaluated for primer dimers in their respective pools. If a primer dimer pair is found, varVAMP evaluates for each primer their overlapping and previously not considered primers (primer search step 2) and again minimizes the penalty. The scheme and all primers are updated. If no alternative primers can be found, varVAMP issues a warning and reports the unsolvable primer dimers.

#### Sanger sequencing
#### Single amplicons
1. varVAMP sorts all amplicons by their penalties and takes the non-overlapping amplicon with the lowest penalty!
2. As varVAMP gives a size penalty to amplicons, varVAMP automatically finds amplicons with low primer penalties close to your optimal length (if possible).

Expand Down
20 changes: 10 additions & 10 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,22 +13,22 @@ varVAMP produces multiple main output files:
| ALL | varvamp_log.txt | Log file. |
| TILED | unsolvable_primer_dimers.tsv | Only produced if there are primer dimers without replacements. Tells which primers form dimers and at which temperature. |
| TILED | primers_pool_0/1.fasta | Primer sequences per pool in fasta format. |
| SANGER | primers.fasta | Primer sequences in fasta format. |
| TILED/SANGER | primer_to_amplicon_assignments.tabular | Simple tab separated file, which primers belong together. Useful for bioinformatic workflows that include primer trimming |
| TILED/SANGER | primer.tsv | A tab separated file with important parameters for the primers including the sequence with ambiguous nucleotides (already in the right strand) and the gc and temperature of the best fitting primer as well as for the mean for all permutations of the primer. |
| SINGLE | primers.fasta | Primer sequences in fasta format. |
| TILED/SINGLE | primer_to_amplicon_assignments.tabular | Simple tab separated file, which primers belong together. Useful for bioinformatic workflows that include primer trimming |
| TILED/SINGLE | primer.tsv | A tab separated file with important parameters for the primers including the sequence with ambiguous nucleotides (already in the right strand) and the gc and temperature of the best fitting primer as well as for the mean for all permutations of the primer. |
| QPCR | qpcr_design.tsv | A tab separated file with important parameters for the qPCR amplicon including the deltaG. |
| QPCR | qpcr_primers.tsv | A tab separated file with important parameters for the primers and probes including the sequence with ambiguous nucleotides (already in the right strand) and the gc and temperature of the best fitting primer and probe as well as for the mean for all permutations. |
| QPCR | oligos.fasta | Oligo sequences in fasta format. |


It also produces some secondary output files in `data` :

| Mode | Output | Description |
| --- | --- | --- |
| ALL | alignment_cleaned | The preprocessed alignment. |
| ALL | majority_consensus.fasta | Consensus sequence that does not have ambiguous characters but instead has the most prevalent nucleotide at each position. |
| ALL | primer_regions.bed | A bed file showing the location of the potential regions of the consensus sequence that were evaluated for primers. |
| TILED/SANGER | all_primers.bed | A bed file with all high scoring primers that varVAMP found. |
| qPCR | probe_regions.bed | A bed file showing the location of the potential regions of the consensus sequence that were evaluated for probes. |
| Mode | Output | Description |
|--------------| --- | --- |
| ALL | alignment_cleaned | The preprocessed alignment. |
| ALL | majority_consensus.fasta | Consensus sequence that does not have ambiguous characters but instead has the most prevalent nucleotide at each position. |
| ALL | primer_regions.bed | A bed file showing the location of the potential regions of the consensus sequence that were evaluated for primers. |
| TILED/SINGLE | all_primers.bed | A bed file with all high scoring primers that varVAMP found. |
| qPCR | probe_regions.bed | A bed file showing the location of the potential regions of the consensus sequence that were evaluated for probes. |

#### [Previous: Usage](./usage.md)&emsp;&emsp;[Next: Wet lab protocol](./wet_lab_protocol.md)
File renamed without changes
8 changes: 4 additions & 4 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,15 +26,15 @@ optional arguments:
-v, --version show program's version number and exit
varvamp mode:
{sanger,tiled,qpcr}
sanger design primers for sanger sequencing
{single,tiled,qpcr}
single design primers for single amplicons
tiled design primers for whole genome sequencing
qpcr design qPCR primers
```
**sanger** mode:
**single** mode:
```shell
usage: varvamp sanger [optional arguments] <alignment> <output dir>
usage: varvamp single [optional arguments] <alignment> <output dir>
```
```
optional arguments:
Expand Down
Binary file modified docs/varvamp.png
100644 → 100755
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/workflow.png
100644 → 100755
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion varvamp/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
"""Tool to design amplicons for highly variable virusgenomes"""
_program = "varvamp"
__version__ = "1.0"
__version__ = "1.0.1"
36 changes: 18 additions & 18 deletions varvamp/command.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,10 +36,10 @@ def get_args(sysargs):
title="varvamp mode",
dest="mode",
)
SANGER_parser = mode_parser.add_parser(
"sanger",
help="design primers for sanger sequencing",
usage="varvamp sanger [optional arguments] <alignment> <output dir>"
SINGLE_parser = mode_parser.add_parser(
"single",
help="design primers for single amplicons",
usage="varvamp single [optional arguments] <alignment> <output dir>"
)
TILED_parser = mode_parser.add_parser(
"tiled",
Expand All @@ -56,7 +56,7 @@ def get_args(sysargs):
nargs=2,
help="alignment file and dir to write results"
)
for par in (SANGER_parser, TILED_parser, QPCR_parser):
for par in (SINGLE_parser, TILED_parser, QPCR_parser):
par.add_argument(
"-t",
"--threshold",
Expand Down Expand Up @@ -89,7 +89,7 @@ def get_args(sysargs):
type=int,
default=1
)
for par in (SANGER_parser, TILED_parser):
for par in (SINGLE_parser, TILED_parser):
par.add_argument(
"-ol",
"--opt-length",
Expand All @@ -114,7 +114,7 @@ def get_args(sysargs):
default=100,
help="min overlap of the amplicons"
)
SANGER_parser.add_argument(
SINGLE_parser.add_argument(
"-n",
"--report-n",
type=int,
Expand Down Expand Up @@ -275,9 +275,9 @@ def shared_workflow(args, log_file):
return alignment_cleaned, majority_consensus, ambiguous_consensus, primer_regions, left_primer_candidates, right_primer_candidates


def sanger_and_tiled_shared_workflow(args, left_primer_candidates, right_primer_candidates, data_dir, log_file):
def single_and_tiled_shared_workflow(args, left_primer_candidates, right_primer_candidates, data_dir, log_file):
"""
part of the workflow shared by the sanger and tiled mode
part of the workflow shared by the single and tiled mode
"""

# find best primers and create primer dict
Expand Down Expand Up @@ -321,20 +321,20 @@ def sanger_and_tiled_shared_workflow(args, left_primer_candidates, right_primer_
args.max_length,
args.threads,
log_file,
mode="sanger_tiled"
mode="single_tiled"
)
else:
off_target_amplicons = []

return all_primers, amplicons, off_target_amplicons


def sanger_workflow(args, amplicons, all_primers, log_file):
def single_workflow(args, amplicons, all_primers, log_file):
"""
workflow part specific for sanger mode
workflow part specific for single mode
"""

amplicon_scheme = scheme.find_sanger_amplicons(amplicons, all_primers, args.report_n)
amplicon_scheme = scheme.find_single_amplicons(amplicons, all_primers, args.report_n)
logging.varvamp_progress(
log_file,
progress=0.9,
Expand Down Expand Up @@ -503,17 +503,17 @@ def main(sysargs=sys.argv[1:]):
reporting.write_fasta(data_dir, "majority_consensus", majority_consensus)
reporting.write_fasta(results_dir, "ambiguous_consensus", ambiguous_consensus)

# SANGER/TILED mode
if args.mode == "tiled" or args.mode == "sanger":
all_primers, amplicons, off_target_amplicons = sanger_and_tiled_shared_workflow(
# SINGLE/TILED mode
if args.mode == "tiled" or args.mode == "single":
all_primers, amplicons, off_target_amplicons = single_and_tiled_shared_workflow(
args,
left_primer_candidates,
right_primer_candidates,
data_dir,
log_file
)
if args.mode == "sanger":
amplicon_scheme = sanger_workflow(
if args.mode == "single":
amplicon_scheme = single_workflow(
args,
amplicons,
all_primers,
Expand Down
8 changes: 4 additions & 4 deletions varvamp/scripts/blast.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ def check_BLAST_installation(log_file):

def create_BLAST_query(all_primers, amplicons, data_dir):
"""
create a query for the BLAST search (tiled, sanger mode)
create a query for the BLAST search (tiled, single mode)
"""
already_written = []

Expand Down Expand Up @@ -168,7 +168,7 @@ def predict_non_specific_amplicons_worker(amp, blast_df, max_length, mode):
"""
name, data = amp
# get correct primers
if mode == "sanger_tiled":
if mode == "single_tiled":
primers = [data[2], data[3]]
elif mode == "qpcr":
primers = []
Expand Down Expand Up @@ -198,7 +198,7 @@ def predict_non_specific_amplicons(amplicons, blast_df, max_length, mode, n_thre
if off_target is None:
continue
off_targets.append(off_target)
if mode == "sanger_tiled":
if mode == "single_tiled":
amplicons[off_target][5] = amplicons[off_target][5] + config.BLAST_PENALTY
elif mode == "qpcr":
amplicons[off_target]["penalty"][0] = amplicons[off_target]["penalty"][0] + config.BLAST_PENALTY
Expand All @@ -208,7 +208,7 @@ def predict_non_specific_amplicons(amplicons, blast_df, max_length, mode, n_thre

def primer_blast(data_dir, db, query_path, amplicons, max_length, n_threads, log_file, mode):
"""
performs the blast search for the sanger or tiled workflow
performs the blast search for the single or tiled workflow
"""
print("\n#### Starting varVAMP primerBLAST. ####\n")
print("Running BLASTN...")
Expand Down
14 changes: 7 additions & 7 deletions varvamp/scripts/logging.py
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ def raise_arg_errors(args, log_file):
log_file,
exit=True
)
if args.mode in ("tiled", "sanger"):
if args.mode in ("tiled", "single"):
if args.opt_length > args.max_length:
raise_error(
"optimal length can not be higher than the maximum amplicon length.",
Expand All @@ -126,8 +126,8 @@ def raise_arg_errors(args, log_file):
log_file,
exit=True
)
# SANGER specific warnings
if args.mode == "sanger":
# SINGLE specific warnings
if args.mode == "single":
if args.report_n < 1:
raise_error(
"number of reported amplicons cannot be below 1.",
Expand Down Expand Up @@ -252,7 +252,7 @@ def confirm_config(args, log_file):

# check if all variables exists
all_vars = [
# arg independent TILED, SANGER mode
# arg independent TILED, SINGLE mode
(
"PRIMER_TMP",
"PRIMER_GC_RANGE",
Expand Down Expand Up @@ -512,7 +512,7 @@ def confirm_config(args, log_file):
sep="\n",
file=f
)
if args.mode in ("tiled", "sanger"):
if args.mode in ("tiled", "single"):
print(
f"AMPLICON_OPT_LENGTH = {args.opt_length}",
f"AMPLICON_MAX_LENGTH = {args.max_length}",
Expand All @@ -532,7 +532,7 @@ def confirm_config(args, log_file):
sep="\n",
file=f
)
if args.mode == "sanger":
if args.mode == "single":
print(
f"REPORT_N_AMPLICONS = {args.report_n}",
sep="\n",
Expand All @@ -553,7 +553,7 @@ def confirm_config(args, log_file):
)
for var in all_vars[0]:
print(f"{var} = {var_dic[var]}", file=f)
if args.mode in ("tiled", "sanger"):
if args.mode in ("tiled", "single"):
if args.database is not None:
for var in all_vars[2]:
print(f"{var} = {var_dic[var]}", file=f)
Expand Down
Loading

0 comments on commit 3cc8e75

Please sign in to comment.