changed sanger to single (#25)

jonas-fuchs · Dec 14, 2023 · 3cc8e75 · 3cc8e75
1 parent 75c474f
commit 3cc8e75
Show file tree

Hide file tree

Showing 14 changed files with 59 additions and 59 deletions.
diff --git a/README.md b/README.md
@@ -17,9 +17,9 @@ For a lot of virus genera it is difficult to design pan-specific primers. varVAM
 
 <img src="https://github.com/jonas-fuchs/varVAMP/blob/master/docs/varvamp.png" alt="varVAMP logo" />
 
-**SANGER**: varVAMP searches for the very best primers and reports back non-overlapping amplicons which can be used for PCR-based screening approaches.
+**SINGLE**: varVAMP searches for the very best primers and reports back non-overlapping amplicons which can be used for PCR-based screening approaches.
 
-<img src="https://github.com/jonas-fuchs/varVAMP/blob/master/docs/sanger.png" alt="sanger" />
+<img src="https://github.com/jonas-fuchs/varVAMP/blob/master/docs/single.png" alt="single" />
 
 **TILED**: varVAMP uses a graph based approach to design overlapping amplicons that tile the entire viral genome. This designs amplicons that are suitable for Oxford Nanopore or Illumina based full-genome sequencing.
 

diff --git a/docs/FAQ.md b/docs/FAQ.md
@@ -16,7 +16,7 @@ The most easy way is to set the number of ambiguous characters you can tolerate
 
 In your case varVAMP could not find suitable replacement primers in the TILED mode. You can either rerun varVAMP and try different settings or you can perform a third pool that contains a amplicon that has one of the conflicting dimers. Notably, varVAMP also reports the dimer melting temperature. If it is still reasonable low, using a hot start polymerase might still lead to successful PCR amplification.
 
-4. **I have multiple hits after SANGER/QPCR mode. Which should I use?**
+4. **I have multiple hits after SINGLE/QPCR mode. Which should I use?**
 
 varVAMP sorts all amplicons and qpcr designs by their penalty and always assigns the lowest number to the one with the lowest penalty of the non-overlapping amplicons/qpcr schemes. If you are not interested in a specific gene region, amplicon_0 or qpcr_scheme_0  are your best candidates!
 
@@ -38,7 +38,7 @@ The coverage is estimated on an alignment that still has gaps. If there are a lo
 
 9. **How fast is varVAMP?**
 
-varVAMP is pretty fast given the complexity of the problem. Running time is depended on the alignment length, number of sequences and the running mode. While the TILED is rather slow, qPCR and SANGER can be faster. An alignment with a few hundred sequences and with a genome size of 10 kb will likely run in under a minute for the TILED mode. For large e.g. DNA viruses (200 kb) it takes considerably longer, but should still finish in minutes. Running time optimizations are planned.
+varVAMP is pretty fast given the complexity of the problem. Running time is depended on the alignment length, number of sequences and the running mode. While the TILED is rather slow, qPCR and SINGLE can be faster. An alignment with a few hundred sequences and with a genome size of 10 kb will likely run in under a minute for the TILED mode. For large e.g. DNA viruses (200 kb) it takes considerably longer, but should still finish in minutes. Running time optimizations are planned.
 
 10. **Can I contribute?**
 

diff --git a/docs/how_varvamp_works.md b/docs/how_varvamp_works.md
@@ -23,7 +23,7 @@ varVAMP searches for potential primer regions as defined by a user-defined numbe
 varVAMP uses [`primer3-py`](https://pypi.org/project/primer3-py/) to search for potential primers. Some of the evaluation process, determining if primers match certain criteria, was adapted from [`primalscheme`](www.github.com/aresti/primalscheme). The primer search contains multiple steps:
 1. Digest the primer regions into kmers with the min and max length of primers. This is performed on a consensus sequence that does not contain ambiguous characters but is just the majority consensus of the alignment. Therefore, primer parameters will be later calculated for the best fitting primer.
 2. Evaluate if these kmers are potential primers independent of their orientation (temperature, GC, size, poly-x repeats and poly dinucleotide repeats) and dependent on their orientation (secondary structure, GC clamp, number of GCs in the last 5 bases of the 3' end and min 3' nucleotides without an ambiguous base). Filter for kmers that satisfy all constraints and calculate their penalties (explained in the last section).
-3. Sanger and tiled mode: Find primer with the lowest penalty. varVAMP sorts the primers by their penalty and always takes one with the lowest penalty if middle third of the primer has not been covered by a primer with a lower penalty. This greatly reduces the complexity of the later amplicon search while only retaining the best primer of a set of overlapping primers.
+3. Single and tiled mode: Find primer with the lowest penalty. varVAMP sorts the primers by their penalty and always takes one with the lowest penalty if middle third of the primer has not been covered by a primer with a lower penalty. This greatly reduces the complexity of the later amplicon search while only retaining the best primer of a set of overlapping primers.
 
 ### Amplicon search
 
@@ -37,7 +37,7 @@ To search for the best amplicon, varVAMP uses a graph based approach.
 6. Repeat steps 3-5 for each start node until the best coverage is reached. Voila! We have the best amplicon scheme with the lowest cumulative primer penalties in respect to the amplicon length!
 7. Lastly, the best scheme is evaluated for primer dimers in their respective pools. If a primer dimer pair is found, varVAMP evaluates for each primer their overlapping and previously not considered primers (primer search step 2) and again minimizes the penalty. The scheme and all primers are updated. If no alternative primers can be found, varVAMP issues a warning and reports the unsolvable primer dimers.
 
-#### Sanger sequencing
+#### Single amplicons
 1. varVAMP sorts all amplicons by their penalties and takes the non-overlapping amplicon with the lowest penalty!
 2. As varVAMP gives a size penalty to amplicons, varVAMP automatically finds amplicons with low primer penalties close to your optimal length (if possible).
 

diff --git a/docs/output.md b/docs/output.md
@@ -13,22 +13,22 @@ varVAMP produces multiple main output files:
 | ALL          | varvamp_log.txt                        | Log file.                                                                                                                                                                                                                                                                |
 | TILED        | unsolvable_primer_dimers.tsv           | Only produced if there are primer dimers without replacements. Tells which primers form dimers and at which temperature.                                                                                                                                                 |
 | TILED        | primers_pool_0/1.fasta                 | Primer sequences per pool in fasta format.                                                                                                                                                                                                                               |
-| SANGER       | primers.fasta                          | Primer sequences in fasta format.                                                                                                                                                                                                                                        |
-| TILED/SANGER | primer_to_amplicon_assignments.tabular | Simple tab separated file, which primers belong together. Useful for bioinformatic workflows that include primer trimming                                                                                                                                                |
-| TILED/SANGER | primer.tsv                             | A tab separated file with important parameters for the primers including the sequence with ambiguous nucleotides (already in the right strand) and the gc and temperature of the best fitting primer as well as for the mean for all permutations of the primer.         |
+| SINGLE       | primers.fasta                          | Primer sequences in fasta format.                                                                                                                                                                                                                                        |
+| TILED/SINGLE | primer_to_amplicon_assignments.tabular | Simple tab separated file, which primers belong together. Useful for bioinformatic workflows that include primer trimming                                                                                                                                                |
+| TILED/SINGLE | primer.tsv                             | A tab separated file with important parameters for the primers including the sequence with ambiguous nucleotides (already in the right strand) and the gc and temperature of the best fitting primer as well as for the mean for all permutations of the primer.         |
 | QPCR         | qpcr_design.tsv                        | A tab separated file with important parameters for the qPCR amplicon including the deltaG.                                                                                                                                                                               |
 | QPCR         | qpcr_primers.tsv                       | A tab separated file with important parameters for the primers  and probes including the sequence with ambiguous nucleotides (already in the right strand) and the gc and temperature of the best fitting primer and probe as well as for the mean for all permutations. |
 | QPCR         | oligos.fasta                           | Oligo sequences in fasta format.                                                                                                                                                                                                                                         |
 
 
 It also produces some secondary output files in `data` :
 
-| Mode | Output | Description |
-| --- | --- | --- |
-| ALL | alignment_cleaned | The preprocessed alignment. |
-| ALL | majority_consensus.fasta | Consensus sequence that does not have ambiguous characters but instead has the most prevalent nucleotide at each position. |
-| ALL | primer_regions.bed | A bed file showing the location of the potential regions of the consensus sequence that were evaluated for primers. |
-| TILED/SANGER | all_primers.bed | A bed file with all high scoring primers that varVAMP found. |
-| qPCR | probe_regions.bed | A bed file showing the location of the potential regions of the consensus sequence that were evaluated for probes. |
+| Mode         | Output | Description |
+|--------------| --- | --- |
+| ALL          | alignment_cleaned | The preprocessed alignment. |
+| ALL          | majority_consensus.fasta | Consensus sequence that does not have ambiguous characters but instead has the most prevalent nucleotide at each position. |
+| ALL          | primer_regions.bed | A bed file showing the location of the potential regions of the consensus sequence that were evaluated for primers. |
+| TILED/SINGLE | all_primers.bed | A bed file with all high scoring primers that varVAMP found. |
+| qPCR         | probe_regions.bed | A bed file showing the location of the potential regions of the consensus sequence that were evaluated for probes. |
 
 #### [Previous: Usage](./usage.md)&emsp;&emsp;[Next: Wet lab protocol](./wet_lab_protocol.md)
diff --git a/docs/sanger.png → docs/single.png b/docs/sanger.png → docs/single.png
diff --git a/docs/usage.md b/docs/usage.md
@@ -26,15 +26,15 @@ optional arguments:
   -v, --version         show program's version number and exit
 
 varvamp mode:
-  {sanger,tiled,qpcr}
-    sanger              design primers for sanger sequencing
+  {single,tiled,qpcr}
+    single              design primers for single amplicons
     tiled               design primers for whole genome sequencing
     qpcr                design qPCR primers
 
 ```
-**sanger** mode:
+**single** mode:
 ```shell
-usage: varvamp sanger [optional arguments] <alignment> <output dir>
+usage: varvamp single [optional arguments] <alignment> <output dir>
 ```
 ```
 optional arguments:

diff --git a/docs/varvamp.png b/docs/varvamp.png
diff --git a/docs/workflow.png b/docs/workflow.png
diff --git a/varvamp/__init__.py b/varvamp/__init__.py
@@ -1,3 +1,3 @@
 """Tool to design amplicons for highly variable virusgenomes"""
 _program = "varvamp"
-__version__ = "1.0"
+__version__ = "1.0.1"
diff --git a/varvamp/command.py b/varvamp/command.py
@@ -36,10 +36,10 @@ def get_args(sysargs):
         title="varvamp mode",
         dest="mode",
     )
-    SANGER_parser = mode_parser.add_parser(
-        "sanger",
-        help="design primers for sanger sequencing",
-        usage="varvamp sanger [optional arguments] <alignment> <output dir>"
+    SINGLE_parser = mode_parser.add_parser(
+        "single",
+        help="design primers for single amplicons",
+        usage="varvamp single [optional arguments] <alignment> <output dir>"
     )
     TILED_parser = mode_parser.add_parser(
         "tiled",
@@ -56,7 +56,7 @@ def get_args(sysargs):
         nargs=2,
         help="alignment file and dir to write results"
     )
-    for par in (SANGER_parser, TILED_parser, QPCR_parser):
+    for par in (SINGLE_parser, TILED_parser, QPCR_parser):
         par.add_argument(
             "-t",
             "--threshold",
@@ -89,7 +89,7 @@ def get_args(sysargs):
             type=int,
             default=1
         )
-    for par in (SANGER_parser, TILED_parser):
+    for par in (SINGLE_parser, TILED_parser):
         par.add_argument(
             "-ol",
             "--opt-length",
@@ -114,7 +114,7 @@ def get_args(sysargs):
         default=100,
         help="min overlap of the amplicons"
     )
-    SANGER_parser.add_argument(
+    SINGLE_parser.add_argument(
         "-n",
         "--report-n",
         type=int,
@@ -275,9 +275,9 @@ def shared_workflow(args, log_file):
     return alignment_cleaned, majority_consensus, ambiguous_consensus, primer_regions, left_primer_candidates, right_primer_candidates
 
 
-def sanger_and_tiled_shared_workflow(args, left_primer_candidates, right_primer_candidates, data_dir, log_file):
+def single_and_tiled_shared_workflow(args, left_primer_candidates, right_primer_candidates, data_dir, log_file):
     """
-    part of the workflow shared by the sanger and tiled mode
+    part of the workflow shared by the single and tiled mode
     """
 
     # find best primers and create primer dict
@@ -321,20 +321,20 @@ def sanger_and_tiled_shared_workflow(args, left_primer_candidates, right_primer_
             args.max_length,
             args.threads,
             log_file,
-            mode="sanger_tiled"
+            mode="single_tiled"
         )
     else:
         off_target_amplicons = []
 
     return all_primers, amplicons, off_target_amplicons
 
 
-def sanger_workflow(args, amplicons, all_primers, log_file):
+def single_workflow(args, amplicons, all_primers, log_file):
     """
-    workflow part specific for sanger mode
+    workflow part specific for single mode
     """
 
-    amplicon_scheme = scheme.find_sanger_amplicons(amplicons, all_primers, args.report_n)
+    amplicon_scheme = scheme.find_single_amplicons(amplicons, all_primers, args.report_n)
     logging.varvamp_progress(
         log_file,
         progress=0.9,
@@ -503,17 +503,17 @@ def main(sysargs=sys.argv[1:]):
     reporting.write_fasta(data_dir, "majority_consensus", majority_consensus)
     reporting.write_fasta(results_dir, "ambiguous_consensus", ambiguous_consensus)
 
-    # SANGER/TILED mode
-    if args.mode == "tiled" or args.mode == "sanger":
-        all_primers, amplicons, off_target_amplicons = sanger_and_tiled_shared_workflow(
+    # SINGLE/TILED mode
+    if args.mode == "tiled" or args.mode == "single":
+        all_primers, amplicons, off_target_amplicons = single_and_tiled_shared_workflow(
             args,
             left_primer_candidates,
             right_primer_candidates,
             data_dir,
             log_file
         )
-        if args.mode == "sanger":
-            amplicon_scheme = sanger_workflow(
+        if args.mode == "single":
+            amplicon_scheme = single_workflow(
                 args,
                 amplicons,
                 all_primers,

diff --git a/varvamp/scripts/blast.py b/varvamp/scripts/blast.py
@@ -31,7 +31,7 @@ def check_BLAST_installation(log_file):
 
 def create_BLAST_query(all_primers, amplicons, data_dir):
     """
-    create a query for the BLAST search (tiled, sanger mode)
+    create a query for the BLAST search (tiled, single mode)
     """
     already_written = []
 
@@ -168,7 +168,7 @@ def predict_non_specific_amplicons_worker(amp, blast_df, max_length, mode):
     """
     name, data = amp
     # get correct primers
-    if mode == "sanger_tiled":
+    if mode == "single_tiled":
         primers = [data[2], data[3]]
     elif mode == "qpcr":
         primers = []
@@ -198,7 +198,7 @@ def predict_non_specific_amplicons(amplicons, blast_df, max_length, mode, n_thre
         if off_target is None:
             continue
         off_targets.append(off_target)
-        if mode == "sanger_tiled":
+        if mode == "single_tiled":
             amplicons[off_target][5] = amplicons[off_target][5] + config.BLAST_PENALTY
         elif mode == "qpcr":
             amplicons[off_target]["penalty"][0] = amplicons[off_target]["penalty"][0] + config.BLAST_PENALTY
@@ -208,7 +208,7 @@ def predict_non_specific_amplicons(amplicons, blast_df, max_length, mode, n_thre
 
 def primer_blast(data_dir, db, query_path, amplicons, max_length, n_threads, log_file, mode):
     """
-    performs the blast search for the sanger or tiled workflow
+    performs the blast search for the single or tiled workflow
     """
     print("\n#### Starting varVAMP primerBLAST. ####\n")
     print("Running BLASTN...")

diff --git a/varvamp/scripts/logging.py b/varvamp/scripts/logging.py
@@ -113,7 +113,7 @@ def raise_arg_errors(args, log_file):
                 log_file,
                 exit=True
             )
-    if args.mode in ("tiled", "sanger"):
+    if args.mode in ("tiled", "single"):
         if args.opt_length > args.max_length:
             raise_error(
                 "optimal length can not be higher than the maximum amplicon length.",
@@ -126,8 +126,8 @@ def raise_arg_errors(args, log_file):
                 log_file,
                 exit=True
             )
-    # SANGER specific warnings
-    if args.mode == "sanger":
+    # SINGLE specific warnings
+    if args.mode == "single":
         if args.report_n < 1:
             raise_error(
                 "number of reported amplicons cannot be below 1.",
@@ -252,7 +252,7 @@ def confirm_config(args, log_file):
 
     # check if all variables exists
     all_vars = [
-        # arg independent TILED, SANGER mode
+        # arg independent TILED, SINGLE mode
         (
             "PRIMER_TMP",
             "PRIMER_GC_RANGE",
@@ -512,7 +512,7 @@ def confirm_config(args, log_file):
             sep="\n",
             file=f
         )
-        if args.mode in ("tiled", "sanger"):
+        if args.mode in ("tiled", "single"):
             print(
                 f"AMPLICON_OPT_LENGTH = {args.opt_length}",
                 f"AMPLICON_MAX_LENGTH = {args.max_length}",
@@ -532,7 +532,7 @@ def confirm_config(args, log_file):
                 sep="\n",
                 file=f
             )
-        if args.mode == "sanger":
+        if args.mode == "single":
             print(
                 f"REPORT_N_AMPLICONS = {args.report_n}",
                 sep="\n",
@@ -553,7 +553,7 @@ def confirm_config(args, log_file):
         )
         for var in all_vars[0]:
             print(f"{var} = {var_dic[var]}", file=f)
-        if args.mode in ("tiled", "sanger"):
+        if args.mode in ("tiled", "single"):
             if args.database is not None:
                 for var in all_vars[2]:
                     print(f"{var} = {var_dic[var]}", file=f)