Skip to content

Commit

Permalink
Merge pull request #15 from d-j-e/v0.4.2dev
Browse files Browse the repository at this point in the history
V0.4.2dev
  • Loading branch information
d-j-e authored May 14, 2020
2 parents f0e4c34 + 1c64a3e commit 4818d11
Show file tree
Hide file tree
Showing 3 changed files with 20 additions and 15 deletions.
26 changes: 15 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ SNPPar is designed to find homoplasic SNPs based on a user-defined phylogenetic

By default, SNPPar uses TreeTime for ancestral state reconstruction (ASR), but using FastML for ASR is also available if FastML is installed (though much, much slower)

Current Version: V0.4.1dev
Current Version: V0.4.2dev

# Home:

Expand Down Expand Up @@ -88,7 +88,7 @@ Note: If any gene is split in the reference (including across the origin of the
[-t TREE] [-g GENBANK] [-E SORTING] [-M MUTATION_EVENTS]
[-d DIRECTORY] [-p PREFIX] [-P] [-S] [-C] [-R] [-A] [-a] [-n]
[-e] [-u] [-f] [-x FASTML_EXECUTE]
SNPPar: Parallel/homoplasic SNP Finder V0.4.1dev
SNPPar: Parallel/homoplasic SNP Finder V0.4.2dev
optional arguments:
-h, --help show this help message and exit
-s SNPTABLE, --snptable SNPTABLE
Expand Down Expand Up @@ -151,16 +151,20 @@ Note: If any gene is split in the reference (including across the origin of the

# SNPPar sorting
Three versions of the SNP sorting are available when using TreeTime for ASR
Filtered out from ASR
complex singletons and monophyletic SNPs
(tested against tree)
intermediate (default) same as complex except SNPs with
missing calls sent to ASR (not singletons)
simple singletons only


Filtered out before ASR
complex singletons and monophyletic SNPs
(tested against tree)
intermediate (default) same as complex except non-singleton SNPs
with missing calls sent to ASR ()
simple singletons only

Complex sorting is the most memory efficient of the three, with simple being about twice as costly (estimate!); intermediate sits somewhere in between (though closer to complex).

Run time is more dependant on missing calls; complex and intermediate sorting are quicker than simple sorting when there are no missing calls. When missing calls are present, complex sorting can be much slower than either simple or intermediate sorting. Intermediate sorting can be faster than simple... (still testing atm)
Run time is more dependant on missing calls; complex and intermediate sorting are quicker than simple sorting when there are no missing calls. When missing calls are present, complex sorting can be much slower than either simple or intermediate sorting. Intermediate sorting is typically faster than simple.

Complex sorting may be useful when memory is a problem; simple sorting can be used to if you would prefer all the internal SNPs (i.e. non-singletons) to be mapped using ASR.

Expand Down Expand Up @@ -234,6 +238,6 @@ Then to run SNPPar:
</p>

# Important Note
SNPPar is very accurate (evidence in SNPPar_test very soon!), BUT calls where the ancestor is the root node ('N1') are arbituarly assigned - As such, the output trees have no homoplasic events (parallel, convergent, or revertant) mapped to root node, though the total number of SNPs on each branch is estimated using the ratio of the distance to the child nodes of 'N1'.
SNPPar is very accurate, BUT calls where the ancestor is the root node ('N1') are arbituarly assigned. As such, the output trees have no homoplasic events (parallel, convergent, or revertant) mapped to root node, though the total number of SNPs on each branch is estimated using the ratio of the distance to the child nodes of 'N1'.

When a homoplasic event does occur at the root node and is removed, if there is only one other mutation event at the same SNP position, that mutation event is *not* removed from the tree. Keep this in mind when interpreting the tree output.
7 changes: 4 additions & 3 deletions scripts/snppar.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
snppar -s snps.csv -g genbank.gb -t tree.tre
'''
#
# Last modified - 2/1/2020
# Last modified - 11/5/2020
# Recent Changes: changed default reporting to homoplasic, not parallel
# change of some input commands as a result
# added user command to log output
Expand All @@ -24,6 +24,7 @@
# fix for fastml_execute
# simplified and intermediate and complex sorting for TreeTime
# further fixing (and testing) of fastml_execute
# removed 'cpickle' option for tree.copy(), 'deepcopy' option insted
# To add: mapping using tree and snp table only (i.e. no reference)
#

Expand All @@ -42,7 +43,7 @@
from datetime import datetime

# Constants declaration
version = 'V0.4.1dev'
version = 'V0.4.2dev'
genefeatures = 'CDS'
excludefeatures = 'gene,misc_feature,repeat_region,mobile_element'
nt = ['A','C','G','T']
Expand Down Expand Up @@ -444,7 +445,7 @@ def addToSNPPatterns(snp,snp_pattern,snp_set,alt_set,na_set,tree,snps_to_map,mon

def getNANodes(tree, na_set, node_names):
removed_nodes = []
test_tree = tree.copy()
test_tree = tree.copy("newick")
if na_set:
for isolate in na_set:
if test_tree.search_nodes(name=isolate):
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

setup(
name='snppar',
version='0.4.1dev',
version='0.4.2dev',
author='David Edwards',
author_email='[email protected]',
packages=['snppar'],
Expand Down

0 comments on commit 4818d11

Please sign in to comment.