You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, treesapp create relies on the package ODSeq for finding outliers in a multiple sequence alignment. This is optionally performed in treesapp create by invoking the --outdet_align flag.
The only issue with ODSeq is that is no longer supported and it's uncertain how long it will continue to be hosted at its current location (http://www.bioinf.ucd.ie/download/od-seq.tar.gz). A bioconductor package exists for it on conda but I'd rather not use it, or create a conda recipe for the binary, so it is currently being downloaded and compiled by users who want to use it (probably not many).
An added benfit of ditching ODSeq is it would be one less dependency.
An alternative method should be implemented to replace this. I propose two options:
Provide a 'subtraction' fasta file to treesapp create and it will cluster the regular input sequences with the 'subtraction' set. Any sequences that are clustered with those from 'subtraction' will be removed.
treesapp create will build a profile HMM from the input sequences (probably cluster them first). The input sequences will then be aligned to the HMM and those that align poorly will be removed. Something like this was used by GraftM.
These should be complementary, not redundant, to the other method for filtering off-target reference sequences - with a provided profile HMM using the treesapp create argument '--profile'.
The text was updated successfully, but these errors were encountered:
Currently,
treesapp create
relies on the package ODSeq for finding outliers in a multiple sequence alignment. This is optionally performed intreesapp create
by invoking the--outdet_align
flag.The only issue with ODSeq is that is no longer supported and it's uncertain how long it will continue to be hosted at its current location (http://www.bioinf.ucd.ie/download/od-seq.tar.gz). A bioconductor package exists for it on conda but I'd rather not use it, or create a conda recipe for the binary, so it is currently being downloaded and compiled by users who want to use it (probably not many).
An added benfit of ditching ODSeq is it would be one less dependency.
An alternative method should be implemented to replace this. I propose two options:
treesapp create
and it will cluster the regular input sequences with the 'subtraction' set. Any sequences that are clustered with those from 'subtraction' will be removed.treesapp create
will build a profile HMM from the input sequences (probably cluster them first). The input sequences will then be aligned to the HMM and those that align poorly will be removed. Something like this was used by GraftM.These should be complementary, not redundant, to the other method for filtering off-target reference sequences - with a provided profile HMM using the
treesapp create
argument '--profile'.The text was updated successfully, but these errors were encountered: