-
Notifications
You must be signed in to change notification settings - Fork 7
Home
Robust comparative analysis and contamination removal for metagenomic data
BioRxiv (2017) doi: https://doi.org/10.1101/190934
Welcome to the Recentrifuge wiki!
- Requirements: Python 3.6 is required. No modules beyond Python Standard Library ones are currently used by Recentrifuge.
- Getting the code: just clone the repo.
- Getting the databases: in the cloning dir, execute
retaxdump.py
. It will download and unzip the required local databases from NCBI servers under the subdirectorytaxdump
.
Python version under 3.6 is no supported as Recentrifuge uses new syntax features of Python 3.6 like syntax for variable annotations (PEP 526) and formatted string literals (PEP 498). The syntax for type annotations was introduced in Python 3.5 (PEP 484) but it is with Python 3.6 when it has achieved maturity for variable annotations. Powerful tools for static type analysis in Python have evolved along with these standards. The development of Recentrifuge includes checks with pylint and mypy. A code whose aim is to perform robust comparative metagenomic analysis is a very good candidate for robust coding.
Let's suppose you have cloned the repo in ~/recentrifuge
and you would like to analyse and compare the Centrifuge output from samples S1, S2 and S3, for instance. The command would be:
~/recentrifuge/recentrifuge.py -f S1.out -f S2.out -f S3.out
The layout of the Recentrifuge command is:
usage: recentrifuge.py [-h] [-V] (-f FILE | -r FILE) [-v] [-n PATH] [-m INT]
[-k] [-o FILE] [-i TAXID] [-x TAXID] [-s SCORING]
[--sequential] [-a] [-c]
with the following optional arguments:
optional arguments:
-h, --help show this help message and exit
-V, --version show program's version number and exit
-f FILE, --file FILE Centrifuge output files (multiple -f is available to
include several samples in plot) (default: None)
-r FILE, --report FILE
Centrifuge/Kraken report files (multiple -r is
available to include several samples in plot)
(default: None)
-v, --verbose increase output verbosity (default: 0)
-n PATH, --nodespath PATH
path for the nodes information files (nodes.dmp and
names.dmp from NCBI) (default: ./taxdump)
-m INT, --mintaxa INT
minimum taxa to avoid collapsing one level to the
parent one (default: 10)
-k, --nokollapse show the "cellular organisms" taxon (default: False)
-o FILE, --outhtml FILE
HTML output file (if not given the filename will be
inferred from input files) (default: None)
-i TAXID, --include TAXID
NCBI taxid code to include a taxon and all underneath
(multiple -i is available to include several taxid).
By default all the taxa is considered for inclusion.
(default: [])
-x TAXID, --exclude TAXID
NCBI taxid code to exclude a taxon and all underneath
(multiple -x is available to exclude several taxid)
(default: [])
-s SCORING, --scoring SCORING
type of scoring to be applied, and can be one of
['SHEL', 'LENGTH', 'LOGLENGTH', 'NORMA'] (default:
SHEL)
--sequential deactivate parallel processing (default: False)
-a, --avoidcross avoid cross analysis (default: False)
-c, --control take the first sample as negative control (default:
False)
For example, to analyse the Centrifuge output:
- from samples X1 (file X1.nt_mhl30_k1_cf.out), X2 (file X2.nt_mhl30_k1_cf.out) and X3 (file X3.nt_mhl30_k1_cf.out),
- with negative control CTRL (file CTRL.nt_mhl30_k1_cf.out),
- but excluding taxa assigned to chordata (taxid 7711), unclassified sequences (taxid 12908) and other sequences (taxid 28384),
- with the taxonomy files downloaded to /my/tax/dir,
- and saving the output to Xsamples.rcf.html file,
the command would be:
~/recentrifuge/recentrifuge.py -n /my/tax/dir -c -f CTRL.nt_mhl30_k1_cf.out -f X1.nt_mhl30_k1_cf.out -f X2.nt_mhl30_k1_cf.out -f X3.nt_mhl30_k1_cf.out -x 7711 -x 12908 -x 28384 -o Xsamples.rcf.html
Sorry, documentation in preparation. Expect this page to change often to accommodate new material.
If you find the code useful and use it in your research, please consider to cite the pre-print (https://doi.org/10.1101/190934). Thanks!
If you use Recentrifuge in your research, please consider citing the paper. Thanks!
Martí JM (2019) Recentrifuge: Robust comparative analysis and contamination removal for metagenomics. PLOS Computational Biology 15(4): e1006967. https://doi.org/10.1371/journal.pcbi.1006967