-
Notifications
You must be signed in to change notification settings - Fork 2
Home
Welcome to the kraken_metaphlan_comparison wiki!
This repository contains information on all analyses carried out for Wright, Comeau and Langille (2023) From defaults to databases: parameter and database choice dramatically impact the performance of metagenomic taxonomic classification tools
Please note that instead of using the Dropbox link for the largest Kraken database (that we recommend), this can now be downloaded from AWS instead. See details below.
Some of this will probably change over time with updates to the databases etc, but the version that was current at the time the paper was accepted for publication is the 2022-09 (September 19th 2022) archive.
Please get in touch with Robyn Wright with any questions.
There are R markdown documents describing all of the analyses run in r_markdown. Following the numbered .Rmd documents in order will run through all database building, classification of samples, processing and analysis of classifications and figure generation. There are all also available as HTML documents here (in the R_code_markdown folder). You can also find scripts and instructions for comparing a new database with our best-performing database (NCBI RefSeq Complete V205).
All databases that we used and created as well as samples used and intermediate files created during the analysis can be downloaded through AWS (see below) or OSF. Please follow the README files within these folders for more details on these files. (Note that the files are also on Dropbox, but Dropbox keeps temporarily blocking the links for exceeding their bandwidth limits).
We have created an Amazon Machine Image (AMI) (ID: ami-04ae7dc734c4934ec). We have a tutorial for using this with the largest database (i.e., the database that we found to perform the best) and running this with your own samples. Note that there will be a cost associated with this (for the computing time used through AWS - payable directly to them, and not to us), but we have provided a rough guide as to how much this is expected to be.
We were having some issues with our "unlimited" Dropbox account in that it was running out of bandwidth for people to download the largest Kraken2 database. We've now got this database hosted on AWS, and it can be downloaded using AWS CLI (this can be installed by following the instructions here) as follows:
aws s3 cp --recursive --no-sign-request s3://kraken2-ncbi-refseq-complete-v205/Kraken2_RefSeqCompleteV205 .
Details on downloading the other databases are here.