Skip to content

This repository contains the scripts used to analyze and visualize the results of metagenomic classifiers, described by our paper.

License

Notifications You must be signed in to change notification settings

TakacsBertalan/NABAS_paper_scripts

Repository files navigation

DOI CC BY-NC 4.0

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

CC BY-NC 4.0

Code for benchmarking our novel metagenomic classifier, NABAS+

This repository contains code for the project "Accurate and highly efficient alignment-based classification of metagenomic sequences with NABAS+". This is the source code used for statistical analysis, and data visualization.

How to use NABAS+

Setting up NABAS+

Prerequisites: Java, samtools, bwa

git clone https://github.com/TakacsBertalan/NABAS_paper_scripts.git

cd NABAS_paper_scripts/NABASStandAlone

mkdir test/Taxonomy

Download the contents of the following Zenodo repository into the test/Taxonomy folder: https://zenodo.org/records/14016582

Test run

java -jar dist/NABAS.jar -d test/ -o test/ -r1 test/small_test_sample_in_silico_casava_S0_L001_R1_001.fastq.gz -r2 test/small_test_sample_in_silico_casava_S0_L001_R2_001.fastq.gz

Running the above command produces the small_test_sample_in_silico_casava_S0_L001_R1_001.ShotgunResult.xlsx found in the NABASStandalone/test folder

General use

java -jar dist/NABAS.jar [-d ] [-h] [-maxMismatch ] [-measureBin ] [-minNotNullBin ] [-o ] [-r1 ] [-r1Mask ] [-r2 ] [-r2Mask ] [-t ] [-taxonomy ]

For detailed instructions, see the README.md in the NABASStandalone folder

Scripts in the repository

This repository contains a stand-alone, exacutable version of NABAS+, see under NABASStandAlone. NABASStandAlone/test contains test input and output files for testing the software.

For de-interleaving CAMI samples and adding a CASAVA-style header, see the FixFastqHeaders project.

Reference database was generated using the NABASCreateDatabase project.

Scripts for comparing the classifier outputs can be found in the NABASCompare folder.

Data Availability

(a) The human gastrooral dataset retrieved from the 2nd CAMI Challenge can be downloaded from the following link: https://frl.publisso.de/data/frl:6425518.

(b) The newly generated sample19 is available on Zenodo, along with the list of reference genomes. https://zenodo.org/uploads/13828312

(c) Illumina sequencing results of the Zymo microbial standards are accessible at the European Nucleotide Archive via the ERR2984773 and ERR2935805 accession IDs.

Contact information

If you have any questions, please do not hesitate to contact Bertalan Takács at [email protected].

About

This repository contains the scripts used to analyze and visualize the results of metagenomic classifiers, described by our paper.

Resources

License

Stars

Watchers

Forks

Packages

No packages published