Please note that this script is a work in progress and should not presently be used Scripts for testing several mapping tools for metagenomic assemblies
The intent of the two scripts 'ContigDepthAnalysis.smk' and 'pyContigDepthAnalysis.py' is the same:
- Generate simulated reads for reference genomes (Fasta format) at random depths
- Merge all the reads and assemble contigs
- Use multiple mapping tools (BWA mem, Minimap2, Kallisto) to map the merged reads to the contigs
- Determine the mapped depth of coverage
- Assemble metagenomic bins which should represent the original genomes
- Confirm MAG origin reference genome, and map reads to MAG to compare simulated read depth with aligned read depth
This tool strings together a number of unix tools for read simulation, metagenomic assembly, mapping to reference, metagenomic binning, and data assessment. These include:
- wgsim
- samtools
- megahit
- bwa
- minimap2
- kallisto
- metabat2
Python 3 modules required include:
- Biopython
- pandas
- numpy
The snakemake pipeline is a work in progress and may need to be split into two separate pipelines to allow for the merging of simulated reads and generation of an unknown quantity of metagenomic bins. The benefit of the snakemake pipeline is obviously the capacity to run multiple alignments simultaniously up to the capacity of your computer/HPC
The basic python script takens in arguments and runs through the above outlined steps in serial