-
Notifications
You must be signed in to change notification settings - Fork 4
script for fixing assembly scaffolding errors
License
christophertbrown/fix_assembly_errors
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This repository contains scripts for microbial genome sequence curation. The key scripts are re_assemble_errors.py, ra2.py, and mapped.py. The scripts require python3 and the following programs: velvet Bowtie2 shrinksam (https://github.com/bcthomas/shrinksam) The basic method for identifying and resolving assembly errors is described in: C. T. Brown, L. A. Hug, B. C. Thomas, I. Sharon, and C. J. Castelle, “Unusual biology across a group comprising more than 15% of domain Bacteria,” Nature, 2015. The general strategy is to identify assembly errors as regions with zero coverage by stringently mapped reads, and then attempt to corrected them by re-assembling the region of the genome with the error using velvet. Reads (and their pairs) that mapped to the region containing the error are used in the re-assembly. The re-assembled fragments are then checked for errors and incorporated into the scaffold if possible. See docs/ra2_assembly_curation.pdf for more information. The original version of the software (re_assemble_errors.py) would break scaffolds at errors if they could not be corrected. The updated version (ra2.py) will only break scaffolds at errors when 1) the error could not be corrected and 2) the region is not spanned by paired read sequences. ra2.py output will be in <genome>.curated directory: * re_assembled.fa: curated genome * re_assembled.report.txt: report (e = extended, n = error that was not fixed, f = error that was fixed, b = scaffold broken at error, removed = scaffold removed due to having no coverage) See genome_curation_using_ra2.pdf for more information and example usage in the context of genome curation. mapped.py is a script for filtering SAM files based on the number of mismatches in the read mapping. Note: These scripts have been tested on Mac and Linux systems, but may not work with Windows. Chris Brown [email protected]
About
script for fixing assembly scaffolding errors
Resources
License
Stars
Watchers
Forks
Packages 0
No packages published