HiFi assembly tools

Scripts associated with genome assembly for kakī (black stilt; Himantopus novaezelandiae). Input data is PacBio HiFi, Hi-C, and Illumina HiSeq.

01-pacbio-processing includes scripts that did not end up being used, but that could be used to convert raw PacBio data to HiFi, and other processes.
02-assembly contains scripts used to produce draft assemblies from HiFi data. These use the hifiasm, HiCanu, MaSuRCA, and Flye assemblers. We had most success with hifiasm and MaSuRCA.
03-purge-dups implements the purge_dups pipeline to remove haplotig and contig overlaps from the draft assembly prior to scaffolding. This pipeline is based on that implemented by Sarah Bailey (UoA) with modification.
04-scaffolding uses the ARIMA mapping pipeline to prepare Hi-C data for use in scaffolding the purged assembly. The scaffolding is then implemented using YAHS.
05-polishing provides the option to polish the scaffolded assembly. This may not be necessary due to the use of high-quality PacBio HiFi data, but is yet to be determined. Meanwhile, manual curation of the draft assembly is in progress.
QC provides scripts for assessing the quality of the draft assemblies, including through the use of short-read alignment, whole genome comparisons, and the Merqury pipeline.

Name		Name	Last commit message	Last commit date
Latest commit History 179 Commits
01-preprocessing		01-preprocessing
02-assembly		02-assembly
03-purge-dups		03-purge-dups
04-fill-polish		04-fill-polish
05-scaffolding		05-scaffolding
06-mitohifi		06-mitohifi
07-blobtools		07-blobtools
QC		QC
annotation		annotation
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback