Scripts associated with genome assembly for kakī (black stilt; Himantopus novaezelandiae). Input data is PacBio HiFi, Hi-C, and Illumina HiSeq.
-
01-pacbio-processing includes scripts that did not end up being used, but that could be used to convert raw PacBio data to HiFi, and other processes.
-
02-assembly contains scripts used to produce draft assemblies from HiFi data. These use the hifiasm, HiCanu, MaSuRCA, and Flye assemblers. We had most success with hifiasm and MaSuRCA.
-
03-purge-dups implements the purge_dups pipeline to remove haplotig and contig overlaps from the draft assembly prior to scaffolding. This pipeline is based on that implemented by Sarah Bailey (UoA) with modification.
-
04-scaffolding uses the ARIMA mapping pipeline to prepare Hi-C data for use in scaffolding the purged assembly. The scaffolding is then implemented using YAHS.
-
05-polishing provides the option to polish the scaffolded assembly. This may not be necessary due to the use of high-quality PacBio HiFi data, but is yet to be determined. Meanwhile, manual curation of the draft assembly is in progress.
-
QC provides scripts for assessing the quality of the draft assemblies, including through the use of short-read alignment, whole genome comparisons, and the Merqury pipeline.