Skip to content

Canu v2.0

Compare
Choose a tag to compare
@brianwalenz brianwalenz released this 18 Mar 08:20
· 3 commits to v2.0-maintenance since this release

These are release notes for Canu version 2.0, which was released on March 18th, 2020. Canu is specialized for assembly of single-molecule high-noise sequences. Full documentation can be found at http://canu.readthedocs.org/.

This release provides a stable, tested, and documented version of the software. The binary distributions should work on any relatively recent version of the respective OS and are the recommended way to install Canu. The source code distribution contains everything you need to create a binary distribution for your own specific OS.

Citation

Minimum Requirements

  • 8GB minimum memory; 16GB strongly suggested
  • GCC 4.5 (for compilation only); GCC 7 or newer strongly recommended
  • Perl 5.12.0, or File::Path 2.08
  • Java SE 8
  • macOS 10.10 Yosemite (for macOS/Darwin binaries only)
  • gnuplot 5.2 (optional, for generating diagnostic graphs)

Installation

Users can download Canu as source code or as pre-compiled binaries. The binary distribution is the recommended install method, assuming it is available for your platform. The source code package needs to be compiled and installed before it can be used.

To install from a binary distribution (recommended installation method):

tar -xJf canu-2.0.*.tar.xz

To install from source code (the file can be named either canu-v2.0.tar.gz or just v2.0.tar.gz, depending on how it is downloaded):

gunzip -dc canu-v2.0.tar.gz | tar -xf -
cd canu-2.0/src
make -j 8
cd ..

In both cases, canu is installed in directory canu-2.0/-, for example, canu-1.9/Linux-amd64. You can run the assembler with:

canu-2.0/*/bin/canu

Changes

This release introduces support for PacBio HiFi assembly and includes several major bug fixes.

Canu v2.0 IS NOT compatible with assemblies started with any previous version.

  • Support for HiFi data using option '-pacbio-hifi'. Full details in the preprint HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads.
  • Numerous improvements to contig construction that make longer more correct contigs:
    ** Detect bubbles during contig construction and prevent them from shattering heterozygous genomes.
    ** Detect and remove short branches branches during contig construction.
    ** Detect reads that are not fully covererd by overlaps and exclude them from contigs.
  • Option 'stopOnReadQuality' is enabled by default, but no longer aborts if there are too many short reads.
  • Option 'minInputCoverage' will stop the assembly if the input read coverage is below this value, default 10. This supplements 'stopOnLowCoverage', which stops if read coverage is below some value after input, after correction or after trimming.
  • Option 'maxInputCoverage', default 200, will randomly down-sample input reads to this coverage. It replaces option 'readSamplingCoverage' ('readSamplingBias' still exists).
  • Write intermediate Mhap outputs to the stageDirectory if it is set.

Bug Fixes

  • Multiple fixes to read positioning during contig construction (Assertion 'cnt > 0' failed.)
  • Possibly fix a weird error reading overlapper output that resulted in out of memory errors (terminate called after throwing an instance of 'std::bad_alloc').
  • A variety of bug fixes that nobody will really care about (unless your assembly crashed, in which case you already know it's fixed) and will be tedious to list, so they aren't listed.

Known Issues

See the issues page for up-to date open issues, or to report a problem.

  • Large memory usage and runtime for long reads (e.g., Nanopore) when using the overlapper=ovl algorithm, and during Overlap Error Adjustment. The -fast option enables a significantly faster algorithm, but may produce slightly less contiguous assemblies on genomes larger than 1 Gbp in size. It is recommended for nanopore genomes smaller than 1 Gbp.
  • No support for trio binning of HiFi data. As a workaround, specify the HiFi data as -pacbio-raw and run only the haplotyping step (-haplotype) followed by assembly of the partitioned reads.

See the FAQ for many suggestions, including suggestions for specific data types, e.g., Nanopore r9 reads.

Legal

Canu is derived from Celera Assembler and includes code from many other projects. Most, but not all, of the code is GPL licensed. See the README.licenses file and individual source code files for details.