Skip to content

ALLMAPS: How to split chimeric contigs

Haibao Tang edited this page Sep 6, 2016 · 2 revisions

Occasionally, there are chimeric contigs that map to multiple linkage groups or chromosomes. Such contigs are problematic and should be split prior to running ALLMAPS. There is no standard way of splitting chimeric contigs. Several questions exist:

  • where do we make the splits?
  • which map do we trust for making the splits?

Let's assume that we have a map.bed that we rely on to make the splits. First we want to identify the breakpoints, using the split command in ALLMAPS. Initially, we have the raw breakpoints coordinates in breakpoints.bed. The parameter --chunk=4 tells ALLMAPS to cut contigs where there are significant matches (at least 4 markers) to multiple linkage groups.

$ python -m jcvi.assembly.allmaps split map.bed --chunk=4 > breakpoints.bed

These coordinates are very rough, of course. We could do better by intersecting with the gaps in the assembly, where most misjoins occur. The first command computes where the gaps are, the second command refines the breakpoint by finding the largest gap in the breakpoint region.

$ python -m jcvi.formats.fasta gaps genome.fasta
$ python -m jcvi.assembly.patch refine breakpoints.bed genome.gaps.bed

Once this is done, we create a new AGP file that specifies how to split the breakpoints exactly. Final command to rebuild the genome into genome.SPLIT.fasta.

$ python -m jcvi.formats.sizes agp genome.fasta
$ python -m jcvi.formats.agp mask genome.fasta.agp breakpoints.genome.refined.bed --splitobject --splitsingle
$ python -m jcvi.formats.agp build genome.fasta.masked.agp genome.fasta genome.SPLIT.fasta

Once we have obtained the new genome with chimeric contigs split (new contig identifiers will be generated, e.g. scaffold_1.1, scaffold_1.2).

Now re-run your BLAST/marker mapping, so your new BED file contain the new identifiers after the split, and proceed with ALLMAPS.