Skip to content

Latest commit



230 lines (176 loc) · 7.79 KB

File metadata and controls

230 lines (176 loc) · 7.79 KB


A gap-closing software tool that uses error-prone long reads generated by third-generation-sequence techniques (Pacbio, Oxford Nanopore, etc.) or preassembled contigs to fill N-gap in the genome assembly.

  • Both raw reads and pre-error-corrected reads are acceptable as input.

  • If only raw long reads are provided, it polishes raw TGS reads by calling Racon.

  • If additional NGS short reads are available, it polishes raw TGS reads by calling Pilon.

Note: only fasta format of TGS reads is acceptable.

Citing TGS-GapCloser

If you use TGS-GapCloser in your work, please cite: TGS-GapCloser: A fast and accurate gap closer for large genomes with low coverage of error-prone long reads Mengyang Xu, Lidong Guo, Shengqiang Gu, Ou Wang, Rui Zhang, Brock A Peters, Guangyi Fan, Xin Liu, Xun Xu, Li Deng, Yongwei Zhang GigaScience, Volume 9, Issue 9, 1 September 2020, giaa094,


  1. gcc 4.4+
  2. make 3.8+
  3. minimap2





configure minimap2

  • If you have already installed a minimap2, then link it here
rm -rf YOUR-INSTALL-DIR/minimap2
  • Otherwise, you need to download it by:
git submodule init
git submodule update

compile main src


Conda install

not available

if you install by conda, please install minimap2 first and make sure that minimap2 is available in your environment.


      tgsgapcloser2 --scaff SCAFF_FILE --reads TGS_READS_FILE --output OUT_PREFIX [options...]
          --scaff     <draft scaffolds>    input draft scaffolds.
          --reads     <TGS reads>          input TGS reads.
          --output    <output prefix>      output prefix.
     ## error correction module
          --ne                             do not correct errors. by default.
          --racon     <racon>              installed racon path. Can be installed following
          --pilon     <pilon>              pilon jar package. Can be downloaded from
          --java      <java>               installed java path.
          --ngs       <ngs_reads>          input NGS reads used for pilon.
          --samtools  <samtools>           installed samtools path.

          --minmap_arg <minmap2 args>      for example, --minmap_arg \' -x ava-ont\'
                                           arg must be wrapped by \' \'
          --tgstype   <pb/ont/hifi>        TGS type. ont by default.
          --min_idy   <float>              minimum identity for filtering candidate TGS sequences.
                                           0.3 for ont by default.
                                           0.2 for pb/hifi by default.
          --min_match <int>                minimum matched length for filtering candidate TGS sequences.
                                           300 for ont by default.
                                           200 for pb/hifi by default.
          --thread    <int>                number of threads uesd. 16 by default.
          --pilon_mem <int>                memory used for pilon, passing to -Xmx.  can use “m” or “M” for MB, or “g” or “G” for GB. 300G by default.
          --chunk     <int>                split candidates into # of chunks to separately correct errors. 3 by default.
          --p_round   <int>                iteration # of error corretion by pilon. 3 by default.
          --r_round   <int>                iteration # of error corretion by racon. 3 by default.
          --g_check                        gap size diff check. off by default.
          --min_nread <int>                minimum number of reads that can bridge this gap. 1 by default.
          --max_nread <int>                maximum number of reads that can bridge this gap. -1 by default.
          --max_candidate <int>            maximum number of candidate alignments used for error correction and gap filling. 200 by default.

WARNING: only fasta format TGS reads is supported and fastq format will lead to program crashing !


an example of pre-corrected TGS reads without error correction

YOUR-INSTALL-DIR/tgsgapcloser  \
        --scaff  scaffold-path/scaffold.fasta \
        --reads  tgs-reads-path/tgs.reads.fasta \
        --output test_ne  \
        --ne \
        >pipe.log 2>pipe.err

An example of raw ONT reads with error correction using long reads only

YOUR-INSTALL-DIR/tgsgapcloser  \
        --scaff  scaffold-path/scaffold.fasta \
        --reads  tgs-reads-path/tgs.reads.fasta \
        --output test_racon \
        --racon  racon-path/bin/racon \
        >pipe.log 2>pipe.err

An example of raw ONT reads with error correction using NGS reads

YOUR-INSTALL-DIR/tgsgapcloser  \
        --scaff  scaffold-path/scaffold.fasta \
        --reads  tgs-reads-path/tgs.reads.fasta \
        --output test_pilon \
        --pilon  pilon-path/pilon-1.23.jar  \
        --ngs    ngs-reads-path/ngs.reads.fastq.gz  \
        --samtools samtools-path/bin/samtools  \
        --java    java-path/bin/java \
        >pipe.log 2>pipe.err

Using Pacbio reads

  • default TGS type is ONT, use --tgstype to change it .
    --tgstype ont
    --tgstype pb
    --tgstype hifi
  • an example of raw Pacbio reads with error correction using long reads only
YOUR-INSTALL-DIR/tgsgapcloser  \
        --scaff  scaffold-path/scaffold.fasta \
        --reads  tgs-reads-path/tgs.reads.fasta \
        --output test_racon \
        --racon  raconn-path/bin/racon \
        --tgstype pb \
        >pipe.log 2>pipe.err

When you need to appoint custom minimap2 parameters

Use --minmap_arg ' your-own minimap2 args'

This is useful when you want to avoid a huge paf file.

for example, if you use HiFi Reads, you may try --minmap_arg '-x asm20'

YOUR-INSTALL-DIR/tgsgapcloser  \
        --scaff  scaffold-path/scaffold.fasta \
        --reads  tgs-reads-path/tgs.reads.fasta \
        --output test_racon \
        --minmap_arg '-x asm20' \
        --racon  raconn-path/bin/racon \
        --tgstype pb \
        >pipe.log 2>pipe.err


  • your-prefix.scaff_seq
    • this is the final assembly after gap filling
  • your-prefix.gap_fill_details
    • details about how the final assembly was assembled

format of your-prefix.gap_fill_details

an example:

1	1000	S	1000	2000
1001	1010	N
1011	1100	S	2201	2290
1101	1110	F
1111	1200	S	2301	2390

detailed information

  1. each scaffold name is followed by its data lines.
  2. a data line consists of 3 or 5 columns and describes the source of each segment in the final sequence:
    • column 1 is the segment's first bp position in the final sequence.
    • column 2 is the segment's last bp position in the final sequence.
    • column 3 is the segment's type, 'S', 'N', or 'F'.
      • 'S' means this segment is a segment of the input sequence and this line includes other two columns:
        • column 4 is the segment's first bp position in the input sequence.
        • column 5 is the segment's last bp position in the input sequence.
      • 'N' means this segment is an N area.
      • 'F' means this segment is a filled sequence from TGS reads.


Any questions, please feel free to ask [email protected] or [email protected]

Star History

Star History Chart