Skip to content

Latest commit

 

History

History
86 lines (71 loc) · 3.71 KB

cli.md

File metadata and controls

86 lines (71 loc) · 3.71 KB

Command-line interface of TRGT

Commands:

  • genotype
  • plot
  • merge
  • validate

Options:

  • -v, --verbose Specify multiple times to increase verbosity level (e.g., -vv for more verbosity)
  • -h, --help Print help
  • -V, --version Print version

Genotype command-line

Options:

  • -g, --genome <GENOME> Path to the FASTA file containing reference genome. This must be the same reference genomes as the one used for read alignment.
  • -r, --reads <READS> BAM file with alignments of HiFi reads.
  • -b, --repeats <REPEATS> BED file with reference coordinates and structure of tandem repeats.
  • -o, --output-prefix <OUTPUT_PREFIX> Prefix for output files. TRGT generates an unsorted VCF file (<OUTPUT_PREFIX>.vcf.gz) and unsorted BAM file with pieces of HiFi reads overlapping the repeats (<OUTPUT_PREFIX>.spanning.bam).
  • -k, --karyotype <KARYOTYPE> Sample karyotype (XX or XY) [default: XX].
  • -t, --threads <THREADS> Number of threads [default: 1]. --preset <PRESET> Parameter preset (wgs or targeted)

Advanced:

  • --sample-name <SAMPLE_NAME> Sample name. If it is not provided, the sample name is extracted from the input BAM or file stem.
  • --genotyper <GENOTYPER> Genotyping algorithm (size or cluster), [default: size].
  • --flank-len <FLANK_LEN> Minimum length of the flanking sequence [default: 250].
  • --output-flank-len <FLANK_LEN> Length of flanking sequence to report on output [default: 50].
  • --fixed-flanks Keep flank length fixed.
  • --disable-bam-output Disable BAM output.
  • --max-depth <MAX_DEPTH> Maximum locus depth [default: 250].

Plot command-line

Options:

  • -g, --genome <GENOME> Path to the FASTA file containing reference genome.
  • -b, --repeats <REPEATS> BED file with repeat coordinates and structure.
  • -r, --spanning-reads <SPANNING_READS> BAM file with spanning reads generated by TRGT.
  • -v, --vcf <VCF> VCF file generated by TRGT.
  • -t, --repeat-id <REPEAT_ID> ID of the repeat to visualize.
  • -o, --image <OUTPUT_PATH> Output image path with ends with extension .pdf, .svg, or .png.

Plotting:

  • --plot-type <plot type> Two types of plots can be generated: allele plots and waterfall plots. Allele plots show alignments of reads to each repeat allele. Waterfall plots display unaligned repeat sequences without aligning them to the (consensus) allele. Waterfall plots are especially useful for QC of repeat calls and for visualization of mosaic expansions [default: allele].
  • --show <show> either motifs (motifs) or methylation. (meth) is visualized, [default: motifs].

Advanced:

  • --flank-len <FLANK_LEN> Length of flanking regions [default: 50].
  • --max-allele-reads <MAX_READS> Max number of reads per allele to plot

Merge command-line

Options:

  • -v, --vcf <VCF> VCF files to merge.
  • -g, --genome <FASTA> Path to the FASTA file containing reference genome.
  • -o, --output <FILE> Write output to a file [standard output].

Advanced:

  • -O, --output-type <OUTPUT_TYPE> Output type: u|b|v|z, u/b: un/compressed BCF, v/z: un/compressed VCF.
  • --skip-n <SKIP_N> Skip the first N records.
  • --process-n <PROCESS_N> Only process N records.
  • --print-header Print only the merged header and exit.
  • --force-single Run even if there is only one file on input.
  • --no-version Do not append version and command line to the header.
  • --quit-on-errors Quit immediately on errors during merging.
  • --contig <CONTIG> Process only the specified contigs (comma-separated list).

Validate command-line

Options:

  • -g, --genome <FASTA> Path to the FASTA file containing reference genome.
  • -b, --repeats <REPEATS> BED file with repeat coordinates and structure.

Advanced:

  • --flank-len <FLANK_LEN> Length of flanking regions [default: 50].