Skip to content

Release v3.0.0: SNP calling with GATK 4.1 includes Slurm compatibility

Latest
Compare
Choose a tag to compare
@ChaochihL ChaochihL released this 02 Jun 19:27
· 15 commits to master since this release

This release includes the following changes.

Slurm workload manager is supported for all handlers.

GATK v4.1.2 on the Slurm queueing system is supported for the following handlers:

  • Haplotype_Caller
  • Added Genomic_DB_Import handler (this combines GVCF files prior to running Genotype_GVCFs handler)
  • Genotype_GVCF
  • Create_HC_Subset (preparation steps for GATK Variant Recalibrator)
  • Variant_Recalibrator

GATK v4.1.2 on non-PBS queueing systems is supported for the following handlers:

  • Haplotype_Caller
  • Genotype_GVCF
  • Variant_Filtering

Additional changes:

  • VCF annotation visualization to assist filtering has also been added.
  • Jupyter Notebook template for exploring VCF files prior to variant recalibration/filtering steps is now available in the HelperScripts directory
  • Realigner_Target_Creator and Indel_Realigner handlers have been separated from the main pipeline because the functionality is only available in GATK 3 or earlier and we still need indel realignment for other downstream tools. Please fill out Config_Indel_Realign for indel realignment steps.
  • Main Config file has been updated accordingly with updates to handlers. A few new variables have been added.
  • Haplotype_Caller, Genomics_DB_Import, and Genotype_GVCFs now handle parallelizing across regions using job arrays.
  • This version allows you to re-run specific job array numbers with an optional -t custom_array_indices argument from the command line (instead of having to re-create your sample list for failed/aborted jobs). So you can now run it like this:
./sequence_handling SAM_Processing /path/to/config -t 1-5,10,12

Without the -t flag, by default runs all samples in your list. So you can still run sequence_handling like this: ./sequence_handling SAM_Processing /path/to/config
This will work for any handler that utilizes job arrays.

  • Create_HC_Subset can now handle very large VCF files (>1TB vcf files) in a reasonable manner
  • Variant_Recalibrator now has additional features:
    • Can specify recalibration "mode" to recalibrate both indels and snps, indels only, or snps only
    • Allows specification of a custom set of annotations in the config file
    • Allows specification of additional options/flags to include
    • Allows more control over setting resource datasets as known, training, or truth sets
    • Automatically indexes raw vcf file and resource files if they are not already indexed