Skip to content
This repository has been archived by the owner on May 13, 2020. It is now read-only.

Commit

Permalink
Merge pull request #1 from gatk-workflows/wdlupdate
Browse files Browse the repository at this point in the history
Wdlupdate
Added workflow and Json
  • Loading branch information
bshifaw authored Oct 4, 2018
2 parents 6ab4198 + 9315743 commit 3a3df7d
Show file tree
Hide file tree
Showing 8 changed files with 1,096 additions and 1 deletion.
69 changes: 68 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1,68 @@
# gatk4-CNNVariant
# gatk4-cnn-variant-filter

### Purpose :
This repo provides workflows that takes advangate of GATKs CNN tool which a deep learning
approach to filter variants based on Convolutional Neural Networks.

Please read the following discussion to learn more about the CNN tool: [Deep Learning in GATK4](https://gatkforums.broadinstitute.org/gatk/discussion/10996/deep-learning-in-gatk4).

### cram2filtered.wdl
This workflow takes an input CRAM/BAM to call variants with HaplotypeCaller
then filters the calls with the CNNVariant neural net tool using the filtering model specified.

The site-level scores are added to the `INFO` field of the VCF. The architecture arguments,
`info_key` and `tensor_type` arguments MUST be in agreement (e.g. 2D models must have
`tensor_type` of `read_tensor` and `info_key` of `CNN_2D`, 1D models have `tensor_type` of
`reference` and `info_key` of `CNN_1D`). The `INFO` field key will be `1D_CNN` or `2D_CNN`
depending on the neural net architecture used for inference. The architecture arguments
specify pre-trained networks. New networks can be trained by the GATK tools: CNNVariantWriteTensors
and CNNVariantTrain. The CRAM could be generated by the [single-sample pipeline](https://github.com/gatk-workflows/gatk4-data-processing/blob/master/processing-for-variant-discovery-gatk4.wdl).
If you would like test the workflow on a more representative example file, use the following
CRAM file as input and change the scatter count from 4 to 200: gs://gatk-best-practices/cnn-h38/NA12878_NA12878_IntraRun_1_SM-G947Y_v1.cram.

#### Requirements/expectations :
- CRAM/BAM
- BAM Index (if input is BAM)

#### Output :
- Filtered VCF and its index.

### cram2model.wdl
This optional workflow is for advanced users who would like to train a CNN model for filtering variants.

#### Requirements/expectations :
- CRAM
- Truth VCF and its index
- Truth Confidence Interval Bed

#### Output :
- Model HD5
- Model JSON
- Model Plots PNG

### run_happy.wdl
This optional evaluation and plotting workflow runs a filtering model against truth data (e.g. [NIST Genomes in a Bottle](https://github.com/genome-in-a-bottle/giab_latest_release), [Synthic Diploid Truth Set](https://github.com/lh3/CHM-eval/releases) ) and plots the accuracy.

#### Requirements/expectations :
- File of VCF Files
- Truth VCF and its index
- Truth Confidence Interval Bed

#### Output :
- Evaluation summary
- Plots

### Important Note :
- Runtime parameters are optimized for Broad's Google Cloud Platform implementation.
- For help running workflows on the Google Cloud Platform or locally please
view the following tutorial [(How to) Execute Workflows from the gatk-workflows Git Organization](https://software.broadinstitute.org/gatk/documentation/article?id=12521).
- Please post questions to the [GATK forum](https://gatkforums.broadinstitute.org/gatk/categories/ask-the-team).
- Please visit our [User Guide](https://software.broadinstitute.org/gatk/documentation/) site for further documentation on workflows and tools.

### LICENSING :
This script is released under the WDL source code license (BSD-3) (see LICENSE in
https://github.com/broadinstitute/wdl). Note however that the programs it calls may
be subject to different licenses. Users are responsible for checking that they are
authorized to run all programs before running this script. Please see the docker
page at https://hub.docker.com/r/broadinstitute/genomes-in-the-cloud/ for detailed
licensing information pertaining to the included programs.
Loading

0 comments on commit 3a3df7d

Please sign in to comment.