cnnGWAS

This is the repository for cnnGWAS, which is a method that captures complex biological patterns shared by GWAS risk variants. By utilizing convolutional neural networks (CNN), the method prioritizes functional variants among all candidate variants in GWAS loci.

Environment setting

Installing miniconda:

We recommend using miniconda as a package manager when executing cnnGWAS. Miniconda can be installed from the following link: https://docs.conda.io/en/latest/miniconda.html#

Environment setting:

All software tests were done under the following environment setting, and therefore we strongly recommend to set the miniconda environment via following commands:

conda create --name cnnGWAS python=2.7
conda activate cnnGWAS
conda install theano=1.0.3 numpy=1.15.4 matplotlib=2.2.3 pandas=0.24.2 bedtools=2.28.0 bedops=2.4.36 mkl=2019.4

Required R packages (R version 3.5.2):

library(ggplot2)
library(gplots)
library(grid)
library(gridExtra)
library(plyr)
library(reshape2)

Required tools:

awk (GNU Awk 3.1.7)
wget (GNU Wget 1.12 built on linux-gnu)

Tools that will be installed:

impG-summary (http://bogdan.bioinformatics.ucla.edu/software/impg/)

Cloning the repository:

When environment setting is complete, clone or download the repository. Through the following steps, feature DB will be downloaded and training will take place. Input text file (GWAS summary statistics) is required.

Feature DB preparation

Required data can be downloaded by executing the following commands:

cd DB
bash Download_DB.sh 
cd ..

Input file preparation

Input file format:

Input summary statistics file should be located under directory "00_RawDatas/". Example input file "AUTsummary.hg19.txt" is provided.
Input file must be a tab delimited text file with 6 columns.
Each of the columns should have a header row named as follows:

snpid  chr     bp      a1      a2      zscore
rs3131972       chr1    752721  A       G       0.618384984132729
rs3131969       chr1    754182  A       G       1.20540840742781

All coordinates are based on hg19 (GRCh37).
Raw GWAS summary statistics files can include different information; in most cases, z-score can be calculated from the given information.

Input file name:

We recommend simple file prefix, preferably disease name (e.g. SLE), always followed by ".txt" extension.
Underbar "_" must not be included inside a file prefix (e.g. PSY_AUT_dis.txt; not allowed), since it will cause string processing error.
Example file names: "SLE.txt", "MDD2019.txt"

1. Imputation of association p-value using ImpG-summary

Following command executes the imputation process. Multiple summary statistics files can be given as an input (e.g. bash RUN.sh FILE1.txt FILE2.txt FILE3.txt).

cd 01_RunImpG
bash RUN.sh AUTsummary.hg19.txt   
cd ..

2. Make input pickled files for CNN model training

In this stage, the input files (*.pkl.gz) for training are produced.

cd 02_PreProc
bash RUN.sh AUTsummary.hg19.txt 
cd ..

3. Model training

In this stage, the training itself takes place. As an example, a sample input file is provided (02_PreProc/Data/AUTsummary.hg19_AE_30SNP.pkl.gz).

cd 03_RunCNN
bash RUN.sh AUTsummary.hg19.txt 
cd ..

4. Evaluation of the model performance

The performance of the model is saved as a pdf file. Also, the scores of candidate SNPs are calculated and given as a text file. SNPs with causal score > 0.5 are classified as positive, and are considered as candidate causal variants.

cd 04_Performance
bash RUN.sh AUTsummary.hg19.txt
cd ..

Notes

Each folder contains "NOTE" or "RUN_NOTE.txt". Please read the contents before executing the software.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cnnGWAS

Environment setting

Installing miniconda:

Environment setting:

Required R packages (R version 3.5.2):

Required tools:

Tools that will be installed:

Cloning the repository:

Feature DB preparation

Input file preparation

Input file format:

Input file name:

1. Imputation of association p-value using ImpG-summary

2. Make input pickled files for CNN model training

3. Model training

4. Evaluation of the model performance

Notes

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
00_RawDatas		00_RawDatas
01_RunImpG		01_RunImpG
02_PreProc		02_PreProc
03_RunCNN		03_RunCNN
04_Performance		04_Performance
DB		DB
README.md		README.md

kaistomics/cnnGWAS

Folders and files

Latest commit

History

Repository files navigation

cnnGWAS

Environment setting

Installing miniconda:

Environment setting:

Required R packages (R version 3.5.2):

Required tools:

Tools that will be installed:

Cloning the repository:

Feature DB preparation

Input file preparation

Input file format:

Input file name:

1. Imputation of association p-value using ImpG-summary

2. Make input pickled files for CNN model training

3. Model training

4. Evaluation of the model performance

Notes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages