Association Analysis - Nextflow

A Nextflow pipeline for performing rare variant association analysis using rvtests.

Introduction

This experimental Nextflow pipeline conducts rare variant association analysis for binary outcome measures using rvtests. The workflow also conducts automated QC filtering and visualisation.

Quick Start

Install Nextflow (>=22.10.1)
Install Docker for full pipeline reproducibility.

Download the pipeline and the test data.

git clone https://github.com/StephenRicher/AssociationAnalysis

Test the pipeline using the example data with a single command:
```
nextflow run main.nf -profile docker
```
- The pipeline includes an example dataset and a pre-configured ./nextflow.config with reasonable default parameters for testing.
- The pipeline includes a config profile called docker, which instruct the pipeline to utilise containers for processes management.

Pipeline Summary

The pipeline comprises four key sub-workflows:

summarise()

The summarise() workflow generates summary statistics for the input data.

Missingness per SNP and individual
Minor Allele Frequency (MAF)
Hardy-Weinberg equilibrium (HWE)

filter()

The filter() workflow conduct typical QC filtering. All filtering statistics, including an aggregated report, are written to the ./results/summary/

Filter SNPs with high missingness
Filter individuals with high missingness
Filter SNPS with low Minor Allele Frequency (MAF)
Filter SNPS not in Hardy-Weinberg equilibrium (HWE)
- Case and control filtered independently.
Filter SNPS with low MAF
Filter individuals with highly deviating heterozygosity
Filter individuals by relatedness (pi‐hat threshold)

plot_stats()

The plot_stats() workflow generates visual summaries of the input data. Example data plots are saved to ./results/summary/plots/.

association()

The association() workflow conducts a CMV burden test and a SKAT test with optional covariates. A setFile is automatically generated from the user-provided GFF3 file. Grouping units are defined by genes (exonic regions only).

The example outputs of both rvtest models are written to ./results/rvtest/.

Configuration

See below and ./conf/defaults.config for default parameters with descriptions. Users should set parameters in ./nextflow.config.

// Default parameters - mandatory set to null

params {
    plink          = null          // Prefix to PLINK format
    binary_format  = null          // Set True if input is PLINK binary
    outdir         = './results/'  // Path to output directory.

    // QC Thresholds
    missing_geno = 0.2   // Exclude genotype with high missing rate
    missing_indi = 0.2   // Exclude individual with higher missing genotyping rate
    maf          = 0.05  // Exclude SNPs below minor allele frequency threshold
    hwe_control  = 1e-6  // Exclude markers deviating from Hardy–Weinberg equilibrium (control sample)
    hwe_case     = 1e-10 // Exclude markers deviating from Hardy–Weinberg equilibrium (case sample)
    related      = 0.2   // Exclude individuals by relatedness
    het_sd       = 3     // Remove individuals who deviate beyond heterozygosity rate mean

    // Genes of Interest
    gff3         = null      // Genes of interest in .gff3 format
    key          = 'gene_id' // Gene attribute key to label sets

    // Covariate
    covar_file   = 'NO_FILE' // Path to covar file
    covar_name   = ''        // List of columns names indiciating covariates (e.g. ['covar1', 'covar2'])

    // Phenotype
    pheno_file   = ''        // Path to pheno file
    pheno_name   = ''        // Column name of phenotype
    zero_one     = false     // Set true if phenotype is encoded as '0' = control and '1' = case
    allow_no_sex = true      // Prevent phenotypes set to missing if amiguous sex.
}

References

Marees, A.T., de Kluiver, H., Stringer, S., Vorspan, F., Curis, E., Marie-Claire, C. and Derks, E.M., 2018. A tutorial on conducting genome-wide association studies: Quality control and statistical analysis. International Journal of Methods in Psychiatric Research [Online], 27(2). Available from: https://doi.org/10.1002/mpr.1608.
Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M.A.R., Bender, D., Maller, J., Sklar, P., De Bakker, P.I.W., Daly, M.J. and Sham, P.C., 2007. PLINK: A tool set for whole-genome association and population-based linkage analyses. American Journal of Human Genetics [Online], 81(3). Available from: https://doi.org/10.1086/519795.
Zhan, X., Hu, Y., Li, B., Abecasis, G.R. and Liu, D.J., 2016. RVTESTS: An efficient and comprehensive tool for rare variant association analysis using sequence data. Bioinformatics [Online], 32(9). Available from: https://doi.org/10.1093/bioinformatics/btw079.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Association Analysis - Nextflow

A Nextflow pipeline for performing rare variant association analysis using rvtests.

Table of contents

Introduction

Quick Start

Pipeline Summary

summarise()

filter()

plot_stats()

association()

Configuration

References

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
conf		conf
data		data
modules		modules
results		results
workflows		workflows
.gitignore		.gitignore
README.md		README.md
main.nf		main.nf
nextflow.config		nextflow.config

StephenRicher/AssociationAnalysis

Folders and files

Latest commit

History

Repository files navigation

Association Analysis - Nextflow

A Nextflow pipeline for performing rare variant association analysis using rvtests.

Table of contents

Introduction

Quick Start

Pipeline Summary

summarise()

filter()

plot_stats()

association()

Configuration

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages