easyQTLseq: A R Package for QTLseq Analysis

EasyQTLseq is a R package for QTL-seq analysis.

Installation

EasyQTLseq can be installed from GitHub via devtools:

# install devtools
install.packages("devtools")
# install easyQTLseq
devtools::install_github("laowang1992/easyQTLseq")

Input Data

Use GATK VariantsToTable

The VCF file contained parent(s) and two bulks was generated from GATK best practices pipeline. Utilizing VariantsToTable function of GATK to extract GT, AD and GQ information for improving reading and processing speed.

java -Xmx30g -jar ${GATK} \
     -R ${genome} -T VariantsToTable \
     -F CHROM -F POS -F REF -F ALT -GF GT -GF AD -GF GQ \
     -V ${BSA}.filter.SNPs.vcf.gz -o ${BSA}.filter.SNPs.table

Use vcf2table

If you only have a vcf file, and you don’t install GATK on your computer. Then you can read the vcf file using vcfR package, and use vcf2table() function in easyQTLseq to convert the vcf object to a table, which is similar the output of GATK’s VariantsToTable. The output of vcf2table can be used for select_sample_and_SNP() directly.

library(vcfR)
library(easyQTLseq)
file_path <- system.file("extdata", "A07.SNPs.vcf.gz", package = "easyQTLseq")
x <- read.vcfR(file = file_path)
data <- vcf2table(x = x)

Usage

Library and import data

A sample file is included with this package. There are four samples in this sample file, R3 is the high phenotype parent sample, qY is the low phenotype parent sample, R is the high phenotype bulk sample, and Y is the low phenotype bulk sample.

library(easyQTLseq)
# Example with sample data from a GATK table.
file_path <- system.file("extdata", "subset.table.gz", package = "easyQTLseq")
# readr::read_tsv() has a faster speed than read.table() when reading a file.
data <- readr::read_tsv(file = file_path)

## Rows: 515669 Columns: 16
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (11): CHROM, REF, ALT, R.GT, R.AD, R3.GT, R3.AD, Y.GT, Y.AD, qY.GT, qY.AD
## dbl  (5): POS, R.GQ, R3.GQ, Y.GQ, qY.GQ
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Select samples and SNPs

Some basic information should be assigned use select_sample_and_SNP(). This function will return a QTLseq S3 object.

x <- select_sample_and_SNP(data = data, highP = "qY", lowP = "R3", highB = "Y", lowB = "R", popType = "F2", bulkSize = c(30, 30))
x

## $data
## # A tibble: 128,751 × 12
##    CHROM          POS REF   ALT   HP.DP LP.DP HB.HP.AD HB.LP.AD HB.DP LB.HP.AD
##    <chr>        <dbl> <chr> <chr> <int> <int>    <int>    <int> <int>    <int>
##  1 scaffoldA01 275959 A     C        17    24        6        6    12       16
##  2 scaffoldA01 320766 G     A        19    11        6        6    12       10
##  3 scaffoldA01 361230 A     T        18     6        6        5    11        8
##  4 scaffoldA01 361486 G     T        26     9       13        5    18        8
##  5 scaffoldA01 361875 A     C        19     5        8        0     8        4
##  6 scaffoldA01 361884 T     G        19     5        8        0     8        6
##  7 scaffoldA01 362128 A     C        19     6        7        5    12        4
##  8 scaffoldA01 362153 T     G        20     5        8        5    13        6
##  9 scaffoldA01 364170 C     T        17    14       12        9    21       14
## 10 scaffoldA01 365454 T     C        18    10       11        5    16       15
## # ℹ 128,741 more rows
## # ℹ 2 more variables: LB.LP.AD <int>, LB.DP <int>
## 
## $highP
## [1] "qY"
## 
## $lowP
## [1] "R3"
## 
## $highB
## [1] "Y"
## 
## $lowB
## [1] "R"
## 
## $popType
## [1] "F2"
## 
## $bulkSize
## [1] 30 30
## 
## $slidwin
## data frame with 0 columns and 0 rows
## 
## $chrLen
## # A tibble: 3 × 2
##   CHROM            Len
##   <chr>          <dbl>
## 1 scaffoldA01 37932493
## 2 scaffoldA07 32276921
## 3 scaffoldA09 65861795
## 
## attr(,"class")
## [1] "QTLseq"     "WithParent" "BothParent"

This function can handle different situation, such as both parents of the segregation population are present, only high parent or low parent is present, and no parent is present, or one of these present has a reference genome used for SNP calling.

# If only one parent is present, e.g. high parent.
x_onlyHP <- select_sample_and_SNP(data = data, highP = "qY", highB = "Y", lowB = "R", popType = "F2", bulkSize = c(30, 30))
# If no parent is present.
x_noParent <- select_sample_and_SNP(data = data, highB = "Y", lowB = "R", popType = "F2", bulkSize = c(30, 30))
# If no parent is present, but high parent has a reference genome, this reference genome is used for SNP calling. Then the `highP` parameter should be "REF".
x_HPisREF <- select_sample_and_SNP(data = data, highP = "REF", highB = "Y", lowB = "R", popType = "F2", bulkSize = c(30, 30))

Depth statistics

After select samples and SNPs, the coverage depth distribution of every samples is calculated, then a density distribution figure is drawn, named <outPrefix>.depth_density.pdf|png

depth_statistics(x = x, outPrefix = "outprefix")

outprefix.depth_density.png

Filter SNP according depth distribution

For low coverage depth SNP may have low reliability and accuracy, and extremely high coverage depth may be derived from repetitive sequence. These SNP should be omited.

# default minimum coverage depth is 6, default maximum coverage depth is `average+3*sd`.
x_filter <- filterDP(x = x)

SNP distribution

SNPs may be unevenly distributed on chromosomes. This step will show the distribution of SNP along the chromosome and generate <outprefix>.SNP_number_per_chr.txt|csv and <outprefix>.SNP_distribution_histogram.pdf|png.

SNP_distribution(x = x_filter, outPrefix = "outprefix", 
                 targetChr = c("scaffoldA01", "scaffoldA07", "scaffoldA09"), 
                 chrLabel = c("A01", "A07", "A09"))

outprefix.SNP_distribution_histogram.png

Export depth coverage information

If you want analyze QTL-seq using other method or software, you can export allele depths information using export_dp(). This information will be export to <outprefix>.Depth_information.txt|csv in work directory.

export_dp(x = x_filter, outPrefix = "outprefix")

Calculate using sliding window

To reduce noise in QTL-seq analsis, a sliding window method is adopted to calculate SNP index, delta SNP index, and Euclidean distance (ED).

If parent is present in the data or parent has a reference genome which is used for SNP calling, both delta SNP index and DE are calculated, if no parent is present in the data, only ED is calculated.

x_filter <- calc_index_etc(x = x_filter, outPrefix = "outprefix", winSize = 2000000, winStep = 200000)

Export figures

After calculating delta SNP index and DE, the result can be show along the chromosome. This function will export the figures.

export_figure(x = x_filter, 
              outPrefix = "outprefix", 
              targetChr = c("scaffoldA01", "scaffoldA07", "scaffoldA09"), # Target chromosome to be drawn in figures, default is all chromosomes in the data.
              chrLabel = c("A01", "A07", "A09"),                          # The label for chromosome shown in figures, default is chromosome names in the data.
              minN = 20,                                                  # Too few SNPs in a window will result in noise, the windows containing SNPs less than minN will be omitted in figures.
              width = 6, height = 3)

outprefix.delta_SNP_index.99CI.line.png

outprefix.ED4.line.png

Get significant QTL region

If parent is present in the data or parent has a reference genome which is used for SNP calling, delta SNP index confidence intervals for different read depths under the null hypothesis (no QTL) as obtained by simulation test (10,000 replications for each read depth). The chromosome regions exceed 95% or 99% confidence intervals are considered as significant QTL region.

getQTL_and_exportFigure(x = x_filter, outPrefix = "outprefix", minN = 20)

outprefix.scaffoldA07.99CI.png

outprefix.scaffoldA07.ED4.png

Citation

If this package is used in your research, you should cite this package in the method section, like:

The QTL-seq analysis was performed using R package easyQTLseq (https://github.com/laowang1992/easyQTLseq.git).

This paper is also recommended to cite:

Takagi H, Abe A, Yoshida K, et al. QTL-seq: rapid mapping of quantitative trait loci in rice by whole genome resequencing of DNA from two bulked populations. Plant J. 2013;74(1):174-183. doi:10.1111/tpj.12105

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
R		R
docs		docs
inst/extdata		inst/extdata
man		man
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
README.Rmd		README.Rmd
README.md		README.md
easyQTLseq.Rproj		easyQTLseq.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

easyQTLseq: A R Package for QTLseq Analysis

Installation

Input Data

Use GATK VariantsToTable

Use vcf2table

Usage

Library and import data

Select samples and SNPs

Depth statistics

Filter SNP according depth distribution

SNP distribution

Export depth coverage information

Calculate using sliding window

Export figures

Get significant QTL region

Citation

About

Licenses found

Releases 1

Packages

Languages

License

Licenses found

laowang1992/easyQTLseq

Folders and files

Latest commit

History

Repository files navigation

easyQTLseq: A R Package for QTLseq Analysis

Installation

Input Data

Use GATK VariantsToTable

Use vcf2table

Usage

Library and import data

Select samples and SNPs

Depth statistics

Filter SNP according depth distribution

SNP distribution

Export depth coverage information

Calculate using sliding window

Export figures

Get significant QTL region

Citation

About

Topics

Resources

License

Licenses found

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages