Skip to content

Commit

Permalink
Update repeat expansion database to provide separate files for hg19 a…
Browse files Browse the repository at this point in the history
…nd grch37 and add new loci

Former-commit-id: 4c16f47
  • Loading branch information
mfbennett committed Jun 30, 2019
1 parent 308fd90 commit e4be33d
Show file tree
Hide file tree
Showing 16 changed files with 78 additions and 39 deletions.
4 changes: 2 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
Package: exSTRa
Type: Package
Title: Expanded STR algorithm: detecting expansions in Illumina sequencing data
Version: 0.88.6
Date: 2019-01-21
Version: 0.89.0
Date: 2019-06-26
Author: Rick Tankard
Maintainer: Rick Tankard <[email protected]>
Description: Detecting expansions with paired-end Illumina sequencing data.
Expand Down
2 changes: 1 addition & 1 deletion R/read_exstra_db.R
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
#' @seealso \code{\link{read_score}}
#'
#' @examples
#' read_exstra_db(system.file("extdata", "repeat_expansion_disorders.txt", package = "exSTRa"))
#' read_exstra_db(system.file("extdata", "repeat_expansion_disorders_hg19.txt", package = "exSTRa"))
#'
#' @export
#' @include read_exstra_db_xlsx.R
Expand Down
6 changes: 3 additions & 3 deletions R/read_score.R
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
#' @examples
#' str_score <- read_score (
#' file = system.file("extdata", "HiSeqXTen_WGS_PCR_2.txt", package = "exSTRa"),
#' database = system.file("extdata", "repeat_expansion_disorders.txt", package = "exSTRa"),
#' database = system.file("extdata", "repeat_expansion_disorders_hg19.txt", package = "exSTRa"),
#' groups.regex = c(control = "^WGSrpt_0[24]$", case = ""),
#' filter.low.counts = TRUE
#' )
Expand All @@ -40,7 +40,7 @@
#' # Defining cases by sample name directly:
#' str_score_HD_cases <- read_score (
#' file = system.file("extdata", "HiSeqXTen_WGS_PCR_2.txt", package = "exSTRa"),
#' database = system.file("extdata", "repeat_expansion_disorders.txt", package = "exSTRa"),
#' database = system.file("extdata", "repeat_expansion_disorders_hg19.txt", package = "exSTRa"),
#' groups.samples = list(case = c("WGSrpt_10", "WGSrpt_12")),
#' filter.low.counts = TRUE
#' )
Expand All @@ -51,7 +51,7 @@
#'
#' # for greater control, use object from read_exstra_db() instead
#' str_db <- read_exstra_db(
#' system.file("extdata", "repeat_expansion_disorders.txt", package = "exSTRa")
#' system.file("extdata", "repeat_expansion_disorders_hg19.txt", package = "exSTRa")
#' )
#' str_score <- read_score (
#' file = system.file("extdata", "HiSeqXTen_WGS_PCR_2.txt", package = "exSTRa"),
Expand Down
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,9 @@ At present, the pipeline requires:
- Sorting
- PCR duplicate marking (recommended)

A database of repeats is required, with known disorder loci included.
An example script to generate a database of all STRs genome wide, or those in genes that are expressed in the brain, is provide in `inst/tools/prepare_exSTRa_input_db.R`.
These input database files can also be [downloaded from FigShare](https://figshare.com/s/0bf679a187d5f3cc2b2c).
A database of repeats is required, with files for the known disorder loci included for hg19 or GRCh37 in the `inst/extdata` directory.
A database of all STRs genome wide in available to [download from FigShare](https://figshare.com/s/bb1e6358781bb3ca12c2).
An example script to generate this database of all STRs genome wide, or those in genes that are expressed in the brain, is provide in `inst/tools/prepare_exSTRa_input_db.R`.

Use the Perl scripts and modules from https://github.com/bahlolab/Bio-STR-exSTRa to analyse reads in BAM files. This generates STR counts.
In the future this functionality may be included within the R exSTRa package.
Expand Down
2 changes: 1 addition & 1 deletion data-raw/exstra_known.R
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
# Read in the known repeat expansion disorder loci dataset
exstra_known <- read_exstra_db(system.file("extdata", "repeat_expansion_disorders.txt", package = "exSTRa"))
exstra_known <- read_exstra_db(system.file("extdata", "repeat_expansion_disorders_hg19.txt", package = "exSTRa"))
2 changes: 1 addition & 1 deletion data-raw/exstra_wgs_pcr_2.R
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# The WGS_PCR_2 data data read in
exstra_wgs_pcr_2 <- read_score (
system.file("extdata", "HiSeqXTen_WGS_PCR_2.txt", package = "exSTRa"), # doesn't work before first install
database = system.file("extdata", "repeat_expansion_disorders.txt", package = "exSTRa"),
database = system.file("extdata", "repeat_expansion_disorders_hg19.txt", package = "exSTRa"),
groups.regex = c(control = "^WGSrpt_0[24]$", case = ""), # here, matches on successive patterns override previous matches # (TODO: maybe should be reversed?)
filter.low.counts = TRUE
)
Expand Down
2 changes: 1 addition & 1 deletion doc/exSTRa.R
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ library(exSTRa)
## ------------------------------------------------------------------------
str_score <- read_score (
file = system.file("extdata", "HiSeqXTen_WGS_PCR_2.txt", package = "exSTRa"),
database = system.file("extdata", "repeat_expansion_disorders.txt", package = "exSTRa"),
database = system.file("extdata", "repeat_expansion_disorders_hg19.txt", package = "exSTRa"),
groups.regex = c(control = "^WGSrpt_0[24]$", case = "")
)

Expand Down
2 changes: 1 addition & 1 deletion doc/exSTRa.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,7 @@ This results in an `exstra_score` object.
```{r}
str_score <- read_score (
file = system.file("extdata", "HiSeqXTen_WGS_PCR_2.txt", package = "exSTRa"),
database = system.file("extdata", "repeat_expansion_disorders.txt", package = "exSTRa"),
database = system.file("extdata", "repeat_expansion_disorders_hg19.txt", package = "exSTRa"),
groups.regex = c(control = "^WGSrpt_0[24]$", case = "")
)
Expand Down
2 changes: 1 addition & 1 deletion doc/exSTRa.html.REMOVED.git-id
Original file line number Diff line number Diff line change
@@ -1 +1 @@
27ff3057312913216fda5bdde43474a65a46189b
b0661c7cbbe131121ec309c4ca25b71c0fd32624
2 changes: 1 addition & 1 deletion examples/exSTRa_score_analysis.R
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ knitr::opts_chunk$set(fig.width=11, fig.height=11)
# Read score data and file with loci information
str_score <- read_score (
file = system.file("extdata", "HiSeqXTen_WGS_PCR_2.txt", package = "exSTRa"),
database = system.file("extdata", "repeat_expansion_disorders.txt", package = "exSTRa"), # for greater control, use object from read_exstra_db() instead
database = system.file("extdata", "repeat_expansion_disorders_hg19.txt", package = "exSTRa"), # for greater control, use object from read_exstra_db() instead
groups.regex = c(control = "^WGSrpt_0[24]$", case = ""), # the group is the first regular expression (regex) to match
filter.low.counts = TRUE
)
Expand Down
31 changes: 31 additions & 0 deletions inst/extdata/repeat_expansion_disorders_grch37.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
### exSTRa repeat expansion disorder GRCh37 database ###
# Last updated 26th June 2019.
# Most fields are for informational purposes and not used by exSTRa.
# Requires: exSTRa 0.8
locus long_name OMIM inheritance gene location gene_region motif norm_low norm_up aff_low aff_up aff_more strand chrom hg19_start hg19_end copyNum perMatch perIndel STR_size_bp score_size strcat
DM1 Myotonic dystrophy 1 160900 AD DMPK 19q13 3'UTR CTG 5 37 50 10000 FALSE - 19 46273463 46273524 20.7 100 0 62 NA http://strcat.teamerlich.org/chart/chr19/46273463/46273524
DM2 Myotonic dystrophy 2 602668 AD ZNF9/CNBP 3q21.3 intron CCTG 10 26 75 11000 FALSE - 3 128891420 128891502 20.8 92 0 83 NA http://strcat.teamerlich.org/chart/chr3/128891420/128891502
DRPLA Dentatorubral-pallidoluysian atrophy 125370 AD DRPLA/ATN1 12p13.31 coding CAG 7 34 49 88 FALSE + 12 7045880 7045938 19.7 92 0 59 NA http://strcat.teamerlich.org/chart/chr12/7045880/7045938
EPM1A Myoclonic epilepsy of Unverricht and Lundborg 254800 AR CSTB 21q22.3 promotor CCCCGCCCCGCG 2 3 40 80 FALSE - 21 45196324 45196360 3.1 100 0 37 NA http://strcat.teamerlich.org/chart/chr21/45196324/45196360
FRAXA Fragile-X site A 309550 X FMR1 Xq27.3 5'UTR CGG 6 54 200 1000 TRUE + X 146993555 146993629 25 90 5 75 NA http://strcat.teamerlich.org/chart/chrX/146993555/146993629
FRAXE Fragile-X site E 309548 X FMR2 Xq28 5'UTR CCG 4 39 200 900 FALSE + X 147582159 147582204 15.3 100 0 46 NA http://strcat.teamerlich.org/chart/chrX/147582125/147582273
FRDA Friedreich ataxia 229300 AR FXN 9q13 intron GAA 6 32 200 1700 FALSE + 9 71652201 71652220 6.7 100 0 20 NA http://strcat.teamerlich.org/chart/chr9/71652201/71652220
FTDALS1 Amyotrophic lateral sclerosis-frontotemporal dementia 105550 AD C9orf72 9p21 intron GGGGCC 2 19 250 1600 FALSE - 9 27573483 27573544 10.8 74 8 62 NA http://strcat.teamerlich.org/chart/chr9/27573483/27573544
HD Huntington disease 143100 AD HTT 4p16.3 coding CAG 6 34 36 100 TRUE + 4 3076604 3076667 21.3 96 0 64 NA http://strcat.teamerlich.org/chart/chr4/3076604/3076667
HDL2 Huntington disease-like 2 606438 AD JPH3 16q24.3 exon CTG 7 28 66 78 FALSE + 16 87637889 87637935 15.3 95 4 47 NA http://strcat.teamerlich.org/chart/chr16/87637889/87637935
SBMA Kennedy disease 313200 X AR Xq12 coding CAG 9 35 38 62 FALSE + X 66765159 66765261 33.3 86 9 103 NA http://strcat.teamerlich.org/chart/chrX/66765159/66765261
SCA1 Spinocerebellar ataxia 1 164400 AD ATXN1 6p23 coding CAG 6 38 39 82 FALSE - 6 16327865 16327955 30.3 95 0 91 NA http://strcat.teamerlich.org/chart/chr6/16327865/16327955
SCA2 Spinocerebellar ataxia 2 183090 AD ATXN2 12q24 coding CAG 15 24 32 200 FALSE - 12 112036754 112036823 23.3 97 0 70 NA http://strcat.teamerlich.org/chart/chr12/112036754/112036823
SCA3 Machado-Joseph disease 109150 AD ATXN3 14q32.1 coding CAG 13 36 61 84 FALSE - 14 92537355 92537396 14 84 0 42 NA http://strcat.teamerlich.org/chart/chr14/92537355/92537396
SCA6 Spinocerebellar ataxia 6 183086 AD CACNA1A 19p13 coding CAG 4 17 21 33 FALSE - 19 13318673 13318712 13.3 100 0 40 NA http://strcat.teamerlich.org/chart/chr19/13318673/13318712
SCA7 Spinocerebellar ataxia 7 164500 AD ATXN7 3p14.1 coding CAG 4 35 37 306 FALSE + 3 63898361 63898392 10.7 100 0 32 NA http://strcat.teamerlich.org/chart/chr3/63898361/63898392
SCA8 Spinocerebellar ataxia 8 608768 AD ATXN8OS/ATXN8 13q21 utRNA CTG 16 34 74 74 TRUE + 13 70713516 70713561 15.3 100 0 46 NA http://strcat.teamerlich.org/chart/chr13/70713516/70713561
SCA10 Spinocerebellar ataxia 10 603516 AD ATXN10 22q13.31 intron ATTCT 10 20 500 4500 FALSE + 22 46191235 46191304 14 100 0 70 NA http://strcat.teamerlich.org/chart/chr22/46191235/46191304
SCA12 Spinocerebellar ataxia 12 604326 AD PPP2R2B 5q32 promotor CAG 7 45 55 78 FALSE - 5 146258291 146258322 10.7 100 0 32 NA http://strcat.teamerlich.org/chart/chr5/146258291/146258322
SCA17 Spinocerebellar ataxia 17 607136 AD TBP 6q27 coding CAG 25 42 47 63 FALSE + 6 170870995 170871105 37 94 0 111 NA http://strcat.teamerlich.org/chart/chr6/170870995/170871105
SCA36 Spinocerebellar ataxia 36 614153 AD NOP56 20p13 intron GGCCTG 3 8 1500 2500 FALSE + 20 2633379 2633421 7.2 97 0 43 NA http://strcat.teamerlich.org/chart/chr20/2633379/2633421
FECD3 Fuchs endothelial corneal dystrophy 3 613267 AD TCF4 18q21.2 intron CTG 10 40 50 1300 TRUE - 18 53253385 53253460 25.3 100 0 76 NA NA
FAME1 Familial adult myoclonic epilepsy 1 601068 AD SAMD12 8q24 intron TTTCA 0 0 440 3680 FALSE - 8 119379052 119379155 0.6 3 NA NA
FAME6 Familial adult myoclonic epilepsy 6 618074 AD TNRC6A 16p12.1 intron TTTCA 0 0 TRUE + 16 24624851 24624853 0.6 3 NA NA
FAME7 Familial adult myoclonic epilepsy 7 618075 AD RAPGEF2 4q32.1 intron TTTCA 0 0 TRUE + 4 160263769 160263770 0.4 2 NA NA
CANVAS Cerebellar ataxia, neuropathy, and vestibular areflexia syndrome 614575 AR RFC1 4p14 intron TTCCC 0 0 400 2000 FALSE - 4 39350045 39350103 11.8 59 NA NA
Original file line number Diff line number Diff line change
@@ -1,27 +1,31 @@
### exSTRa repeat expansion disorder database ###
# Last updated 18th May 2017.
# Most fields are for informational purposes and not used by exSTRa.
# Requires: exSTRa 0.8
### exSTRa repeat expansion disorder GRCh37 database ###
# Last updated 26th June 2019.
# Most fields are for informational purposes and not used by exSTRa.
# Requires: exSTRa 0.8
locus long_name OMIM inheritance gene location gene_region motif norm_low norm_up aff_low aff_up aff_more strand chrom hg19_start hg19_end copyNum perMatch perIndel STR_size_bp score_size strcat
DM1 Myotonic dystrophy 1 160900 AD DMPK 19q13 3'UTR CTG 5 37 50 10000 FALSE - chr19 46273463 46273524 20.7 100 0 62 NA http://strcat.teamerlich.org/chart/chr19/46273463/46273524
DM2 Myotonic dystrophy 2 602668 AD ZNF9/CNBP 3q21.3 intron CCTG 10 26 75 11000 FALSE - chr3 128891420 128891502 20.8 92 0 83 NA http://strcat.teamerlich.org/chart/chr3/128891420/128891502
DRPLA Dentatorubral-pallidoluysian atrophy 125370 AD DRPLA/ATN1 12p13.31 coding CAG 7 34 49 88 FALSE + chr12 7045880 7045938 19.7 92 0 59 NA http://strcat.teamerlich.org/chart/chr12/7045880/7045938
EPM1A Myoclonic epilepsy of Unverricht and Lundborg 254800 AR CSTB 21q22.3 promotor CCCCGCCCCGCG 2 3 40 80 FALSE - chr21 45196324 45196360 3.1 100 0 37 NA http://strcat.teamerlich.org/chart/chr21/45196324/45196360
FRAXA Fragile-X site A 309550 X FMR1 Xq27.3 5'UTR CGG 6 54 200 1000 TRUE + chrX 146993555 146993629 25 90 5 75 NA http://strcat.teamerlich.org/chart/chrX/146993555/146993629
FRAXE Fragile-X site E 309548 X FMR2 Xq28 5'UTR CCG 4 39 200 900 FALSE + chrX 147582159 147582204 15.3 100 0 46 NA http://strcat.teamerlich.org/chart/chrX/147582125/147582273
FRDA Friedreich ataxia 229300 AR FXN 9q13 intron GAA 6 32 200 1700 FALSE + chr9 71652201 71652220 6.7 100 0 20 NA http://strcat.teamerlich.org/chart/chr9/71652201/71652220
FTDALS1 Amyotrophic lateral sclerosis-frontotemporal dementia 105550 AD C9orf72 9p21 intron GGGGCC 2 19 250 1600 FALSE - chr9 27573483 27573544 10.8 74 8 62 NA http://strcat.teamerlich.org/chart/chr9/27573483/27573544
HD Huntington disease 143100 AD HTT 4p16.3 coding CAG 6 34 36 100 TRUE + chr4 3076604 3076667 21.3 96 0 64 NA http://strcat.teamerlich.org/chart/chr4/3076604/3076667
HDL2 Huntington disease-like 2 606438 AD JPH3 16q24.3 exon CTG 7 28 66 78 FALSE + chr16 87637889 87637935 15.3 95 4 47 NA http://strcat.teamerlich.org/chart/chr16/87637889/87637935
SBMA Kennedy disease 313200 X AR Xq12 coding CAG 9 35 38 62 FALSE + chrX 66765159 66765261 33.3 86 9 103 NA http://strcat.teamerlich.org/chart/chrX/66765159/66765261
SCA1 Spinocerebellar ataxia 1 164400 AD ATXN1 6p23 coding CAG 6 38 39 82 FALSE - chr6 16327865 16327955 30.3 95 0 91 NA http://strcat.teamerlich.org/chart/chr6/16327865/16327955
SCA2 Spinocerebellar ataxia 2 183090 AD ATXN2 12q24 coding CAG 15 24 32 200 FALSE - chr12 112036754 112036823 23.3 97 0 70 NA http://strcat.teamerlich.org/chart/chr12/112036754/112036823
SCA3 Machado-Joseph disease 109150 AD ATXN3 14q32.1 coding CAG 13 36 61 84 FALSE - chr14 92537355 92537396 14 84 0 42 NA http://strcat.teamerlich.org/chart/chr14/92537355/92537396
SCA6 Spinocerebellar ataxia 6 183086 AD CACNA1A 19p13 coding CAG 4 17 21 33 FALSE - chr19 13318673 13318712 13.3 100 0 40 NA http://strcat.teamerlich.org/chart/chr19/13318673/13318712
SCA7 Spinocerebellar ataxia 7 164500 AD ATXN7 3p14.1 coding CAG 4 35 37 306 FALSE + chr3 63898361 63898392 10.7 100 0 32 NA http://strcat.teamerlich.org/chart/chr3/63898361/63898392
SCA17 Spinocerebellar ataxia 17 607136 AD TBP 6q27 coding CAG 25 42 47 63 FALSE + chr6 170870995 170871105 37 94 0 111 NA http://strcat.teamerlich.org/chart/chr6/170870995/170871105
DRPLA Dentatorubral-pallidoluysian atrophy 125370 AD DRPLA/ATN1 12p13.31 coding CAG 7 34 49 88 FALSE + chr12 7045880 7045938 19.7 92 0 59 NA http://strcat.teamerlich.org/chart/chr12/7045880/7045938
HDL2 Huntington disease-like 2 606438 AD JPH3 16q24.3 exon CTG 7 28 66 78 FALSE + chr16 87637889 87637935 15.3 95 4 47 NA http://strcat.teamerlich.org/chart/chr16/87637889/87637935
FRAXA Fragile-X site A 309550 X FMR1 Xq27.3 5'UTR CGG 6 54 200 1000 TRUE + chrX 146993555 146993629 25 90 5 75 NA http://strcat.teamerlich.org/chart/chrX/146993555/146993629
FRAXE Fragile-X site E 309548 X FMR2 Xq28 5'UTR CCG 4 39 200 900 FALSE + chrX 147582159 147582204 15.3 100 0 46 NA http://strcat.teamerlich.org/chart/chrX/147582125/147582273
DM1 Myotonic dystrophy 1 160900 AD DMPK 19q13 3'UTR CTG 5 37 50 10000 FALSE - chr19 46273463 46273524 20.7 100 0 62 NA http://strcat.teamerlich.org/chart/chr19/46273463/46273524
FRDA Friedreich ataxia 229300 AR FXN 9q13 intron GAA 6 32 200 1700 FALSE + chr9 71652201 71652220 6.7 100 0 20 NA http://strcat.teamerlich.org/chart/chr9/71652201/71652220
DM2 Myotonic dystrophy 2 602668 AD ZNF9/CNBP 3q21.3 intron CCTG 10 26 75 11000 FALSE - chr3 128891420 128891502 20.8 92 0 83 NA http://strcat.teamerlich.org/chart/chr3/128891420/128891502
FTDALS1 Amyotrophic lateral sclerosis-frontotemporal dementia 105550 AD C9orf72 9p21 intron GGGGCC 2 19 250 1600 FALSE - chr9 27573483 27573544 10.8 74 8 62 NA http://strcat.teamerlich.org/chart/chr9/27573483/27573544
SCA36 Spinocerebellar ataxia 36 614153 AD NOP56 20p13 intron GGCCTG 3 8 1500 2500 FALSE + chr20 2633379 2633421 7.2 97 0 43 NA http://strcat.teamerlich.org/chart/chr20/2633379/2633421
SCA8 Spinocerebellar ataxia 8 608768 AD ATXN8OS/ATXN8 13q21 utRNA CTG 16 34 74 74 TRUE + chr13 70713516 70713561 15.3 100 0 46 NA http://strcat.teamerlich.org/chart/chr13/70713516/70713561
SCA10 Spinocerebellar ataxia 10 603516 AD ATXN10 22q13.31 intron ATTCT 10 20 500 4500 FALSE + chr22 46191235 46191304 14 100 0 70 NA http://strcat.teamerlich.org/chart/chr22/46191235/46191304
EPM1A Myoclonic epilepsy of Unverricht and Lundborg 254800 AR CSTB 21q22.3 promotor CCCCGCCCCGCG 2 3 40 80 FALSE - chr21 45196324 45196360 3.1 100 0 37 NA http://strcat.teamerlich.org/chart/chr21/45196324/45196360
SCA12 Spinocerebellar ataxia 12 604326 AD PPP2R2B 5q32 promotor CAG 7 45 55 78 FALSE - chr5 146258291 146258322 10.7 100 0 32 NA http://strcat.teamerlich.org/chart/chr5/146258291/146258322
SCA8 Spinocerebellar ataxia 8 608768 AD ATXN8OS/ATXN8 13q21 utRNA CTG 16 34 74 74 TRUE + chr13 70713516 70713561 15.3 100 0 46 NA http://strcat.teamerlich.org/chart/chr13/70713516/70713561

SCA17 Spinocerebellar ataxia 17 607136 AD TBP 6q27 coding CAG 25 42 47 63 FALSE + chr6 170870995 170871105 37 94 0 111 NA http://strcat.teamerlich.org/chart/chr6/170870995/170871105
SCA36 Spinocerebellar ataxia 36 614153 AD NOP56 20p13 intron GGCCTG 3 8 1500 2500 FALSE + chr20 2633379 2633421 7.2 97 0 43 NA http://strcat.teamerlich.org/chart/chr20/2633379/2633421
FECD3 Fuchs endothelial corneal dystrophy 3 613267 AD TCF4 18q21.2 intron CTG 10 40 50 1300 TRUE - chr18 53253385 53253460 25.3 100 0 76 NA NA
FAME1 Familial adult myoclonic epilepsy 1 601068 AD SAMD12 8q24 intron TTTCA 0 0 440 3680 FALSE - chr8 119379052 119379155 0.6 3 NA NA
FAME6 Familial adult myoclonic epilepsy 6 618074 AD TNRC6A 16p12.1 intron TTTCA 0 0 TRUE + chr16 24624851 24624853 0.6 3 NA NA
FAME7 Familial adult myoclonic epilepsy 7 618075 AD RAPGEF2 4q32.1 intron TTTCA 0 0 TRUE + chr4 160263769 160263770 0.4 2 NA NA
CANVAS Cerebellar ataxia, neuropathy, and vestibular areflexia syndrome 614575 AR RFC1 4p14 intron TTCCC 0 0 400 2000 FALSE - chr4 39350045 39350103 11.8 59 NA NA
8 changes: 6 additions & 2 deletions inst/tools/prepare_exSTRa_input_db.R
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,10 @@ GTEx_median_tpm_file <- ".../path/to/.../GTEx_Analysis_2016-01-15_v7_RNASeQCv1.1
# Specify median TMP value to use as threshold for which genes are considered expressed in brain
brain_median_tpm_thresh <- 1

# Specify miminum and maximum motif size (in base pairs) to search for
min_motif_size <- 2
max_motif_size <- 6

# Download and install ANNOVAR (http://annovar.openbioinformatics.org/)
table_annovar_script <- ".../path/to/.../table_annovar.pl"
humandb_annovar_dir <- ".../path/to/.../humandb"
Expand All @@ -29,8 +33,8 @@ simpleRepeat <- readr::read_delim(simpleRepeat_file, delim="\t", col_names=FALSE
colnames(simpleRepeat) <- c("bin", "chrom", "chromStart", "chromEnd", "name", "period", "copyNum", "consensusSize", "perMatch", "perIndel", "score", "A", "C", "G", "T", "entropy", "sequence")
simpleRepeat <- as.data.frame(simpleRepeat, stringsAsFactors=FALSE)

# Filter to 2-6 bp pair repeats
simpleRepeat <- simpleRepeat[(simpleRepeat$period >= 2) & (simpleRepeat$period <= 6), ]
# Filter based on repeat motif size
simpleRepeat <- simpleRepeat[(simpleRepeat$consensusSize >= min_motif_size) & (simpleRepeat$consensusSize <= max_motif_size), ]

# Download GTEx portal median TPM table from https://gtexportal.org/home/datasets:
# GTEx_Analysis_2016-01-15_v7_RNASeQCv1.1.8_gene_median_tpm.gct.gz
Expand Down
2 changes: 1 addition & 1 deletion man/read_exstra_db.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit e4be33d

Please sign in to comment.