Skip to content

Latest commit

 

History

History
142 lines (104 loc) · 6.36 KB

README.md

File metadata and controls

142 lines (104 loc) · 6.36 KB

GDCRNATools - an R/Bioconductor package for downloading, organizing, and integrative analyzing lncRNA, mRNA, and miRNA data in GDC

  • The GDCRNATools Manual and R code of the GDCRNATools Workflow has been updated in 10-30-2018.

  • If you use GDCRNATools in your published research, please cite:
    Li, R., Qu, H., Wang, S., Wei, J., Zhang, L., Ma, R., Lu, J., Zhu, J., Zhong, W., and Jia, Z. (2018). GDCRNATools: an R/Bioconductor package for integrative analysis of lncRNA, miRNA and mRNA data in GDC. Bioinformatics 34, 2515-2517. https://doi.org/10.1093/bioinformatics/bty124.

  • Please add my WeChat: rli012 or email to [email protected] if you have further questions.


1. Introduction

The Genomic Data Commons (GDC) maintains standardized genomic, clinical, and biospecimen data from National Cancer Institute (NCI) programs including The Cancer Genome Atlas (TCGA) and Therapeutically Applicable Research To Generate Effective Treatments (TARGET), It also accepts high quality datasets from non-NCI supported cancer research programs, such as genomic data from the Foundation Medicine.

GDCRNATools is an R/Bioconductor package which provides a standard, easy-to-use and comprehensive pipeline for downloading, organizing, and integrative analyzing RNA expression data in the GDC portal with an emphasis on deciphering the lncRNA-mRNA related ceRNA regulatory network in cancer.

2. Manual and R script (updated in 10-30-2018)

The comprehensive manual of GDCRNATools is available here: GDCRNATools Manual

R code of the workflow is available here: GDCRNATools Workflow

3. Installation

3.1 Installation via Bioconductor

  • The stable release version of GDCRNATools requires R(>=3.5.0) and Bioconductor(>=3.8). Please start R and enter:
## try http:// if https:// URLs are not supported
if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install("GDCRNATools")
  • To install the development version of GDCRNATools, please update your R and Biocondutor to the latest version and run:
if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install("GDCRNATools", version = "devel")

3.2 Installation locally

Please download the compressed package here: GDCRNATools_1.1.5.tar.gz

3.2.1 On Windows system

  • Make sure that your R is installed in 'c:\program files'

  • Install Rtools in 'c:\program files'

  • Add R and Rtools to the Path Variable on the Environment Variables panel, including

    c:\program files\Rtools\bin

    c:\program files\Rtools\gcc-4.6.3\bin

    c:\program files\R\R.3.x.x\bin\i386

    c:\program files\R\R.3.x.x\bin\x64

  • Run the following code in R

install.packages('GDCRNATools_1.1.5.tar.gz', repos = NULL, type='source')

3.2.2 On Linux and Mac systems

Just run the following code in R

install.packages('GDCRNATools_1.1.5.tar.gz', repos = NULL, type='source')

3.3 Note

If GDCRNATools cannot be installed due to the lack of dependencies, please run the following code ahead to install those pacakges either simutaneously or separately:

if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")

### install packages simutaneously ###
BiocManager::install(c('limma', 'edgeR', 'DESeq2', 'clusterProfiler', 'DOSE', 'org.Hs.eg.db', 'biomaRt', 'BiocParallel', 'GenomicDataCommons'))
install.packages(c('shiny', 'jsonlite', 'rjson', 'survival', 'survminer', 'ggplot2', 'gplots', 'Hmisc', 'DT', 'matrixStats', 'xml2'))

### install packages seperately ###
BiocManager::install('limma')
BiocManager::install('edgeR')
BiocManager::install('DESeq2')
BiocManager::install('clusterProfiler')
BiocManager::install('DOSE')
BiocManager::install('org.Hs.eg.db')
BiocManager::install('biomaRt')
BiocManager::install('BiocParallel')
BiocManager::install('GenomicDataCommons')

install.packages('shiny')
install.packages('jsonlite')
install.packages('rjson')
install.packages('survival')
install.packages('survminer')
install.packages('ggplot2')
install.packages('gplots')
install.packages('Hmisc')
install.packages('DT')
install.packages('matrixStats')
install.packages('xml2')

4. Frequently Asked Questions.

Q1: gdcRNADownload() function doesn't work with the following error:
Error in FUN(X[[i]], ...):
unused arguments(desination_dir=directory, overwrite=TRUE)
A1: This error occurs when the default API method for downloading fails. Please add method='gdc-client' to the gdcRNADownload() function.

####### Download RNAseq data #######
project <- 'TCGA-CHOL'
rnadir <- paste(project, 'RNAseq', sep='/')
gdcRNADownload(project.id     = 'TCGA-CHOL', 
               data.type      = 'RNAseq', 
               write.manifest = FALSE,
               method         = 'gdc-client', ### use 'gdc-client' to download data
               directory      = rnadir)

Q2: gdcRNAMerge() doesn't work with the following error:
Error in open.connection(file, 'rt'): cannot open the connection
In addition: Warning message:
In open.connection(file, 'rt'):
cannot open compressed file 'TCGA-XXXX/RNAseq/xxx-xxx-xxx-xxx.htseq.counts.gz', probable reason 'No such file or directory'.
A2: This is usually because the data for different samples are downloaded in separate folders. Please add organized=FALSE to the gdcRNAMerge() function.

####### Merge RNAseq data #######
rnaCounts <- gdcRNAMerge(metadata  = metaMatrix.RNA, 
                         path      = rnadir,
                         organized = FALSE, # if the data are in separate folders
                         data.type = 'RNAseq')