Skip to content

Latest commit

 

History

History
91 lines (67 loc) · 5.53 KB

README.md

File metadata and controls

91 lines (67 loc) · 5.53 KB

CERENKOV

1. Computational Elucidation of the REgulatory NonKOding Variome

CERENKOV is a software pipeline and associated machine-learning framework for identifying regulatory single nucleotide polymorphisms (rSNPs) in the noncoding genome for post-analysis of genetic regions identified in genome-wide association studies (GWAS). CERENKOV was created by Yao Yao, Zheng Liu, Satpreet Singh, Qi Wei, and Stephen Ramsey at Oregon State University.

The March 2017 data files for CERENKOV can be accessed on the Ramsey Lab file server (see README.md files under the subdirectories of the GitHub CERENKOV project area, for more information about which data files are used in which parts of CERENKOV).

2. Reproducing the results of the CERENKOV2

Based on the 2017 ACM-BCB version, we further developed CERENKOV2 and submitted a methodology article CERENKOV2: data-space geometric features improve machine learning-based detection of functional noncoding SNPs to BMC Bioinformatics in July 2018.

To reproduce the results reported in this submission, please follow the README file in experiments/CERENKOV2 folder.

We revised our manuscript and code in the following weeks and re-submitted with a new title CERENKOV2: improved detection of functional noncoding SNPs using data-space geometric features in Nov. 2018.

To reproduce the results reported in revised submission, please follow the README file in experiments/CERENKOV2_revision folder.

We also provided an R scripts, install_packages.R, to install all dependencies of reproduction.

3. Reproducing the results of the 2017 ACM-BCB article

We presented the very first version of CERENKOV at the 2017 ACM-BCB conference in Boston in August 2017, with an accompanying full research article CERENKOV: Computational Elucidation of the Regulatory Noncoding Variome in the proceedings, describing CERENKOV and demonstrating its accuracy for discriminating rSNPs from nonfunctional SNPs.

The corresponding code is archived in release v0.1-alpha.

To reproduce the results of our article, please follow the instructions below.

3.1. Guide to source code files in CERENKOV v0.1-alpha:

  • cerenkov_ml_compare_models.R: obtains the comparative machine-learning performance results that were used to make Figure 3 of the article (the script cerenkov_analyze_ml_results_compare_models.R actually generates the plot).

The cerenkov_ml_compare_models.R script will require the R packages PRROC, parallel, xgboost (version 0.6-4), ranger (version 0.6.0), Matrix, and pbapply. The cerenkov_analyze_ml_results_compare_models.R script will require the R packages ggplot2 and reshape2.

  • cerenkov_ml_xgboost_importance.R: obtains the feature importance scores for the CERENKOV method that were used to make Figure 2 in the article.

The script requires the R packages Matrix, parallel, pbapply, and xgboost.

  • cerenkov_ml_tune_xgboost.R: obtains the grid-search tuning machine-learning performance results that were used to make Figure 1b,c in the article (the script cerenkov_analyze_ml_results_tune_xgboost.R actually generates the plot).

The cerenkov_ml_tune_xgboost.R script will require the R packages PRROC, parallel, xgboost (version 0.6-4), Matrix, and pbapply. The cerenkov_analyze_ml_results_tune_xgboost.R script will require the R packages ggplot2 and reshape2.

3.2 Data files for download

The following .Rdata files (the "201703 data supplement" for CERENKOV) accompany the article CERENKOV: Computational elucidation of the regulatory noncoding variome by Yao Yao, Zheng Liu, Satpreet Singh, Qi Wei, and Stephen A. Ramsey (submitted to the ACM-BCB conference, April 2017). The data files are available at the following links (all are HTTP links to the file server files.cgrb.oregonstate.edu):