This folder contains the results obtained during the GREEKC hackathon (May 23-26, 2019), and the subsequent development until July 9, 2019. After this date, the workflow will be further developed in a separate repository.
https://github.com/YvonFrid/cisreg-GWAS/
- Yvon Mbouamboua, Aix-Marseille Université (AMU), France
- Jacques van Helden, Institut Français de Bioinformatique + Aix-Marseille Université
- Benoît Ballester
- Aziz Kahn
- Thomas Rosnet
- Thuy Nga Thi Nguyen
- Andrew Parton
- Ferran Moratalla Navarro
The aim of the project is to apply bioinformatic methods to detect non-coding disease-associated variant that may affect transcriptional regulation by modifying transcription factor binding sites. The approach is based on the integration of information elements collected automatically from various genomic databases (BioMart, dbSNP, Ensembl, HalpoReg), and on the selection of variations that may affect regulation, by combining specialized bioinformatic tools: Regulatory Sequence Analysis Tools (RSAT) and ChIP-seq (ReMap) data.
For this, we develop an analysis workflow in the R statistical language, with BioConductor and CRAN libraries, to invoke remote resources (Web services). The tool is designed generically, and can be adapted for the study of regulatory variants of any disease documented in the GWAS catalog.
In order to facilitate its use by a biologist, the tool automatically generates (in R markdown) an analysis report illustrated by figures and tables.
-
interfaces
- Currently there is no Web services for Remap
- RSAT Web services were originally based on SOAP/WSDL, which is not supported anymore by R
- We are currently developing a REST interface (and the efforts are put on the tools that will be used for this hackathon)
- ...
-
IDs
- Cross-lins between factor names in ReMap, matrix names from RSAT, matrices from Jaspar, proteins in Uniprot, genes in Ensembl, ...
The table below provides the URL of each resource mobilised by the workflow, and indicates their API if availeble.
Resource name | Data types | URL | Access mode in the workflow |
---|---|---|---|
GWAS catalog | SNPs associated to a query disease | https://www.ebi.ac.uk/gwas/ | ftp download |
HaploReg | Collect the SNPs in linkage desiquilibrium (LD) | https://pubs.broadinstitute.org/mammals/haploreg/ | R package |
BioMart | Collect SNP missing data | http://www.biomart.org | R package |
ReMap | Collect transcriptional regulators ChIP-seq experiments | http://remap.cisreg.eu/ | Web interface, to be converted to REST |
Jaspar | Collect all matrices corresponding to transcription factor names | http://jaspar2018.genereg.net | ftp download, to be converted to REST |
RSAT | Prediction of polymorphic variations affecting trnascription factor binding | http://rsat.sb-roscoff.fr/ | Web interface, to be converted to REST |
The workflow is written in R code embedded in a R markdown document, which automatically generates a report in HTML , pdf or Word .docs format.
Main R packages
- biomaRt
- jsonlite
- haploR
- httr
- GenomicRanges
- RCurl
- ReMapEnrich
- XGR
- xml2
-
Replace the downloads and manual analyses by programmatic accesses
- Use R JASPAR package or RESTful API to download all matrices
- REST interface for RSAT
- REST interface for ReMap
-
Cross-references between RSAT and Jaspar matrices
-
Cross-references between ReMap factors and Jaspar
- REST API development
- Shiny interface
- Occasional help of the developers of the mobilized resources
At the end of the hackathon, we aim at providing a fully automated workflow relying as much as possible on APIs wihout having to download the full datasets and parse them locally.
After day 1, ...
After day 2, ...
- A workflow integrated in an R markdown document that automatically runs the analysis and generates a report
- A yaml-base management of the parameters of the workflow
- Examples of utiliation with selected study cases
- A Shiny-based Web interface to the workflow
- Full code of the workflow available in github
- A user documentation enabling biologists to run the anlayses on their own computer