Hackathon team: gene expression analysis for Covid-19 Virtual Biohackathon (vBH)
https://github.com/virtual-biohackathons/covid-19-bh20
https://github.com/virtual-biohackathons/covid-19-bh20/wiki/GeneExpression
As part of the virtual BioHackathon 2020, we formed a working group that focused on the analysis of gene expression in the context of COVID-19. More specifically, we performed transcriptome analyses on published datasets in order to better understand the interaction between the human host and the SARS-CoV-2 virus.
Fig. 1. Project structure and interaction. Project 1 and 2 along with literature research will provide a list of candidate genes for Project 3 and 4 that will take into account external factors (comorbidities, and potential drug treatments). All data analyzed during this project are fully available to the medical community and meet the FAIR principles. Finally, Project 5 allows the efforts of all of the previous projects to be clearly detailed into workflows for increased reproducibility.Biological: Perform a global RNA-Seq analysis with SARS-CoV-2 infected datasets to search for new candidate genes for testing experimentally
Methodological: Create a packaged reproducible pipeline in Docker to help scientists to easily treat their RNA-Seq data and for us if any new dataset comes out
The report describing all the work and results generated during the virtual BioHackathon can be found here.
The ideas proposed during this hackathon were divided into five projects (Fig. 1):
- SARS-CoV-2 infection global analyses: Understanding how global gene expression in human cells responds to infection by the SARS-CoV-2 virus, including changes in gene regulatory networks.
- Human-virus interaction analyses: Identification of human RNA-binding proteins that might be key in the interaction between human cells and the RNA genome of SARS-CoV-2.
- Increased risk factors analyses: Investigating gene expression in other datasets with the goal of identifying commonalities and differences with the two previous analyses, focusing on specific genes.
- Identification of potential pharmacological treatments: Searching for potential drugs that could impact the expression of human genes that are important for the interaction of human and virus.
- Workflows for reproducibility of analysis: Packaging the workflows devised within the Gene Expression group to enable seamless integration and approach reproducibility.
Projects 1 and 2 aim to identify human genes that are important in the process of viral infection of human cells. Projects 3 and 4 aim to take the candidate genes identified in projects 1 and 2, as well as by independent studies, and relate them to clinical information and to possible therapeutic interventions. All data analyzed during this study are fully available and meet the FAIR principles of Findability, Accessibility, Interoperability, and Reusability. Finally, Project 5 aims to package and containerize software and workflows used and generated here in a reusable manner, ultimately providing scalable and reproducible workflows.
Please see the project folder for details on each individual project.
See the contributors table for a full list of the amazing people who have contributed to the project.
We are tracking progress on project-specific boards here: https://github.com/avantikalal/covid-gene-expression/projects
This working group is dedicated to Open Science. All code in this repository is licensed under the MIT license. Data generated during the course of this project is licensed under the CC0 license.
This repository and the created results are subject to ongoing research and have thus NOT yet undergone any scientific peer-review. That is, none of the contents can be considered to be free of errors and must be taken with caution!