-
Notifications
You must be signed in to change notification settings - Fork 0
ISMB2023_Tutorial
Applied computational techniques for omics-phenotype association discovery
Ali Rahnavard: George Washington University Himel Mallick: Merck Research Laboratories Daniel Kerchner: George Washington University
Computational biologists, bioinformaticians, principal investigators, and their research team, including Ph.D. students after the second year of their program with basic familiarity with multi-omics upstream bioinformatics tools who want to get started with the current best practices in downstream multi-omics data analysis; beginner-level familiarity with R is required.
Methodological advancements paired with measured multi-omics data using high-throughput technologies enable capturing a comprehensive snapshot of distinct biological entities. In particular, low-cost, culture-independent omics profiling has made metabolomics surveys of human health, other hosts, and the environment possible at an unprecedented scale. The resulting data have stimulated the development of new statistical and computational approaches to analyze and integrate omics data, including human gene expression, microbial gene products, metabolites, and proteins, among others. Multi-omics data generated from diverse platforms are often fed into generic downstream analysis software without proper appreciation of the inherent data properties, resulting in incorrect interpretations. Specifically, omics data is typically high dimensional, zero inflated, and characterized by a variety of statistical distributions. Further, there is also an extensive collection of downstream analysis software platforms, and appropriately selecting the best tool can be overwhelming for untrained researchers and non-specialists. We present a high-level introduction to computational multi-omics, highlighting the state-of-the-art in the field and outstanding challenges geared towards downstream analysis methods. The workshop will include formulating biological hypotheses and the statistical methods currently available to achieve them. We will begin with an overview of the statistical challenges inherent to analyzing the high-dimensional data in multi-omics studies. Introductory lectures will include: 1) The challenges associated with applying machine learning (ML) techniques for GWAS (gene-wise association studies) and precisely testing for multivariable association in population-scale meta-omics studies, and 2) challenges and advances in pathway enrichment analyses. The workshop will cover computational techniques for gene-wise association of sequencing data using machine learning, association discovery in multi-omics data, and pathway enrichment analysis. Workshop participants will have the opportunity to apply at least two different software tools to each of these three scenarios. The workshop is project-focused and uses a hands-on approach. Participants are encouraged to attend with a specific study or project in mind for the content to be applied in the short term. The workshop will use real data for the exercises. Workshop tools and resources will be with open and FAIR resources available on GitHub with adherence to FAIR principles to the greatest extent possible. This workshop will be presented as a collaboration between George Washington University and Merck Research Laboratories. Researchers from industry and academia will come together to share a diverse perspective on the topic, both from drug discovery and basic science angles, enabling attendees to achieve a holistic view of multi-omics and clinical data integration through state-of-the-art tools applied to motivating examples and use cases.
Workshop attendees will: Understand the concepts and theoretical basis for:
- Machine learning for GWAS
- Multi-omics data association testing
- Pathway enrichment analysis Gain hands-on experience using tools for association in multi-omics including:
- DeepGS and deepBreaks: Association discovery with the phenotype of interest using multi-alignment sequencing data from a population
- DESeq2, Tweedieverse, and Maaslin2: Statistical frameworks for differential analysis of multi-omics
- omePath and IPA: Omics pathway enrichment analysis
- Practice generating publication-quality figures and effective visualization of the results.
The workshop can accommodate up to 50 attendees.
- Welcome and introduction to multi-omics (20 min)
- ML techniques for inferring from sequencing data and GWAS
- Theoretical background(15 min)
- DeepGS and deepBreaks tutorials using an HIV study (40 min)
- Multivariable association testing: challenges and techniques
- Conceptual background (30 min)
- DESeq2, Tweedieverse, and Maaslin2 tutorial and application to cancer metabolites (45 min)
- Coffee Break (30 min)
- Pathway enrichment analysis: challenges and advancements
- Conceptual overview (15 min)
- omePath hands-on tutorial with application to cancer metabolites (30 min)
- Q/A and Wrap-up, Tips for visualization of results (15 min)
Total time: Approximately 4 hours