This repository contains the Matlab scripts used in our paper.
Analysing data from complex diseases in a personalised manner to identify disrupted pathways can improve elucidation of the disease progression. We have developed a personalised statistical method that robustly models gene expression time-course datasets from complex heterogeneous diseases and summarises the gene-level results on the level of pathways, i.e. groups of genes involved in particular biological processes that drive the disease. By analysing three Type 1 Diabetes datasets using this method, we demonstrate that our personalised method reveals more insight into the biological processes involved in disease progression over time and in specific time intervals, than non-personalised (combined) methods. With its robust capabilities of identifying numerous disease-relevant pathways, this method could be further developed for predicting events in the progression of heterogeneous diseases, pursuing preventive treatments and even biomarker identification.
We present a method that models time-course data in a personalised manner using Gaussian processes in order to identify differentially expressed genes (DEGs); and combines the DEG lists on a pathway-level using a permutation-based empirical hypothesis testing in order to overcome gene-level variability and inconsistencies prevalent to datasets from heterogenous diseases. Our method can be applied to study the time-course dynamics as well as specific time-windows of heterogeneous diseases.
These scripts require the following software:
- Matlab (>= r2016a)
- GPstuff 4.7
For personalised time-course analysis:
- Run
compute_ratios.m
and pass the required parameters.- The
probeset_file
is a path to a.mat
file that contains the expression values for each probe-set (or gene) of a case-control pair (i.e. case expressions and control expressions separately). It also contains the time points for the case and control expressions as well as the time of seroconversion (or disease diagnosis).
- The
- Run
child_mapping.m
and pass the required parametets to map the differentially expressed probe-sets to genes. Perform this step only if probe-sets are used in the analysis. - To compute the
adjusted gemetric mean
for each pathway, runpathway_overlap.m
. - Run
pathway_rand_overlap.m
to generate the null distribution as described in our paper. - Compute the emperical p-values for each pathway using the results from steps 4 and 5. Also, perform multiple testing correction on the p-values using the Benjamini-Hochberg procedure.
For personalised time-window analysis:
- Run
compute_KL.m
and pass the required parameters.- The
probeset_file
is a path to a.mat
file that contains the expression values for each probe-set (or gene) of a case-control pair (i.e. case expressions and control expressions separately). It also contains the time points for the case and control expressions as well as the time of seroconversion (or disease diagnosis).
- The
- Perform steps 2, 3, 4 and 5 as in the
personalised time-course analysis
.
Please cite this work as:
Somani, J., Ramchandran, S., & Lähdesmäki, H. (2019). A personalised approach for identifying disease-relevant pathways in heterogeneous diseases. BioRxiv, 738062.
This project is licensed under the MIT License - see the LICENSE file for details.