ALLIUM (ALL subtype Identification Using Machine learning) is a multimodal classifier of molecular subtypes in pediatric acute lymphoblastic leukemia, using DNA methylation (DNAm) and gene expression (GEX) data. The reference genome used by this model is Homo_sapiens.GRCh38.103.
Krali, O., Marincevic-Zuniga, Y., Arvidsson, G. et al. Multimodal classification of molecular subtypes in pediatric acute lymphoblastic leukemia. npj Precis. Onc. 7, 131 (2023). https://doi.org/10.1038/s41698-023-00479-5
This repository contains:
- the ALLIUM models
- GEX and DNAm prediction clients
- test data
Conda must be installed on your system.
You will need to activate the allium
conda environment before running any subsequent commands.
Install: conda env create -f environment.yml
Activate: conda activate allium
Update (after changes to environment.yml): conda env update --file environment.yml --prune
Run python test_client.py
to run GEX and DNAm prediction on test datasets.
Run pytest
.
Preprocessing tools are available in the ALLIUM PrePro repository.
The models were trained using an older version of scikit-learn, due to some legacy dependency issues. This package, together with the Python version, should preferably be upgraded when retraining the model. Due to this, the current version of the prediction client does not work on Mac OS.