You should be using a GPU to train and evaluate s-enformer, ideally with 24 GB of memory
Unless you are running this code from Imperial College London's HPC, and have access to the directory neurogenomics-lab/live/Projects/enformer_bigbird/
, you'll need to download the data to train, validate, and test s-enformer. You can find this data here. To download the data, run:
gsutil cp -r gs://basenji_barnyard/data .
If you want to change the download location, replace .
with your desired location.
If not already installed, you'll need to install gsutil.
You will then need to change the path to the data in utils/
Inside of the train directory, run the command below to train the model:
Depending on the specifications of your machine, you might want to alter the training parameters.
The model will be saved to a ./models
The evaluation directory contains all of the code to evaluate the model. All of these scripts must be run within the evaluation directory.
Evaluate the model by measuring its correlation across the four genomic track types:
- DNase-Seq & ATAC-Seq
- Histone modification ChIP-Seq
- Transcription factor ChIP-Seq
- CAGE (cap analysis of gene expression)
Measure how similar the predictions between s-enformer and enformer.
Measure how much memory a model uses when training and how quickly it trains.
Measure the size of a model's receptive field.
Some of the figures for the report are created in create_figures.ipynb.