From a731e9bd932231f03826b400f329ac914db709d6 Mon Sep 17 00:00:00 2001 From: Henry Addison Date: Wed, 12 Jun 2024 23:40:13 +0100 Subject: [PATCH] a few updates to the README instructions --- README.md | 20 +++++++++++++++----- 1 file changed, 15 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index 4647f4baa..21a769508 100644 --- a/README.md +++ b/README.md @@ -2,6 +2,8 @@ A machine learning emulator of a CPM based on a diffusion model. +This is the code for the paper Addison et al. (2024) "Machine learning emulation of precipitation from km-scale regional climate simulations using a diffusion model". + Diffusion model implementation forked from PyTorch implementation for the paper [Score-Based Generative Modeling through Stochastic Differential Equations](https://openreview.net/forum?id=PxTIG12RRHS) by [Yang Song](https://yang-song.github.io), [Jascha Sohl-Dickstein](http://www.sohldickstein.com/), [Diederik P. Kingma](http://dpkingma.com/), [Abhishek Kumar](http://users.umiacs.umd.edu/~abhishek/), [Stefano Ermon](https://cs.stanford.edu/~ermon/), and [Ben Poole](https://cs.stanford.edu/~poole/). ## Dependencies @@ -10,12 +12,20 @@ Diffusion model implementation forked from PyTorch implementation for the paper 2. Create conda environment: `conda env create -f environment.lock.yml` (or add dependencies to your own: `conda env install -f environment.txt`) 3. Activate the conda environment (if not already done so) 4. Install ml_downscaling_emulator locally: `pip install -e .` -5. Install U-Net code: `git clone --depth 1 https://github.com/henryaddison/Pytorch-UNet.git src/ml_downscaling_emulator/unet` +5. \[Optional\] Install U-Net code: `git clone --depth 1 https://github.com/henryaddison/Pytorch-UNet.git src/ml_downscaling_emulator/unet` - this is only necessary if you wish to use the deterministic comparison models. 6. Configure application behaviour with environment variables. See `.env.example` for variables that can be set. Any datasets are assumed to be found in `${DERIVED_DATA}/moose/nc-datasets/{dataset_name}/`. In particular, the config key config.data.dataset_name is the name of the dataset to use to train the model. -## Usage +## Diffusion Model Usage + +### Data + +Datasets for use with the emulator can be created using [[https://github.com/henryaddison/mlde-data]]. +This repo contains further information about dataset specification. +The datasets used in the paper can be found on [Zenodo](https://doi.org/10.5281/zenodo.11504859). + +**NB** the interface commonly takes just the name of a dataset name. It is expected to be found at `${DERIVED_DATA}/moose/nc-datasets/{dataset_name}/` (where DERIVED_DATA is a configurable environment variable). ### Smoke test @@ -28,7 +38,7 @@ Recommended to run with a sample of the dataset. ### Training -Train models through `bin/main.py`, e.g. +Train models through `bin/main.py`, e.g. to train the model used in the paper use ```sh python bin/main.py --config src/ml_downscaling_emulator/score_sde_pytorch/configs/subvpsde/ukcp_local_pr_12em_cncsnpp_continuous.py --workdir ${DERVIED_DATA}/path/to/models/paper-12em --mode train @@ -64,12 +74,12 @@ Functionalities can be configured through config files, or more conveniently, th Once have trained a model create samples from it with `bin/predict.py`, e.g. ```sh -python bin/predict.py --checkpoint epoch-20 --dataset bham_60km-4x_12em_psl-sphum4th-temp4th-vort4th_eqvt_random-season --split test --ensemble-member 01 --input-transform-dataset bham_60km-4x_12em_psl-sphum4th-temp4th-vort4th_eqvt_random-season --input-transform-key pixelmmsstan --num-samples 1 ${DERVIED_DATA}/path/to/models/paper-12em +python bin/predict.py --checkpoint epoch_20 --dataset bham_60km-4x_12em_psl-sphum4th-temp4th-vort4th_eqvt_random-season --split test --ensemble-member 01 --input-transform-dataset bham_60km-4x_12em_psl-sphum4th-temp4th-vort4th_eqvt_random-season --input-transform-key pixelmmsstan --num-samples 1 ${DERVIED_DATA}/path/to/models/paper-12em ``` This example command will: * use the checkpoint of the model in `${DERVIED_DATA}/path/to/models/paper-12em/checkpoints/{checkpoint}.pth` and model config from training `${DERVIED_DATA}/path/to/models/paper-12em/config.yml`. -* store samples generated in `${DERVIED_DATA}/path/to/models/paper-12em/samples/{dataset}/{input_transform_data}-{input_transform_key}/{split}/{ensemble_member}/`. Sample files and named like `predictions-{uuid}.nc`. +* store samples generated in `${DERVIED_DATA}/path/to/models/paper-12em/samples/{dataset}/{input_transform_data}-{input_transform_key}/{split}/{ensemble_member}/`. Sample files ar named like `predictions-{uuid}.nc`. * generate samples conditioned on examples from ensemble member `01` in the `test` subset of the `bham_60km-4x_12em_psl-sphum4th-temp4th-vort4th_eqvt_random-season` dataset. * transform the inputs based on the `bham_60km-4x_12em_psl-sphum4th-temp4th-vort4th_eqvt_random-season` dataset using the `pixelmmsstan` approach. * generate 1 set of samples.