Enable multivariate emulator training and sampling #38

henryaddison · 2024-09-18T12:56:38Z

NB merging this branch does not assume that multivariate models are "good" (for whatever definition of good) merely that they work (i.e. don't blow up or produce entirely random output).

To Do:

Revert change of environment name
Non-alpha version of mlde_utils
Non-alpha version of this package (once merged)
Wait until paper is ready?

Covers:

Allowing multiple outputs to be modelled by emulator
Deterministic models using the score-sde training loop (and prediction script) leading to better deterministic U-Net performace
Update change in name of datasets to be easier to read
Smoothing the val loss calculation (to use the same random starting points and noise levels each time)

for upcoming changes related to multivariate work

for the xfm for relhum of recentring 0-100 percentages to -1 to 1

by using the same random numbers for computing loss when not in train mode

partly for debugging but also to see improvements

it's large

to be able to use hopefully better xfms for tmean150cm and relhum150cm

instead just do the big ones

rather than score-sde/diffusion setting

as well as diffusion type models

in theory this should replace the mirroring deterministic package but obviously we should test this once blue pebble is back up properly also include configs for a now more tuned config for plain det unet and one that resembles the old det unet config more closely

so can basically disable EMA with a flag. This is another difference between u-net trained on score_sde side deterministically and the separate deterministic training approach."" In theory decay rate of 1 should allow this but it's complicated by a num_updates params too

a rate of 1 means no EMA this is backwards compatible unlike adding a new config attribute

for backwards compatibility

Random initization of location-specific parameters

though I think this can't be done entirely on the fly from CLI yet

rather than rely on it being pre-installed by anther means but still allow for compilation of custom extensions

Switch to using cuda package from nvidia's conda channels

now can do deterministic (MSE) training on the score_sde_pytorch side

no reason anymore to have it in this namespace now that parallel deterministic namespace has been removed

Remove deterministic package

so not quite what it was before but much easier to remember

henryaddison added 30 commits September 18, 2024 13:31

change conda env for training for now

31a349a

update version to a alpha stand-in around 0.2

4a720ce

for upcoming changes related to multivariate work

add start of a config file for multivariate model

7a4286c

make predict.py multivariate aware

892280f

correct a couple of bugs from multivariate samples

b99001d

update mv config to full cCPM dataset with 12em and relhum + temp + pr

f5e2a84

bump mlde_utils

6827f9b

for the xfm for relhum of recentring 0-100 percentages to -1 to 1

rename the mv conda env for now

bbe98e6

add debug config for testing diff emulators in mv setting

3772a59

correct dataset name for mv config

a06139b

add a script to update references to dataset names in workdir configs

740e1b1

correct how config paths are found

ee3916d

not all configs have config.data.input_transform_dataset

9a97110

actually save configs

087fb32

a few more dataset name changes needed

61b2013

update default dataset names in configs

4a40214

attempt to stabilize validation loss

78f0115

by using the same random numbers for computing loss when not in train mode

compute and log val loss before 1st epoch

899ada5

partly for debugging but also to see improvements

don't duplicate logging for validation loss each epoch

1e60348

don't both recording val loss before any training

a8f1491

it's large

bump mlde_utils

84f0502

to be able to use hopefully better xfms for tmean150cm and relhum150cm

add memory size of model to model-size scritps

bb77ebb

compress samples on disk in predict.py

8535a2a

some issue with compressing all vars in predict.py

729e98e

instead just do the big ones

add a config for using models in a deterministic setup

28cc71a

rather than score-sde/diffusion setting

update training and sampling to handle deterministic approach

58724f7

as well as diffusion type models

add a smoke test for debugging det models in the main module

7814bdb

use decay/ema_rate to effectively disable EMA

edd70cc

a rate of 1 means no EMA this is backwards compatible unlike adding a new config attribute

henryaddison added 23 commits September 18, 2024 13:31

add helper scripts for queuing model jobs on jasmin

1e518e1

allow for missing deterministic key on config

ff9def5

for backwards compatibility

use plain unet config as default for debug version

ef46ca1

correct name of config in a smoke test

dd97da1

re-add loc spec channels to smoke test

6f0a587

add loc spec channels to diff smoke test

8fc977a

organize smoke tests better

2bae4fe

initialize loc-spec params from gaussian

a8e03f6

use He initialization

e1b866d

Merge pull request #37 from henryaddison/randn-init-loc-params

5d1c28b

Random initization of location-specific parameters

bump GH action for installing mamba

2fe5cd8

bump mlde_utils version

a3ea315

add configs for stand alone relhum and tmean at 1.5m models

18426d2

try without micromamba caching in CI GH workflow

279cacf

allow overriding of target transform per variable

122b88d

though I think this can't be done entirely on the fly from CLI yet

install cuda via conda

e5ed606

rather than rely on it being pre-installed by anther means but still allow for compilation of custom extensions

Merge pull request #39 from henryaddison/cuda-packages

fb2ece8

Switch to using cuda package from nvidia's conda channels

update to full 0.2 release of mlde_utils

4a62319

remove deterministic package (AMAP)

f3cf227

now can do deterministic (MSE) training on the score_sde_pytorch side

remove score_sde_pytorch namespace

f76547f

no reason anymore to have it in this namespace now that parallel deterministic namespace has been removed

update sample helper used for bilinear interpolation

4df883a

Merge pull request #40 from henryaddison/remove-old-unet-approach

c262366

Remove deterministic package

rename conda env to mlde

7cfca01

so not quite what it was before but much easier to remember

henryaddison merged commit 93b2765 into main Oct 22, 2024
3 checks passed

henryaddison deleted the multivariate branch October 22, 2024 10:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable multivariate emulator training and sampling #38

Enable multivariate emulator training and sampling #38

henryaddison commented Sep 18, 2024 •

edited

Loading

Enable multivariate emulator training and sampling #38

Enable multivariate emulator training and sampling #38

Conversation

henryaddison commented Sep 18, 2024 • edited Loading

henryaddison commented Sep 18, 2024 •

edited

Loading