Pierzyna, M., Saathof, R., and Basu, S. "Π-ML: A dimensional analysis-based machine learning parameterization of optical turbulence in the atmospheric surface layer". Optics Letters, vol. 48, no. 17, 2023, pp. 4484-4487.
DOI: https://doi.org/10.1364/OL.492652 (also on arxiv)
The trained
- Clone or download this git repository and its submodules:
git clone --recurse-submodules https://github.com/mpierzyna/piml
- Set up the required Python packages in a new conda environment:
Note: The exact package versions from the environment file have to be used to guarantee that trained models can be loaded later.
conda env create -f environment.yml
An example workspace set up to model optical turbulence strength
- Create new a workspace for by copying the template folder:
cp -r workspace/template workspace/my_workspace[config.yml](workspace%2Fcn2_mlo%2Fconfig.yml)
- For convenience, set the path to your workspace as environment variable:
Note, this needs to be repeated everytime you open a new terminal.
export PIML_WORKSPACE=workspace/my_workspace
- Follow the example/tutorial in workspace/cn2_mlo to set up your own
config.yml
.
Activate the conda environment:
conda activate piml
-
step_1_make_pi_sets.py
: Generates all possible$\Pi$ -sets based on variables inconfig.yml
and saves them tomy_workspace/1_raw/pi_sets_full.joblib
. -
step_2_constrain_pi_sets.py
: Apply the following constraints to reduce the number of possible$\Pi$ -sets. Please refer to our paper for more details. If you require different or more constraints, you need to modify the code.- Each
$\Pi$ -set can only contain a single$\Pi$ -group that is function of the model output/target. - Signed dim. variables, have to retain their sign, so, e.g., squared versions of that variable are not allowed.
- Each
-
step_3_split_train_test.py
: Split dimensional dataset into training and testing portions and make sure it is valid for training. -
step_4_train_ensemble.py
: Train ensemble of models for each valid$\Pi$ -set. By default training happens sequentially, which might take a long time. To train models in parallel, supply the--pi_set=...
flag to train only a specific$\Pi$ -set and use the array functionality of your HPC scheduler to run multiple jobs with increasing integer values for--pi_set=...
. -
step_5_eval_ensemble.py
: Evaluate the trained ensemble of models on the test dataset and plot diagnostic figures.