Home

General Information

Uses python 3
Need to clone python repo and install hdbscan
e.g. mine is python3_20200417
Environment example in “environment_hdbscan”
Choose cluster method by setting in SCS_* settings scripts
ClusterMeth='hdbscan' # current options are ['HandK','hdbscan']
Cluster breakup option is ClusterBreakup=1 set in SCS_* settings scripts.
Variable dependence plots and data created via ipynb file in VariableImportance directory

Setting Number of Extreme Days (2 methods)

1. Set in SCS_ settings scripts

rgrNrOfExtremes=10 #[6,10,15,30]
Original way to set number of extreme days, just picks 10, 30, etc. days

2. Set a threshold in the code to use days with XX number of reports per day

Need to manually comment/uncomment and modify within the programs (SearchOptimum_XWT-Combination.py and Centroids-and-Scatterplot.py) themselves.
Search “KRF”
Uncommenting these and hardcoding the number will override anything set in SCS_ settings scripts

Adjust minimum distance of days between extreme days

Original is 7 days from Andy’s experience and recommendation
Can modify this by editing a line in:
Centroids-and-Scatterplot.py and Functions_Extreme_WTs.py (in function XWT)
Search for “KRF” and/or “MinDistDD”

Scripts

SCS_search_settings.py
- Settings for the search program
runSearch_hdbscan.csh
- Job script that submits SearchOptimum_XWT-Combination.py
SearchOptimum_XWT-Combination.py
- Program for searching for and return optimum configuration based on provided settings
SCS_XWT_apply_settings.py
- Settings for applying the optimum configuration
Centroids-and-Scatterplot.py
- Program that applies the optimum to the clustering algorithm and returns WTs and plots
VariableImportance/VariableImportance.ipynb
- Jupyter notebook for plotting variable importance heat maps and cvs files of top variable combinations and skill scores.

Procedures on Casper

There are basically two parts to the XWT and each part more or less do the same thing, so it’s almost like having two identical programs. This can make modifying code cumbersome since it needs to be done in two places.

The first part is encompassed in steps 2-3 and these perform the search and evaluation of all combinations of variables and outputs the optimum settings, where optimum is defined by the minimization of the average of two skill scores. The second part is steps 4-5 in which we apply those settings from the output of step 3. One can also just modify the apply settings in Steps 4-5 to play around with various changes to settings.

1. Set environment (e.g. see environment_hdbscan file)

Assumes you have cloned python3 repo and installed hdbscan)

2. Modify settings in SCS_search_settings.py

Basic settings like regions, months, paths, etc., are basically identical to those in Step 4; make sure these match when making changes.

3. Submit job runSearch_hdbscan.csh
sbatch runSearch_hdbscan.csh

Output in job output file. e.g.:
====== OPTIMAL SETTINGS ======
Region: MDW.poly
Months: 6-7-8
VARIABLES
UV850
CAPE
CAPE-Shear
Extreme Nr : 42
Domain Size : M
Annual Cy. Rem.: 1
Smoothing : 0.5
Average Score : 0.49
====================================

4. Use output in (3) to modify settings in SCS_XWT_apply_settings.py

Basic settings like regions, months, paths, etc., are basically identical to those in Step 2; make sure to modify/update here too.
rgsWTvars, VarsFullName, rgsWTfolders must be in correct order and are case sensitive; use commented block for guidance

5. Run the cluster and plotting program
./Centroids-and-Scatterplot.py

Output:

data/ directory (name set in SCS_XWT_apply_settings.py)
plot/ directory (name set in SCS_XWT_apply_settings.py)
- Subdirectories by region

Provide feedback

Saved searches