Skip to content
Kate Fossell edited this page Nov 9, 2020 · 1 revision

General Information

  • Uses python 3
  • Need to clone python repo and install hdbscan
  • e.g. mine is python3_20200417
  • Environment example in “environment_hdbscan”
  • Choose cluster method by setting in SCS_* settings scripts
  • ClusterMeth='hdbscan' # current options are ['HandK','hdbscan']
  • Cluster breakup option is ClusterBreakup=1 set in SCS_* settings scripts.
  • Variable dependence plots and data created via ipynb file in VariableImportance directory

Setting Number of Extreme Days (2 methods)

1. Set in SCS_ settings scripts

  • rgrNrOfExtremes=10 #[6,10,15,30]
  • Original way to set number of extreme days, just picks 10, 30, etc. days

2. Set a threshold in the code to use days with XX number of reports per day

  • Need to manually comment/uncomment and modify within the programs (SearchOptimum_XWT-Combination.py and Centroids-and-Scatterplot.py) themselves.
  • Search “KRF”
  • Uncommenting these and hardcoding the number will override anything set in SCS_ settings scripts

Adjust minimum distance of days between extreme days

  • Original is 7 days from Andy’s experience and recommendation
  • Can modify this by editing a line in:
  • Centroids-and-Scatterplot.py and Functions_Extreme_WTs.py (in function XWT)
  • Search for “KRF” and/or “MinDistDD”

Scripts

  • SCS_search_settings.py
    • Settings for the search program
  • runSearch_hdbscan.csh
    • Job script that submits SearchOptimum_XWT-Combination.py
  • SearchOptimum_XWT-Combination.py
    • Program for searching for and return optimum configuration based on provided settings
  • SCS_XWT_apply_settings.py
    • Settings for applying the optimum configuration
  • Centroids-and-Scatterplot.py
    • Program that applies the optimum to the clustering algorithm and returns WTs and plots
  • VariableImportance/VariableImportance.ipynb
    • Jupyter notebook for plotting variable importance heat maps and cvs files of top variable combinations and skill scores.

Procedures on Casper

There are basically two parts to the XWT and each part more or less do the same thing, so it’s almost like having two identical programs. This can make modifying code cumbersome since it needs to be done in two places.

The first part is encompassed in steps 2-3 and these perform the search and evaluation of all combinations of variables and outputs the optimum settings, where optimum is defined by the minimization of the average of two skill scores. The second part is steps 4-5 in which we apply those settings from the output of step 3. One can also just modify the apply settings in Steps 4-5 to play around with various changes to settings.

1. Set environment (e.g. see environment_hdbscan file)

  • Assumes you have cloned python3 repo and installed hdbscan)

2. Modify settings in SCS_search_settings.py

  • Basic settings like regions, months, paths, etc., are basically identical to those in Step 4; make sure these match when making changes.

3. Submit job runSearch_hdbscan.csh
sbatch runSearch_hdbscan.csh

Output in job output file. e.g.:
====== OPTIMAL SETTINGS ======
Region: MDW.poly
Months: 6-7-8
VARIABLES
UV850
CAPE
CAPE-Shear
Extreme Nr : 42
Domain Size : M
Annual Cy. Rem.: 1
Smoothing : 0.5
Average Score : 0.49
====================================

4. Use output in (3) to modify settings in SCS_XWT_apply_settings.py

  • Basic settings like regions, months, paths, etc., are basically identical to those in Step 2; make sure to modify/update here too.
  • rgsWTvars, VarsFullName, rgsWTfolders must be in correct order and are case sensitive; use commented block for guidance

5. Run the cluster and plotting program
./Centroids-and-Scatterplot.py

Output:

  • data/ directory (name set in SCS_XWT_apply_settings.py)
  • plot/ directory (name set in SCS_XWT_apply_settings.py)
    • Subdirectories by region