Feature 345 develop gha node20 (#347)

* Prepare for next release * add missing end quote to fix package install * reset_index is performed on the float value #322 (#323) * Update release notes (#328) * Update release-notes.rst formatting * Update and rename 2.1.0_wcoss2 to 3.0.0_wcoss2 * loop over statistics only once to avoid data multiplication #330 (#331) * Added sphinx_rtd_theme to extensions * Updated requirements.txt * Added pillow * feature 497 headers (#336) * changing header for continuity * Modified the other headers in the file to be consistent with other repos --------- Co-authored-by: Julie Prestopnik <[email protected]> * Beta2 release (#338) * Next version * Feature 332 di doc (#333) * Add difficulty index documentation * Add more documentation * Add more definition * Fix indent * Add figure * fix indentation * fix equations * Add table * Added remaining tables * fix table issue * Add links * formatting * change link to latest --------- Co-authored-by: Tracy <[email protected]> * Additions to beta2 release (#340) * Next beta * Bugfix 329 negative bcmse (#344) * Issue #329 return 0 if negative BCMSE value is calculated * Issue #329 add test for calculate_bcmse() in the sl1l2_statistics module * Issue #392 added test_sl1l2.py to the list of pytests to run * command line updates * updating yaml file --------- Co-authored-by: Hank Fisher <[email protected]> Co-authored-by: George McCabe <[email protected]> Co-authored-by: John Halley Gotway <[email protected]> Co-authored-by: Tatiana Burek <[email protected]> Co-authored-by: jprestop <[email protected]> Co-authored-by: Tracy Hertneky <[email protected]> Co-authored-by: Tracy <[email protected]> Co-authored-by: bikegeek <[email protected]>
dtcenter · Feb 2, 2024 · 15d1cc0 · 15d1cc0
1 parent 5a4bfd6
commit 15d1cc0
Show file tree

Hide file tree

Showing 12 changed files with 241 additions and 36 deletions.
diff --git a/.github/workflows/unit_tests.yml b/.github/workflows/unit_tests.yml
@@ -64,6 +64,7 @@ jobs:
         pytest test_utils.py
         pytest test_validate_mv_python.py
         pytest test_future_warnings.py
+        pytest test_sl1l2.py
         coverage run -m pytest test_agg_eclv.py test_agg_stats_and_boot.py test_agg_stats_with_groups.py test_calc_difficulty_index.py test_convert_lon_indices.py test_event_equalize.py test_event_equalize_against_values.py test_lon_360_to_180.py test_statistics.py test_tost_paired.py test_utils.py test_future_warnings.py
         coverage html
         

diff --git a/docs/Users_Guide/difficulty_index.rst b/docs/Users_Guide/difficulty_index.rst
@@ -0,0 +1,171 @@
+****************
+Difficulty Index
+****************
+
+Description
+===========
+
+This module is used to calculate the difficulty of a decision based on a set of forecasts, 
+such as an ensemble, for quantities such as wind speed or significant wave height as a 
+function of space and time.
+
+Example
+=======
+
+An example Use-Case for running Difficulty Index for Wind Speed can be found in the `METplus documentation <https://metplus.readthedocs.io/en/latest/generated/model_applications/medium_range/UserScript_fcstGEFS_Difficulty_Index.html#sphx-glr-generated-model-applications-medium-range-userscript-fcstgefs-difficulty-index-py>`_.
+
+Decision Difficulty Index Computation
+=====================================
+
+Consider the following formulation of a forecast decision difficulty index:
+
+  .. math :: d_{i,j} = \frac{A(\bar{x}_{i,j})}{2}(\frac{(\sigma/\bar{x})_{i,j}}{(\sigma/\bar{x})_{ref}}+[1-\frac{1}{2}|P(x_{i,j}\geq thresh)-P(x_{i,j}<thresh)|])
+
+where :math:`\sigma` is the ensemble standard deviation, :math:`\bar{x}` is the ensemble mean, 
+:math:`P(x_{i,j}\geq thresh)` is the ensemble (sample) probability of being greater than or equal 
+to the threshold, and  :math:`P(x_{i,j}<thresh)` is the ensemble probability of being less than 
+the threshold. The :math:`(\sigma/\bar{x})` expression is a measure of spread normalized by the 
+mean, and it allows one to identify situations of truly significant uncertainty. Because the 
+difficulty index is defined only for positive definite quantities such as wind speed or significant 
+wave height, division by zero is avoided. :math:`(\sigma/\bar{x})_{ref}` is a (scalar) reference 
+value, for example the maximum value of :math:`(\sigma/\bar{x})` obtained over the last 5 days as 
+a function of geographic region.
+
+The first term in the outer brackets is large when the uncertainty in the current forecast is 
+large relative to a reference. The second term is minimum when all the probability is either 
+above or below the threshold, and maximum when the probability is evenly distributed about the 
+threshold. Therefore, it penalizes the split case, where the ensemble members are close to evenly 
+split on either side of the threshold. The *A* term outside the brackets is a weighting to account 
+for heuristic forecast difficulty situations. Its values for winds are given below.
+
+| A = 0 if :math:`\bar{x}` is above 50kt
+| A = 0 if :math:`\bar{x}` is below 5kt
+| A = 1.5 if :math:`\bar{x}` is between 28kt and 34kt
+| A = :math:`1.5 - 1.5[\frac{\bar{x}(kt)-34kt}{16kt}]` for 34kt :math:`\leq\bar{x}\leq` 50kt
+| A = :math:`1.5[\frac{\bar{x}(kt)-5kt}{23kt}]` for 5kt :math:`\leq\bar{x}\leq` 28kt
+
+  .. _difficulty_index_fig1:
+
+  .. figure:: figure/weighting_wind_speed_difficulty_index.png
+
+     Weighting applied to wind difficulty index.
+
+The weighting ramps up to a value 1.5 for a value of *x* that is slightly below the threshold. 
+This accounts for the notion that a forecast is more difficult when it is slightly below the threshold 
+than slightly above. The value of *A* then ramps down to zero for large values of 
+:math:`\bar{x}_{i,j}`.
+
+To gain a sense of how the difficulty index performs, consider the interplay between probability of 
+exceedance, normalized ensemble spread, and the mean forecast value (which sets the value of 
+*A*) shown in Tables 3.1-3.3. Each row is for a different probability of threshold exceedance, 
+:math:`P(x_{i,j} \geq thresh)`, each column is for a different value of normalized uncertainty, 
+quantized as small, :math:`(\sigma/\bar{x})/(\sigma/\bar{x})_{ref}=0.01`, medium, 
+:math:`(\sigma/\bar{x})/(\sigma/\bar{x})_{ref}=0.05`, and large, 
+:math:`(\sigma/\bar{x})/(\sigma/\bar{x})_{ref}=1.0`. Each box contains the calculation of 
+:math:`d_{i,j}` for that case.
+
+When :math:`\bar{x}` is very large or very small the difficulty index is dominated by *A*. 
+Regardless of the spread or the probability of exceedance the difficulty index takes on a value near 
+zero and the forecast is considered to be easy (:numref:`table_1`).
+
+When :math:`\bar{x}` is near the threshold (e.g. 25kt or 37kt), the situation is a bit more complex 
+(:numref:`table_2`). For small values of spread the only interesting case is when the probability is 
+equally distributed about the threshold. For large spread, all probability values deserve a look, and 
+the case where the probability is equally distributed about the threshold is deemed difficult.
+
+When :math:`\bar{x}` is close to but slightly below the threshold (e.g. between 28kt and 34kt), 
+almost all combinations of probability of exceedance and spread deserve a look, and all values of the 
+difficulty index for medium and large spread are difficult or nearly difficult (:numref:`table_3`).
+
+.. _table_1:
+
+.. list-table:: Example of an easy forecast where :math:`\bar{x}` is very large (e.g. 48 kt) or very small (e.g. 7kt), making :math:`A/2=0.1/2=0.05`.
+  :widths: auto
+  :header-rows: 1
+
+  * - Prob of Thresh Exceedance 
+    - Small Spread
+    - Medium Spread
+    - Large Spread
+  * - 1
+    - 0.05*(0.01+0.5) = 0.026
+    - 0.05*(0.5+0.5) = 0.05
+    - 0.05*(1+0.5) = 0.075
+  * - 0.75
+    - 0.05*(0.01+0.75) = 0.038
+    - 0.05*(0.5+0.75) = 0.063
+    - 0.05*(1+0.75) = 0.088
+  * - 0.5
+    - 0.05*(0.01+1) = 0.051
+    - 0.05*(0.5+1) = 0.075
+    - 0.05*(1+1) = 0.1
+  * - 0.25
+    - 0.05*(0.01+0.75) = 0.038
+    - 0.05*(0.5+0.75) = 0.063
+    - 0.05*(1+0.75) = 0.088
+  * - 0
+    - 0.05*(0.01+0.5) = 0.026
+    - 0.05*(0.5+0.5) = 0.05
+    - 0.05*(1+0.5) = 0.075
+
+.. _table_2:
+
+.. list-table:: Example of a forecast that could be difficult if the conditions are right, where :math:`\bar{x}` is moderately close to the threshold (e.g. 25kt or 37kt), making :math:`A/2=1/2=0.5`.
+  :widths: auto
+  :header-rows: 1
+
+  * - Prob of Thresh Exceedance
+    - Small Spread
+    - Medium Spread
+    - Large Spread
+  * - 1
+    - 0.5*(0.01+0.5) = 0.26
+    - 0.5*(0.5+0.5) = 0.5
+    - 0.5*(1+0.5) = 0.75
+  * - 0.75
+    - 0.5*(0.01+0.75) = 0.38
+    - 0.5*(0.5+0.75) = 0.63
+    - 0.5*(1+0.75) = 0.88
+  * - 0.5
+    - 0.5*(0.01+1) = 0.51
+    - 0.5*(0.5+1) = 0.75
+    - 0.5*(1+1) = 1.0
+  * - 0.25
+    - 0.5*(0.01+0.75) = 0.38
+    - 0.5*(0.5+0.75) = 0.63
+    - 0.5*(1+0.75) = 0.88
+  * - 0
+    - 0.5*(0.01+0.5) = 0.26
+    - 0.5*(0.5+0.5) = 0.5
+    - 0.5*(1+0.5) = 0.75
+
+.. _table_3:
+
+.. list-table:: Example of a situation that is almost always difficult, where :math:`\bar{x}` is at or slightly below the threshold (e.g. 28kt to 34kt), making :math:`A/2=1.5/2=0.75`.
+  :widths: auto
+  :header-rows: 1
+
+  * - Prob of Thresh Exceedance
+    - Small Spread
+    - Medium Spread
+    - Large Spread
+  * - 1
+    - 0.75*(0.01+0.5) = 0.38
+    - 0.75*(0.5+0.5) = 0.75
+    - 0.75*(1+0.5) = 1.13
+  * - 0.75
+    - 0.75*(0.01+0.75) = 0.57
+    - 0.75*(0.5+0.75) = 0.94
+    - 0.75*(1+0.75) = 1.31
+  * - 0.5
+    - 0.75*(0.01+1) = 0.76
+    - 0.75*(0.5+1) = 1.13
+    - 0.75*(1+1) = 1.5
+  * - 0.25
+    - 0.75*(0.01+0.75) = 0.57
+    - 0.75*(0.5+0.75) = 0.94
+    - 0.75*(1+0.75) = 1.31
+  * - 0
+    - 0.75*(0.01+0.5) = 0.38
+    - 0.75*(0.5+0.5) = 0.75
+    - 0.75*(1+0.5) = 1.13
diff --git a/docs/Users_Guide/figure/weighting_wind_speed_difficulty_index.png b/docs/Users_Guide/figure/weighting_wind_speed_difficulty_index.png
diff --git a/docs/Users_Guide/index.rst b/docs/Users_Guide/index.rst
@@ -64,6 +64,7 @@ National Center for Atmospheric Research (NCAR) is sponsored by NSF.
 
    installation
    vertical_interpolation
+   difficulty_index
    release-notes
 
 **Indices and tables**

diff --git a/docs/Users_Guide/release-notes.rst b/docs/Users_Guide/release-notes.rst
@@ -9,33 +9,35 @@ METcalcpy Release Notes
 =======================
 
 
-METcalcpy Version 2.1.0 release notes (20230728)
-----------------------------------------------------
+METcalcpy Velsion 3.0.0-beta2 release notes (20231114)
+------------------------------------------------------
 
   .. dropdown:: New Functionality
 
-     * **Convert plot_tcmpr.R calculations to Python** (`#189 <https://github.com/dtcenter/METcalcpy/issues/189>`_)
-
   .. dropdown:: Enhancements
 
-     * **Upgrade to using Python 3.10.4** (`#270 <https://github.com/dtcenter/METcalcpy/issues/270>`_)
-     * Enhance the Release Notes by adding dropdown menus(`#292 <https://github.com/dtcenter/METcalcpy/issues/292>`_)
-     * **Updated Analysis tools to handle TC-RMW functionality** (`#308 <https://github.com/dtcenter/METcalcpy/issues/308>`_)
-
   .. dropdown:: Internal
 
-     * Add 'License.txt' to the METcalcpy repo (`#279 <https://github.com/dtcenter/METcalcpy/issues/279>`_)
-     * Update cloud python to 3.10(`#170 <https://github.com/dtcenter/METcalcpy/issues/170>`_)
-     * Update cloud python to 3.8(`#169 <https://github.com/dtcenter/METcalcpy/issues/169>`_)
-     * Changed Project to Cycle assignment in github(`#301 <https://github.com/dtcenter/METcalcpy/issues/301>`_)
-     * Partial completion of logging for STIGS(`#46 <https://github.com/dtcenter/METplus-internal/issues/46>`_)
+     * Change second person references to third (`#315 <https://github.com/dtcenter/METcalcpy/issues/315>`_)
+     * Enhanced documentation for Difficulty index (`#332 <https://github.com/dtcenter/METcalcpy/issues/332>`_)
 
   .. dropdown:: Bugfixes
 
-     * **Remaining reference to ARIMA in utils.py compute_std_err_from_median_variance_inflation_factor()** (`#254 <https://github.com/dtcenter/METcalcpy/issues/254>`_)
-     * **Replace frame.append method to pandas.concat in agg_stat.py** (`#296 <https://github.com/dtcenter/METcalcpy/issues/296>`_)
-     * **Fixed Ratio Derived curve not generating** (`#302 <https://github.com/dtcenter/METcalcpy/issues/302>`_)
-     * **Fixed PSTD_ROC_AUC calculation** (`#306 <https://github.com/dtcenter/METcalcpy/issues/306>`_)
+     * Add missing reliability statistics (`#330 <https://github.com/dtcenter/METcalcpy/issues/330>`_)
+
+METcalcpy Version 3.0.0-beta1 release notes (20230915)
+------------------------------------------------------
+
+  .. dropdown:: New Functionality
+
+  .. dropdown:: Enhancements
+
+  .. dropdown:: Internal
+
+  .. dropdown:: Bugfixes
+
+     * Remove reset_index from various calculations (`#322 <https://github.com/dtcenter/METcalcpy/issues/322>`_)
+
 
 METcalcpy Upgrade Instructions
 ==============================

diff --git a/docs/index.rst b/docs/index.rst
@@ -1,14 +1,14 @@
-###########################
+===========================
 METcalcpy version |version|
-###########################
+===========================
 
 Developed by the `Developmental Testbed Center <https://dtcenter.org/>`_,
 Boulder, CO
 
 .. image:: _static/METplus_banner_photo_web.png
 
-**History**
-
+History
+-------
 
 The Model Evaluation Tools (MET) were developed by the Developmental Testbed
 Center (DTC)  and released in January 2008. The goal of the tools was to
@@ -37,7 +37,8 @@ diagnostics capability for NOAA's UFS and a component of NCAR's SIMA
 modeling frameworks.  It is being actively developed by NCAR, ESRL, EMC
 and is open to community contributions.
 
-**METplus Concept**
+METplus Concept
+---------------
 
 METplus is the overarching, or umbrella, repository and hence framework for
 the Unified Forecast System verification capability.  It is intended to be
@@ -76,7 +77,8 @@ was developed because CESM is comprised of a number of different components
 that are developed and managed independently. Each component also may have
 additional "external" dependencies that need to be maintained independently.
 
-**Acronyms**
+Acronyms
+--------
 
 * **MET** - Model Evaluation Tools
 * **DTC** - Developmental Testbed Center
@@ -93,7 +95,8 @@ additional "external" dependencies that need to be maintained independently.
 * **GSD** - Global Systems Division
 
 
-**Authors**
+Authors
+-------
 
 Many authors, listed below in alphabetical order, have contributed to the documentation of METplus.
 To cite this documentation in publications, please refer to the METcalcpy User's Guide :ref:`Citation Instructions<citations>`.
@@ -121,7 +124,8 @@ To cite this documentation in publications, please refer to the METcalcpy User's
 
 
 
-**Indices**
+Indices
+=======
 
 * :ref:`genindex`
 * :ref:`modindex`

diff --git a/docs/version b/docs/version
@@ -1 +1 @@
-__version__="2.1.0"
+__version__="3.0.0-beta3"
diff --git a/...pts/installation/modulefiles/2.1.0_wcoss2 → ...pts/installation/modulefiles/3.0.0_wcoss2 b/...pts/installation/modulefiles/2.1.0_wcoss2 → ...pts/installation/modulefiles/3.0.0_wcoss2
@@ -3,7 +3,7 @@
 ##      METcalcpy
 ##
 proc ModulesHelp { } {
-  puts stderr "Sets up the paths and environment variables to use the METcalcpy-2.1.0.
+  puts stderr "Sets up the paths and environment variables to use the METcalcpy-3.0.0.
      *** For help see the official MET webpage at http://www.dtcenter.org/met/users ***"
 }
 
@@ -12,4 +12,4 @@ module load intel
 module use /apps/dev/modulefiles/
 module load ve/evs/2.0
 
-prepend-path PYTHONPATH /apps/sw_review/emc/METcalcpy/2.1.0
+prepend-path PYTHONPATH /apps/sw_review/emc/METcalcpy/3.0.0
diff --git a/metcalcpy/util/pstd_statistics.py b/metcalcpy/util/pstd_statistics.py
@@ -41,16 +41,13 @@ def calculate_pstd_brier(input_data, columns_names):
         t_table = df_pct_perm['n_i'].sum()
         o_bar_table = df_pct_perm['oy_i'].sum() / t_table
         o_bar = input_data[0, get_column_index_by_name(columns_names, 'o_bar')]
-
-        t_table.reset_index(inplace=True, drop=True)
-
         reliability = calc_reliability(t_table, df_pct_perm)
         resolution = calc_resolution(t_table, df_pct_perm, o_bar)
         uncertainty = calc_uncertainty(o_bar_table)
 
         brier = reliability - resolution + uncertainty
         result = round_half_up(brier, PRECISION)
-    except (TypeError, ZeroDivisionError, Warning, ValueError):
+    except (TypeError, ZeroDivisionError, Warning, ValueError, AttributeError):
         result = None
     warnings.filterwarnings('ignore')
     return result
@@ -75,8 +72,6 @@ def calculate_pstd_bss_smpl(input_data, columns_names):
         t_table = df_pct_perm['n_i'].sum()
         o_bar_table = df_pct_perm['oy_i'].sum() / t_table
         o_bar = input_data[0, get_column_index_by_name(columns_names, 'o_bar')]
-
-        t_table.reset_index(inplace=True, drop=True)
         reliability = calc_reliability(t_table, df_pct_perm)
         resolution = calc_resolution(t_table, df_pct_perm, o_bar)
         uncertainty = calc_uncertainty(o_bar_table)
@@ -161,7 +156,6 @@ def calculate_pstd_resolution(input_data, columns_names):
         df_pct_perm = _calc_common_stats(columns_names, input_data)
         o_bar = input_data[0, get_column_index_by_name(columns_names, 'o_bar')]
         t_table = df_pct_perm['n_i'].sum()
-        t_table.reset_index(inplace=True, drop=True)
         resolution = calc_resolution(t_table, df_pct_perm, o_bar)
         result = round_half_up(resolution, PRECISION)
     except (TypeError, ZeroDivisionError, Warning, ValueError):

diff --git a/metcalcpy/util/sl1l2_statistics.py b/metcalcpy/util/sl1l2_statistics.py
@@ -524,10 +524,13 @@ def calculate_bcmse(input_data, columns_names, aggregation=False):
     """
     warnings.filterwarnings('error')
     try:
+
         mse = calculate_mse(input_data, columns_names, aggregation)
         me = calculate_me(input_data, columns_names, aggregation)
         result = mse - me ** 2
         result = round_half_up(result, PRECISION)
+        if result < 0:
+            return 0.
     except (TypeError, Warning):
         result = None
     warnings.filterwarnings('ignore')

diff --git a/metcalcpy/util/utils.py b/metcalcpy/util/utils.py
@@ -691,7 +691,14 @@ def equalize_axis_data(fix_vals_keys, fix_vals_permuted, params, input_data, axi
 
     for fcst_var, fcst_var_stats in fcst_var_val.items():
         series_data_for_ee = pd.DataFrame()
-        for fcst_var_stat in fcst_var_stats:
+        fcst_var_stats_current = fcst_var_stats
+        # if the data comes from agg_stat workflow it doesn't contain statistics values.
+        # They will be calculated later. In this case The code should not loop over all
+        # requested statistics but instead should do it only ones to avoid data multiplication
+        if 'stat_name' not in input_data.keys():
+            fcst_var_stats_current = [fcst_var_stats[0]]
+
+        for fcst_var_stat in fcst_var_stats_current:
             # for each series for the specified axis
             if len(params['series_val_' + axis]) == 0:
                 series_data_for_ee = input_data