Skip to content

Commit

Permalink
Feature 345 develop gha node20 (#347)
Browse files Browse the repository at this point in the history
* Prepare for next release

* add missing end quote to fix package install

* reset_index is performed on the float value #322 (#323)

* Update release notes (#328)

* Update release-notes.rst formatting

* Update and rename 2.1.0_wcoss2 to 3.0.0_wcoss2

* loop over statistics only once to avoid data multiplication #330 (#331)

* Added sphinx_rtd_theme to extensions

* Updated requirements.txt

* Added pillow

* feature 497 headers (#336)

* changing header for continuity

* Modified the other headers in the file to be consistent with other repos

---------

Co-authored-by: Julie Prestopnik <[email protected]>

* Beta2 release (#338)

* Next version

* Feature 332 di doc (#333)

* Add difficulty index documentation

* Add more documentation

* Add more definition

* Fix indent

* Add figure

* fix indentation

* fix equations

* Add table

* Added remaining tables

* fix table issue

* Add links

* formatting

* change link to latest

---------

Co-authored-by: Tracy <[email protected]>

* Additions to beta2 release (#340)

* Next beta

* Bugfix 329 negative bcmse (#344)

* Issue #329 return 0 if negative BCMSE value is calculated

* Issue #329 add test for calculate_bcmse() in the sl1l2_statistics module

* Issue #392 added test_sl1l2.py to the list of pytests to run

* command line updates

* updating yaml file

---------

Co-authored-by: Hank Fisher <[email protected]>
Co-authored-by: George McCabe <[email protected]>
Co-authored-by: John Halley Gotway <[email protected]>
Co-authored-by: Tatiana Burek <[email protected]>
Co-authored-by: jprestop <[email protected]>
Co-authored-by: Tracy Hertneky <[email protected]>
Co-authored-by: Tracy <[email protected]>
Co-authored-by: bikegeek <[email protected]>
  • Loading branch information
9 people authored Feb 2, 2024
1 parent 5a4bfd6 commit 15d1cc0
Show file tree
Hide file tree
Showing 12 changed files with 241 additions and 36 deletions.
1 change: 1 addition & 0 deletions .github/workflows/unit_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ jobs:
pytest test_utils.py
pytest test_validate_mv_python.py
pytest test_future_warnings.py
pytest test_sl1l2.py
coverage run -m pytest test_agg_eclv.py test_agg_stats_and_boot.py test_agg_stats_with_groups.py test_calc_difficulty_index.py test_convert_lon_indices.py test_event_equalize.py test_event_equalize_against_values.py test_lon_360_to_180.py test_statistics.py test_tost_paired.py test_utils.py test_future_warnings.py
coverage html
Expand Down
171 changes: 171 additions & 0 deletions docs/Users_Guide/difficulty_index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,171 @@
****************
Difficulty Index
****************

Description
===========

This module is used to calculate the difficulty of a decision based on a set of forecasts,
such as an ensemble, for quantities such as wind speed or significant wave height as a
function of space and time.

Example
=======

An example Use-Case for running Difficulty Index for Wind Speed can be found in the `METplus documentation <https://metplus.readthedocs.io/en/latest/generated/model_applications/medium_range/UserScript_fcstGEFS_Difficulty_Index.html#sphx-glr-generated-model-applications-medium-range-userscript-fcstgefs-difficulty-index-py>`_.

Decision Difficulty Index Computation
=====================================

Consider the following formulation of a forecast decision difficulty index:

.. math :: d_{i,j} = \frac{A(\bar{x}_{i,j})}{2}(\frac{(\sigma/\bar{x})_{i,j}}{(\sigma/\bar{x})_{ref}}+[1-\frac{1}{2}|P(x_{i,j}\geq thresh)-P(x_{i,j}<thresh)|])
where :math:`\sigma` is the ensemble standard deviation, :math:`\bar{x}` is the ensemble mean,
:math:`P(x_{i,j}\geq thresh)` is the ensemble (sample) probability of being greater than or equal
to the threshold, and :math:`P(x_{i,j}<thresh)` is the ensemble probability of being less than
the threshold. The :math:`(\sigma/\bar{x})` expression is a measure of spread normalized by the
mean, and it allows one to identify situations of truly significant uncertainty. Because the
difficulty index is defined only for positive definite quantities such as wind speed or significant
wave height, division by zero is avoided. :math:`(\sigma/\bar{x})_{ref}` is a (scalar) reference
value, for example the maximum value of :math:`(\sigma/\bar{x})` obtained over the last 5 days as
a function of geographic region.

The first term in the outer brackets is large when the uncertainty in the current forecast is
large relative to a reference. The second term is minimum when all the probability is either
above or below the threshold, and maximum when the probability is evenly distributed about the
threshold. Therefore, it penalizes the split case, where the ensemble members are close to evenly
split on either side of the threshold. The *A* term outside the brackets is a weighting to account
for heuristic forecast difficulty situations. Its values for winds are given below.

| A = 0 if :math:`\bar{x}` is above 50kt
| A = 0 if :math:`\bar{x}` is below 5kt
| A = 1.5 if :math:`\bar{x}` is between 28kt and 34kt
| A = :math:`1.5 - 1.5[\frac{\bar{x}(kt)-34kt}{16kt}]` for 34kt :math:`\leq\bar{x}\leq` 50kt
| A = :math:`1.5[\frac{\bar{x}(kt)-5kt}{23kt}]` for 5kt :math:`\leq\bar{x}\leq` 28kt
.. _difficulty_index_fig1:

.. figure:: figure/weighting_wind_speed_difficulty_index.png

Weighting applied to wind difficulty index.

The weighting ramps up to a value 1.5 for a value of *x* that is slightly below the threshold.
This accounts for the notion that a forecast is more difficult when it is slightly below the threshold
than slightly above. The value of *A* then ramps down to zero for large values of
:math:`\bar{x}_{i,j}`.

To gain a sense of how the difficulty index performs, consider the interplay between probability of
exceedance, normalized ensemble spread, and the mean forecast value (which sets the value of
*A*) shown in Tables 3.1-3.3. Each row is for a different probability of threshold exceedance,
:math:`P(x_{i,j} \geq thresh)`, each column is for a different value of normalized uncertainty,
quantized as small, :math:`(\sigma/\bar{x})/(\sigma/\bar{x})_{ref}=0.01`, medium,
:math:`(\sigma/\bar{x})/(\sigma/\bar{x})_{ref}=0.05`, and large,
:math:`(\sigma/\bar{x})/(\sigma/\bar{x})_{ref}=1.0`. Each box contains the calculation of
:math:`d_{i,j}` for that case.

When :math:`\bar{x}` is very large or very small the difficulty index is dominated by *A*.
Regardless of the spread or the probability of exceedance the difficulty index takes on a value near
zero and the forecast is considered to be easy (:numref:`table_1`).

When :math:`\bar{x}` is near the threshold (e.g. 25kt or 37kt), the situation is a bit more complex
(:numref:`table_2`). For small values of spread the only interesting case is when the probability is
equally distributed about the threshold. For large spread, all probability values deserve a look, and
the case where the probability is equally distributed about the threshold is deemed difficult.

When :math:`\bar{x}` is close to but slightly below the threshold (e.g. between 28kt and 34kt),
almost all combinations of probability of exceedance and spread deserve a look, and all values of the
difficulty index for medium and large spread are difficult or nearly difficult (:numref:`table_3`).

.. _table_1:

.. list-table:: Example of an easy forecast where :math:`\bar{x}` is very large (e.g. 48 kt) or very small (e.g. 7kt), making :math:`A/2=0.1/2=0.05`.
:widths: auto
:header-rows: 1

* - Prob of Thresh Exceedance
- Small Spread
- Medium Spread
- Large Spread
* - 1
- 0.05*(0.01+0.5) = 0.026
- 0.05*(0.5+0.5) = 0.05
- 0.05*(1+0.5) = 0.075
* - 0.75
- 0.05*(0.01+0.75) = 0.038
- 0.05*(0.5+0.75) = 0.063
- 0.05*(1+0.75) = 0.088
* - 0.5
- 0.05*(0.01+1) = 0.051
- 0.05*(0.5+1) = 0.075
- 0.05*(1+1) = 0.1
* - 0.25
- 0.05*(0.01+0.75) = 0.038
- 0.05*(0.5+0.75) = 0.063
- 0.05*(1+0.75) = 0.088
* - 0
- 0.05*(0.01+0.5) = 0.026
- 0.05*(0.5+0.5) = 0.05
- 0.05*(1+0.5) = 0.075

.. _table_2:

.. list-table:: Example of a forecast that could be difficult if the conditions are right, where :math:`\bar{x}` is moderately close to the threshold (e.g. 25kt or 37kt), making :math:`A/2=1/2=0.5`.
:widths: auto
:header-rows: 1

* - Prob of Thresh Exceedance
- Small Spread
- Medium Spread
- Large Spread
* - 1
- 0.5*(0.01+0.5) = 0.26
- 0.5*(0.5+0.5) = 0.5
- 0.5*(1+0.5) = 0.75
* - 0.75
- 0.5*(0.01+0.75) = 0.38
- 0.5*(0.5+0.75) = 0.63
- 0.5*(1+0.75) = 0.88
* - 0.5
- 0.5*(0.01+1) = 0.51
- 0.5*(0.5+1) = 0.75
- 0.5*(1+1) = 1.0
* - 0.25
- 0.5*(0.01+0.75) = 0.38
- 0.5*(0.5+0.75) = 0.63
- 0.5*(1+0.75) = 0.88
* - 0
- 0.5*(0.01+0.5) = 0.26
- 0.5*(0.5+0.5) = 0.5
- 0.5*(1+0.5) = 0.75

.. _table_3:

.. list-table:: Example of a situation that is almost always difficult, where :math:`\bar{x}` is at or slightly below the threshold (e.g. 28kt to 34kt), making :math:`A/2=1.5/2=0.75`.
:widths: auto
:header-rows: 1

* - Prob of Thresh Exceedance
- Small Spread
- Medium Spread
- Large Spread
* - 1
- 0.75*(0.01+0.5) = 0.38
- 0.75*(0.5+0.5) = 0.75
- 0.75*(1+0.5) = 1.13
* - 0.75
- 0.75*(0.01+0.75) = 0.57
- 0.75*(0.5+0.75) = 0.94
- 0.75*(1+0.75) = 1.31
* - 0.5
- 0.75*(0.01+1) = 0.76
- 0.75*(0.5+1) = 1.13
- 0.75*(1+1) = 1.5
* - 0.25
- 0.75*(0.01+0.75) = 0.57
- 0.75*(0.5+0.75) = 0.94
- 0.75*(1+0.75) = 1.31
* - 0
- 0.75*(0.01+0.5) = 0.38
- 0.75*(0.5+0.5) = 0.75
- 0.75*(1+0.5) = 1.13
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/Users_Guide/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ National Center for Atmospheric Research (NCAR) is sponsored by NSF.

installation
vertical_interpolation
difficulty_index
release-notes

**Indices and tables**
Expand Down
36 changes: 19 additions & 17 deletions docs/Users_Guide/release-notes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,33 +9,35 @@ METcalcpy Release Notes
=======================


METcalcpy Version 2.1.0 release notes (20230728)
----------------------------------------------------
METcalcpy Velsion 3.0.0-beta2 release notes (20231114)
------------------------------------------------------

.. dropdown:: New Functionality

* **Convert plot_tcmpr.R calculations to Python** (`#189 <https://github.com/dtcenter/METcalcpy/issues/189>`_)

.. dropdown:: Enhancements

* **Upgrade to using Python 3.10.4** (`#270 <https://github.com/dtcenter/METcalcpy/issues/270>`_)
* Enhance the Release Notes by adding dropdown menus(`#292 <https://github.com/dtcenter/METcalcpy/issues/292>`_)
* **Updated Analysis tools to handle TC-RMW functionality** (`#308 <https://github.com/dtcenter/METcalcpy/issues/308>`_)

.. dropdown:: Internal

* Add 'License.txt' to the METcalcpy repo (`#279 <https://github.com/dtcenter/METcalcpy/issues/279>`_)
* Update cloud python to 3.10(`#170 <https://github.com/dtcenter/METcalcpy/issues/170>`_)
* Update cloud python to 3.8(`#169 <https://github.com/dtcenter/METcalcpy/issues/169>`_)
* Changed Project to Cycle assignment in github(`#301 <https://github.com/dtcenter/METcalcpy/issues/301>`_)
* Partial completion of logging for STIGS(`#46 <https://github.com/dtcenter/METplus-internal/issues/46>`_)
* Change second person references to third (`#315 <https://github.com/dtcenter/METcalcpy/issues/315>`_)
* Enhanced documentation for Difficulty index (`#332 <https://github.com/dtcenter/METcalcpy/issues/332>`_)

.. dropdown:: Bugfixes

* **Remaining reference to ARIMA in utils.py compute_std_err_from_median_variance_inflation_factor()** (`#254 <https://github.com/dtcenter/METcalcpy/issues/254>`_)
* **Replace frame.append method to pandas.concat in agg_stat.py** (`#296 <https://github.com/dtcenter/METcalcpy/issues/296>`_)
* **Fixed Ratio Derived curve not generating** (`#302 <https://github.com/dtcenter/METcalcpy/issues/302>`_)
* **Fixed PSTD_ROC_AUC calculation** (`#306 <https://github.com/dtcenter/METcalcpy/issues/306>`_)
* Add missing reliability statistics (`#330 <https://github.com/dtcenter/METcalcpy/issues/330>`_)

METcalcpy Version 3.0.0-beta1 release notes (20230915)
------------------------------------------------------

.. dropdown:: New Functionality

.. dropdown:: Enhancements

.. dropdown:: Internal

.. dropdown:: Bugfixes

* Remove reset_index from various calculations (`#322 <https://github.com/dtcenter/METcalcpy/issues/322>`_)


METcalcpy Upgrade Instructions
==============================
Expand Down
20 changes: 12 additions & 8 deletions docs/index.rst
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
###########################
===========================
METcalcpy version |version|
###########################
===========================

Developed by the `Developmental Testbed Center <https://dtcenter.org/>`_,
Boulder, CO

.. image:: _static/METplus_banner_photo_web.png

**History**

History
-------

The Model Evaluation Tools (MET) were developed by the Developmental Testbed
Center (DTC) and released in January 2008. The goal of the tools was to
Expand Down Expand Up @@ -37,7 +37,8 @@ diagnostics capability for NOAA's UFS and a component of NCAR's SIMA
modeling frameworks. It is being actively developed by NCAR, ESRL, EMC
and is open to community contributions.

**METplus Concept**
METplus Concept
---------------

METplus is the overarching, or umbrella, repository and hence framework for
the Unified Forecast System verification capability. It is intended to be
Expand Down Expand Up @@ -76,7 +77,8 @@ was developed because CESM is comprised of a number of different components
that are developed and managed independently. Each component also may have
additional "external" dependencies that need to be maintained independently.

**Acronyms**
Acronyms
--------

* **MET** - Model Evaluation Tools
* **DTC** - Developmental Testbed Center
Expand All @@ -93,7 +95,8 @@ additional "external" dependencies that need to be maintained independently.
* **GSD** - Global Systems Division


**Authors**
Authors
-------

Many authors, listed below in alphabetical order, have contributed to the documentation of METplus.
To cite this documentation in publications, please refer to the METcalcpy User's Guide :ref:`Citation Instructions<citations>`.
Expand Down Expand Up @@ -121,7 +124,8 @@ To cite this documentation in publications, please refer to the METcalcpy User's



**Indices**
Indices
=======

* :ref:`genindex`
* :ref:`modindex`
Expand Down
2 changes: 1 addition & 1 deletion docs/version
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__="2.1.0"
__version__="3.0.0-beta3"
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
## METcalcpy
##
proc ModulesHelp { } {
puts stderr "Sets up the paths and environment variables to use the METcalcpy-2.1.0.
puts stderr "Sets up the paths and environment variables to use the METcalcpy-3.0.0.
*** For help see the official MET webpage at http://www.dtcenter.org/met/users ***"
}

Expand All @@ -12,4 +12,4 @@ module load intel
module use /apps/dev/modulefiles/
module load ve/evs/2.0

prepend-path PYTHONPATH /apps/sw_review/emc/METcalcpy/2.1.0
prepend-path PYTHONPATH /apps/sw_review/emc/METcalcpy/3.0.0
8 changes: 1 addition & 7 deletions metcalcpy/util/pstd_statistics.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,16 +41,13 @@ def calculate_pstd_brier(input_data, columns_names):
t_table = df_pct_perm['n_i'].sum()
o_bar_table = df_pct_perm['oy_i'].sum() / t_table
o_bar = input_data[0, get_column_index_by_name(columns_names, 'o_bar')]

t_table.reset_index(inplace=True, drop=True)

reliability = calc_reliability(t_table, df_pct_perm)
resolution = calc_resolution(t_table, df_pct_perm, o_bar)
uncertainty = calc_uncertainty(o_bar_table)

brier = reliability - resolution + uncertainty
result = round_half_up(brier, PRECISION)
except (TypeError, ZeroDivisionError, Warning, ValueError):
except (TypeError, ZeroDivisionError, Warning, ValueError, AttributeError):
result = None
warnings.filterwarnings('ignore')
return result
Expand All @@ -75,8 +72,6 @@ def calculate_pstd_bss_smpl(input_data, columns_names):
t_table = df_pct_perm['n_i'].sum()
o_bar_table = df_pct_perm['oy_i'].sum() / t_table
o_bar = input_data[0, get_column_index_by_name(columns_names, 'o_bar')]

t_table.reset_index(inplace=True, drop=True)
reliability = calc_reliability(t_table, df_pct_perm)
resolution = calc_resolution(t_table, df_pct_perm, o_bar)
uncertainty = calc_uncertainty(o_bar_table)
Expand Down Expand Up @@ -161,7 +156,6 @@ def calculate_pstd_resolution(input_data, columns_names):
df_pct_perm = _calc_common_stats(columns_names, input_data)
o_bar = input_data[0, get_column_index_by_name(columns_names, 'o_bar')]
t_table = df_pct_perm['n_i'].sum()
t_table.reset_index(inplace=True, drop=True)
resolution = calc_resolution(t_table, df_pct_perm, o_bar)
result = round_half_up(resolution, PRECISION)
except (TypeError, ZeroDivisionError, Warning, ValueError):
Expand Down
3 changes: 3 additions & 0 deletions metcalcpy/util/sl1l2_statistics.py
Original file line number Diff line number Diff line change
Expand Up @@ -524,10 +524,13 @@ def calculate_bcmse(input_data, columns_names, aggregation=False):
"""
warnings.filterwarnings('error')
try:

mse = calculate_mse(input_data, columns_names, aggregation)
me = calculate_me(input_data, columns_names, aggregation)
result = mse - me ** 2
result = round_half_up(result, PRECISION)
if result < 0:
return 0.
except (TypeError, Warning):
result = None
warnings.filterwarnings('ignore')
Expand Down
9 changes: 8 additions & 1 deletion metcalcpy/util/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -691,7 +691,14 @@ def equalize_axis_data(fix_vals_keys, fix_vals_permuted, params, input_data, axi

for fcst_var, fcst_var_stats in fcst_var_val.items():
series_data_for_ee = pd.DataFrame()
for fcst_var_stat in fcst_var_stats:
fcst_var_stats_current = fcst_var_stats
# if the data comes from agg_stat workflow it doesn't contain statistics values.
# They will be calculated later. In this case The code should not loop over all
# requested statistics but instead should do it only ones to avoid data multiplication
if 'stat_name' not in input_data.keys():
fcst_var_stats_current = [fcst_var_stats[0]]

for fcst_var_stat in fcst_var_stats_current:
# for each series for the specified axis
if len(params['series_val_' + axis]) == 0:
series_data_for_ee = input_data
Expand Down
Loading

0 comments on commit 15d1cc0

Please sign in to comment.