Skip to content

Implementation Details

Alex R edited this page Mar 29, 2021 · 14 revisions

Implementation Details

This section of the wiki describes and documents the functions central to the package –– identification, tracking, quantification, and visualization.

Identification

Divide the precipitation field at each time step into individual storms by calling identify .

Signature

  • data: the precipitation data, given as an array of dimensions Time x Rows x Cols. To identify storms in a single time slice, reshape the array to dimensions (1, Rows, Cols).
  • morph_structure: the structural set used to perform morphological operations, given as an array.

identify returns an array of time slice maps with the dimensions of data containing individual storms labeled sequentially in each time slice.

Tips

  • When choosing a structural set for the associated morphological operations, it is highly likely that a disk of a particular radius should be used, as this structure ensures that a point in any direction from the segment being eroded or dilated has equal chance of either. For more on creating a structural set, see the Tutorial.
  • Since both erosion and dilation use the same structure, an increase in the structure's size will likely lead to fewer large storms (since more of the map is being eroded) and greater connections between storms (as more storms will likely be connected in almost-connected component labeling). Likewise, a decrease in size adds up to more large storms and less connection between non-contiguous regions.
  • Also, for this reason, keep in mind that pursuing a particular connection (or lack thereof) in one portion of a map may lead to drastic changes not only with regards to other regions in the same time slice, but for the entire length of the run due to the structure's universal use.

Tracking

Track rainstorm events over time by calling track.

Signature

  • labeled_maps: the identified storms returned by the identification algorithm, given as an array of dimensions Time x Rows x Cols.
  • precip_data: the precipitation data corresponding to the identified storms, with the same dimensions as labeled_maps.
  • tau: the threshold at which a storm is considered similar enough to another to possibly be linked through time, given as a float.
  • phi: the constant to be used in computing similarity between storms, given as a float.
  • km: the number of grid cells equivalent to 120km, given as a float.
  • test: turn on/off optional testing printouts to help tune parameters, given as a boolean with default value False.

track returns an array with the dimensions of labeled_maps containing tracked storms labeled sequentially through time.

Tips

  • As noted in the README, due to the complex nature of the computations used to track storms, please be aware that this algorithm requires a good deal of time to run and uses an immense amount of memory. For these reasons, it is highly recommended that runs of any substance be done on machines designed specifically for tasks of such computational weight.
  • Tracking accuracy is helped by smaller intervals between snapshots. The package was validated on data with 3 hour time step intervals, the same interval found in the paper.
  • Much of the success in tracking storms in predicated on effectively tweaking user-specified parameters tau and phi (after a successful identification run). The tau threshold is simply the minimum value returned by the similarity measure deemed large enough to possibly signify a match between storms. The constant phi, on the other hand, controls the output of the similarity measure. Directly, a greater constant yields a greater bias against points far apart in distance, and indirectly, this means similarity measures as a whole are reduced. As always, please see the paper for more information and calculation specifics.
  • As tracking may require a number of runs to fine tune, try tweaking parameters on a small temporal subset first. Try setting test = True and using the print statements to aid in interpreting the results of a run and finding optimal parameters.

Quantification

Quantitatively describe individual storms in terms of duration, size, mean intensity, and central location by calling quantify.

Signature

  • tracked_storms: the tracked storms returned by the tracking algorithm, given as an array of dimensions Time x Rows x Cols.
  • precip_data: the precipitation data corresponding to the tracked storms data, with the same dimensions as tracked_storms.
  • lat_data: the latitude data corresponding to each [row][col] location in tracked_storms, given as an array of dimensions 1 x Rows x Cols.
  • long_data: the longitude data corresponding to each [row][col] location in tracked_storms, given as an array of dimensions 1 x Rows x Cols.
  • time_interval: the period between temporal 'snapshots', given as a float. The user should interpret the duration results in terms of the unit of time that is implied.
  • pixel_size: the area one grid cell represents in the data. The user should interpret the size and average intensity results in terms of the unit of area that is implied.

quantify returns a tuple of size four containing the duration of each storm, as well as its size, mean intensity, and central location at each time step, in this order.

Tips

  • For each of the 'metric arrays' returned in the tuple excluding duration, the resulting data for storm 12 in time slice 7 can be found at [7][12]. For duration, simply specify the storm [12] for the storm's duration.
  • Excluding duration, if a storm is not present in a time slice, its metrics in that time slice will be reported as 0.
  • To interpret results of the central location calculation, the user must familiarize themselves with the latitude and longitude data associated with the tracking results and its orientation. Furthermore, producing an intensity plot will also be of great help. With these in hand, one may interpret the results of this calculation similarly to a traditional center of mass calculation (or the like), where positive values for both dimensions correspond to positive shifts away from an unweighted center in terms of latitude and longitude towards areas with more precipitation.

Visualization

To visualize the results of the identification and tracking algorithms and the associated precipitation data, or produce a histogram depicting the frequency of precipitation values, the following can be called.

Signatures

histogram produces a histogram depicting the frequency of precipitation values over a set of time slices to aid in finding an optimal precipitation threshold which can be shown and/or saved.

  • data: the data from which to construct the histogram.
  • bins: the number of equal-width bins in the range. (From Matplotlib.)
  • bin_range: the lower and upper range of the bins. (Same source as above.)
  • title: the title both of the histogram and the PNG file. The default title is
    'Frequency of Precipitation Intensities'.
  • show_save: determines whether images produced are shown, saved, or both. The default argument is 'both', while 'show' only shows images and 'save' only saves them.

histogram returns None.


intensities produces an intensity plot for data containing precipitation intensities through time. A PNG image of each time slice is produced as well as a GIF of all time slices. All PNG's can be shown and saved, and the GIF can only be saved.

  • data: the storm data to be plotted, given as an array of dimensions Time x Rows x Cols containing precipitation data. To plot precipitation in a single time slice, reshape the array to dimensions (1, Rows, Cols).
  • colormap: the colormap given to the precipitation data, given as a LinearSegmentedColormap used by Matplotlib.
  • title: the title both of the histogram and the GIF file, given as a string. PNG's are named with a combination of this title and time step plotted.
  • unit: the unit of the precipitation values plotted.
  • start_time: the number of the first time slice in the data to be plotted, used in file naming and noting time in image outputs, where applicable, given as an integer. The default value is 0.
  • show_save: (as above)
  • dpi: the resolution in dots per inch, given as integer. The default value is 300.

intensities returns None.


storms produces a plot for data containing labeled storms, where each label is plotted to the same color throughout time slices. The plot also displays the number of the current time slice and optionally ticks corresponding to array indices for testing purposes.

  • data: the storm data to be plotted, given as an array of dimensions Time x Rows x Cols containing labeled storms (fully-identified, tracked, or otherwise). To plot storms in a single time slice, reshape the array to dimensions (1, Rows, Cols).
  • colors: the colormap given to the storm labels. If colors is a LinearSegmentedColormap from Matplotlib, each label is given a unique color (when a cyclic colormap is not given), though color differentiation may suffer with longer runs. If colors is a Python list of HEX color codes (e.g. '#FF0000'), labels are mapped by cycling through the list as a ListedColormap, which can lead to unfortunate color collisions.
  • Parameters title, start_time, show_save, and dpi are as above.
  • ticks: the toggle for including ticks in the plot, given as a boolean. The default value is False and turning on ticks stops the current time slice from being displayed to avoid overlap.

storms returns None.

Tips

  • The use of a continuous colormap will lead to discernibility issues with longer runs. In this case, the user should give a list acting as a colormap, though this may of course run into its own issues when neighboring storms are given the same color, as this problem is optimized mostly on a plot by plot basis.
  • If storms appear to 'touch' when plotted, this may be helped by increasing dpi.
  • tutorial_files/plot_with_map.py provides an example of how to plot results of the core algorithms over a map. This structure can largely be used for the user's own work, but they will very likely need to provide a different map. See Basemap examples for this, as there are many good options available. Additionally, the colorbar displaying the currently active storms in each time slice is given as this may be of use, though it is impractical with a large number of storms.