Skip to content

Methodology

Alex R edited this page Aug 14, 2020 · 11 revisions

Methodology

The novel algorithms provided in the package are given here in high-level overview form. For detailed implementation outlines, please see the comments provided in the associated files identification.py, tracking.py, and quantification.py. For more information on the plotting functions, see visualization.py.

Identification

The identification of individual storms within each time slice is computed as follows.

Algorithm Overview

  1. Find all contiguous precipitation regions. That is, perform (fully) connected-component labeling.
  2. Classify a storm region as large if it has one or more remaining grid cells left after an erosion operation and small otherwise.
  3. For regions in the set of large regions:
    1. Find smoothed regions using an opening operation.
    2. Perform almost-connected-component labeling on them.
    3. Group the large regions based on the clustering results.
  4. For regions in the set of small regions:
    1. Dilate each region.
    2. If any larger regions overlap, add the region to the cluster that shares the largest number of grid cells.
    3. Otherwise, perform almost-connected-component labeling for the regions not added to any clusters for the large regions.

Tracking

Once the rainstorm segments for all time slices are identified, we link them through consecutive time steps to form rainstorm events evolving over time.

Algorithm Overview

  1. At t=0, assign the rainstorm segments to different rainstorm events as their starting segments.
  2. For t=1 onwards:
    1. Link each segment to one of the segments in the previous step based on similarity measure and magnitude of displacement vector. More specifically, link the two if the following conditions are satisfied:
      1. The shape of the two events are similar enough so that the value of the similarity measure between them exceeds a tau threshold of 0.05 in summer and 0.01 in winter and
      2. The link does not result in too drastic a change of storm location in the opposite direction to its original movement. That is, we allow linkage only when the magnitude of the displacement vector of the two segments is less than (the equivalent of) 120 km (in grid cells) regardless of direction, or the angle between the displacement vector and the displacement between the segment in the previous time slice and its predecessor is less than 120 degrees.
    2. If no events satisfy the criteria, let the segment initialize a new rainstorm event as its starting segment.

Similarity Measure Overview

While the majority of the calculations integral to tracking storms are relatively well-known, the algorithm for computing the proposed similarity measure computation is a very technical and largely unintuitive one. For this reason, please also see the extensive comments provided in the implementation in tracking.py. As is suggested there, working through a small example will likely be very helpful in understanding not only this specific implementation, but the idea behind it as well.


Due to the nature of the double summation, a vectorized solution is crucial here, though enormously memory heavy. Thus, in order to minimize this issue while maintaining some speed, the similarity measure is computed involving the union of cell locations in the two storms, since overlapping cells have no effect on similarity. This of course will lead to greater memory usage when there is no overlap, but has had a significantly positive effective when there is some and storms are very large, since this is where computation is normally derailed with a more straightforward implementation.

  1. For each of the two storms, compute the relative weight for each grid cell, preserving the shape of the array.
  2. Again for each, find the coordinates of the non-zero precipitation data corresponding to each storm and compute their union, since overlapping cells will not effect our result.
  3. Reshape the location arrays into 1d arrays and compute their union.
  4. Place these coordinates in identical 1d arrays.
  5. Create two new arrays of weights, where each weight is added to the array only if its coordinates exist in the union of coordinates and placed where those coordinates exist in the union of coordinates.
  6. Reshape both arrays of weights into 1d arrays, one as a column and one as a row, and compute their matrix multiplication.
  7. Similarly, compute the distances between each pair of coordinates in the coordinate arrays. (We now have two arrays where the distance between each relevant cell pairing of the two storms in the array of multiplied, relative weights can be found at the same location in the distance array.)
  8. Compute the element-wise exponential involving phi on the array of distances.
  9. Compute the element-wise multiplication of this resulting array with the array of multiplied, relative weights.
  10. The summation of this array gives the similarity measure of the two storms.

Physical Characteristics

Once rainstorm events have been tracked through time, we are able to characterize each individual rainstorm event with four metrics: duration, size, mean intensity, and central location. In the case of central location, be sure to review its explanation in the original publication and the usage tips given in the Implementation Details.

Duration

  • Create a new dictionary.
  • Find all the storms in the tracked storm data.
  • Create a new array of length equal to the number of storms found.
  • For each time slice:
    1. Compute the storms that appear in that time slice.
    2. For each storms in the set of all storms:
      1. If that storm is in the set of storms that appear in this time slice:
        1. If the storm is not already in the dictionary, add it with value 1.
        2. Otherwise, increment the value found at the key equal to that storm.
  • For each key, value pair in the dictionary:
    1. If the key isn't the background:
      1. Set the value found at [key] of the array to the key's value in the dictionary.
  • Multiply each value of the array to be returned by the time interval.

Size

  • Create an array with dimensions number of time slices x number of storms.
  • For each time slice:
    1. Find the storms that appear in it.
    2. For each storm that appears in this time slice:
      1. Compute the number of grid cells belonging to it.
      2. Place this result at the corresponding [time][storm] location in the array.
  • Multiply the number of grid cells by the specified grid cell size for the data.

Average intensity

  • Create an array with dimensions number of time slices x number of storms.
  • For each time slice:
    1. Find the storms that appear in it.
    2. For each storm that appears in this time slice:
      1. Find and sum the precipitation belonging to the storm in the current time slice.
      2. Find the storm's average precipitation in this time slice.
      3. Place this result at the corresponding [time][storm] location in the array.

Central location

  1. Create an array with dimensions number of time slices x number of storms to store the results of our computations, but of type object to allow us to store an array in each cell.
  2. Create arrays of x, y, and z values corresponding to the latitude and longitude data converted into the Cartesian grid in R3.
  3. For each time slice:
    1. Find the storms that appear in it.
    2. For each storm that appears in this time slice:
      1. Find the sum of the precipitation values belonging to the storm.
      2. Compute the intensity weighted averages corresponding to the grid in R3 for the storm.
      3. Find the nearest point on Earth's surface.
      4. Place this result at the corresponding [time][storm] location in the array.