Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch splitting catchments: get_split_catchments(), get_raindrop_trace(), method ambiguity and errors #420

Open
mjcashman opened this issue Jan 14, 2025 · 5 comments

Comments

@mjcashman
Copy link

I have a use-case where I need to split NHD catchments at around ~3,000 locations where I have USGS gages and biological sample data. In getting this workflow operational, I have encountered several issues and ambiguities that I'm documenting below:

According to the documentation in get_split_catchment(), it is recommended that I use get_raindrop_trace() prior to catchment splitting, as points not along a flowline (which may occur in my use-case due to poor GPS location data) may instead delineate a small sub-catchment off the flowline. While this is useful, ambiguity in its implementation is my first question.

  • When does the distance between the lat/long point and the flowline change between snapping to the flowline and delineating a flowline/catchment split, and delineating a catchment off the flowline occur? Surely there must be a tolerance under the hood that's performing the operation. This would be good to know. For example, I have the following two points: point <- tibble(lat =39.24767, lon= -76.71308) and alt_point <- tibble(lat =39.24949, lon= -76.71325) , less than 200 m apart, located at an office and across the street, both "off a flowline". However, they result in on-flowline and off-flowline split catchment delineations respectively, as seen below:

Image.
If I have points that switch arbitrarily between on and off flowpath delinations, that could be a problem, especially if its unclear when this happens, so I can't create a catch for that.

  • This naturally leads to using the get_raindrop_trace() function to force all locations to find the nearest flowline to help in the delineation. However, I have run into several common errors that occur when executing this function across a batch of sites.
  1. The first error type is: "Error executing process: Flowtrace intersected a nodata FDR value; cannot continue downhill.", which can be reproduced using USGS gage 07144780 NF NINNESCAH R AB CHENEY RE, KS (lat =37.86261, lon= -98.01385). However, when checking this site, the point is located next to the flowline, and this lat/long intersects with the flow direction grid contained in NHDPlus. Flow is going from left to right.:
    Image

  2. A second type of common error occurs with: "Error executing process: Input geometry segment overlaps with the splitter.", with a reproducible example at USGS gage 07315700 Mud Creek near Courtney, OK (lat = 34.00426, lon = -97.56697). I have not been able to dig as much into the characteristics of this error type, but still we're only doing the raindrop tracing, and not even doing splitting. If you skip the raindrop trace, it successfully splits the catchment along the flowline when using get_split_catchment(), since the point is also pretty close to the flowline.

Image

  1. A third type of error also occurs: "Error executing process: index out of range" with a reproducible example at USGS gage 03588500 SHOAL CREEK AT IRON CITY, TN (lat = 35.02412, -lon = 87.57903). This seems like it's on a tiny flowline fragment, but split catchment still does a delineation ( see below). Perhaps the flow direction takes the point outside of the source catchment before it intersects with the flowline?

Image

Before I write some error handlers that default back to the default catchment splitting whenever the raindrop trace fails, I really would like to understand how these functions are working more closely, why they're throwing errors, and if I might be missing alternative problems (and accidentally resulting in off-flowline splitting) if I implement this fallback measure.

Happy to discuss more

@dblodgett-usgs
Copy link
Collaborator

Thanks for the thorough set of examples.

I think most of them will require specifics of the implementation of the flow trace and split catchment functions.

https://nldi-flowtools.readthedocs.io/en/latest/

Scanning the docs over there, there's not much on edge cases. @Anders-Hopkins may be able to help with some specifics about the cases you are outlining. I could wave my hands around and make some guesses, but better to get the person who knows involved.

The alignment of flowlines and the raster is normally good, but there are a ton of edge cases that can cause issues -- as you are finding. It's usually necessary to just go to the source data and to spend a few minutes figuring out why the raster is the way it is. There's a lot of complexity there, but in any instance, it's usually one of a handful of not too complicated edge cases.

I just reached out to Anders on Teams. Hopefully he'll pop in here with some ideas.

@Anders-Hopkins
Copy link

@mjcashman,

Thank you for detailing these edge cases for me. Its super helpful to have this stuff to make nldi-flowtools better.

Regarding your first example with the two points and two basins: auto-snapping has been written into the splitcatchment() function. At this point, snapping might be unnecessary since I have recommended users to use the flowtrace() function first to ensure that a delineation point in on a flowline. But this snapping was necessary at the beginning, and so it has remained. This package uses Pyflwdir to handle the delineations, and in NLDI-Flowtools I have delineation points snap to Strahler streams with a magnitude of 3 or great. This is totally arbitrary, but it was what seems to yield the best results when I first started writing this. I'll try turning off this snapping and see what happens. Another idea I've had is to make snapping optional in splitcatchment(), but that would require us to add an additional input to the function and to the services. We'll see.

  1. In this example, I understand that you are running get_raindrop_trace() for (lat =37.86261, lon= -98.01385), a point which is close to the flowline vector geometry. This function traces the flow direction grid until it falls "on" the flowline. Declaring a point to be "on" a flowline is something I have really struggled with because the vector flowline can intersect cells which are not "stream" cells. But this is what it currently is: the stream intersection point needs to fall in a FDR raster cell which intersects the NHD flowline geometry, and the flow accumulation value of that cell needs to be greater than 1100. Again, this is totally arbitrary, but through trial and error it is what has yielded the best result. Here in your example, somewhere in the flowtrace process, the flowpath has come across a nodata flow direction cell. Because it has no value, the algorithm cannot keep tracing downstream and so the process breaks before it finds the intersection with the stream. This is a data problem and not something I can fix. But I will examine this location more closely to see if I can tweak the algorithm to make this point work (or at the least, figure out why it fails).

  2. In this example, something wrong must have happened when the flowline is supposed to be split by the intersection point. I haven't seen this error before, so I'll use your point from the example to try to replicate the problem.

  3. Which function is producing this last error? The splitcatchment() or get_raindrop_trace()? I have no idea what is happening here, but that is a short flowline and quite a small basin, and so this is a good example of an edge case. I'll try to replicate this error.

The rest of my week is dedicated to another project. But I should be able to get to this early next week. If you what to pick thru the code and understand what is it doing, here is a link to the repo: https://code.usgs.gov/wma/nhgf/toolsteam/nldi-flowtools

@Anders-Hopkins
Copy link

Anders-Hopkins commented Jan 21, 2025

For problem #1, this is an issue because the point from which the flowtrace starts is quite near the edge of the catchment and the end of the local NHD flowline segment. The requirements to be 'on' a flowline are that the raster cell intersects the NHD flowline geomtery, and that the flow accumulation (FAC) be larger than 1100. In this picture, the raster pictured is the FAC in which the white cells represent the course of the river (according the the FAC grid). The X marks the query point. The green line is the NHD catchment boundary. The brown line is the NHD flowline. And the purple arrows are the flow directions.

Starting at the query point, and following the FDR grid, the path flows north one cell, and intersects the NHD flowline. However, this cell has a very low FAC value, so it is not considered 'on' the flowline, because if we were to delineate from this point, the basin would not include all of the upstream cells for the stream. While the delineation would be correct, it would confuse users for at point 'on' the flowline to not return a basin including the area around the flowline. So the algorithm continues to look for an 'on' cell and steps north again. This time, it lands on a white cell (an FAC cell with a high value), but since we are now outside of the NHD catchment, the NHD flowline geometry does not intersect this cell. So the algorithm continues downstream until it reaches the edge of the raster and returns that NoData error.

Image

This problem is uncommon and can literally be called an 'edge case'. There are two ways I can think to solve this. One on your end, @mjcashman, and the other on my end. Your fix can be to snap this point to the NHD flowline for this river. I notice that stream gage lon, lats usually are for the gage house and not the middle of the stream channel. This introduces errors when delineating basins from that shifted lon, lat.

My fix will be to add functionality to nldi-flowtools. I think grabbing the downstream's catchment and data will help solve this problem. The current algorithm will be to be updated, but it will solve this points where the flowtrace runs outside of the local NHD catchment.

@Anders-Hopkins
Copy link

For that first problem (un-numbered), I turned off the stream-order snapping and ran the point. The green polygon in the picture is the basin which got returned. Even though it is only three raster cells, its a correct delineation for the point. I like this functionality better now that the standard method is to use flowtrace to 'snap' a point to the stream grid/NHD flowline. In the next release of nldi-flowtools, I'll have snapping turned off. This change might trip up some people since this will change the basins which some points are returning, but this method is a more true form basin delineation.

Image

@mjcashman
Copy link
Author

Great, thanks Anders. My own investigation into problem 3 above is similar to what you reproduced, the flow direction grid will actually take the search to the next downstream catchment.

Right now, while we're looking into what's happening with the tools, I have a tryCatch set up so that I'll use the flowpath trace, but if that fails, I'll default to the streamline snapping that's built into the split catchment function. However, if you turn off snapping that might break my workaround! I could probably add another manual snapping step with a local search, depending on your next push. But you do identify a common issue - the lat/lon points for gages being at the gage house (if you're lucky) is standard procedure. Honestly, many points can be even worse, like in parking lots. So having a suggested fix for this, if raindrop still might have some edge-case errors, might be needed.

Think we might be able to hop on a call later this week? Can message in Teams

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants