-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
As a user, I want the WRDS-based thresholds output in the WRES output to be fully qualified so that I can concretely identify where they came from #281
Comments
Original Redmine Comment Just quickly tested a code change to see what impact it would have. If I construct the @ThresholdOuter@ instances for WRDS thresholds from rating curves so that they include text after the name indicating the source of the rating curve from which the threshold is derived, that is enough to ensure that, when the rating curve source is not declared by the user, all thresholds from all rating curves are evaluated (see #98837). For example:
If I further change the the key in a map that stores the thresholds to use the full label, I can obtain outputs for the raw thresholds as well as the rating curve thresholds when a user does not clarify what they want. For example:
I can provide more information about the test I performed, if desired. The code change was to the method @GeneralThresholdDefinition.getThresholds@; the modified code is below, shared for posterity because I plan to undo it. However, my only goal was to confirm that insufficiently identifying thresholds was the cause of the problem noted in #98837, and, indeed, it is. I also confirmed a run where the features were pooled appropriately identified the thresholds by name. Thanks, Hank ===================================================================
|
Original Redmine Comment James, Jesse: Is there any value in pursuing what I described in #97209-7 as an interim solution to evaluating all WRDS thresholds implied by a declaration and relaying that identifier to the output? Recall that, if a user does not specify a rating curve provider, then currently the thresholds evaluated are the last thresholds found for a given name (e.g., "flood", "major", "action", etc.) while processing a map of threshold name -> threshold value. It could be a mix of NRLDB and USGS Rating Depot calculated thresholds. With the inelegant fix above, the name of the rating curve source is included in the name of the threshold. The real fix is an update to the data model, but that will require significantly more time to complete (at least for me, since its not a part of the code I'm familiar with). Thanks, Hank |
Original Redmine Comment I am not against it, largely because it is "better" than what exists. It's a bit yucky to make a half-***** fix, but it is better than broken, so I would go for it, on reflection. At the same time, you should probably anticipate the half-***** solution to exist semi-permanently ;) |
Original Redmine Comment In short, I don't know. A threshold of @>= 50 m3/s@ is identical to a threshold of @>= 50 m3/s@, regardless of name, and so it makes sense to merge them into one threshold for the purposes of data processing. On the other hand, the names matter because we might want to group by "NWS Major Flood" or plot that way, so we need to track the names and keep them separate. Maybe there is a way to track the threshold itself as a key and then a list of names as the value. That does sound like a more involved change. I don't have an objection to the qualifier in the name like you have above, except that it can be shorter like "NRLDB major" or "USGS Rating Depot bankfull." |
Original Redmine Comment The permanent fix is pretty clear, I think, namely to expand the @ThresholdOuter@ (and hence the canonical @Threshold@ it wraps) to allow a ratings qualifier. Once that is done, we can make any downstream choices we want w/r to de-duplication. Currently, thresholds are de-duplicated on the basis of climatological probabilities only (edit: that is to say, different probabilities that map to the same real-valued threshold), so thresholds with distinct names will not be de-duplicated, even if the threshold values are the same. Implementing the proper fix is probably something like "32". Implementing the half-***** fix is probably "0.32" or something. Note that the main reason for "32" is that it will create some additional slots in some of the statistics output formats. It shouldn't break any system tests because there are none with different ratings. |
Original Redmine Comment Thanks, James and Jesse. Yeah, 0.32 hours is pretty close... I'm guessing it would take me about 1-2 hours to complete the change. The code change is already in place; I just need to run some system tests to see if anything breaks, because it may also impact WRDS JSON threshold tests which I think we include in one of the 800s, iirc. We'll also need to agree upon the label (Jesse's shorter proposal looks fine to me). I'll try to get to it later this week; for now, I need to prioritize the proxy issues, deploying 5.16, and completing at least some initial tests of Chris's output service stuff. Thanks, Hank |
Original Redmine Comment But you won't be adding the qualifier unless there are duplicates, I assume(?) And I further assume that scenario802 does not contain thresholds with several different sources. Could be wrong on either of these, but I wouldn't add the qualification unless it is needed to de-duplicate. That way, we retain existing naming except where the extra qualification is needed (because the naming isn't really the correct spot for this qualification). |
Original Redmine Comment I can add an if-check to see if "major" already exists and, if so, add the rating curve identifier. No problem. Of course, I expect that to often be the case for the most "important" forecast points where an RFC has defined various raw thresholds already. I'll check the 800 series thresholds to see what is used. I never looked closely. Hank |
Original Redmine Comment scenario802 has two sets of thresholds from different threshold sources (as opposed to rating curves). It should trigger the same problem if the declaration does not clarify which source is desired. However, scenario802 does set the @Provider@, so it won't be an issue. Hank |
Original Redmine Comment Right, so there are two extra qualifiers and possibly nothing to stop N>2 qualifiers in future, so then you get into a situation where you may just want a -bag- map of qualifiers, i.e., key-value pairs and a corresponding builder method to add a new key-value qualifier pair. Even the proper fix is a bit yucky due to the complexity of thresholds. |
Original Redmine Comment Exactly. By the way, an alternative is to force the user to be very specific about the thresholds they want from WRDS in such a way that duplication is not possible. We can require the source and rating source be specified, for example, if the format is WRDS. Then we map to only a single set of thresholds and can rely on WRDS labeling to avoid duplication of names. Or we can throw an exception when a duplicate is encountered and point to the attributes in the exception message ("Did you configure the provider and ratingProvider?"). I'm actually not opposed to implementing either of those, instead, since it makes no sense to me to mix thresholds from NRLDB, USGS rating depot, etc. I'm just not sure that everyone will think the way I do. Perhaps someone out there has a good reason to evaluate thresholds from both sets of curves. If either of you prefer one of those alternative solutions, let me know. Hank |
Original Redmine Comment I think that would be pretty harsh, assuming a user wanted to evaluate the commonly-named thresholds from several different key-value pairs. I think your proposed hack is the right hack. |
Original Redmine Comment Sounds good. I'll try to get to it later in the week, perhaps even as soon as tomorrow. Hank |
Original Redmine Comment James, Did anything change related to this ticket as a byproduct of changes to thresholds recently put in place as a byproduct of moving to YAML? Hank |
Original Redmine Comment The most foolproof way of knowing is to run an evaluation and witness the behavior :-) I forget what we did for these one:many feature relationships w/r to thresholds. |
Original Redmine Comment I note the Description references the fact that WRDS can return RFC thresholds for multiple NWS locations when asked for based on NWM feature. This issue was raised as part of the UAT in #116116. Let me find the conversation there or in a related ticket, Hank |
Original Redmine Comment The output when using WRDS thresholds only include the threshold value and WRDS label, such as, "SAMPLE SIZE >= 0.0 ft3/s (low)". It could also include a reference to the rating curve identifier, such as, "MEAN ERROR >= 64.0 ft3/s (NRLDB low)". However, there is no indication of the source being WRDS, though it may be understood based on the label shown. There is also no indication of the feature for which the thresholds were found. So, for example, if NWS thresholds are obtained using a NWM feature id, then thresholds could be returned for multiple NWS location ids. Hence, it would make sense to discern between them in the WRES outputs. As discussed in more recent tickets, the idea of requesting NWS thresholds by an NWM feature id is flawed from the start, so perhaps we should not worry about the location id being included. I guess the question is, do we need some way to let users know that thresholds used in an evaluation were obtained from WRDS as opposed to, say, a CSV file or in-line declaration? But that would mean both WRDS and custom thresholds are evaluated in a single run. That sounds like a highly unlikely scenario. At this point, I'm not sure that the thresholds are qualified enough in our outputs, but the only use cases where I think this could lead to confusion are not very likely, so I'm lowering the priority. I reserve the right to reject this ticket if we decide the benefit doesn't justify the added code. Hank |
Author Name: Hank (Hank)
Original Redmine Issue: 97209, https://vlab.noaa.gov/redmine/issues/97209
Original Date: 2021-10-06
See #97142. This can happen if a user evaluates NWM against USGS using NWS thresholds. For some NWM feature ids, thresholds from multiple NWS locations can be returned. The code needs to handle that; that is what #97142 is for. However, when seeing the thresholds in the output, I would like them qualified sufficiently to identify the WRDS location for which the thresholds were returned. It should also show the rating provider (e.g., USGS or NRLB) and rating source. There may be other important metadata to share.
This ticket applies to CSV2 and protobuf, I assume, but I'm not sure about the others. Images should not include that metadata as there is no good way to present it in a graphic, I think.
Thanks,
Hank
Related issue(s): #288
Redmine related issue(s): 98151, 99233, 99428
The text was updated successfully, but these errors were encountered: