-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: ngen_cal_model_observations
model plugin hook and default implementation
#155
Conversation
simulation_interval: pd.Timedelta = pd.Timedelta(3600, unit="s") | ||
obs = self._hooks.ngen_cal_model_observations( | ||
id=location.station_id, | ||
start_time=start_time, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In reality I think the caller should pass start_time + 1 dt
(e.g. 3600s
) since the model outputs will not contain values for the actual start_time
(left exclusive). I don't think this is a problem now, but just wanted to note it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does matter and things will break if we don't properly account for this. This only works right now b.c. csv_output
's dt
is 300s
which means the first value is start_time
+ 5min
. When we resample the simulation output to the hour using .resample("h").first()
the simulated value at 5min
is backfilled to start_time
and lines up with the nwis observations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This likely heavily depends on how the data is "merged", in search.py _objective_function
. Currently this done with a pandas merge
with left_index=True, right_index=True
which should result in a data frame of the overlapping indicies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple things to consider, we can chat about this directly.
The returned pandas Series should be in units of cubic meters per | ||
second. | ||
|
||
`id`: USGS gage station id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
id
doesn't strictly have to be USGS gage id, but more generally some identifier linked/referenced to the hydrofabric...we just happen to use gage id's for now. Perhaps this should ultimately be the Hydrofabric feature id
instead of the obs id? Or we may need a feature id
and a linked id?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we pass a Nexus
feature here which should be able to embody additional context extracted from the hydrofabric which can be used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great questions! Glad to hear that we are thinking on the same page. Two thoughts preface with: I went with this design b.c. I decided to just keep it as simple as possible for now. Since we are pre-1.0
, I figured we might change this as we learn things along the way.
- I'd originally written up a complex object that encapsulates what Mike's group refers to a point-of-interest. Like you are saying, the thinking being that the
id
for the model data is likely not the sameid
for the observation (truth) data. I decided against that to err on the side of simplicity for now. I am open changing that now or in the future. - I also though about passing a
Nexus
, but thinking back to previous conversations we'd had about keeping things model agnostic, it seemed passing aNextGen
specific idea went against that. If we ever do go in the direction of something like a POI interface as a solution to this problem, I figured we could always create a type that implements the POI interface for aNexus
for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conversation with @hellkite500, in theory a HydroLocation
or Nexus
makes the most sense here (really a Nexus
b.c. its more general). We have decided to go with Nexus
for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the Nexus
is the right way to go for now, it encapsulates a (eventually) list of hydrolocation
which can reference different entities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related: NOAA-OWP/hypy#37
2169ed2
to
2d13584
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a follow up PR, we should create a "user" hook which is not in core which reads a typical USGS csv/tabular file and create a quick example of how to use that instead of the default hook via config
ngen_cal_model_observations
is called during each calibration iteration to provide truth / observation values in the form of apandas.Series
, indexed by time with a record everysimulation_interval
. The returnedpandas.Series
should be in units of cubic meters per second.A default implementation is included in this PR the pulls data from the USGS NWIS IV data service using
hydrotools.nwis_client
. This is enabled by default and does not require modifying configuration files.Hook signature:
Users who would like to load observation data from a file, for example, should implement this plugin hook and specify it as a
model
plugin in yourngen.cal
configuration file. For example:Likewise, if you would like to, instead, save the observations used during calibration, you can implement this plugin as a "wrapper" style plugin. For example:
Related to #93
Related to #111
Additions
ngen_cal_model_observations
- called during each calibration iteration to provide truth / observation values in the form of apandas.Series
, indexed by time with a record everysimulation_interval
.