-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BMI get_value and grid type #304
Comments
Thanks for tackling these issues. I have found one more (possible) deviation from BMI in Wflow.jl: In the bmi-example-julia this is implemented as: repr(eltype(m.temperature)) However, in Wflow.jl this is implemented as string(typeof(param(model, name))) So instead of |
Thanks for spotting this! |
I think we need to decide what model grids we want to expose through BMI, with the following two options:
Finally, some variables in Wflow have an extra (3rd) dimension, for example |
I'm a bit torn on this. My first thought was that the grids need to be unstructured to have a 1:1 mapping for performance reasons. Though from a usability perspective unstructured is really quite a bit more difficult, and as you say in our output files we also act like it is a 2D model. The BMI grid implementation is just not so well suited for what we do internally. It is a structured grid, we just don't store it as such internally. I suggest we go for option 2. I think copying data from 1D internal to 2D BMI is not too costly if we do that for every timestep, provided we re-use the 2D memory. We can measure it first to be sure.
Is that assuming picking option 2? And "1D (and not 2D)" for option 1? Why do you prefer this? If we go for option 2, I think offering 3D grids to BMI here is the most obvious choice. |
Thanks for your input @visr ! You are probably aware of this: Arrays are always flattened in BMI. A user always needs to convert the data (1D Array) to a 2D Array if this is required, for both option 1 and 2. But I agree, from a user perspective an unstructured grid with Below a bit more information on the different BMI grid types that are required for option 1 and 2 (including 3D grids). If we go for option 1, the following (unstructured) grid types are required:
So, a total of 10 different grid types are required for option 1. For option 2, the components kinematic wave (river, land and subsurface), vertical, lake and reservoir can be stored on a structured grid. If we go for option 2, I think we need the following grid types:
So, a total of 6 grid types. I think it is preferred to store the local inertial model as an unstructured grid (although this is not (yet) part of Wflow output), it is more explicit and not depending on certain assumptions (like a variable stored at a grid node is actually stored at a downstream edge), which should also make model couplings with these components more straightforward.
This would be the case for both option 1 and 2. I am suggesting this, because 3D data/grids make up a small part of Wflow, users are probably mostly interested in a subset of this data (for example soil moisture for only the upper soil layer and not 4 layers), and it would require less grid types. I am also fine with offering 3D data through BMI (as we also do as part of Wflow output). I am also a bit torn on this. In the end I think option 2 is cleaner (and probably closer to how BMI is intended), and I agree, it is probably not too costly (compared to option 1). In addition, a possible risk with option 1 is that we end up with many different grid types over time, when more variables are added to Wflow. |
I read about flattening in the BMI best practices, and actually I don't really understand it, at least in the context of Julia. I wrote about it in these notes starting from the best practices quote. If flattening is just a reshape/reinterpretation of the same memory, how does flattening a vector avoid row/column major confusion? And in a julia-to-julia BasicModelInterface, I don't see any benefit in flattening, it just destroys the true shape information, but is otherwise functionally the same, especially with Julia's linear indexing into multi dimensional arrays. So in the bmi-example-julia I just returned a Matrix, since I didn't see the point in calling @BSchilperoort maybe you also have some ideas on this, since you also mentioned a flat vector should be returned by |
The main reason I care about that, is that this is the BMI specification. Exactly following the specification allows others to write code that should work on any model implementing a BMI (e.g. eWaterCycle and grpc4bmi). For example, in eWaterCycle we have a routine "get_value_as_xarray" which wraps the BMI functions and returns the data in an xarray DataArray. This only works if there is a standard across languages.
Yes, for these cases it's often a hassle to flatten. Same with for example the dest argument being required: this doesn't really make sense for languages such as Python or Julia.
I had not noticed that yet, but that should then be updated, yes.
In my opinion fewer grid types would be better, as I think it will get confusing to users of the BMI quite fast. |
Ok yes I was struggling to come up with an example in Julia where the Vector/Matrix difference matters, but when PythonCall.jl converts a Matrix to a 2D numpy array then indeed that can cause issues. Updated the example in csdms/bmi-example-julia#5. |
Ok, but for the BMI uniform rectilinear grid type, you need to know if the flattened array is from a row or column major language (the node location functions are not listed as part of the uniform rectilinear grids)? Or am I missing something? |
Yes this part is also not that clear to me yet. The BMI best practices state.
The summary table states what dimensions 1, 2 and 3 should be. If that needs to be true for both row and column-major languages, the data needs to be permuted, which is expensive. If say we want to write to GeoTIFF from Julia we often use (y,x) rather than (x,y) such that the row-major interpretation of that is (x,y) again.
|
As far as I can see, this is sadly not specified (yet). There is a discussion about this topic here: csdms/bmi#118 |
After reading through issue csdms/bmi#118, I am leaning to use unstructured grid types in Wflow for now, at least until this issue has been resolved. I agree that this grid type can be more difficult from a user perspective, on the other hand a student was able to couple Modflow 6 and Wflow without too many issues a couple of years ago. As indeed fewer grid types would be better, I think we need only two grid types:
|
Although fewer grid types would be better from a user perspective, at the moment it does not seem worth to support this with the use of a buffer (@visr and I concluded this after a brief discussion). For example. to support reservoirs as part of a river grid type, a reference (for example with For the use of fewer grid types it makes more sense to solve this at the |
Hi Bart (@BSchilperoort), do the code changes in branch BMI now work fine for eWaterCycle (Albrecht got the impression it is working now during a workshop with escience this week)? The branch will also go through an internal code review, but would be good to also have some feedback from your side. |
Hi Willem, as far as I know the BMI is now complete and compliant with BMI 2.0. It worked great during the workshop. Thank you for implementing these fixes! If we encounter anything else at some point we'll open up an issue. The only thing not working yet for eWaterCycle is putting the model in a container and communicating with it through gRPC, as a Julia native implementation of a gRPC server does not exist. Trying to pass the BMI through Python or C has led to segfaults and other issues. We'll still have to work on this, and possibly use a different communication protocol. However, this is not related to this PR or any code in Wflow.jl. |
Wflow version checks
I have checked that this issue has not already been reported.
I have checked that this bug exists on the latest version of Wflow.
Reproducible Example
Request of grid type seems wrong for some variables. As an example, for
lateral.subsurface.to_river
: request of grid type is incorrect. With BMI.get_var_grid, grid=2 is returned. However, the Vector that is returned with BMI.get_value has the size of grid=4.Current behaviour
BMI.get_value deviates from the BasicModelInterface.jl implementation (
dest
argument is missing).Grid type for some variables seems wrong (see also Reproducible Example).
BMI.get_value should always return a flattened array, this is not the case for variables with an extra dimension (e.g.
layer
), then a Vector of SVectors is returned.Desired behaviour
BMI.get_value should have
dest
as argument.Grid type should be correct for each variable. This has already been resolved in the zmq_server branch (coupling with OpenDA), but good to include in a separate branch that addresses this issue.
BMI.get_value should always return a flattened array (probably need to make use of get_grid_z).
Additional Context
These BMI issues were reported during an update of eWaterCycle to support Wflow.jl.
The text was updated successfully, but these errors were encountered: