-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
remaining corpus of idx mapping functionality #523
Conversation
@martindurant , just to make sure I understand, this is awaiting your review? |
Yes, but it's only 12hr old :) @emfdavid is asking for feedback above, so if you are in a place to do that, we would appreciate that too. |
@martindurant, sorry, didn't mean to suggest impatience! 🙃 |
Recently meet Taylor who has a different take on the same concept... his approach is more in the on-the-fly category more akin to hypergrib. |
@rsignell I am impatient 😆 did you kick the tires yet? |
@emfdavid , heh heh, sorry, got distracted with another burning fire, but I just tried it out. Running into some xarray/datatree issue, perhaps because the latest release (2024.10.0) of xarray has datatree built in now and there is some conflict if you try to not use the built in version?
|
The CI is showing that too, so yes, probably need to pin xarray or not-install tree |
Ideally this code would use the now-built-in xarray datatree -- would that be an easy fix for you @emfdavid ? |
So I did manage to run the test notebook using Do the extracted data at the end of the rendered notebook (Cell [21]) look okay? Seem a bit strange... |
I updated the code to use the now built in datatree 🎉 I can't run your notebook tonight but those nan values do look funny. Building the axes is really finicky... I can try to take a look tomorrow. I did run down the issue with the variable names dswrf -> sdswrf, ulwrf -> sulwrf Is that something we can ask them to undo? |
I was able to run a slightly-modified version of the notebook using the latest xarray (2024.10.0) and your latest fixes to #523 ! When I extracted some data values from the virtual dataset, however, they seem unlikely to be correct. @emfdavid, @Anu-Ra-g, or @martindurant, is there a simple alternate method by which we could check to see if the data values in the last cell are correct? Perhaps we should hold off merging until we decide whether the data extraction is working properly? |
I will see if I can debug your notebook tonight. If you have a chance to try the one in this PR I would appreciate it. As far as when to merge... do you (@martindurant) want the tests added to this already large PR? |
@emfdavid okay, I'm testing the notebook in the repo now... And it took awhile (like 10 minutes), but it worked: https://nbviewer.org/gist/rsignell/64234851f1eafeab261ed8a774aea5ca I used this conda environment file to create the environment to run it. |
The data extraction is working properly as there are minimal code changes and refactors from the original code. But I used GEFS grib data for making that notebook. Maybe that could be a reason! As per the original demonstrations by @emfdavid with the GFS data, the data extraction works fine. If you want, I could change it from GEFS to some other model's data. |
I'd like to make sure it's actually working as expected on the GEFS data - we don't want to sidestep a potential bug, right? If those NaN values are correct, then yes, perhaps switching to GFS would make a more pleasing notebooks. |
@rsignell There is a white space error in the double for loop. The notebook is only processing the 18z forecast for each day which is why there are three nan value between each real value. I also noticed there are a lot of duplicate attrs in that much older 2017 file. The GEFS model changed significantly around 2020-09-25. I think these are really issues with the NOAA Grib file and its compatibility with cfgrib. I don't see this on the more recent GEFS data, but I have only really looked at the geavg files so far. |
@martindurant I tried to fix the build issue, with ImportError "HybridMappingProxy" for just the 3.10 build but I think I made it worse with d40cca4 We can definitely rebase away the notebooks before merging - those are not to be checked in here like this. |
Whoa, great find on that white space indent error @emfdavid ! The notebook indeed works fine now: https://nbviewer.org/gist/rsignell/c3fd58368ed9d0ae50c26807b6a51678 |
It does. Actually, xarray is noarch, but it has compiled deps. It does install into a fresh env, though.
should be
? |
OK, I am happy to push this in, as is - it's fine to include the example notebook, and we can always iterate yet. OK, everyone? |
Sure - your call on the notebook - it is a bit chunky to have in the git tree at ~12mb... I will work on the tests shortly. There will be some moderate size test fixture files for that. |
Can we put in the un-evaluated notebook? |
Pushed the change - can you squash merge? |
This is the remainder of the idx mapping code - primarily reinflate_grib_store and supporting methods that allow creating datasets from the idx files.
This PR is pretty large. I think the notebook example should go to https://github.com/ProjectPythia/kerchunk-cookbook ?
But I wanted you to be able to see the rest and how it works. Happy to break it up however you want. I know the methods need better doc strings, but I wanted you to give me a little more feedback on how to proceed first.
There are also pretty extensive tests and I will move those too but they require a bit more work to convert from unittest to pytest.