Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hourly reanalysis downloading #296

Open
wants to merge 6 commits into
base: develop
Choose a base branch
from

Conversation

ejsimley
Copy link
Collaborator

@ejsimley ejsimley commented Jun 4, 2024

This pull request adds two new functions to the utils/downloader module to download hourly reanalysis data: get_era5_hourly and get_merra2_hourly. These functions are modified from get_era5_monthly and get_merra2_monthly contributed by @charlie9578, and similarly download data from the CDS and GES DISC services.

Note that it can take a long time to download historical data (~1 day for a 20-year time series). Downloading era5 data seems to be faster than merra2 for me, though.

@ejsimley ejsimley requested a review from RHammond2 June 4, 2024 16:02
@codecov-commenter
Copy link

codecov-commenter commented Jun 4, 2024

Codecov Report

Attention: Patch coverage is 0% with 124 lines in your changes missing coverage. Please review.

Project coverage is 70.14%. Comparing base (a53308e) to head (33c940f).
Report is 13 commits behind head on develop.

Files Patch % Lines
openoa/utils/downloader.py 0.00% 124 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #296      +/-   ##
===========================================
- Coverage    72.49%   70.14%   -2.35%     
===========================================
  Files           29       29              
  Lines         3690     3815     +125     
  Branches       796      819      +23     
===========================================
+ Hits          2675     2676       +1     
- Misses         826      950     +124     
  Partials       189      189              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@RHammond2 RHammond2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this together, it's a great addition to the toolkit! The only comment holding me back from an approval was the one regarding the monthly files from ERA5, and then all the base files from the MERRA2. I'm thinking we don't really need them, and should ditch them, but I could be swayed towards leaving it as-is as well.

Comment on lines +507 to +510
lat + node_spacing,
lon - node_spacing,
lat - node_spacing,
lon + node_spacing,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that I quite follow the node_spacing logic, could you just explain that a bit?

# set up cds-api client
try:
c = cdsapi.Client()
except Exception as e:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this what's raised from the API? I thought it was more specific for some reason, or are you just trying to capture any failed connection, regardless of reason?

],
"year": None,
"month": None,
"day": [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could do [f"{i:02d}" for i in range(1, 32)] just to trim the size of the list

"31",
],
"time": [
"00:00",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, but with [f"{i:02d}:00" for i in range(24)].

Comment on lines +358 to +360
As well as returning the data as a dataframe, the data is also saved as monthly NetCDF files and
a csv file with the concatenated data. These are located in the "save_pathname" directory, with
"save_filename" prefix. This allows future loading without download from the CDS service.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about this, and I think we should just delete the monthly files, and save as one single netcdf and csv file as it's less onerous on the user who'll have to delete the originals if they're no longer required.


for month in months:
# get the file names from the GES DISC site for the year
result = requests.get(base_url + str(year) + "/%02d" % month)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The string portion could get tidied up to f"{base_url}{year}/{month:02d}"
if I followed that correctly.

Comment on lines +822 to +824
months = list(range(1, end_date.month + 1, 1))
else:
months = list(range(1, 12 + 1, 1))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The extra 1 at the end is the default, even when the starting point is changed. Though if that's just a style thing, then it doesn't bother me, but thought I'd flag it just in case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants