-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
latest and nearresttime fail with missing directory on AWS #77
Comments
This actually happens even without dataflow issues. The easiest way to replicate this is with full disk imagery just after the top of the hour, without any images in for that hour yet (thus that hour directory does not exist on AWS yet) - it will fail, citing the missing directory. |
Hey @csteele2 did you ever figure out a workaround for this? I am able to replicate the issue with the GOES Clear Sky Mask product. The 'noaa-goes16/ABI-L2-ACMC/2022/014/23' hour is empty and produces the error: FileNotFoundError: noaa-goes16/ABI-L2-ACMC/2022/014/23 while hour 22 'noaa-goes16/ABI-L2-ACMC/2022/014/22' didn't produce an error even though it is missing scans after minute 31 Did you find some way to check if a file exists first? My first thought is catching this as an exception? Additionally if there is a way to return the number of missing scans in a time range (even sub hourly with when no error is thrown) that is also relevant to the cloud frequency problem I'm trying to solve. |
Not yet. I work around the top of the hour stuff by generating a list after I know the first file hits AWS. I was deciding on whether to hack up and try to fix goes2go, or go my own completely different way. What's pulling me out of goes2go is the day/night blending that is in satpy. I suppose I could give trying to come up with a way to fix this within goes2go, but it probably won't be soon. Probably more like spring. Tangentially related is incomplete downloads - it's pretty easily worked around when it happens, but ideally, something here would be included to redownload if it a partial file is returned (which happens to me very frequently). |
This seems quick and dirty but could be a possibility to skip over files (hours) that don't have any data in the AWS bucket. This seems like it will work for at least allowing bulk downloads to continue for the timerange function. Any suggestions for how to modify this to get how many scans within an hour are missing? in data.py line 139 `
` Input Example Case Download data for a specified time range Output for example case before change: Output for example case after change: This doesn't fix/alert for partially filled buckets: for example 2022-01-15 02:00:00 only has scans for minutes 41,46,51,56 I apologize for formatting, new to github. |
I just ran into this same issue when trying to plot a week-long series of GOES images that have a data gap in the middle. I was going to implement a fix along the lines of the code from @vwgeiser in the prior comment, and then discovered this thread. Cutting and pasting his code solves the general problem when using goes_timerange. While it does not anticipate the specific missing files to report them one-by-one, at least noting the empty directory and not crashing is helpful. This saves having to go through the AWS file listing and figure out the missing hours to avoid them manually. I am in favor of including the above code change. |
This is because the file with interval from last ten minutes from the hour, and created within first seconds from next hour is being placed in folder from the hour of its interval, not in the hour of its creation. For example, the file: OR_ABI-L2-FDCF-M6_G18_s20242382050213_e20242382059521_c20242382100018.nc is within folder 20, but the code looks this file in folder 21, which does not exist yet. It will only be created ten minutes later (with the next file in sequence). The code from @vwgeiser fixes it. Tested with data.py
|
I know this is mostly an upstream issue, but I imagine this shouldn't be too much flexibility to support. The nearesttime and latest functions seem to fail if there is an hour missing from the directories on AWS.
For example, today, 4-Dec-2023 (day 338), the 18Z directory has not been created yet in noaa-goes16/ABI-L2-MCMIPF/2023/338, and the current time is 1946Z. This results in errors like
FileNotFoundError: noaa-goes16/ABI-L2-MCMIPF/2023/338/18
even when specifying a time like 1943Z on this day.
The text was updated successfully, but these errors were encountered: