-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Key error for some stations #5
Comments
Can you please provide an example of code that produces this error? |
Thank you for the quick response!
df_precip_temp = response.to_dataframe() This is an example of one of the stations that throws this error for me. There are lots, such as |
This gives a similar error, but with a different: x = ["FIPS:56"] |
Both these errors result from bugs in how this library handles missing data. In the first example, it looks like there is a gap in the data for that station between 1951 and 2003; the missing years are causing the errors. In the second, certain stations are missing the elevation parameter. I've patched the issue on GitHub but expect it will be a bit before I do a new release. In the meantime, you can install the GitHub code as follows:
And here is a version of your code that should catch the missing years. It does require you to use the development code. response = NCEIResponse()
for year in range(2000, 2024):
resp = ncei.get_data(
datasetid="GHCND",
stationid='GHCND:USC00140637',
datatypeid=["PRCP"],
startdate=date(year, 1, 1),
enddate=date(year, 12, 31)
)
if resp:
response.extend(resp)
else:
print(f"No data found for {year}")
response.to_dataframe() Let me know if that solves the problem for you. |
Thanks for digging into this! It works now for |
Can you please provide the code that is producing the error? When I plug those stations into the code above, it seems to run fine. |
Here is my uncommented, data science-esque code in all of its inefficient glory... Maybe I did something wrong and am still using the original pyncei code? lat = 35.00
lon = -97.05
distance = 125 #km
df_stations = pd.read_csv('stations.csv')
gdf_stations = gpd.GeoDataFrame(df_stations,
geometry=gpd.points_from_xy(df_stations['longitude'], df_stations['latitude']),
crs='EPSG:4326')
gdf_stations_proj = gdf_stations.to_crs('EPSG:3395')
site = gpd.GeoSeries([Point(lon, lat)], crs='EPSG:4326').to_crs('EPSG:3395')
gdf_stations_proj['distance'] = gdf_stations_proj.distance(site[0])
gdf_ref = gdf_stations_proj[gdf_stations_proj['distance'] <= distance * 1000] # Filter for distances within set distance
df_precip = pd.DataFrame()
for id in gdf_ref["id"].unique():
year = 2023
ncei = NCEIBot("********************************", cache_name="ncei")
response = NCEIResponse()
for year in range(2000, 2024):
resp = ncei.get_data(
datasetid="GHCND",
stationid=id,
datatypeid=["PRCP"],
startdate=date(year, 1, 1),
enddate=date(year, 12, 31)
)
if resp:
response.extend(resp)
else:
print(f"No data found for {year}")
df_precip_temp = response.to_dataframe()
df_precip = pd.concat([df_precip, df_precip_temp]) I am also attaching a copy of |
Hmm I can't reproduce the error without falling back to the release version on PyPI. I'm a little mystified by the error popping up for these stations but not for the station we discussed earlier. Is the traceback the same? Can you run And a friendly word of warning--you don't want to share an API token publicly. I tried to obscure it above but it's still in the comment history. Be careful pasting code in a public forum. |
Yep, you're right, I didnt install the updated version correctly the first time (still not sure how that station ran, I checked it like 3 times). Anyways, appreciate the help! |
Hello, I'm having a similar issue. I checked the version of the package and the version is 1.0. I also checked if NOAA has a request response, it seems that the server is providing data, but the package can't convert it into a data frame. I would appreciate it if you could help me with this. |
There is a good chance this is a user error, but I am running into the following error, specifically when pulling
GHCND
andPRCP
data. If I follow the example and generate a response, there appears to be data, but usingto_dataframe()
throws the following error for some stations:File "/PycharmProjects/Regenerate/.venv/lib/python3.12/site-packages/IPython/core/interactiveshell.py", line 3553, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "<ipython-input-15-7a2ac4d974af>", line 1, in <module> response.to_dataframe() File "/PycharmProjects/Regenerate/.venv/lib/python3.12/site-packages/pyncei/bot.py", line 1068, in to_dataframe df = pd.DataFrame(self.values()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/PycharmProjects/Regenerate/.venv/lib/python3.12/site-packages/pandas/core/frame.py", line 832, in __init__ data = list(data) ^^^^^^^^^^ File "/PycharmProjects/Regenerate/.venv/lib/python3.12/site-packages/pyncei/bot.py", line 1010, in values yield {k: val[k] for k in self.key_order if k in keys} ~~~^^^ KeyError: 'station'
Is there anything I can do to avoid this or check for this issue before running
to_dataframe()
to avoid erroring out?The text was updated successfully, but these errors were encountered: