Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Key error for some stations #5

Open
skfrost01 opened this issue Feb 16, 2024 · 12 comments
Open

Key error for some stations #5

skfrost01 opened this issue Feb 16, 2024 · 12 comments

Comments

@skfrost01
Copy link

There is a good chance this is a user error, but I am running into the following error, specifically when pulling GHCND and PRCP data. If I follow the example and generate a response, there appears to be data, but using to_dataframe() throws the following error for some stations:

File "/PycharmProjects/Regenerate/.venv/lib/python3.12/site-packages/IPython/core/interactiveshell.py", line 3553, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "<ipython-input-15-7a2ac4d974af>", line 1, in <module> response.to_dataframe() File "/PycharmProjects/Regenerate/.venv/lib/python3.12/site-packages/pyncei/bot.py", line 1068, in to_dataframe df = pd.DataFrame(self.values()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/PycharmProjects/Regenerate/.venv/lib/python3.12/site-packages/pandas/core/frame.py", line 832, in __init__ data = list(data) ^^^^^^^^^^ File "/PycharmProjects/Regenerate/.venv/lib/python3.12/site-packages/pyncei/bot.py", line 1010, in values yield {k: val[k] for k in self.key_order if k in keys} ~~~^^^ KeyError: 'station'

Is there anything I can do to avoid this or check for this issue before running to_dataframe() to avoid erroring out?

@adamancer
Copy link
Owner

Can you please provide an example of code that produces this error?

@skfrost01
Copy link
Author

Thank you for the quick response!
year = 2023
response = NCEIResponse()
while year >= 2000:
response.extend(
ncei.get_data(
datasetid="GHCND",
stationid='GHCND:USC00140637',
datatypeid=["PRCP"],
startdate=date(year, 1, 1),
enddate=date(year, 12, 31)
)
)

    year -= 1

df_precip_temp = response.to_dataframe()

This is an example of one of the stations that throws this error for me. There are lots, such as GHCND:USC00347390 that work as expected.

@skfrost01
Copy link
Author

This gives a similar error, but with a different: KeyError: 'elevation

x = ["FIPS:56"]
stations = ncei.get_stations(
datasetid="GHCND",
datatypeid=["PRCP"],
locationid=x,
startdate=mindate,
enddate=maxdate,
)
df_stations = stations.to_dataframe()

@adamancer
Copy link
Owner

Both these errors result from bugs in how this library handles missing data. In the first example, it looks like there is a gap in the data for that station between 1951 and 2003; the missing years are causing the errors. In the second, certain stations are missing the elevation parameter. I've patched the issue on GitHub but expect it will be a bit before I do a new release. In the meantime, you can install the GitHub code as follows:

git clone https://github.com/adamancer/pyncei
cd pyncei
pip install .

And here is a version of your code that should catch the missing years. It does require you to use the development code.

response = NCEIResponse()
for year in range(2000, 2024):
    resp = ncei.get_data(
        datasetid="GHCND",
        stationid='GHCND:USC00140637',
        datatypeid=["PRCP"],
        startdate=date(year, 1, 1),
        enddate=date(year, 12, 31)
    )

    if resp:
        response.extend(resp)
    else:
        print(f"No data found for {year}")

response.to_dataframe()

Let me know if that solves the problem for you.

@skfrost01
Copy link
Author

Thanks for digging into this! It works now for USC00140637 and generally seems to be getting a higher success rate, but there are still some stations that are failing. For example:
USC00031459
USC00145870
USC00340017
I have this set up to pull stations within a radius of a point, so these are just a random selection of ones that failed.

@adamancer
Copy link
Owner

Can you please provide the code that is producing the error? When I plug those stations into the code above, it seems to run fine.

@skfrost01
Copy link
Author

skfrost01 commented Feb 16, 2024

Here is my uncommented, data science-esque code in all of its inefficient glory... Maybe I did something wrong and am still using the original pyncei code?

lat = 35.00
lon = -97.05
distance = 125 #km

df_stations = pd.read_csv('stations.csv')
gdf_stations = gpd.GeoDataFrame(df_stations,
                                geometry=gpd.points_from_xy(df_stations['longitude'], df_stations['latitude']),
                                crs='EPSG:4326')

gdf_stations_proj = gdf_stations.to_crs('EPSG:3395')
site = gpd.GeoSeries([Point(lon, lat)], crs='EPSG:4326').to_crs('EPSG:3395')

gdf_stations_proj['distance'] = gdf_stations_proj.distance(site[0])
gdf_ref = gdf_stations_proj[gdf_stations_proj['distance'] <= distance * 1000]  # Filter for distances within set distance
df_precip = pd.DataFrame()

for id in gdf_ref["id"].unique():
    year = 2023
    ncei = NCEIBot("********************************", cache_name="ncei")
    response = NCEIResponse()
    for year in range(2000, 2024):
        resp = ncei.get_data(
            datasetid="GHCND",
            stationid=id,
            datatypeid=["PRCP"],
            startdate=date(year, 1, 1),
            enddate=date(year, 12, 31)
            )
        if resp:
            response.extend(resp)
        else:
            print(f"No data found for {year}")

    df_precip_temp = response.to_dataframe()
    df_precip = pd.concat([df_precip, df_precip_temp])

I am also attaching a copy of stations.csv which is a bulk pull using ncei.get_stations
stations.csv

@adamancer
Copy link
Owner

Hmm I can't reproduce the error without falling back to the release version on PyPI. I'm a little mystified by the error popping up for these stations but not for the station we discussed earlier. Is the traceback the same?

Can you run pip freeze in your command line and locate pyncei in the output? If you've installed it from PyPI, it should show up as pyncei==1.0, otherwise there should be a path to a file on your computer.

And a friendly word of warning--you don't want to share an API token publicly. I tried to obscure it above but it's still in the comment history. Be careful pasting code in a public forum.

@skfrost01
Copy link
Author

Yep, you're right, I didnt install the updated version correctly the first time (still not sure how that station ran, I checked it like 3 times). Anyways, appreciate the help!

@YufanZheng
Copy link

屏幕截图 2024-05-03 130027 屏幕截图 2024-05-03 125935 屏幕截图 2024-05-03 125750

@YufanZheng
Copy link

Hello, I'm having a similar issue. I checked the version of the package and the version is 1.0. I also checked if NOAA has a request response, it seems that the server is providing data, but the package can't convert it into a data frame.

I would appreciate it if you could help me with this.

@YufanZheng
Copy link

Weixin Image_20240503132402 Weixin Image_20240503132502

I fixed the issue. When the response is 1, it is actually missing. As a result, they make mistakes when stitching data from different years. I modified the code to fix the occurrence of this exception.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants