You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I was trying out some of the data sources, and I notice that some of them can take a while to run, while also requiring an active internet connection. I know this suggestion introduces further headaches, but perhaps you should consider setting up a cache for the non-real time datasets?
For the requests based datasets, this would be as trivial as adding in requests-cache:
Along with some kind of helpful expire_cache() call for students to use if the remote data changes for whatever reason.
The SPARQLWrapper stuff would probably be a bit messier, since that's using urllib under the hood. But it probably wouldn't be too hard to just make a little decorator for it. Heck, you could probably even reuse the architecture for requests_cache and keep it all in one place.
If this seemed worthwhile, I'm willing to turn this into a Pull Request. But I wanted to get a sense of whether this is a worthwhile direction.
The text was updated successfully, but these errors were encountered:
This is a good idea, thank you. We will need to do a bit of exploration before we can say it is something we can use without unintended side effects.
As it stands we are already caching some of the OSM data internally. Can you elaborate on which ones were slow on your end so we can investigate a bit further?
Cory, we are indeed caching some of the larger datasets like OpenStreetMap, the NOAA elevation map. Is there something more and better we should be doing?
It's a little tough to tell exactly what I was working on then, but I believe it was the WikiData dataset.
My perspective was that all datasets should be cached, with some clever mechanism for easily letting students clear out that local cache. I'm a little less worried about speed than I am about internet stability and the need to not worry about being connected and such. I was expecting something like what Sinbad does. There are headaches and issues, but it seemed like a worthwhile fight to me.
Hi, I was trying out some of the data sources, and I notice that some of them can take a while to run, while also requiring an active internet connection. I know this suggestion introduces further headaches, but perhaps you should consider setting up a cache for the non-real time datasets?
For the
requests
based datasets, this would be as trivial as adding in requests-cache:Along with some kind of helpful
expire_cache()
call for students to use if the remote data changes for whatever reason.The SPARQLWrapper stuff would probably be a bit messier, since that's using
urllib
under the hood. But it probably wouldn't be too hard to just make a little decorator for it. Heck, you could probably even reuse the architecture forrequests_cache
and keep it all in one place.If this seemed worthwhile, I'm willing to turn this into a Pull Request. But I wanted to get a sense of whether this is a worthwhile direction.
The text was updated successfully, but these errors were encountered: