Cache data sources upon access #54

acbart · 2020-01-30T13:28:45Z

Hi, I was trying out some of the data sources, and I notice that some of them can take a while to run, while also requiring an active internet connection. I know this suggestion introduces further headaches, but perhaps you should consider setting up a cache for the non-real time datasets?

For the requests based datasets, this would be as trivial as adding in requests-cache:

import requests_cache
requests_cache.install_cache('bridges_datasets')

Along with some kind of helpful expire_cache() call for students to use if the remote data changes for whatever reason.

The SPARQLWrapper stuff would probably be a bit messier, since that's using urllib under the hood. But it probably wouldn't be too hard to just make a little decorator for it. Heck, you could probably even reuse the architecture for requests_cache and keep it all in one place.

If this seemed worthwhile, I'm willing to turn this into a Pull Request. But I wanted to get a sense of whether this is a worthwhile direction.

The text was updated successfully, but these errors were encountered:

AlecGoncharow · 2020-01-30T22:00:02Z

This is a good idea, thank you. We will need to do a bit of exploration before we can say it is something we can use without unintended side effects.

As it stands we are already caching some of the OSM data internally. Can you elaborate on which ones were slow on your end so we can investigate a bit further?

krs-world · 2020-03-19T15:17:12Z

Cory, we are indeed caching some of the larger datasets like OpenStreetMap, the NOAA elevation map. Is there something more and better we should be doing?

acbart · 2020-03-24T16:13:53Z

It's a little tough to tell exactly what I was working on then, but I believe it was the WikiData dataset.

My perspective was that all datasets should be cached, with some clever mechanism for easily letting students clear out that local cache. I'm a little less worried about speed than I am about internet stability and the need to not worry about being connected and such. I was expecting something like what Sinbad does. There are headaches and issues, but it seemed like a worthwhile fight to me.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache data sources upon access #54

Cache data sources upon access #54

acbart commented Jan 30, 2020

AlecGoncharow commented Jan 30, 2020

krs-world commented Mar 19, 2020

acbart commented Mar 24, 2020

Cache data sources upon access #54

Cache data sources upon access #54

Comments

acbart commented Jan 30, 2020

AlecGoncharow commented Jan 30, 2020

krs-world commented Mar 19, 2020

acbart commented Mar 24, 2020