You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This would be a separate py2store dependent repository.
The objective of this project is to offer easy and consistent access to various datasets.
We'll start with dataset providers that have a lot of data (so that we can get a lot out of the py2store wrapper we'll make for it).
The interface should start off as other hierarchical explorers such as for files (folders, subfolders, files) or DBs (e.g. mongo host>dbs>collections or sql connection>dbs>tables). For example, the first level of listing would list the data providers or other named groups (with a misc for the catch all unclassified). For example:
Check out if there's already a python lib to connect to the data provider (mall).
Check out API.
If API easy to use py2request (all we need is listing and download capabilities), use raw API. If not use python lib if available.
Caching
We want to use caching smartly and automatically (with automatic refreshes on a schedule, and/or warnings when a refresh hasn't happened for awhile.
We want to cache both listings as well as metadata and data.
Depending on the context, the cache could work in many ways. For example:
If listings are long, better cache them (and hope the API has some "anything new since DATE" function).
Even if listings are short, we'd like to cache them for offline use of the object.
This would be a separate py2store dependent repository.
The objective of this project is to offer easy and consistent access to various datasets.
We'll start with dataset providers that have a lot of data (so that we can get a lot out of the py2store wrapper we'll make for it).
The interface should start off as other hierarchical explorers such as for files (folders, subfolders, files) or DBs (e.g. mongo host>dbs>collections or sql connection>dbs>tables). For example, the first level of listing would list the data providers or other named groups (with a
misc
for the catch all unclassified). For example:Check out if there's already a python lib to connect to the data provider (mall).
Check out API.
If API easy to use
py2request
(all we need is listing and download capabilities), use raw API. If not use python lib if available.Caching
We want to use caching smartly and automatically (with automatic refreshes on a schedule, and/or warnings when a refresh hasn't happened for awhile.
We want to cache both listings as well as metadata and data.
Depending on the context, the cache could work in many ways. For example:
Dataset providers
More links
https://www.freecodecamp.org/news/https-medium-freecodecamp-org-best-free-open-data-sources-anyone-can-use-a65b514b0f2d/
The text was updated successfully, but these errors were encountered: