-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow reading from remote resources over http #19
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! Thanks for finding out about fsspec
, it looks like a really promising package for a consistent interface for filesystem accesses.
To answer your question about other protocols, I think GCS and GDrive are the two main ones that come to mind.
One thing that I'm not clear about is reconciling the drivers that fsspec
offers with the virtual filesystems that GDAL supports. The worst case there is that there's some storage backend that fsspec
supports that GDAL does not. But I'm also cautiously optimistic that we might be able to find a workaround, like if we can pass GDAL an open file object or something. Anyways, what do you think about this challenge?
@@ -2,10 +2,13 @@ | |||
# -------------------- | |||
# This file records the packages and requirements needed in order for | |||
# the library to work as expected. And to run tests. | |||
aiohttp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is aiohttp
implicitly required by fsspec
? Or does it change the behavior of fsspec
if it's available at runtime?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It changes the behavior. http
protocol did not seem to be supported at all without it:
>>> of = fsspec.open('https://storage.googleapis.com/gef-ckan-public-data/awc-isric-soilgrids/awc.tif.yml')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\dmf\projects\geometamaker\env-test\Lib\site-packages\fsspec\core.py", line 459, in open
out = open_files(
^^^^^^^^^^^
File "C:\Users\dmf\projects\geometamaker\env-test\Lib\site-packages\fsspec\core.py", line 283, in open_files
fs, fs_token, paths = get_fs_token_paths(
^^^^^^^^^^^^^^^^^^^
File "C:\Users\dmf\projects\geometamaker\env-test\Lib\site-packages\fsspec\core.py", line 623, in get_fs_token_paths
chain = _un_chain(urlpath0, storage_options or {})
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\dmf\projects\geometamaker\env-test\Lib\site-packages\fsspec\core.py", line 332, in _un_chain
cls = get_filesystem_class(protocol)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\dmf\projects\geometamaker\env-test\Lib\site-packages\fsspec\registry.py", line 238, in get_filesystem_class
raise ImportError(bit["err"]) from e
ImportError: HTTPFileSystem requires "requests" and "aiohttp" to be installed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. It's strange to me that aiohttp
is listed as an extra requirement in fsspec
's setup.py
but not requests
. Oh well!
I agree. Do you think it makes sense to always install those dependencies ( If I understand correctly, files on GCS or GDrive could be referenced with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great from me end! Seems that it covers the items we wanted. Happy to test it out once it's ready.
I personally love the idea of
That is really awesome that public files could already be accessed over I think we should explore this option a little more before making a decision on being able to read from private filesystems and write to these filesystems. Open questions on my mind include:
|
Yeah good question. I guess it goes hand in hand with whether we want to support other file protocols besides HTTP. If so we would need to figure out how to have GDAL open files on those other protocols. |
Okay, great. In that case I think we don't need to add any other dependencies right now and maybe this PR is complete enough for this case.
Great points, thanks for thinking about this! I'll put these notes in another issue. |
This PR uses
fsspec
to open files.fsspec.open
will detect the correct protocol to use based on the filepath string.FSSPEC supports a large number of protocols
https://filesystem-spec.readthedocs.io/en/latest/api.html#built-in-implementations
https://filesystem-spec.readthedocs.io/en/latest/api.html#other-known-implementations
Availability depends on which extra dependencies are installed (see
extras_require
) .So far this PR added
requests
andaiohttp
as requirements in order to supporthttp
andhttps
protocols. Are there others we know we want to support?If a user is creating metadata for a remote dataset, GDAL drivers handle reading the dataset itself,
geometamaker
, via fsspec, checks for and reads any existing remote MCF for that dataset. If a user wishes to write metadata docs for a remote resource, they will have to use the newworkspace
arg inMetadataControl.write
to give a local directory where files can be written.Fixes #18