Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for reading files from S3 #73

Open
metasim opened this issue May 21, 2018 · 3 comments
Open

Add support for reading files from S3 #73

metasim opened this issue May 21, 2018 · 3 comments

Comments

@metasim
Copy link
Contributor

metasim commented May 21, 2018

Support for this via the "External File" data source option should be pretty easy and extremely helpful for those of us using AWS EMR for deployment. I attempted to implement this myself, but couldn't figure out how to make sure the hadoop-aws library was included in the local-mode runtime:

https://github.com/s22s/seahorse/commit/909d37852975d88a205d4c3ee98f769e4a3430d9

So I couldn't test the implementation, at least not in local/development mode.

FWIW: The following schemes are supported by org.apache.hadoop:hadoop-aws:

Scheme Service Provider
s3 org.apache.hadoop.fs.s3.S3FileSystem
s3a org.apache.hadoop.fs.s3native.NativeS3FileSystem
s3n org.apache.hadoop.fs.s3a.S3AFileSystem

cc: @mteldridge @mobsy74

@jaroslaw-osmanski
Copy link

@metasim some info:

@metasim
Copy link
Contributor Author

metasim commented Jun 27, 2018

@jaroslaw-osmanski Thanks for the additional, helpful info. All that sounds good to us, and we're comfortable with those constraints/requirements.

@metasim
Copy link
Contributor Author

metasim commented Jul 12, 2018

Proof of concept:

#92

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants