Spark job with AWS S3 or S3 compatible storage, s3n file size limit? #130
Unanswered
heungheung
asked this question in
Query
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Are we still using s3n:// for Spark job(s) with AWS S3 or S3 compatible storage?
If yes, is there any plan to change to s3a for AWS? How about other S3 compatible storage?
I understand there is/are adopter(s) have problem when file size is large, error will be something like
diving into the storeFile function, it seems to me the function storeFile
https://github.com/apache/hadoop/blob/branch-2.7.3/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3native/Jets3tNativeFileSystemStore.java#L106
should auto detect whether to use storeLargeFile instead of just putObject - of course, that putObject will have size limitation.
How to enable such storeLargeFile? Is it possible to set / override this
in the job level or this has been config in the whole Spark cluster?
Beta Was this translation helpful? Give feedback.
All reactions