Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to set configuration to access files on S3 bucket #90

Open
richiesgr opened this issue Sep 27, 2016 · 6 comments
Open

How to set configuration to access files on S3 bucket #90

richiesgr opened this issue Sep 27, 2016 · 6 comments

Comments

@richiesgr
Copy link

Hi
I succeed to start the job but my files are located in S3 bucket so I get always

Caused by: java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3n URL, or by setting the fs.s3n.awsAccessKeyId or fs.s3n.awsSecretAccessKey properties (respectively).
    at org.apache.hadoop.fs.s3.S3Credentials.initialize(S3Credentials.java:70) ~[?:?]

I've tried to set the keys in all way I know nothing help
tried to put in properties like this :

"properties": {
        "fs.s3n.awsAccessKeyId": "xx",
        "fs.s3n.awsSecretAccessKey": "xx"}

or in context like this

"context": {
        "fs.s3n.awsAccessKeyId": "xx",
        "fs.s3n.awsSecretAccessKey": "xx",

I've also tried to set the S3 keys directly on spark cluster but it seems that it's not working anymore.
Do you have any idea what can I do ?

Thanks

@drcrallen
Copy link
Contributor

try using s3 instead of s3n in your property stuff. so fs.s3.awsAccessKeyId

@richiesgr
Copy link
Author

Hi
Thanks for your help I keep trying to get it working with spark 2.0.0.

I've add as hadoop-dependency the spark-core:2.0.0 it's not able to work with s3n because it doesn't include the necessary packages.

So like you said I'm return back to old S3 protocol I'm usin s3:// url and provide as properties the access key and secret key.

Now I get this
org.jets3t.service.S3ServiceException: Service Error Message. -- ResponseCode: 403, ResponseStatus: Forbidden, XML Error Message: SignatureDoesNotMatchThe request signature we calculated does not match the signature you provided. Check your key and signing method.xxxxxxGETWed, 28 Sep 2016 13:30:14 GMT/inneractive-events/20160801_15/aggregationevent_-1_7f947d_1470081545369.gz

So the REST API get the access and secret key but refuse to access the file
I know it's not really related to your extension but may be you can help me on this.
Thanks

@richiesgr
Copy link
Author

By the way I check the bug you open on druid regarding classpath.
may be it's related I can make spark using s3n protocol if I add all the missing dependency but it's not helping because of that:

com.fasterxml.jackson.databind.JsonMappingException: Jackson version is too old 2.4.6

In fact the dependency of druid is still first in the classpath and not working

@drcrallen
Copy link
Contributor

I'm still trying to get druid working internally on 0.9.2 + spark2.0

Whenever it does work the branch will be here: https://github.com/metamx/druid/tree/0.9.2-mmx

Right now I'm on https://github.com/metamx/druid/tree/0.9.2-mmx-fix3148 which is under very active modifications.

I have had to modify the jackson version in the master druid POM to accommodate spark 2.0

A _2.10 release of this repository should still be compatible with a spark 1.6.x deployment (assuming scala 2.10), though.

In short, spark 2.0 is causing various problems for me as well, and I haven't had a stable version running on druid 0.9.2-rc yet.

@richiesgr
Copy link
Author

Thanks I'll switch back to 1.6.2
and try again with scala 2.10
are you thinking to deploy your extension to the repository of metamarket like this the pull-dep will work ?
Thanks

@richiesgr
Copy link
Author

Ok still nothing here.
I made a Spark cluster on ec2 1.6.2/hadoop 2.4
change the
druid.indexer.task.defaultHadoopCoordinates=["org.apache.spark:spark-core_2.10:1.6.2"]

made pull-deps to get it
now I get
Caused by: java.lang.NoSuchMethodError: org.jets3t.service.impl.rest.httpclient.RestS3Service.(Lorg/jets3t/service/security/AWSCredentials;)V
at org.apache.hadoop.fs.s3.Jets3tFileSystemStore.initialize(Jets3tFileSystemStore.java:97) ~[?:?]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_101]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_101]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_101]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_101]
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) ~[?:?]
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) ~[?:?]
at com.sun.proxy.$Proxy193.initialize(Unknown Source) ~[?:?]

In fact there is an version of JetS3 in the hadoop distribution
I've tried to replace the jar directly in the spark-core directory it doesn't help.

How can I know if the job failed on druid or is it the spark cluster ?
When I check the log I can see druid contact spark but there is no job started on the spark cluster
May be could I add some other dependency in your project and make a uber jar with the new jar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants