How to set configuration to access files on S3 bucket #90

richiesgr · 2016-09-27T13:45:58Z

Hi
I succeed to start the job but my files are located in S3 bucket so I get always

Caused by: java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3n URL, or by setting the fs.s3n.awsAccessKeyId or fs.s3n.awsSecretAccessKey properties (respectively).
    at org.apache.hadoop.fs.s3.S3Credentials.initialize(S3Credentials.java:70) ~[?:?]

I've tried to set the keys in all way I know nothing help
tried to put in properties like this :

"properties": {
        "fs.s3n.awsAccessKeyId": "xx",
        "fs.s3n.awsSecretAccessKey": "xx"}

or in context like this

"context": {
        "fs.s3n.awsAccessKeyId": "xx",
        "fs.s3n.awsSecretAccessKey": "xx",

I've also tried to set the S3 keys directly on spark cluster but it seems that it's not working anymore.
Do you have any idea what can I do ?

Thanks

The text was updated successfully, but these errors were encountered:

drcrallen · 2016-09-27T22:25:02Z

try using s3 instead of s3n in your property stuff. so fs.s3.awsAccessKeyId

richiesgr · 2016-09-28T13:35:50Z

Hi
Thanks for your help I keep trying to get it working with spark 2.0.0.

I've add as hadoop-dependency the spark-core:2.0.0 it's not able to work with s3n because it doesn't include the necessary packages.

So like you said I'm return back to old S3 protocol I'm usin s3:// url and provide as properties the access key and secret key.

Now I get this
org.jets3t.service.S3ServiceException: Service Error Message. -- ResponseCode: 403, ResponseStatus: Forbidden, XML Error Message: SignatureDoesNotMatchThe request signature we calculated does not match the signature you provided. Check your key and signing method.xxxxxxGETWed, 28 Sep 2016 13:30:14 GMT/inneractive-events/20160801_15/aggregationevent_-1_7f947d_1470081545369.gz

So the REST API get the access and secret key but refuse to access the file
I know it's not really related to your extension but may be you can help me on this.
Thanks

richiesgr · 2016-09-28T14:41:53Z

By the way I check the bug you open on druid regarding classpath.
may be it's related I can make spark using s3n protocol if I add all the missing dependency but it's not helping because of that:

com.fasterxml.jackson.databind.JsonMappingException: Jackson version is too old 2.4.6

In fact the dependency of druid is still first in the classpath and not working

drcrallen · 2016-09-28T16:22:11Z

I'm still trying to get druid working internally on 0.9.2 + spark2.0

Whenever it does work the branch will be here: https://github.com/metamx/druid/tree/0.9.2-mmx

Right now I'm on https://github.com/metamx/druid/tree/0.9.2-mmx-fix3148 which is under very active modifications.

I have had to modify the jackson version in the master druid POM to accommodate spark 2.0

A _2.10 release of this repository should still be compatible with a spark 1.6.x deployment (assuming scala 2.10), though.

In short, spark 2.0 is causing various problems for me as well, and I haven't had a stable version running on druid 0.9.2-rc yet.

richiesgr · 2016-09-29T07:12:21Z

Thanks I'll switch back to 1.6.2
and try again with scala 2.10
are you thinking to deploy your extension to the repository of metamarket like this the pull-dep will work ?
Thanks

richiesgr · 2016-09-30T11:46:31Z

Ok still nothing here.
I made a Spark cluster on ec2 1.6.2/hadoop 2.4
change the
druid.indexer.task.defaultHadoopCoordinates=["org.apache.spark:spark-core_2.10:1.6.2"]

made pull-deps to get it
now I get
Caused by: java.lang.NoSuchMethodError: org.jets3t.service.impl.rest.httpclient.RestS3Service.(Lorg/jets3t/service/security/AWSCredentials;)V
at org.apache.hadoop.fs.s3.Jets3tFileSystemStore.initialize(Jets3tFileSystemStore.java:97) ~[?:?]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_101]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_101]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_101]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_101]
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) ~[?:?]
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) ~[?:?]
at com.sun.proxy.$Proxy193.initialize(Unknown Source) ~[?:?]

In fact there is an version of JetS3 in the hadoop distribution
I've tried to replace the jar directly in the spark-core directory it doesn't help.

How can I know if the job failed on druid or is it the spark cluster ?
When I check the log I can see druid contact spark but there is no job started on the spark cluster
May be could I add some other dependency in your project and make a uber jar with the new jar

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to set configuration to access files on S3 bucket #90

How to set configuration to access files on S3 bucket #90

richiesgr commented Sep 27, 2016

drcrallen commented Sep 27, 2016

richiesgr commented Sep 28, 2016

richiesgr commented Sep 28, 2016

drcrallen commented Sep 28, 2016

richiesgr commented Sep 29, 2016

richiesgr commented Sep 30, 2016

How to set configuration to access files on S3 bucket #90

How to set configuration to access files on S3 bucket #90

Comments

richiesgr commented Sep 27, 2016

drcrallen commented Sep 27, 2016

richiesgr commented Sep 28, 2016

richiesgr commented Sep 28, 2016

drcrallen commented Sep 28, 2016

richiesgr commented Sep 29, 2016

richiesgr commented Sep 30, 2016