Spark 2.2.0 compatibility #8

glaphil · 2017-12-08T13:31:13Z

Hello Takeshi,

I'm working on a POC with kinesis + spark structured streaming.

I read that Spark Structured Streaming is marked as GA only from 2.2.0, so I'd like to use your lib with spark 2.2.0, but I got the following error :

{code}
17/12/08 13:43:09 ERROR StreamExecution: Query [id = 95bccb80-d234-449b-aaa7-ab19dda71d2b, runId = c6f1c512-7e63-4ead-ba42-1fe67318ddb2] terminated with error
java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.json.JSONOptions.(Lscala/collection/immutable/Map;)V
at org.apache.spark.sql.execution.datasources.json.JsonKinesisValueFormat.buildReader(JsonKinesisValueFormat.scala:105)
at org.apache.spark.sql.kinesis.KinesisSource.getBatch(KinesisSource.scala:477)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatch$2$$anonfun$apply$7.apply(StreamExecution.scala:607)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatch$2$$anonfun$apply$7.apply(StreamExecution.scala:603)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at org.apache.spark.sql.execution.streaming.StreamProgress.foreach(StreamProgress.scala:25)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at org.apache.spark.sql.execution.streaming.StreamProgress.flatMap(StreamProgress.scala:25)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatch$2.apply(StreamExecution.scala:603)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatch$2.apply(StreamExecution.scala:603)
at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:279)
at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runBatch(StreamExecution.scala:602)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(StreamExecution.scala:306)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$apply$mcZ$sp$1.apply(StreamExecution.scala:294)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$apply$mcZ$sp$1.apply(StreamExecution.scala:294)
at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:279)
at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1.apply$mcZ$sp(StreamExecution.scala:294)
at org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56)
at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches(StreamExecution.scala:290)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:206)
{code}

Would you know how I can fix it ? Or should I come back to spark 2.1.0 (it runs on it) ?

I saw that you're no more maintaining this lib, you refered to databricks but I can't find any link to download a lib from their site or github... It seems that the only way to use their connector is by using their integrated data analysis solution....
Would you know if kinesis will be soon "officialy" supported by spark ?

Thanks!

Philippe

kavyaMariGowda · 2018-07-13T02:46:25Z

Hi ,

Is this issues reloved?

glaphil · 2018-07-13T07:27:06Z

I came back to spark 2.1.0..
I hope databricks will open source their connector...
Be careful with that lib, at every run it will create dynamodb checkpoint tables on N. Virginia zone (due KCL lib used under the hood), and you can't set its names or zone...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark 2.2.0 compatibility #8

Spark 2.2.0 compatibility #8

glaphil commented Dec 8, 2017

kavyaMariGowda commented Jul 13, 2018

glaphil commented Jul 13, 2018

Spark 2.2.0 compatibility #8

Spark 2.2.0 compatibility #8

Comments

glaphil commented Dec 8, 2017

kavyaMariGowda commented Jul 13, 2018

glaphil commented Jul 13, 2018