You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have been testing Deequ.
So far I had mixed results when using Redshift as a datasource.
I am using Spark Redshift library in order to load a data frame from Redshift.
One example of the problems I had is with uniqueness verification of a column.
I get the following error:
java.sql.SQLException: Exception thrown in awaitResult:
at io.github.spark_redshift_community.spark.redshift.JDBCWrapper.executeInterruptibly(RedshiftJDBCWrapper.scala:172) ~[spark-redshift_2.12-6.1.0-spark_3.4.jar:6.1.0-spark_3.4]
at io.github.spark_redshift_community.spark.redshift.JDBCWrapper.executeInterruptibly(RedshiftJDBCWrapper.scala:145) ~[spark-redshift_2.12-6.1.0-spark_3.4.jar:6.1.0-spark_3.4]
at io.github.spark_redshift_community.spark.redshift.RedshiftRelation.UnloadDataToS3(RedshiftRelation.scala:328) ~[spark-redshift_2.12-6.1.0-spark_3.4.jar:6.1.0-spark_3.4]
at io.github.spark_redshift_community.spark.redshift.RedshiftRelation.$anonfun$buildScanFromSQL$1(RedshiftRelation.scala:271) ~[spark-redshift_2.12-6.1.0-spark_3.4.jar:6.1.0-spark_3.4]
at scala.Option.orElse(Option.scala:447) ~[scala-library-2.12.17.jar:?]
at io.github.spark_redshift_community.spark.redshift.RedshiftRelation.buildScanFromSQL(RedshiftRelation.scala:271) ~[spark-redshift_2.12-6.1.0-spark_3.4.jar:6.1.0-spark_3.4]
at io.github.spark_redshift_community.spark.redshift.pushdown.RedshiftScanExec$$anon$1.call(RedshiftScanExec.scala:53) ~[spark-redshift_2.12-6.1.0-spark_3.4.jar:6.1.0-spark_3.4]
at io.github.spark_redshift_community.spark.redshift.pushdown.RedshiftScanExec$$anon$1.call(RedshiftScanExec.scala:49) ~[spark-redshift_2.12-6.1.0-spark_3.4.jar:6.1.0-spark_3.4]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
at java.lang.Thread.run(Thread.java:829) ~[?:?]
Caused by: com.amazon.redshift.util.RedshiftException: ERROR: cannot cast type boolean to double precision
at com.amazon.redshift.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2613) ~[redshift-jdbc42-2.1.0.23.jar:?]
at com.amazon.redshift.core.v3.QueryExecutorImpl.processResultsOnThread(QueryExecutorImpl.java:2281) ~[redshift-jdbc42-2.1.0.23.jar:?]
at com.amazon.redshift.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1886) ~[redshift-jdbc42-2.1.0.23.jar:?]
at com.amazon.redshift.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1878) ~[redshift-jdbc42-2.1.0.23.jar:?]
The data itself is the Sample Database provided by AWS.
The verification code is as follows:
Hello,
I have been testing Deequ.
So far I had mixed results when using Redshift as a datasource.
I am using Spark Redshift library in order to load a data frame from Redshift.
One example of the problems I had is with uniqueness verification of a column.
I get the following error:
The data itself is the Sample Database provided by AWS.
The verification code is as follows:
Is using Redshift as a datasource supported by Deequ?
The text was updated successfully, but these errors were encountered: