Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Deequ 2.0.7 - Spark CodeGenerator ERROR - Expression is not an rvalue #592

Open
pawelpinkos opened this issue Oct 22, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@pawelpinkos
Copy link

Describe the bug
Beginning with version 2.0.7 of Deequ (all spark releases) there is a bug in library witch happend failing of catalyst codegen in spark. The exception is catched so this do not fail runtime, you can observe the issue in the logs (eg. try to run MaximumTest from Deequ tests and see the log).

I have investigated and in my opinion the root cause of issue is the change: 34d8f3a

Error is throw when AnalisisRunner call dataframe.agg() here depending of provided parameters. Eg. before deequ 2.0.7 (for the example provided in "To Reproduce" section) the parameteres were:

  • min(CAST(size AS DOUBLE))
  • max(CAST(size AS DOUBLE))
  • CAST(sum(size) AS DOUBLE)
  • CAST(count(size) AS BIGINT)
  • stateful_stddev_pop(size)
  • CAST(sum(size) AS DOUBLE)

And there was no error. For deequ 2.0.7 the parameters are:

  • min(CAST(element_at(array(InScopeData AS source, size AS selection), 2) AS DOUBLE))
  • max(CAST(element_at(array(InScopeData AS source, size AS selection), 2) AS DOUBLE))
  • CAST(sum(size) AS DOUBLE)
  • CAST(count(size) AS BIGINT)
  • stateful_stddev_pop(size)
  • CAST(sum(size) AS DOUBLE)

And the error is thrown.

This is cause of a lot of errors in logs of application witch use Deequ. I have tried to bump deequ in my project to 2.0.7 but beacuse of this I have to postpone this action.

24/10/22 10:13:20 ERROR CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 91, Column 1: Expression "hashAgg_isNull_21" is not an rvalue
org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 91, Column 1: Expression "hashAgg_isNull_21" is not an rvalue
	at org.codehaus.janino.UnitCompiler.compileError(UnitCompiler.java:12021)
	at org.codehaus.janino.UnitCompiler.toRvalueOrCompileException(UnitCompiler.java:7575)
	at org.codehaus.janino.UnitCompiler.compileContext2(UnitCompiler.java:4377)
	at org.codehaus.janino.UnitCompiler.access$6700(UnitCompiler.java:226)
	at org.codehaus.janino.UnitCompiler$15$1.visitAmbiguousName(UnitCompiler.java:4326)
	at org.codehaus.janino.UnitCompiler$15$1.visitAmbiguousName(UnitCompiler.java:4323)
	at org.codehaus.janino.Java$AmbiguousName.accept(Java.java:4429)
	at org.codehaus.janino.UnitCompiler$15.visitLvalue(UnitCompiler.java:4323)
	at org.codehaus.janino.UnitCompiler$15.visitLvalue(UnitCompiler.java:4319)
	at org.codehaus.janino.Java$Lvalue.accept(Java.java:4353)
	at org.codehaus.janino.UnitCompiler.compileContext(UnitCompiler.java:4319)
	at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:3838)
	at org.codehaus.janino.UnitCompiler.access$6100(UnitCompiler.java:226)
	at org.codehaus.janino.UnitCompiler$13.visitAssignment(UnitCompiler.java:3799)
	at org.codehaus.janino.UnitCompiler$13.visitAssignment(UnitCompiler.java:3779)
	at org.codehaus.janino.Java$Assignment.accept(Java.java:4690)
	at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:3779)
	at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:2366)
	at org.codehaus.janino.UnitCompiler.access$1800(UnitCompiler.java:226)
	at org.codehaus.janino.UnitCompiler$6.visitExpressionStatement(UnitCompiler.java:1497)
	at org.codehaus.janino.UnitCompiler$6.visitExpressionStatement(UnitCompiler.java:1490)
	at org.codehaus.janino.Java$ExpressionStatement.accept(Java.java:3064)
	at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:1490)
	at org.codehaus.janino.UnitCompiler.compileStatements(UnitCompiler.java:1573)
	at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:1559)
	at org.codehaus.janino.UnitCompiler.access$1700(UnitCompiler.java:226)
	at org.codehaus.janino.UnitCompiler$6.visitBlock(UnitCompiler.java:1496)
	at org.codehaus.janino.UnitCompiler$6.visitBlock(UnitCompiler.java:1490)
	at org.codehaus.janino.Java$Block.accept(Java.java:2969)
	at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:1490)
	at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:2475)
	at org.codehaus.janino.UnitCompiler.access$1900(UnitCompiler.java:226)
	at org.codehaus.janino.UnitCompiler$6.visitIfStatement(UnitCompiler.java:1498)
	at org.codehaus.janino.UnitCompiler$6.visitIfStatement(UnitCompiler.java:1490)
	at org.codehaus.janino.Java$IfStatement.accept(Java.java:3140)
	at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:1490)
	at org.codehaus.janino.UnitCompiler.compileStatements(UnitCompiler.java:1573)
	at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:3420)
	at org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1362)
	at org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1335)
	at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:807)
	at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:975)
	at org.codehaus.janino.UnitCompiler.access$700(UnitCompiler.java:226)
	at org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:392)
	at org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:384)
	at org.codehaus.janino.Java$MemberClassDeclaration.accept(Java.java:1445)
	at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:384)
	at org.codehaus.janino.UnitCompiler.compileDeclaredMemberTypes(UnitCompiler.java:1312)
	at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:833)
	at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:410)
	at org.codehaus.janino.UnitCompiler.access$400(UnitCompiler.java:226)
	at org.codehaus.janino.UnitCompiler$2.visitPackageMemberClassDeclaration(UnitCompiler.java:389)
	at org.codehaus.janino.UnitCompiler$2.visitPackageMemberClassDeclaration(UnitCompiler.java:384)
	at org.codehaus.janino.Java$PackageMemberClassDeclaration.accept(Java.java:1594)
	at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:384)
	at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:362)
	at org.codehaus.janino.UnitCompiler.access$000(UnitCompiler.java:226)
	at org.codehaus.janino.UnitCompiler$1.visitCompilationUnit(UnitCompiler.java:336)
	at org.codehaus.janino.UnitCompiler$1.visitCompilationUnit(UnitCompiler.java:333)
	at org.codehaus.janino.Java$CompilationUnit.accept(Java.java:363)
	at org.codehaus.janino.UnitCompiler.compileUnit(UnitCompiler.java:333)
	at org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:235)
	at org.codehaus.janino.SimpleCompiler.compileToClassLoader(SimpleCompiler.java:464)
	at org.codehaus.janino.ClassBodyEvaluator.compileToClass(ClassBodyEvaluator.java:314)
	at org.codehaus.janino.ClassBodyEvaluator.cook(ClassBodyEvaluator.java:237)
	at org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:205)
	at org.codehaus.commons.compiler.Cookable.cook(Cookable.java:80)
	at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:1490)
	at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1587)
	at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1584)
	at org.sparkproject.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
	at org.sparkproject.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
	at org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
	at org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2257)
	at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4000)
	at org.sparkproject.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004)
	at org.sparkproject.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
	at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:1437)
	at org.apache.spark.sql.execution.WholeStageCodegenExec.liftedTree1$1(WholeStageCodegenExec.scala:726)
	at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:725)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:194)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:232)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:229)
	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:190)
	at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.inputRDD$lzycompute(ShuffleExchangeExec.scala:135)
	at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.inputRDD(ShuffleExchangeExec.scala:135)
	at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.mapOutputStatisticsFuture$lzycompute(ShuffleExchangeExec.scala:140)
	at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.mapOutputStatisticsFuture(ShuffleExchangeExec.scala:139)
	at org.apache.spark.sql.execution.exchange.ShuffleExchangeLike.$anonfun$submitShuffleJob$1(ShuffleExchangeExec.scala:68)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:232)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:229)
	at org.apache.spark.sql.execution.exchange.ShuffleExchangeLike.submitShuffleJob(ShuffleExchangeExec.scala:68)
	at org.apache.spark.sql.execution.exchange.ShuffleExchangeLike.submitShuffleJob$(ShuffleExchangeExec.scala:67)
	at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.submitShuffleJob(ShuffleExchangeExec.scala:115)
	at org.apache.spark.sql.execution.adaptive.ShuffleQueryStageExec.shuffleFuture$lzycompute(QueryStageExec.scala:174)
	at org.apache.spark.sql.execution.adaptive.ShuffleQueryStageExec.shuffleFuture(QueryStageExec.scala:174)
	at org.apache.spark.sql.execution.adaptive.ShuffleQueryStageExec.doMaterialize(QueryStageExec.scala:176)
	at org.apache.spark.sql.execution.adaptive.QueryStageExec.materialize(QueryStageExec.scala:82)
	at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.$anonfun$getFinalPhysicalPlan$5(AdaptiveSparkPlanExec.scala:258)
	at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.$anonfun$getFinalPhysicalPlan$5$adapted(AdaptiveSparkPlanExec.scala:256)
	at scala.collection.Iterator.foreach(Iterator.scala:943)
	at scala.collection.Iterator.foreach$(Iterator.scala:943)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
	at scala.collection.IterableLike.foreach(IterableLike.scala:74)
	at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
	at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
	at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.$anonfun$getFinalPhysicalPlan$1(AdaptiveSparkPlanExec.scala:256)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
	at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.getFinalPhysicalPlan(AdaptiveSparkPlanExec.scala:228)
	at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.withFinalPlanUpdate(AdaptiveSparkPlanExec.scala:367)
	at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.executeCollect(AdaptiveSparkPlanExec.scala:340)
	at org.apache.spark.sql.Dataset.collectFromPlan(Dataset.scala:3868)
	at org.apache.spark.sql.Dataset.$anonfun$collect$1(Dataset.scala:3120)
	at org.apache.spark.sql.Dataset.$anonfun$withAction$2(Dataset.scala:3858)
	at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:510)
	at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3856)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:109)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:169)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:95)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
	at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3856)
	at org.apache.spark.sql.Dataset.collect(Dataset.scala:3120)
	at com.amazon.deequ.analyzers.runners.AnalysisRunner$.liftedTree1$1(AnalysisRunner.scala:327)
	at com.amazon.deequ.analyzers.runners.AnalysisRunner$.runScanningAnalyzers(AnalysisRunner.scala:320)
	at com.amazon.deequ.analyzers.runners.AnalysisRunner$.doAnalysisRun(AnalysisRunner.scala:169)
	at com.amazon.deequ.analyzers.runners.AnalysisRunBuilder.run(AnalysisRunBuilder.scala:110)
	at com.amazon.deequ.profiles.ColumnProfiler$.profile(ColumnProfiler.scala:195)
	at com.amazon.deequ.profiles.ColumnProfilerRunner.run(ColumnProfilerRunner.scala:72)
	at com.amazon.deequ.profiles.ColumnProfilerRunBuilder.run(ColumnProfilerRunBuilder.scala:185)
	at DeequTest$.main(DeequTest.scala:25)
	at DeequTest.main(DeequTest.scala)
24/10/22 10:13:20 INFO CodeGenerator: 
/* 001 */ public Object generate(Object[] references) {
/* 002 */   return new GeneratedIteratorForCodegenStage1(references);
/* 003 */ }
/* 004 */
/* 005 */ // codegenStageId=1
/* 006 */ final class GeneratedIteratorForCodegenStage1 extends org.apache.spark.sql.execution.BufferedRowIterator {
/* 007 */   private Object[] references;
/* 008 */   private scala.collection.Iterator[] inputs;
/* 009 */   private boolean hashAgg_initAgg_0;
/* 010 */   private boolean hashAgg_bufIsNull_0;
/* 011 */   private double hashAgg_bufValue_0;
/* 012 */   private boolean hashAgg_bufIsNull_1;
/* 013 */   private double hashAgg_bufValue_1;
/* 014 */   private boolean hashAgg_bufIsNull_2;
/* 015 */   private long hashAgg_bufValue_2;
/* 016 */   private boolean hashAgg_bufIsNull_3;
/* 017 */   private long hashAgg_bufValue_3;
/* 018 */   private boolean hashAgg_bufIsNull_4;
/* 019 */   private double hashAgg_bufValue_4;
/* 020 */   private boolean hashAgg_bufIsNull_5;
/* 021 */   private double hashAgg_bufValue_5;
/* 022 */   private boolean hashAgg_bufIsNull_6;
/* 023 */   private double hashAgg_bufValue_6;
/* 024 */   private scala.collection.Iterator localtablescan_input_0;
/* 025 */   private boolean hashAgg_subExprValue_0;
/* 026 */   private double hashAgg_subExprValue_1;
/* 027 */   private boolean hashAgg_subExprIsNull_0;
/* 028 */   private boolean hashAgg_hashAgg_isNull_27_0;
/* 029 */   private boolean hashAgg_hashAgg_isNull_29_0;
/* 030 */   private boolean hashAgg_hashAgg_isNull_32_0;
/* 031 */   private org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter[] hashAgg_mutableStateArray_0 = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter[1];
/* 032 */
/* 033 */   public GeneratedIteratorForCodegenStage1(Object[] references) {
/* 034 */     this.references = references;
/* 035 */   }
/* 036 */
/* 037 */   public void init(int index, scala.collection.Iterator[] inputs) {
/* 038 */     partitionIndex = index;
/* 039 */     this.inputs = inputs;
/* 040 */
/* 041 */     localtablescan_input_0 = inputs[0];
/* 042 */     hashAgg_mutableStateArray_0[0] = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(7, 0);
/* 043 */
/* 044 */   }
/* 045 */
/* 046 */   private void hashAgg_doAggregateWithoutKey_0() throws java.io.IOException {
/* 047 */     // initialize aggregation buffer
/* 048 */     hashAgg_bufIsNull_0 = true;
/* 049 */     hashAgg_bufValue_0 = -1.0;
/* 050 */     hashAgg_bufIsNull_1 = true;
/* 051 */     hashAgg_bufValue_1 = -1.0;
/* 052 */     hashAgg_bufIsNull_2 = true;
/* 053 */     hashAgg_bufValue_2 = -1L;
/* 054 */     hashAgg_bufIsNull_3 = false;
/* 055 */     hashAgg_bufValue_3 = 0L;
/* 056 */     hashAgg_bufIsNull_4 = false;
/* 057 */     hashAgg_bufValue_4 = 0.0D;
/* 058 */     hashAgg_bufIsNull_5 = false;
/* 059 */     hashAgg_bufValue_5 = 0.0D;
/* 060 */     hashAgg_bufIsNull_6 = false;
/* 061 */     hashAgg_bufValue_6 = 0.0D;
/* 062 */
/* 063 */     while ( localtablescan_input_0.hasNext()) {
/* 064 */       InternalRow localtablescan_row_0 = (InternalRow) localtablescan_input_0.next();
/* 065 */       ((org.apache.spark.sql.execution.metric.SQLMetric) references[0] /* numOutputRows */).add(1);
/* 066 */       long localtablescan_value_0 = localtablescan_row_0.getLong(0);
/* 067 */
/* 068 */       hashAgg_doConsume_0(localtablescan_row_0, localtablescan_value_0);
/* 069 */       // shouldStop check is eliminated
/* 070 */     }
/* 071 */
/* 072 */   }
/* 073 */
/* 074 */   private void hashAgg_subExpr_1(long hashAgg_expr_0_0) {
/* 075 */     ArrayData hashAgg_arrayData_1 = ArrayData.allocateArrayData(
/* 076 */       -1, 2L, " createArray failed.");
/* 077 */
/* 078 */     hashAgg_arrayData_1.update(0, ((UTF8String) references[2] /* literal */));
/* 079 */
/* 080 */     boolean hashAgg_isNull_24 = false;
/* 081 */     UTF8String hashAgg_value_24 = null;
/* 082 */     if (!false) {
/* 083 */       hashAgg_value_24 = UTF8String.fromString(String.valueOf(hashAgg_expr_0_0));
/* 084 */     }
/* 085 */     hashAgg_arrayData_1.update(1, hashAgg_value_24);
/* 086 */
/* 087 */     UTF8String hashAgg_value_21 = null;
/* 088 */
/* 089 */     int hashAgg_elementAtIndex_1 = (int) 2;
/* 090 */     if (hashAgg_arrayData_1.numElements() < Math.abs(hashAgg_elementAtIndex_1)) {
/* 091 */       hashAgg_isNull_21 = true;
/* 092 */     } else {
/* 093 */       if (hashAgg_elementAtIndex_1 == 0) {
/* 094 */         throw QueryExecutionErrors.sqlArrayIndexNotStartAtOneError();
/* 095 */       } else if (hashAgg_elementAtIndex_1 > 0) {
/* 096 */         hashAgg_elementAtIndex_1--;
/* 097 */       } else {
/* 098 */         hashAgg_elementAtIndex_1 += hashAgg_arrayData_1.numElements();
/* 099 */       }
/* 100 */
/* 101 */       {
/* 102 */         hashAgg_value_21 = hashAgg_arrayData_1.getUTF8String(hashAgg_elementAtIndex_1);
/* 103 */       }
/* 104 */     }
/* 105 */     boolean hashAgg_isNull_20 = false;
/* 106 */     double hashAgg_value_20 = -1.0;
/* 107 */     if (!false) {
/* 108 */       final String hashAgg_doubleStr_1 = hashAgg_value_21.toString();
/* 109 */       try {
/* 110 */         hashAgg_value_20 = Double.valueOf(hashAgg_doubleStr_1);
/* 111 */       } catch (java.lang.NumberFormatException e) {
/* 112 */         final Double d = (Double) Cast.processFloatingPointSpecialLiterals(hashAgg_doubleStr_1, false);
/* 113 */         if (d == null) {
/* 114 */           hashAgg_isNull_20 = true;
/* 115 */         } else {
/* 116 */           hashAgg_value_20 = d.doubleValue();
/* 117 */         }
/* 118 */       }
/* 119 */     }
/* 120 */     hashAgg_subExprIsNull_0 = hashAgg_isNull_20;
/* 121 */     hashAgg_subExprValue_1 = hashAgg_value_20;
/* 122 */   }
/* 123 */
/* 124 */   private void hashAgg_doAggregate_max_0() throws java.io.IOException {
/* 125 */     hashAgg_hashAgg_isNull_29_0 = true;
/* 126 */     double hashAgg_value_29 = -1.0;
/* 127 */
/* 128 */     if (!hashAgg_bufIsNull_1 && (hashAgg_hashAgg_isNull_29_0 ||
/* 129 */         (org.apache.spark.sql.catalyst.util.SQLOrderingUtil.compareDoubles(hashAgg_bufValue_1, hashAgg_value_29)) > 0)) {
/* 130 */       hashAgg_hashAgg_isNull_29_0 = false;
/* 131 */       hashAgg_value_29 = hashAgg_bufValue_1;
/* 132 */     }
/* 133 */
/* 134 */     if (!hashAgg_subExprIsNull_0 && (hashAgg_hashAgg_isNull_29_0 ||
/* 135 */         (org.apache.spark.sql.catalyst.util.SQLOrderingUtil.compareDoubles(hashAgg_subExprValue_1, hashAgg_value_29)) > 0)) {
/* 136 */       hashAgg_hashAgg_isNull_29_0 = false;
/* 137 */       hashAgg_value_29 = hashAgg_subExprValue_1;
/* 138 */     }
/* 139 */
/* 140 */     hashAgg_bufIsNull_1 = hashAgg_hashAgg_isNull_29_0;
/* 141 */     hashAgg_bufValue_1 = hashAgg_value_29;
/* 142 */   }
/* 143 */
/* 144 */   private void hashAgg_doConsume_0(InternalRow localtablescan_row_0, long hashAgg_expr_0_0) throws java.io.IOException {
/* 145 */     // do aggregate
/* 146 */     // common sub-expressions
/* 147 */
/* 148 */     hashAgg_subExpr_1(hashAgg_expr_0_0);
/* 149 */
/* 150 */     hashAgg_subExpr_0(hashAgg_expr_0_0);
/* 151 */
/* 152 */     // evaluate aggregate functions and update aggregation buffers
/* 153 */     hashAgg_doAggregate_min_0();
/* 154 */     hashAgg_doAggregate_max_0();
/* 155 */     hashAgg_doAggregate_sum_0(hashAgg_expr_0_0);
/* 156 */     hashAgg_doAggregate_count_0();
/* 157 */     hashAgg_doAggregate_stateful_stddev_pop_0(hashAgg_expr_0_0);
/* 158 */
/* 159 */   }
/* 160 */
/* 161 */   private void hashAgg_doAggregate_stateful_stddev_pop_0(long hashAgg_expr_0_0) throws java.io.IOException {
/* 162 */     boolean hashAgg_isNull_39 = false;
/* 163 */     double hashAgg_value_39 = -1.0;
/* 164 */     if (!false && hashAgg_subExprValue_0) {
/* 165 */       hashAgg_isNull_39 = hashAgg_bufIsNull_4;
/* 166 */       hashAgg_value_39 = hashAgg_bufValue_4;
/* 167 */     } else {
/* 168 */       double hashAgg_value_41 = -1.0;
/* 169 */
/* 170 */       hashAgg_value_41 = hashAgg_bufValue_4 + 1.0D;
/* 171 */       hashAgg_isNull_39 = false;
/* 172 */       hashAgg_value_39 = hashAgg_value_41;
/* 173 */     }
/* 174 */     boolean hashAgg_isNull_44 = false;
/* 175 */     double hashAgg_value_44 = -1.0;
/* 176 */     if (!false && hashAgg_subExprValue_0) {
/* 177 */       hashAgg_isNull_44 = hashAgg_bufIsNull_5;
/* 178 */       hashAgg_value_44 = hashAgg_bufValue_5;
/* 179 */     } else {
/* 180 */       boolean hashAgg_isNull_46 = true;
/* 181 */       double hashAgg_value_46 = -1.0;
/* 182 */
/* 183 */       double hashAgg_value_53 = -1.0;
/* 184 */
/* 185 */       hashAgg_value_53 = hashAgg_bufValue_4 + 1.0D;
/* 186 */       boolean hashAgg_isNull_48 = false;
/* 187 */       double hashAgg_value_48 = -1.0;
/* 188 */       if (hashAgg_value_53 == 0) {
/* 189 */         hashAgg_isNull_48 = true;
/* 190 */       } else {
/* 191 */         boolean hashAgg_isNull_50 = false;
/* 192 */         double hashAgg_value_50 = -1.0;
/* 193 */         if (!false) {
/* 194 */           hashAgg_value_50 = (double) hashAgg_expr_0_0;
/* 195 */         }
/* 196 */
/* 197 */         double hashAgg_value_49 = -1.0;
/* 198 */
/* 199 */         hashAgg_value_49 = hashAgg_value_50 - hashAgg_bufValue_5;
/* 200 */
/* 201 */         hashAgg_value_48 = (double)(hashAgg_value_49 / hashAgg_value_53);
/* 202 */       }
/* 203 */       if (!hashAgg_isNull_48) {
/* 204 */         hashAgg_isNull_46 = false; // resultCode could change nullability.
/* 205 */
/* 206 */         hashAgg_value_46 = hashAgg_bufValue_5 + hashAgg_value_48;
/* 207 */
/* 208 */       }
/* 209 */       hashAgg_isNull_44 = hashAgg_isNull_46;
/* 210 */       hashAgg_value_44 = hashAgg_value_46;
/* 211 */     }
/* 212 */     boolean hashAgg_isNull_56 = false;
/* 213 */     double hashAgg_value_56 = -1.0;
/* 214 */     if (!false && hashAgg_subExprValue_0) {
/* 215 */       hashAgg_isNull_56 = hashAgg_bufIsNull_6;
/* 216 */       hashAgg_value_56 = hashAgg_bufValue_6;
/* 217 */     } else {
/* 218 */       boolean hashAgg_isNull_58 = true;
/* 219 */       double hashAgg_value_58 = -1.0;
/* 220 */
/* 221 */       boolean hashAgg_isNull_60 = true;
/* 222 */       double hashAgg_value_60 = -1.0;
/* 223 */       boolean hashAgg_isNull_62 = false;
/* 224 */       double hashAgg_value_62 = -1.0;
/* 225 */       if (!false) {
/* 226 */         hashAgg_value_62 = (double) hashAgg_expr_0_0;
/* 227 */       }
/* 228 */
/* 229 */       double hashAgg_value_61 = -1.0;
/* 230 */
/* 231 */       hashAgg_value_61 = hashAgg_value_62 - hashAgg_bufValue_5;
/* 232 */       boolean hashAgg_isNull_65 = true;
/* 233 */       double hashAgg_value_65 = -1.0;
/* 234 */       boolean hashAgg_isNull_67 = false;
/* 235 */       double hashAgg_value_67 = -1.0;
/* 236 */       if (!false) {
/* 237 */         hashAgg_value_67 = (double) hashAgg_expr_0_0;
/* 238 */       }
/* 239 */
/* 240 */       double hashAgg_value_66 = -1.0;
/* 241 */
/* 242 */       hashAgg_value_66 = hashAgg_value_67 - hashAgg_bufValue_5;
/* 243 */       double hashAgg_value_75 = -1.0;
/* 244 */
/* 245 */       hashAgg_value_75 = hashAgg_bufValue_4 + 1.0D;
/* 246 */       boolean hashAgg_isNull_70 = false;
/* 247 */       double hashAgg_value_70 = -1.0;
/* 248 */       if (hashAgg_value_75 == 0) {
/* 249 */         hashAgg_isNull_70 = true;
/* 250 */       } else {
/* 251 */         boolean hashAgg_isNull_72 = false;
/* 252 */         double hashAgg_value_72 = -1.0;
/* 253 */         if (!false) {
/* 254 */           hashAgg_value_72 = (double) hashAgg_expr_0_0;
/* 255 */         }
/* 256 */
/* 257 */         double hashAgg_value_71 = -1.0;
/* 258 */
/* 259 */         hashAgg_value_71 = hashAgg_value_72 - hashAgg_bufValue_5;
/* 260 */
/* 261 */         hashAgg_value_70 = (double)(hashAgg_value_71 / hashAgg_value_75);
/* 262 */       }
/* 263 */       if (!hashAgg_isNull_70) {
/* 264 */         hashAgg_isNull_65 = false; // resultCode could change nullability.
/* 265 */
/* 266 */         hashAgg_value_65 = hashAgg_value_66 - hashAgg_value_70;
/* 267 */
/* 268 */       }
/* 269 */       if (!hashAgg_isNull_65) {
/* 270 */         hashAgg_isNull_60 = false; // resultCode could change nullability.
/* 271 */
/* 272 */         hashAgg_value_60 = hashAgg_value_61 * hashAgg_value_65;
/* 273 */
/* 274 */       }
/* 275 */       if (!hashAgg_isNull_60) {
/* 276 */         hashAgg_isNull_58 = false; // resultCode could change nullability.
/* 277 */
/* 278 */         hashAgg_value_58 = hashAgg_bufValue_6 + hashAgg_value_60;
/* 279 */
/* 280 */       }
/* 281 */       hashAgg_isNull_56 = hashAgg_isNull_58;
/* 282 */       hashAgg_value_56 = hashAgg_value_58;
/* 283 */     }
/* 284 */
/* 285 */     hashAgg_bufIsNull_4 = hashAgg_isNull_39;
/* 286 */     hashAgg_bufValue_4 = hashAgg_value_39;
/* 287 */
/* 288 */     hashAgg_bufIsNull_5 = hashAgg_isNull_44;
/* 289 */     hashAgg_bufValue_5 = hashAgg_value_44;
/* 290 */
/* 291 */     hashAgg_bufIsNull_6 = hashAgg_isNull_56;
/* 292 */     hashAgg_bufValue_6 = hashAgg_value_56;
/* 293 */   }
/* 294 */
/* 295 */   private void hashAgg_subExpr_0(long hashAgg_expr_0_0) {
/* 296 */     boolean hashAgg_isNull_18 = false;
/* 297 */     double hashAgg_value_18 = -1.0;
/* 298 */     if (!false) {
/* 299 */       hashAgg_value_18 = (double) hashAgg_expr_0_0;
/* 300 */     }
/* 301 */
/* 302 */     hashAgg_subExprValue_0 = hashAgg_isNull_18;
/* 303 */   }
/* 304 */
/* 305 */   private void hashAgg_doAggregate_sum_0(long hashAgg_expr_0_0) throws java.io.IOException {
/* 306 */     hashAgg_hashAgg_isNull_32_0 = true;
/* 307 */     long hashAgg_value_32 = -1L;
/* 308 */     do {
/* 309 */       if (!hashAgg_bufIsNull_2) {
/* 310 */         hashAgg_hashAgg_isNull_32_0 = false;
/* 311 */         hashAgg_value_32 = hashAgg_bufValue_2;
/* 312 */         continue;
/* 313 */       }
/* 314 */
/* 315 */       if (!false) {
/* 316 */         hashAgg_hashAgg_isNull_32_0 = false;
/* 317 */         hashAgg_value_32 = 0L;
/* 318 */         continue;
/* 319 */       }
/* 320 */
/* 321 */     } while (false);
/* 322 */
/* 323 */     long hashAgg_value_31 = -1L;
/* 324 */
/* 325 */     hashAgg_value_31 = hashAgg_value_32 + hashAgg_expr_0_0;
/* 326 */
/* 327 */     hashAgg_bufIsNull_2 = false;
/* 328 */     hashAgg_bufValue_2 = hashAgg_value_31;
/* 329 */   }
/* 330 */
/* 331 */   private void hashAgg_doAggregate_count_0() throws java.io.IOException {
/* 332 */     long hashAgg_value_36 = -1L;
/* 333 */
/* 334 */     hashAgg_value_36 = hashAgg_bufValue_3 + 1L;
/* 335 */
/* 336 */     hashAgg_bufIsNull_3 = false;
/* 337 */     hashAgg_bufValue_3 = hashAgg_value_36;
/* 338 */   }
/* 339 */
/* 340 */   private void hashAgg_doAggregate_min_0() throws java.io.IOException {
/* 341 */     hashAgg_hashAgg_isNull_27_0 = true;
/* 342 */     double hashAgg_value_27 = -1.0;
/* 343 */
/* 344 */     if (!hashAgg_bufIsNull_0 && (hashAgg_hashAgg_isNull_27_0 ||
/* 345 */         (org.apache.spark.sql.catalyst.util.SQLOrderingUtil.compareDoubles(hashAgg_value_27, hashAgg_bufValue_0)) > 0)) {
/* 346 */       hashAgg_hashAgg_isNull_27_0 = false;
/* 347 */       hashAgg_value_27 = hashAgg_bufValue_0;
/* 348 */     }
/* 349 */
/* 350 */     if (!hashAgg_subExprIsNull_0 && (hashAgg_hashAgg_isNull_27_0 ||
/* 351 */         (org.apache.spark.sql.catalyst.util.SQLOrderingUtil.compareDoubles(hashAgg_value_27, hashAgg_subExprValue_1)) > 0)) {
/* 352 */       hashAgg_hashAgg_isNull_27_0 = false;
/* 353 */       hashAgg_value_27 = hashAgg_subExprValue_1;
/* 354 */     }
/* 355 */
/* 356 */     hashAgg_bufIsNull_0 = hashAgg_hashAgg_isNull_27_0;
/* 357 */     hashAgg_bufValue_0 = hashAgg_value_27;
/* 358 */   }
/* 359 */
/* 360 */   protected void processNext() throws java.io.IOException {
/* 361 */     while (!hashAgg_initAgg_0) {
/* 362 */       hashAgg_initAgg_0 = true;
/* 363 */
/* 364 */       long hashAgg_beforeAgg_0 = System.nanoTime();
/* 365 */       hashAgg_doAggregateWithoutKey_0();
/* 366 */       ((org.apache.spark.sql.execution.metric.SQLMetric) references[4] /* aggTime */).add((System.nanoTime() - hashAgg_beforeAgg_0) / 1000000);
/* 367 */
/* 368 */       // output the result
/* 369 */
/* 370 */       ((org.apache.spark.sql.execution.metric.SQLMetric) references[3] /* numOutputRows */).add(1);
/* 371 */       hashAgg_mutableStateArray_0[0].reset();
/* 372 */
/* 373 */       hashAgg_mutableStateArray_0[0].zeroOutNullBytes();
/* 374 */
/* 375 */       if (hashAgg_bufIsNull_0) {
/* 376 */         hashAgg_mutableStateArray_0[0].setNullAt(0);
/* 377 */       } else {
/* 378 */         hashAgg_mutableStateArray_0[0].write(0, hashAgg_bufValue_0);
/* 379 */       }
/* 380 */
/* 381 */       if (hashAgg_bufIsNull_1) {
/* 382 */         hashAgg_mutableStateArray_0[0].setNullAt(1);
/* 383 */       } else {
/* 384 */         hashAgg_mutableStateArray_0[0].write(1, hashAgg_bufValue_1);
/* 385 */       }
/* 386 */
/* 387 */       if (hashAgg_bufIsNull_2) {
/* 388 */         hashAgg_mutableStateArray_0[0].setNullAt(2);
/* 389 */       } else {
/* 390 */         hashAgg_mutableStateArray_0[0].write(2, hashAgg_bufValue_2);
/* 391 */       }
/* 392 */
/* 393 */       hashAgg_mutableStateArray_0[0].write(3, hashAgg_bufValue_3);
/* 394 */
/* 395 */       hashAgg_mutableStateArray_0[0].write(4, hashAgg_bufValue_4);
/* 396 */
/* 397 */       hashAgg_mutableStateArray_0[0].write(5, hashAgg_bufValue_5);
/* 398 */
/* 399 */       hashAgg_mutableStateArray_0[0].write(6, hashAgg_bufValue_6);
/* 400 */       append((hashAgg_mutableStateArray_0[0].getRow()));
/* 401 */     }
/* 402 */   }
/* 403 */
/* 404 */ }

To Reproduce
Create project with Deequ 2.0.7 dependecy and run below code:

import com.amazon.deequ.profiles.ColumnProfilerRunner
import org.apache.spark.sql.SparkSession

import java.sql.Date

object DeequTest {

  def main(args: Array[String]): Unit = {

    val spark: SparkSession = SparkSession.builder()
      .appName("data-quality")
      .master("local")
      .getOrCreate()

    import spark.implicits._

    val testData = Seq(
      TestEvent(),
      TestEvent(),
      TestEvent(),
    ).toDF()

    val profiles = ColumnProfilerRunner()
      .onData(testData)
      .run()

  }


}

case class TestEvent(
                            evenId: String = "bc60b4ca-e331-11ed-b5ea-0242ac120002",
                            size: Int = 10,
                            createdDate: Date = Date.valueOf("2023-04-24")
                          )

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here.

@pawelpinkos pawelpinkos added the bug Something isn't working label Oct 22, 2024
@pawelpinkos
Copy link
Author

pawelpinkos commented Oct 22, 2024

@rdsharma26 - could you please take a look at this? Probably your change is root cause of this. Thanks a lot!

@rdsharma26
Copy link
Contributor

Thanks @pawelpinkos for bringing this to our attention. The details are extremely helpful. We will investigate this and get back to you soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants