-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/100 0 n columns #111
Conversation
JaCoCo agent module code coverage report - spark:2 - scala 2.12.12
|
JaCoCo server module code coverage report - scala 2.12.12
|
measureName: String, | ||
resultValueType: ResultValueType.ResultValueType | ||
) extends Measure { | ||
private val columnExpression = countDistinct(col(controlCols.head), controlCols.tail.map(col): _*) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not sure if this is what we want. I thought that it's about 'distinct -> count' operation, performed on a single column. Do you remember what was used in the old Atum & what can our users require @benedeki ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the suggested code it can be performed on single column or multiple columns. If this particular measure is expected to work differently please let me know. @benedeki
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the universality of supporting multi-column uniqueness 👍
Why not to repoint the PR then? It would be easier to do the review |
Good point, will change the target branch to be the feature/99-custom-measure-2. |
# Conflicts: # agent/src/main/scala/za/co/absa/atum/agent/model/Measurement.scala # agent/src/main/scala/za/co/absa/atum/agent/model/MeasuresBuilder.scala
agent/src/main/scala/za/co/absa/atum/agent/model/Measurement.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I finished the review, I like the changes. There is only one tiny minor thing I found, and 1 thing regarding distinct count functionality. Otherwise LGTM and I will be happy to approve once these are addressed.
…cala Co-authored-by: Ladislav Sulak <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To consider:
I like class hierarchies. What you think of having 3 abstract classes
MeasureWithoutColumn
MeasureOneColumn
MeasureMultipleColumns
measureName: String, | ||
resultValueType: ResultValueType.ResultValueType | ||
) extends Measure { | ||
private val columnExpression = countDistinct(col(controlCols.head), controlCols.tail.map(col): _*) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the universality of supporting multi-column uniqueness 👍
def apply(controlCol: String): AbsSumOfValuesOfColumn = AbsSumOfValuesOfColumn(measureName, controlCol) | ||
} | ||
|
||
case class SumOfHashesOfColumn private (measureName: String, controlCol: String) extends AtumMeasure { | ||
|
||
private val columnExpression: Column = sum(crc32(col(controlCol).cast("String"))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this function also support a list of columns, concatenated.
Extra bonus is *
for all columns. 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- pulled
- built
- test run
- code coverage - required merge of conflicting files, there were changes in tests
- in current test implementation I see not covered paths
I will continue in review after merge of conflict files.
@@ -28,15 +28,15 @@ class MeasurementTest extends AnyFlatSpec with Matchers with SparkTestBase { sel | |||
|
|||
"MeasurementProvided" should "be able to be converted to MeasurementProvided object when the result is Double" in { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test speak about Double, but works with Big Decimals
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's ok.
Just decide uf you want to add the hash function on multiple columns to add here or separate ticket+PR.
Oh and there's Miroslav's comment on one of the tests.
private def handleAggregationResult(dataType: DataType, result: Any): String = { | ||
val aggregatedValue = dataType match { | ||
case _: LongType => | ||
if (result == null) 0 else result | ||
case _: StringType => | ||
val value = | ||
if (collected == null) new java.math.BigDecimal(0) | ||
else collected.asInstanceOf[java.math.BigDecimal] | ||
if (result == null) new java.math.BigDecimal(0) | ||
else result.asInstanceOf[java.math.BigDecimal] | ||
value.stripTrailingZeros // removes trailing zeros (2001.500000 -> 2001.5, but can introduce scientific notation (600.000 -> 6E+2) | ||
.toPlainString // converts to normal string (6E+2 -> "600") | ||
case _ => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the record, this we might want to revisit and rethink. But not part of this PR. 😉
I will re-point this PR to master instead of feature/99-custom-measure-2. |
JaCoCo agent module code coverage report - spark:2 - scala 2.12.18
|
JaCoCo server module code coverage report - scala 2.12.18
|
Closes #100
Introduces support for 0-n columns (depends on individual measure).