You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Beginning with version 2.0.7 of Deequ (all spark releases) there is a bug in library witch happend failing of catalyst codegen in spark. The exception is catched so this do not fail runtime, you can observe the issue in the logs (eg. try to run MaximumTest from Deequ tests and see the log).
I have investigated and in my opinion the root cause of issue is the change: 34d8f3a
Error is throw when AnalisisRunner call dataframe.agg() here depending of provided parameters. Eg. before deequ 2.0.7 (for the example provided in "To Reproduce" section) the parameteres were:
min(CAST(size AS DOUBLE))
max(CAST(size AS DOUBLE))
CAST(sum(size) AS DOUBLE)
CAST(count(size) AS BIGINT)
stateful_stddev_pop(size)
CAST(sum(size) AS DOUBLE)
And there was no error. For deequ 2.0.7 the parameters are:
min(CAST(element_at(array(InScopeData AS source, size AS selection), 2) AS DOUBLE))
max(CAST(element_at(array(InScopeData AS source, size AS selection), 2) AS DOUBLE))
CAST(sum(size) AS DOUBLE)
CAST(count(size) AS BIGINT)
stateful_stddev_pop(size)
CAST(sum(size) AS DOUBLE)
And the error is thrown.
This is cause of a lot of errors in logs of application witch use Deequ. I have tried to bump deequ in my project to 2.0.7 but beacuse of this I have to postpone this action.
Describe the bug
Beginning with version 2.0.7 of Deequ (all spark releases) there is a bug in library witch happend failing of catalyst codegen in spark. The exception is catched so this do not fail runtime, you can observe the issue in the logs (eg. try to run MaximumTest from Deequ tests and see the log).
I have investigated and in my opinion the root cause of issue is the change: 34d8f3a
Error is throw when AnalisisRunner call dataframe.agg() here depending of provided parameters. Eg. before deequ 2.0.7 (for the example provided in "To Reproduce" section) the parameteres were:
And there was no error. For deequ 2.0.7 the parameters are:
And the error is thrown.
This is cause of a lot of errors in logs of application witch use Deequ. I have tried to bump deequ in my project to 2.0.7 but beacuse of this I have to postpone this action.
To Reproduce
Create project with Deequ 2.0.7 dependecy and run below code:
Expected behavior
A clear and concise description of what you expected to happen.
Screenshots
If applicable, add screenshots to help explain your problem.
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: