You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think we should either have all the engines infer the column data types or all the engines specify the column data types for a better comparison. It's not apples:apples if some engines are using int32 and others are using int64.
The text was updated successfully, but these errors were encountered:
I agree that all engines should attempt to use the same types.
It's important to note, however, that some of the aggregations have answers that overflow to int64, while all inputs are int32. I think polars had this issue somewhere.
Also, I only have a limited time to work on this benchmark, and it is mostly for maintenance and updating solutions. I don't have much time to go through every solution to ensure the setup for each system is the exact same. I am happy to review PRs if the come up.
Here's a snippet from the Polars groupby benchmarks:
Looks like
id4
,id5
,id6
andv1
are using Int32 columns.Other engines, like Spark, are just inferring the column types:
I think we should either have all the engines infer the column data types or all the engines specify the column data types for a better comparison. It's not apples:apples if some engines are using int32 and others are using int64.
The text was updated successfully, but these errors were encountered: