-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add GroupColumn Decimal128Array
#13564
Changes from 6 commits
8da0bc3
dcc488d
a707d12
ea6f77a
2101112
403cfa7
afd6e2c
079942c
76ec4f9
8e7159e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -22,6 +22,7 @@ use arrow_array::cast::AsArray; | |
use arrow_array::{Array, ArrayRef, ArrowPrimitiveType, PrimitiveArray}; | ||
use arrow_schema::DataType; | ||
use datafusion_execution::memory_pool::proxy::VecAllocExt; | ||
use datafusion_physical_expr::aggregate::utils::adjust_output_array; | ||
use itertools::izip; | ||
use std::iter; | ||
use std::sync::Arc; | ||
|
@@ -190,9 +191,13 @@ impl<T: ArrowPrimitiveType, const NULLABLE: bool> GroupColumn | |
assert!(nulls.is_none(), "unexpected nulls in non nullable input"); | ||
} | ||
|
||
let arr = PrimitiveArray::<T>::new(ScalarBuffer::from(group_values), nulls); | ||
let arr = PrimitiveArray::<T>::new(ScalarBuffer::from(group_values), nulls) | ||
.with_data_type(data_type.clone()); | ||
let array_ref = Arc::new(arr) as ArrayRef; | ||
|
||
// Set timezone information for timestamp | ||
Arc::new(arr.with_data_type(data_type)) | ||
adjust_output_array(&data_type, array_ref) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we need There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The precision and scale aren't kept in the generic when constructing the buffer, so i think i need t keep with_data_type There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is kept in pub fn adjust_output_array(data_type: &DataType, array: ArrayRef) -> Result<ArrayRef> {
let array = match data_type {
DataType::Decimal128(p, s) => Arc::new(
array
.as_primitive::<Decimal128Type>()
.clone()
.with_precision_and_scale(*p, *s)?,
) as ArrayRef, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah that is what I thought as well but in ea6f77a it fails due to precision/scale errors due to the original array not having There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The error you mentioned can be fixed by adding support for Decimal256 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I tried backing out the use of adjust_data_type and I didn't see any failures locally 🤔 |
||
.expect("Failed to adjust array data type") | ||
} | ||
|
||
fn take_n(&mut self, n: usize) -> ArrayRef { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something is wrong here. This change effectively turns off fuzz testing for sum with decimal:
When I reverted this change the fuzz tests fail occasionally like this:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my perspective other than this change, this PR is ready to go.
Thank you @jonathanc-n
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm getting this, should I still remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The real issue is that we have decimal(38, 10) which is the fixed precision for
sum
and it mismatches with the fuzz test which has random precisionThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Decimal128(38, 3) for normal precision, Decimal128(30, 3) for grouping. Not sure why there is mismatch in fuzz test. We should either align the precision for both cases or fix the fuzz schema check if they are not necessary to have the same precision like slt