Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add Date32/Date64 in aggregate fuzz testing #13041

Merged
merged 5 commits into from
Oct 26, 2024

Conversation

LeslieKid
Copy link
Contributor

@LeslieKid LeslieKid commented Oct 21, 2024

Which issue does this PR close?

Part of #12114 .

Rationale for this change

Supporting more types for dataset generator in fuzzer framework is needed according to this comment.

What changes are included in this PR?

  1. Refactor the PrimitiveArrayGenerator to make it easier to introduce new primitive types.
  2. Add coverage of Date32/Date64 types.

Are these changes tested?

Are there any user-facing changes?

No.

@github-actions github-actions bot added the core Core DataFusion crate label Oct 21, 2024
@alamb
Copy link
Contributor

alamb commented Oct 21, 2024

Thanks @LeslieKid -- I plan to review this tomorrow

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good @LeslieKid -- I think this PR is ready to be rebased on main and up for review

@@ -392,6 +392,26 @@ impl RecordBatchGenerator {
Float64Type
)
}
DataType::Date32 => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

Comment on lines 66 to 72
unsafe {
std::ptr::read(&adjusted_value as *const i64 as *const N)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to use unsafe -- perhaps adjusted_value.try_into() would work....

Copy link
Contributor Author

@LeslieKid LeslieKid Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tried to use try_into().
But maybe it requires N: TryFrom<i64>? This would limit other primitive types like f32 in PrimitiveArrayGenerator.

And because of the A: ArrowPrimitiveType<Native = N> and A::DATA_TYPE is Date64, I think it can use unsafe here as an alternative.

@LeslieKid LeslieKid changed the title feat: support Date/Decimal/... types for dataset generator in fuzz testing feat: support Date32/Date64 types for dataset generator in fuzz testing Oct 23, 2024
@LeslieKid LeslieKid marked this pull request as ready for review October 23, 2024 05:06
alamb
alamb previously approved these changes Oct 23, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you @LeslieKid -- this is looking close. If we can just figure out how to remove the unsafe it will be good to go

let date_value = self.rng.gen_range(i64::MIN..=i64::MAX);
let millis_per_day: i64 = 86_400_000;
let adjusted_value = date_value - (date_value % millis_per_day);
// SAFETY: here we can convert i64 to A::Native safely since we determine that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this unsafe block is only thing I am worried about.

I wonder if you could instead use

                        let date_value = self.rng.gen_range(i64::MIN..=i64::MAX);
                        let millis_per_day  = 86_400_000;
                        let adjusted_value = date_value - (date_value % millis_per_day);

And then return adjusted value 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @alamb

We really need to avoid using unsafe.
I introduce a trait FromNative and remove the unsafe block to solve this problem.

@alamb alamb dismissed their stale review October 23, 2024 23:38

clicked wrong button

@alamb alamb changed the title feat: support Date32/Date64 types for dataset generator in fuzz testing feat: Test Date32/Date64 in aggregate fuzz testing Oct 25, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @LeslieKid -- this is great ❤️

I also verified we have added additional coverage by running

$ cargo llvm-cov --html test --test fuzz aggregate_fuzz

And then looking at the report.

Are you interested in helping expand out the report (e.g. Timestamp and Time and Decimal type support?

@alamb alamb changed the title feat: Test Date32/Date64 in aggregate fuzz testing feat: Add Date32/Date64 in aggregate fuzz testing Oct 25, 2024
@LeslieKid
Copy link
Contributor Author

LeslieKid commented Oct 26, 2024

Are you interested in helping expand out the report (e.g. Timestamp and Time and Decimal type support?

OK! I'd love to take this. I will work on it in the next few days. @alamb

@alamb alamb merged commit 73cfa6c into apache:main Oct 26, 2024
26 checks passed
@alamb
Copy link
Contributor

alamb commented Oct 26, 2024

Are you interested in helping expand out the report (e.g. Timestamp and Time and Decimal type support?

OK! I'd love to take this. I will work on it in the next few days. @alamb

Thanks again @LeslieKid 🙏 🚀

@LeslieKid LeslieKid deleted the data-generator branch October 28, 2024 06:59
@LeslieKid LeslieKid mentioned this pull request Nov 6, 2024
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants