Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace random test values by discrete ones #23956

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

ges1227
Copy link

@ges1227 ges1227 commented Nov 6, 2024

Description

Adds TestMode.java to parameterize tests with an arbitrary value and an upper-/lower-bounded value.

Used upper and lower bounds of the previously used Random.getInt()/Random.getLong() functions. Although before the lower bound with Random.getInt()/Random.getLong() was 0 it was lowered to the most negative values for the datatype where possible.

com/facebook/presto/parquet/batchreader/decoders/TestParquetUtils.java the valueString in line112 was randomly chosen. This might deserve more attention on code review.

Motivation and Context

The use of Random() function in the parquet.batchreader.decoders tests may cause flakiness.
Resolves: #23840

Impact

Developer level: Reduced flakiness in test.

Test Plan

CI should run.

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== NO RELEASE NOTE ==

@ges1227 ges1227 requested review from shangxinli and a team as code owners November 6, 2024 04:46
Copy link

linux-foundation-easycla bot commented Nov 6, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

@tdcmeehan tdcmeehan self-assigned this Nov 6, 2024
*/
package com.facebook.presto.parquet.batchreader.decoders;

public enum TestMode
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of embedding the concept of TestMode into the data generator, I'm just wondering if we should have method overloads for each test mode. For example, instead of one method generatePlainValuesPage that takes in TestMode, we have three lightweight methods generatePlainValuesPageUpperBound etc. that generate the test data in different ways. They could delegate to a common inner method which does most of the work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to going with the object-oriented approach. I think just having a superclass with the main logic and maybe delegating the mode-specific details to an abstract method implementation would be better than having to pass in the TestMode.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per now I committed an idea integrating both of your suggestions. The TestMode is removed. It got replaced by three lightweight methods implemented in three subclasses that refer to the TestParquetUtils, where still the main part is handled. Would be curious to know your thoughts, because I still have the feeling it might be not as clean as it can be.

Copy link
Contributor

@ZacBlanco ZacBlanco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with Tim, otherwise looks good

*/
package com.facebook.presto.parquet.batchreader.decoders;

public enum TestMode
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to going with the object-oriented approach. I think just having a superclass with the main logic and maybe delegating the mode-specific details to an abstract method implementation would be better than having to pass in the TestMode.

@ges1227
Copy link
Author

ges1227 commented Nov 17, 2024

Thanks for the comments you two 👍

FYI: I’ll be able to access my notebook next week again. So you’re aware I’m not ignoring it.

The use of Random() function in the parquet.batchreader.decoders tests may cause flakiness.

Adds TestMode.java to parameterize tests with an arbitrary value and an upper-/lower-bounded value.

Resolves: prestodb#23840
@ges1227 ges1227 force-pushed the 23840-deterministic-decoder-test-data branch from 2046ca0 to 8cc6878 Compare November 24, 2024 18:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[parquet] Use deterministic data in parquet.batchreader.decoders.TestValuesDecoders
3 participants