Table Generation Guaranteed Values #30

stanbrub · 2023-02-08T20:06:19Z

During table generation for benchmark tests, random is often used as a lazy way to provide a non-sequential distribution of data. The problem is that some tests require looking up generated values which may or may not be present. Consider the following snippet...

result = source.partition_by(['column3']).get_constituent(['random1'])

"random1" is a "column3" column value that is randomly generated. Depending on the scale selected, there is no guarantee that "random1" will exist as a value in the "column3" column.

Possible solutions:

Replace random() on table generation with a random that always injects the first value in the defined range, then does random from then on
Don't do random on columns at all. Do incremental data with overlapping ranges (ex. col1=[1-100], col2=[1-101] then shuffle the rows)

stanbrub added the enhancement New feature or request label Feb 8, 2023

stanbrub mentioned this issue Sep 24, 2024

PartitionedBy Benchmarks #341

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Table Generation Guaranteed Values #30

Table Generation Guaranteed Values #30

stanbrub commented Feb 8, 2023 •

edited

Loading

Table Generation Guaranteed Values #30

Table Generation Guaranteed Values #30

Comments

stanbrub commented Feb 8, 2023 • edited Loading

stanbrub commented Feb 8, 2023 •

edited

Loading