Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ad-hoc or scheduled mutation based testing #14589

Open
edmondop opened this issue Feb 10, 2025 · 3 comments
Open

Ad-hoc or scheduled mutation based testing #14589

edmondop opened this issue Feb 10, 2025 · 3 comments

Comments

@edmondop
Copy link
Contributor

As described in #13661 we want to improve our pre-release testing. I propose we adopt mutation based testing using cargo mutants

TLDR: mutation based testing try to identify valid mutants of your code and send them through your unit tests. If your unit tests are good, mutants should be "killed" by a test failure. If you have insufficient assertions a mutant can "survive"

See this example of running mutation based testing

@alamb
Copy link
Contributor

alamb commented Feb 11, 2025

Hi @edmondop -- this sounds cool.

I don't fully understand what you have in mind for mutation testing -- it seems like mutation testing is designed to find holes in our test coverage (a good goal). However, I am worried about adding an automatic test like unless someone has the bandwidth to monitor and fix issues it finds.

We are already struggling to keep extended test running cleanly, for example

I worry if we added automatic mutation testing the results would simply be ignored -- in which case adding the tests doesn't add much value

Perhaps you can try to run mutation tests manually, look at the results, and see if they uncover anything good

@edmondop
Copy link
Contributor Author

@alamb they did, although they went out of space! If you click on the "this" hyperlink in the text of the issue, you get here https://github.com/edmondop/arrow-datafusion/actions/runs/13248520757/job/36980712855

before it run out of memory, it output

MISSED   datafusion/functions/src/math/log.rs:222:61: replace == with != in <impl ScalarUDFImpl for LogFunc>::simplify in 84.2s build + 6.9s test
MISSED   datafusion/functions/src/lib.rs:179:5: replace register_all -> Result<()> with Ok(()) in 18.1s build + 5.7s test
MISSED   datafusion/physical-optimizer/src/join_selection.rs:227:9: replace || with && in try_collect_left in 77.1s build + 0.4s test
MISSED   datafusion/physical-plan/src/aggregates/mod.rs:1278:27: replace <= with > in group_id_array in 90.6s build + 41.3s test
MISSED   datafusion/physical-optimizer/src/limit_pushdown.rs:77:9: replace <impl PhysicalOptimizerRule for LimitPushdown>::name -> &str with "" in 14.8s build + 0.3s test
MISSED   datafusion/functions-aggregate/src/array_agg.rs:383:13: replace - with + in <impl Accumulator for DistinctArrayAggAccumulator>::size in 70.4s build + 1.7s test
MISSED   datafusion/functions-aggregate/src/sum.rs:469:9: replace <impl Accumulator for DistinctSumAccumulator<T>>::size -> usize with 1 in 35.2s build + 1.6s test
MISSED   datafusion/functions-nested/src/concat.rs:292:35: delete ! in <impl ScalarUDFImpl for ArrayConcat>::return_type in 40.2s build + 5.2s test
MISSED   datafusion/proto-common/src/to_proto/mod.rs:247:9: replace <impl TryFrom for protobuf::Schema>::try_from -> Result<Self, Self::Error> with Ok(Default::default()) in 14.9s build + 2.7s test
MISSED   datafusion/optimizer/src/eliminate_duplicated_expr.rs:45:9: replace <impl PartialEq for SortExprWrapper>::eq -> bool with true in 46.4s build + 11.5s test
MISSED   datafusion/functions-aggregate/src/string_agg.rs:176:9: replace <impl Accumulator for StringAggAccumulator>::size -> usize with 0 in 41.0s build + 1.6s test
MISSED   datafusion/physical-plan/src/aggregates/group_values/single_group_by/bytes.rs:77:9: replace <impl GroupValues for GroupValuesByes<O>>::size -> usize with 0 in 23.8s build + 39.4s test
MISSED   datafusion/catalog/src/memory.rs:166:9: replace <impl SchemaProvider for MemorySchemaProvider>::table_names -> Vec<String> with vec!["xyzzy".into()] in 25.3s build + 0.6s test
MISSED   datafusion/expr-common/src/type_coercion/aggregates.rs:276:5: replace is_covariance_support_arg_type -> bool with false in 8.7s build + 5.7s test
MISSED   datafusion/physical-expr/src/window/window_expr.rs:294:20: delete ! in AggregateWindowExpr::get_result_column in 38.3s build + 9.0s test
TIMEOUT  datafusion/physical-plan/src/aggregates/row_hash.rs:1059:9: replace GroupedHashAggregateStream::update_merged_stream -> Result<()> with Ok(()) in 53.6s build + 300.0s test
MISSED   datafusion/physical-expr-common/src/sort_expr.rs:429:9: replace LexOrdering::from_lex_requirement -> LexOrdering with Default::default() in 7.7s build + 10.2s test
MISSED   datafusion/sql/src/statement.rs:1607:69: replace - with / in SqlToRel<'_, S>::show_variable_to_plan in 45.2s build + 9.4s test

Some of these are not not to be worried about, honestly. For example

impl UDF for MyFunc {
      fn func_name(self) -> String {
           // I will never test this, so mutating this code to return "bar" will make the mutant survive 
            return "foo".to_string()
      }
}

Cargo mutants provide different ways of skipping mutations. You can put an attribute you can put on "stuff that you don't want to be mutate".

impl UDF for MyFunc {
      #[cfg_attr(test, mutants::skip)]
      fn func_name(self) -> String {
           // I will never test this, so mutating this code to return "bar" will make the mutant survive 
            return "foo".to_string()
      }
}

or using exclusions (see mutants.toml)

exclude_re = [" <impl Accumulator for .*>::size "]

@2010YOUY01
Copy link
Contributor

@alamb they did, although they went out of space! If you click on the "this" hyperlink in the text of the issue, you get here https://github.com/edmondop/arrow-datafusion/actions/runs/13248520757/job/36980712855

before it run out of memory, it output

Looks like it's running out of disk space due to lots of recompilations
The default CI runner's disk budget is < 20GB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants