Fix BlockVector init bug #75

Heinrich-BR · 2025-01-06T11:15:32Z

This PR fixes a small bug which can crash runs unpredictably.

In ProblemOperatorInterface::Init, the BlockVector of a given variable has its size and offsets updated, with the relevant amount of memory allocated for the object. This is done by MFEM's Vector::SetSize which calls upon Memory<T>::New. The latter function allocates memory without initialising it, meaning any memory remains already existing in that space allocated for it will stay there. The BlockVector will then interpret those values as elements of its array. In principle this shouldn't be an issue because before the run begins properly these values will get initialised to their proper values or they will host the result of some calculation step. However, MFEM's NewtonSolver::Mult has a check that the BlockVector is finite before it gets initialised. Thus, if some memory remains in those spaces and any element gets interpreted by BlockVector as an inf or nan, this will fail MFEM's finiteness check and crash the run. Because the content of newly allocated memory spaces is effectively unpredictable, this crash may happen at any time. All of this can be prevented by simply initialising new BlockVectors with finite values. Here we choose zero.

codecov · 2025-01-06T11:22:11Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Files with missing lines	Coverage Δ
src/problem_operators/problem_operator_interface.C	`65.00% <100.00%> (+1.84%)`	⬆️

k-collie · 2025-01-08T11:33:14Z

However, MFEM's NewtonSolver::Mult has a check that the BlockVector is finite before it gets initialised.

Assuming leaving memory uninitialised is a choice to improve performance wouldn't this be a bug in MFEM?

Heinrich-BR · 2025-01-08T11:54:37Z

Assuming leaving memory uninitialised is a choice to improve performance wouldn't this be a bug in MFEM?

Potentially, yeah. Maybe they deemed that the circumstances under which this crash happens are rare enough that it doesn't compensate the performance cost of initialising vector elements. Indeed, this is the first time I've come across this issue and in my case it only happens in the AMD GPU build, while running on the host, for the specific case of the MOOSE variable transfer test, when we have more than one variable in the config file.

I'll check what happens with those memory spaces in other builds, but given no one seems to have raised this issue in the MFEM repo (as far as I know), I doubt they'd consider it a worthwhile bug to sacrifice performance for. Let's see what @alexanderianblair thinks about it.

Fix BlockVector init bug

1ebf876

Heinrich-BR requested a review from alexanderianblair January 6, 2025 12:44

Heinrich-BR self-assigned this Jan 6, 2025

Heinrich-BR added the bug Something isn't working label Jan 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix BlockVector init bug #75

Fix BlockVector init bug #75

Heinrich-BR commented Jan 6, 2025 •

edited

Loading

codecov bot commented Jan 6, 2025 •

edited

Loading

k-collie commented Jan 8, 2025

Heinrich-BR commented Jan 8, 2025

Fix BlockVector init bug #75

Are you sure you want to change the base?

Fix BlockVector init bug #75

Conversation

Heinrich-BR commented Jan 6, 2025 • edited Loading

codecov bot commented Jan 6, 2025 • edited Loading

Codecov Report

k-collie commented Jan 8, 2025

Heinrich-BR commented Jan 8, 2025

Heinrich-BR commented Jan 6, 2025 •

edited

Loading

codecov bot commented Jan 6, 2025 •

edited

Loading