Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-determinism in primal solution and derivatives #305

Open
bernardopacini opened this issue Mar 22, 2024 · 0 comments
Open

Non-determinism in primal solution and derivatives #305

bernardopacini opened this issue Mar 22, 2024 · 0 comments

Comments

@bernardopacini
Copy link

bernardopacini commented Mar 22, 2024

When running a coupled aerostructural derivative check with DAFoam and MELD through MPhys, we noticed that identical runs of the same script / case led to different analytic derivatives (ignore the FD check, that is due to generally poor, but consistent convergence from DAFoam):

Run 1: 
Full Model: 'maneuver.struct_post.eval_funcs.ks_vmfailure' wrt 'dvs.sweep_wing'
     Reverse Magnitude: 1.176269e-03
          Fd Magnitude: 6.548034e-04 (fd:forward)

Run 2:
Full Model: 'maneuver.struct_post.eval_funcs.ks_vmfailure' wrt 'dvs.sweep_wing'
     Reverse Magnitude: 1.527612e-03
          Fd Magnitude: 6.548034e-04 (fd:forward)

We were not sure where this came from and tried to check the aero solver, load and displacement transfer, and structural solver. In the primal solve, we found slight differences between the results only after the first pass of the structural solution. We started printing out the norm of the flattened values passing through MELD. The first pass through the load transfer is consistent between runs:

Run 1: 
MELD Primal Load Transfer x_struct0 24702.433470988755
MELD Primal Load Transfer x_aero0 23658.198184194116
MELD Primal Load Transfer u_struct 0.0
MELD Primal Load Transfer f_aero 8100.573476032763
MELD Primal Load Transfer f_struct 1700.082296927264

Run 2:
MELD Primal Load Transfer x_struct0 24702.433470988755
MELD Primal Load Transfer x_aero0 23658.198184194116
MELD Primal Load Transfer u_struct 0.0
MELD Primal Load Transfer f_aero 8100.573476032763
MELD Primal Load Transfer f_struct 1700.082296927264

But, the pass back through the displacement transfer receives a different vector of displacements, leading to a slightly different converged aerostructural solution:

Run 1: 
MELD Primal Disp Transfer x_struct0 24702.433470988755
MELD Primal Disp Transfer x_aero0 23658.198184194116
MELD Primal Disp Transfer u_struct 47.00897156793725
MELD Primal Disp Transfer u_aero 33.24244380757927

NL: NLBGS 5 ; 219.13182 1.42895947e-06 

Run 2:
MELD Primal Disp Transfer x_struct0 24702.433470988755
MELD Primal Disp Transfer x_aero0 23658.198184194116
MELD Primal Disp Transfer u_struct 47.00897157165467
MELD Primal Disp Transfer u_aero 33.24244381025362

NL: NLBGS 5 ; 219.131819 1.42895946e-06

We then started looking at simpler cases and found that TACS-only primal solves are deterministic up to 8 digits, especially when running in parallel. @A-CGray and I reproduced this on his computer as well, and found that typically 3+ procs were problematic, but serial cases run with python <scripy>.py were also nondeterministic on one day, but fine the next.

I tried out several of the examples to see which were deterministic, and @A-CGray scripted this on his computer as well using the following:

randVec = problem.assembler.createVec()
randVec.setRand()

for _ in range(20):
    problem.setVariables(randVec)
    # Solve state
    problem.getResidual(problem.res)
    initResNorm = problem.res.norm()
    problem.solve()
    uNorm = problem.u.norm()
    if comm.rank==0:
        print(f"{problem.name:>10s}: {uNorm=:.16e}, {initResNorm=:.16e}")

The test was to run the solver repeatedly with the same starting point and see if it was consistent. This led to the following results:

Output from slot_3d example


[9] TACSBlockCyclicMat: (6423,6423) Initial density: 0.832 factor fill in: 0.119
load_set_000: uNorm=3.1102406253022369e-03, initResNorm=4.5878651244647644e+10
load_set_000: uNorm=3.1102406123293758e-03, initResNorm=4.5878651244647644e+10
load_set_000: uNorm=3.1102406123293758e-03, initResNorm=4.5878651244647644e+10
load_set_000: uNorm=3.1102405960886136e-03, initResNorm=4.5878651244647644e+10
load_set_000: uNorm=3.1102406234748215e-03, initResNorm=4.5878651244647644e+10
load_set_000: uNorm=3.1102406229065651e-03, initResNorm=4.5878651244647644e+10
load_set_000: uNorm=3.1102406001658109e-03, initResNorm=4.5878651244647644e+10
load_set_000: uNorm=3.1102406133390252e-03, initResNorm=4.5878651244647644e+10
load_set_000: uNorm=3.1102406123291494e-03, initResNorm=4.5878651244647644e+10
load_set_000: uNorm=3.1102406141400342e-03, initResNorm=4.5878651244647644e+10
load_set_000: uNorm=3.1102406437929267e-03, initResNorm=4.5878651244647644e+10
load_set_000: uNorm=3.1102406079585327e-03, initResNorm=4.5878651244647644e+10
load_set_000: uNorm=3.1102406018139010e-03, initResNorm=4.5878651244647644e+10
load_set_000: uNorm=3.1102406079068340e-03, initResNorm=4.5878651244647644e+10
load_set_000: uNorm=3.1102406123293758e-03, initResNorm=4.5878651244647644e+10
load_set_000: uNorm=3.1102406045341430e-03, initResNorm=4.5878651244647644e+10
load_set_000: uNorm=3.1102406196805662e-03, initResNorm=4.5878651244647644e+10
load_set_000: uNorm=3.1102406217764444e-03, initResNorm=4.5878651244647644e+10
load_set_000: uNorm=3.1102406022091790e-03, initResNorm=4.5878651244647644e+10
load_set_000: uNorm=3.1102406003855588e-03, initResNorm=4.5878651244647644e+10

The initial residual is consistent, but the norm of the solution is not. We tried many of the examples and found that:

Deterministic:

  • Spring mass
  • Beam
  • Annulus (floating point exception with 16 procs)

Non-deterministic when run on >3-4 cores:

  • CRM
  • Fuselage
  • MACH tutorial wing
  • Plate
  • Slot_3d

Out of curiosity, I tried the CRM example with Valgrind and found that with 16 procs there were two invalid reads. Fewer proc counts did not have this issue, but Valgrind might also miss invalid reads depending on if the memory is allocated or not.

==796863== Invalid read of size 4
==796863== at 0x6099B8B9: TACSBlockCyclicMat::get_block_owner(int, int) const (TACSBlockCyclicMat.h:142)
==796863== by 0x60992954: TACSBlockCyclicMat::merge_nz_pattern(int, int*, int*, int) (TACSBlockCyclicMat.cpp:554)
==796863== by 0x60990A07: TACSBlockCyclicMat::TACSBlockCyclicMat(ompi_communicator_t*, int, int, int, int const*, int, int const*, int const*, int, int, int) (TACSBlockCyclicMat.cpp:177)
==796863== by 0x609ABE04: TACSSchurPc::TACSSchurPc(TACSSchurMat*, int, double, int) (TACSSchurMat.cpp:1096)
==796863== by 0x6077EA3D: __pyx_pf_4tacs_4TACS_2Pc___cinit__ (TACS.cpp:34395)
==796863== by 0x6077EA3D: __pyx_pw_4tacs_4TACS_2Pc_1__cinit__ (TACS.cpp:34155)
==796863== by 0x6077EA3D: __pyx_tp_new_4tacs_4TACS_Pc(_typeobject*, _object*, _object*) (TACS.cpp:89770)
==796863== by 0x2589C6: _PyObject_MakeTpCall (in /usr/bin/python3.10)
==796863== by 0x25214F: _PyEval_EvalFrameDefault (in /usr/bin/python3.10)
==796863== by 0x2629FB: _PyFunction_Vectorcall (in /usr/bin/python3.10)
==796863== by 0x24B45B: _PyEval_EvalFrameDefault (in /usr/bin/python3.10)
==796863== by 0x257C13: _PyObject_FastCallDictTstate (in /usr/bin/python3.10)
==796863== by 0x26CB04: ??? (in /usr/bin/python3.10)
==796863== by 0x258A1B: _PyObject_MakeTpCall (in /usr/bin/python3.10)

==796863== Address 0xc66f0b6c is 4 bytes before a block of size 60 alloc’d
==796863== at 0x484A2F3: operator new[](unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==796863== by 0x6099385F: TACSBlockCyclicMat::init_proc_grid(int) (TACSBlockCyclicMat.cpp:777)
==796863== by 0x609902AA: TACSBlockCyclicMat::TACSBlockCyclicMat(ompi_communicator_t*, int, int, int, int const*, int, int const*, int const*, int, int, int) (TACSBlockCyclicMat.cpp:87)
==796863== by 0x609ABE04: TACSSchurPc::TACSSchurPc(TACSSchurMat*, int, double, int) (TACSSchurMat.cpp:1096)
==796863== by 0x6077EA3D: __pyx_pf_4tacs_4TACS_2Pc___cinit__ (TACS.cpp:34395)
==796863== by 0x6077EA3D: __pyx_pw_4tacs_4TACS_2Pc_1__cinit__ (TACS.cpp:34155)
==796863== by 0x6077EA3D: __pyx_tp_new_4tacs_4TACS_Pc(_typeobject*, _object*, _object*) (TACS.cpp:89770)
==796863== by 0x2589C6: _PyObject_MakeTpCall (in /usr/bin/python3.10)
==796863== by 0x25214F: _PyEval_EvalFrameDefault (in /usr/bin/python3.10)
==796863== by 0x2629FB: _PyFunction_Vectorcall (in /usr/bin/python3.10)
==796863== by 0x24B45B: _PyEval_EvalFrameDefault (in /usr/bin/python3.10)
==796863== by 0x257C13: _PyObject_FastCallDictTstate (in /usr/bin/python3.10)
==796863== by 0x26CB04: ??? (in /usr/bin/python3.10)
==796863== by 0x258A1B: _PyObject_MakeTpCall (in /usr/bin/python3.10)

To summarize:

  • The initial residual is deterministic even when the solution is not
  • The non-determinism is independent of element type / constitutive model
  • Tightening the solver tolerance has no effect, nor does lowering compiler optimization
  • Happens reliably with large, parallel problems (nproc > 2) (but, also happened in serial at times)

Other notes:

  • We looked into the preconditioner and stiffness matrices and computed min / max / mean values, and these values were deterministic even when the results were not
@bernardopacini bernardopacini changed the title Non-determinism issues in primal solution and derivatives Non-determinism in primal solution and derivatives Mar 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant