Unify linear solvers #2706

atgeirr · 2020-07-10T07:54:08Z

This tries to reduce the linear solver usage to essentially either using the GPU or the FlexibleSolver.

The replacement consists of using the FlexibleSolver code.

atgeirr · 2020-08-04T10:44:57Z

This is now ready for review. I have tried to eliminate a lot of complexity (and a decent bit of compilation time), and to that end I have made two perhaps controversial choices:

Made the FlexibleSolver the default and only alternative to the GPU solver. No change in results with default (ilu0) option, as already established.
Require "--owner-cells-first=true", this is already the default. However the possibility of having a general cell order may be good to have in the future (adaptive grids perhaps?), so I think it is worth discussing.

I also had to re-add the makeOverlapRowsInvalid() method, and run it for all parallel runs. I was a little confused about this, and assumed that it would not be necessary with ILU0 and --owner-cells-first=true at least, but apparently I was wrong. I did not quite understand how (and if) this modification was done in the master branch version?

There are also quite a bit of refactoring here, so the diffs may not be all that useful for reviewing. If requested I can try to split this into more PRs to simplify the process.

atgeirr · 2020-08-04T10:45:05Z

jenkins build this please

blattms · 2020-08-04T11:02:51Z

Please post benchmark results before and after the changes.

atgeirr · 2020-08-04T11:06:00Z

Please post benchmark results before and after the changes.

Sure! Did you mean compile times or simulation times? Or both?

atgeirr · 2020-08-04T13:03:26Z

benchmark please

atgeirr · 2020-08-04T13:55:06Z

Build times (16 HW threads, building with make -j 15, not including cmake times) for release mode:

Build variant	Master	This PR
Clean build	4m24s	3m59s
touch opm/simulators/wells/StandardWell.hpp	3m34s	3m02s
touch opm/simulators/linalg/PreconditionerFactory.hpp	1m21s	1m20s

Build times (16 HW threads, building with make -j 15, not including cmake times) for debug mode:

Build variant	Master	This PR
Clean build	2m05s	1m53s
touch opm/simulators/wells/StandardWell.hpp	1m29s	1m21s
touch opm/simulators/linalg/PreconditionerFactory.hpp	41s	41s

atgeirr · 2020-08-04T14:15:58Z

Results from running Norne with default settings and 8 processes. Master branch:

Number of MPI processes:      8
Threads per MPI process:      1
Total time (seconds):         210.217
Solver time (seconds):        210.165
 Assembly time (seconds):     62.3946 (Failed: 1.35474; 2.17125%)
 Linear solve time (seconds): 117.127 (Failed: 2.32797; 1.98756%)
 Linear solve setup time (seconds): 17.5982 (Failed: 0.478168; 2.71714%)
 Update time (seconds):       3.31908 (Failed: 0.0760609; 2.29163%)
 Output write time (seconds): 2.18266
Overall Well Iterations:      928 (Failed: 5; 0.538793%)
Overall Linearizations:       1953 (Failed: 42; 2.15054%)
Overall Newton Iterations:    1618 (Failed: 42; 2.5958%)
Overall Linear Iterations:    25444 (Failed: 463; 1.81968%)

This PR:

Number of MPI processes:      8
Threads per MPI process:      1
Total time (seconds):         208.417
Solver time (seconds):        208.366
 Assembly time (seconds):     62.2715 (Failed: 1.34684; 2.16285%)
 Linear solve time (seconds): 115.238 (Failed: 2.20138; 1.91029%)
 Linear solve setup time (seconds): 17.1001 (Failed: 0.434218; 2.53928%)
 Update time (seconds):       3.29661 (Failed: 0.0742074; 2.25102%)
 Output write time (seconds): 2.17215
Overall Well Iterations:      928 (Failed: 5; 0.538793%)
Overall Linearizations:       1953 (Failed: 42; 2.15054%)
Overall Newton Iterations:    1618 (Failed: 42; 2.5958%)
Overall Linear Iterations:    25444 (Failed: 463; 1.81968%)

Results are identical. I think the time differences are random, and do not indicate any advantage of the PR (it should not have any).

alfbr · 2020-08-04T20:55:16Z

Seems we still have issues with the benchmarks, hopefully resolved tomorrow.

blattms

It seem to me like all the parallel optimzations done by @andrthu are lost with this. Unless I am mistaking these actually gave quite a performance boost on parallel production runs. Hence his changes must be retained. E.g. the default should be ownersFirst_==true which is what it is in master, but in this branch it is the opposite.

It seems like easily choosing cpr or amg with one option is gone now. I think this is a decrease in usability.
Maybe should "cpr" (defaulting to true impes) and "amg" to --linear-solver-configuration?

blattms · 2020-08-06T09:46:08Z

opm/simulators/linalg/setupPropertyTree_impl.hpp

 #endif
    }
    else
    {
        std::string conf =  p.linear_solver_configuration_;
        // Support old UseCpr if not configuration was set
-        if (!EWOMS_PARAM_IS_SET(TypeTag, std::string, LinearSolverConfiguration) && p.use_cpr_)


How does a user select CPR now (formerly --use-cpr=true was all it took)?

I am trying to make everything follow --linear-solver-configuration (or at least move in that direction).

blattms · 2020-08-06T09:47:03Z

opm/simulators/linalg/FlowLinearSolverParameters.hpp

@@ -191,14 +194,14 @@ namespace Opm
            newton_use_gmres_ = EWOMS_GET_PARAM(TypeTag, bool, UseGmres);
            require_full_sparsity_pattern_ = EWOMS_GET_PARAM(TypeTag, bool, LinearSolverRequireFullSparsityPattern);
            ignoreConvergenceFailure_ = EWOMS_GET_PARAM(TypeTag, bool, LinearSolverIgnoreConvergenceFailure);
-            linear_solver_use_amg_ = EWOMS_GET_PARAM(TypeTag, bool, UseAmg);


How do I select AMG as preconditioner (formerly --use-amg), I think that proved good for some 2-phase problems of @totto82?

Right now, by using the JSON file option. I was not aware this actually had good use cases, I suggest we make any such particular choices available by --linear-solver-configuration=some_string, just like for "ilu0", "cpr_quasiimpes" and "cpr_trueimpes".

Edit: I see this is the same that you suggested.

Seems like this is still missing?

If we skip --use-cpr like in this PR then to the very least I would suggest having "cpr" here.
Biut "--linear-solver-configuration=cpr" is stil a lot more to type than "--use-cpr"

blattms · 2020-08-06T09:48:26Z

opm/simulators/linalg/FlowLinearSolverParameters.hpp

+    //         cpr_ilu_milu_             = MILU_VARIANT::ILU;
+    //         cpr_ilu_redblack_         = false;
+    //         cpr_ilu_reorder_sphere_   = true;


Are all these really supported by the flexible solvers?

blattms · 2020-08-06T09:57:56Z

opm/simulators/linalg/ISTLSolverEbos.hpp

-                noGhostAdjacency();
-                setGhostsInNoGhost(*noGhostMat_);


I think there was a performance reason in paralell for doing. Are you sure this does not apply anymore? Please back up with some numbers.

Are we perhaps now always using the OwnerCellsFirst approach? In that case we need to remove the option from opm-grid, too.

This is indeed trying to always use OwnerCellsFirst, however I am unsure if I should reverse that, as discussed briefly in the description.

blattms · 2020-08-06T09:59:25Z

opm/simulators/linalg/ISTLSolverEbos.hpp

-                    OpmLog::warning("OwnerCellsFirst option is true, but ignored.");
+            const bool ownersFirst = EWOMS_GET_PARAM(TypeTag, bool, OwnerCellsFirst);
+            if (!ownersFirst) {
+                const std::string msg = "The linear solver no longer supports --owner-cells-first=false.";


Sorry, this a nogo for performance reasons. I might have said before that ownersFirst==true should be the only option in the long term.

This confuses me: with this PR ownersFirst==true is the only option, unless I made some 180 degree mistake somewhere.

Indeed, I misread this.
But if we only support true, shouldn't the option be deleted? Having it confuses me.

I did not go as far as removing the option, as I became unsure if that would be acceptable. But if there are no uses for it (adaptivity struck me as a possible use) I also think it is better to remove it.

blattms · 2020-08-06T10:02:48Z

opm/simulators/linalg/ISTLSolverEbos.hpp

-                    boost::property_tree::write_json(os, prm_, true);
-                    OpmLog::note(os.str());
-                }
+            interiorCellNum_ = detail::numMatrixRowsToUseInSolver(simulator_.vanguard().grid(), true);


This is now actually doing grid.numCells()

I do not think so. This is the function called, with the second argument being true:

template <class Grid> size_t numMatrixRowsToUseInSolver(const Grid& grid, bool ownerFirst) { size_t numInterior = 0; if (!ownerFirst || grid.comm().size()==1) return grid.numCells(); const auto& gridView = grid.leafGridView(); auto elemIt = gridView.template begin<0>(); const auto& elemEndIt = gridView.template end<0>(); // loop over cells in mesh for (; elemIt != elemEndIt; ++elemIt) { // Count only the interior cells. if (elemIt->partitionType() == Dune::InteriorEntity) { numInterior++; } } return numInterior; }

Looks fine to me? Could be simplified if we go with this though, as we would never call it with a false argument though.

blattms · 2020-08-06T10:07:01Z

opm/simulators/linalg/ISTLSolverEbos.hpp

@@ -678,87 +331,34 @@ DenseMatrix transposeDenseMatrix(const DenseMatrix& M)

        void prepareFlexibleSolver()
        {
-            // Decide if we should recreate the solver or just do


Since the scaling stuff (DRS et all) is from your colleagues, I trust you here. Was wondering myself whether this is used/useful.

blattms · 2020-08-06T10:08:58Z

opm/simulators/linalg/ISTLSolverEbos.hpp

-                            OPM_THROW(std::runtime_error, "In parallel, the flexible solver requires "
-                                      "--owner-cells-first=true when --matrix-add-well-contributions=false is used.");


Why isn't this needed anymore? ownersFirst is now alway false with your changes.

See above comment, I meant for it to be true always.

blattms · 2020-08-06T10:29:13Z

According to #2220 the savings (besides noise) started from 16 processes.

Are will still doing #2205?

atgeirr · 2020-08-06T11:59:46Z

Are will still doing #2205?

I have to admit I was confused by this code, and may have deleted too much. I assumed that the no-ghost-matrix was only created and used when the ordering was not OwnerCellsFirst, so it would now be not needed, but I may have misread. So will the manipulations made with the no-ghost-matrix still have any effect with the OwnerCellsFirst always true?

blattms · 2020-08-06T12:38:33Z

If ownerFirst is always true, then the additions #2205 can go and in addition both findOverlapAndInterior and makeOverlapRowsInvalid should not be needed anymore. Not 100% sure, so we should test that. Overlap rows are not treated by the operator anymore, Not sure about the preconditioner.

Concerning that: The ILU had knowledge whether this was ownerFirst_ or not and adapted accordingly. Is that still true for the flexible solvers?

atgeirr · 2020-08-06T14:19:05Z

makeOverlapRowsInvalid should not be needed anymore. Not 100% sure, so we should test that. Overlap rows are not treated by the operator anymore, Not sure about the preconditioner.

I found that I needed this to get identical results as the current master. I think it makes sense, even if the operator does not see the rows, the preconditioner will.

Concerning that: The ILU had knowledge whether this was ownerFirst_ or not and adapted accordingly. Is that still true for the flexible solvers?

Yes. Although it is not elegant: I added a method to the PreconditionerFactory

    template <class CommArg>
    static size_t interiorIfGhostLast(const CommArg& comm)

that returns the number of interior cells or all cells, just what the constructor of the ParallelOverlappingILU0 class needs. So this is (re-)calculated every time we (re-)construct the preconditioner. It did not seem to cause any performance penalty that I could discover (it is a local operation, no communication). We could add another argument to PreconditionerFactory::create(), just like the weight function, but I am thinking about ways to communicate such information to the preconditioner construction in a more generic way. One way could be to pass for example const std::map<std::string, std::any>& extra_info through to the preconditioner, then cpr could look for extra_info["weightfunction"], the parallel ILU0 could look for extra_info["interior_size"] etc. If something needed is not found, we can give a helpful error message.

atgeirr · 2020-08-11T09:23:23Z

benchmark please

alfbr · 2020-08-12T07:41:48Z

There were some issues related to MariaDB and the github API. Hopefully resolved now, so let's try again.

alfbr · 2020-08-12T07:42:00Z

benchmark please

alfbr · 2020-08-12T07:51:27Z

There seems to be a build failure for this PR preventing the benchmark.

atgeirr · 2020-08-12T07:54:01Z

There seems to be a build failure for this PR preventing the benchmark.

A bit strange since the Jenkins test passed, but perhaps things have changed on master in incompatible ways. I will rerun Jenkins to see if that gives a build error.

atgeirr · 2020-08-12T07:54:09Z

jenkins build this serial please

blattms · 2020-08-12T13:59:51Z

In file included from /home/mblatt/src/dune/opm-2.6/opm-simulators/ebos/ebos.hh:35:0,
                 from /home/mblatt/src/dune/opm-2.6/opm-simulators/ebos/ebos_blackoil.cc:30:
/home/mblatt/src/dune/opm-2.6/opm-simulators/opm/simulators/linalg/ISTLSolverEbos.hpp: In member function ‘bool Opm::ISTLSolverEbos<TypeTag>::solve(Opm::ISTLSolverEbos
<TypeTag>::Vector&)’:
/home/mblatt/src/dune/opm-2.6/opm-simulators/opm/simulators/linalg/ISTLSolverEbos.hpp:258:36: error: ‘simulator’ was not declared in this scope
                     if (use_gpu && simulator.gridView().comm().rank() == 0) {
                                    ^~~~~~~~~
/home/mblatt/src/dune/opm-2.6/opm-simulators/opm/simulators/linalg/ISTLSolverEbos.hpp:258:36: note: suggested alternative: ‘simulator_’
                     if (use_gpu && simulator.gridView().comm().rank() == 0) {
                                    ^~~~~~~~~
                                    simulator_

atgeirr · 2020-08-12T14:10:30Z

Thanks for the error message, a trivial bug for sure! (But hard to find without GPU compilation.) I'll update.

atgeirr · 2020-08-12T14:28:20Z

Is it not strange that Jenkins did not complain? I assumed that it also compiled the GPU version?

atgeirr · 2020-08-12T14:28:37Z

jenkins build this please

atgeirr · 2020-08-12T17:21:42Z

benchmark please

atgeirr · 2020-08-13T06:17:46Z

Seems that the benchmark did not go through? Could someone with a CUDA or OpenCL setup test this PR in case I have more silly mistakes in the code parts I do not compile locally?

alfbr · 2020-08-13T06:43:00Z

Seems that the benchmark did not go through? Could someone with a CUDA or OpenCL setup test this PR in case I have more silly mistakes in the code parts I do not compile locally?

Actually it did go through, it was just not reported. During testing the reporting was turned off to avoid spamming. Sorry for the confusion. These were the results:

Test	Configuration	Relative
opm-git	OPM Benchmark: flow_mpi_extra - Threads: 1	1.03
opm-git	OPM Benchmark: flow_mpi_extra - Threads: 8	0.984
opm-git	OPM Benchmark: flow_mpi_norne - Threads: 1	1.002
opm-git	OPM Benchmark: flow_mpi_norne - Threads: 8	0.978

Speed-up = Total time master / Total time pull request. Above 1.0 is an improvement. *

atgeirr · 2020-08-13T09:04:36Z

Thanks for the results! The slight performance reduction for 8 threads could perhaps be noise, but it might not be. After discussion with @andrthu I think I have a better understanding of what is removed and not here: Results should be identical (as also indicated through testing), but there is one performance optimization lost: The "noGhostMatrix" would be built with no off-diagonal entries on ghost rows. However, this optimization would only affect the (no longer default) OwnerCellsFirst==false case. After discussion, the conclusion was that it is better to never put these entries in the jacobian in the first place, instead of manipulating the matrix afterwards. That should save the time for makeOverlapRowsInvalid() and also save a tiny bit of time for assembly. I'll experiment a little.

alfbr · 2020-08-13T09:38:19Z

The slight performance reduction for 8 threads could perhaps be noise, but it might not be.

Indeed, the deviation is within the noise band. Then again, it may be real, so please do the experiments.

blattms · 2020-08-13T09:50:19Z

I think I said this before: There is no reason for OwnerCellsFirst and if possible the option should be removed and the code cleaned accordingly.

alfbr · 2020-08-13T09:54:06Z

I think I said this before: There is no reason for OwnerCellsFirst and if possible the option should be removed and the code cleaned accordingly.

Good point, this is maybe a good opportunity to clean it up.

atgeirr · 2020-08-13T10:07:40Z

Good point, this is maybe a good opportunity to clean it up.

Will amend. I think we should keep the possibility in opm-grid, but remove the option from Flow. Perhaps also add some assertations to guard against future mistakes.

bska · 2020-08-13T10:53:56Z

I think we should keep the possibility in opm-grid

What is the cost/benefit ratio of doing so? I can easily imagine that keeping the possibility in place will incur at least some maintenance cost—e.g., one more branch that must be tested whenever we touch the code —and I don't expect benefit to be particularly large if we never run that code.

blattms · 2020-08-13T12:05:47Z

the option in opm-grid was introduced by me (default==false) to show how @andrthu could use it from opm-simulators without switching the default. After he finished his work the default ws switched to true. I highly doubt that the option was ever changed/used by any user after that. Reintroducing it with `git revert` should be easy but is work. But maybe less time than developers will spend thinking about it when they come across it.

ytelses · 2020-08-13T22:26:21Z

Benchmark result overview:

Test	Configuration	Relative
opm-git	OPM Benchmark: flow_mpi_extra - Threads: 1	1.03
opm-git	OPM Benchmark: flow_mpi_extra - Threads: 8	0.984
opm-git	OPM Benchmark: flow_mpi_norne - Threads: 1	1.002
opm-git	OPM Benchmark: flow_mpi_norne - Threads: 8	0.978

Speed-up = Total time master / Total time pull request. Above 1.0 is an improvement. *

View result details @ https://www.ytelses.com/opm/?page=result&id=

blattms · 2020-08-14T08:32:56Z

Unfortunately, this PR seems to be 5% slower on model 2 with 16 processors. I will run further tests.

atgeirr · 2020-08-14T08:35:56Z

I have now done some experiments with eliminating off-diagonal nonzeros on ghost rows, it worked quite well! I see a minor speedup, and it should allow this PR to be merged without fear of losing any optimization. See OPM/opm-models#623.

There is one detail worth mentioning: when using --matrix-add-well-contributions=true (still recommended to get good cpr performance), the well contributions can still result in ghost row off-diagonal elements. I looked briefly at eliminating this as well, but it would be a little trickier than the opm-models change, so I postponed that.

Unfortunately, this PR seems to be 5% slower on model 2 with 16 processors. I will run further tests.

I assume that is with default solver and options? Consider testing with the above-mentioned opm-models PR.

blattms · 2020-08-14T09:39:11Z

when using --matrix-add-well-contributions=true (still recommended to get good cpr performance), the well contributions can still result in ghost row off-diagonal elements.

Are you sure that is true for the current code base? IMHO all wells should only perforate interior cells, as each well is contained in the interior.

atgeirr · 2020-08-14T10:12:33Z

IMHO all wells should only perforate interior cells, as each well is contained in the interior.

That sounds good, my mistake then. However my testing seemed to indicate that with cpr the makeOverlapRowsInvalid() call still made a difference. I interpreted that to mean that it was needed due to off-diagonal ghost elements from the wells. However, if they do not exist, the only effect of that call is setting the diagonal element to identity. So perhaps that is still a good idea!

blattms · 2020-08-17T06:23:57Z

I retract my statement witht the 5%. I dd more runs and on average the runtimes are the same.

blattms · 2020-09-03T17:58:59Z

ebos/eclpeacemanwell.hh

@@ -309,7 +309,9 @@ public:
        auto wellDofIt = dofVariables_.begin();
        const auto& wellDofEndIt = dofVariables_.end();
        for (; wellDofIt != wellDofEndIt; ++ wellDofIt) {
-            neighbors[wellGlobalDof].insert(wellDofIt->first);
+            if (wellDofIt->second.element.partitionType() == Dune::PartitionType::InteriorEntity) {


Maybe this will be gone after rebasing. Is it needed for this PR?

atgeirr · 2020-10-10T08:28:05Z

Closing in favour of #2848.

atgeirr force-pushed the unify-linsolve branch 2 times, most recently from 1b4cb20 to 85de88a Compare July 31, 2020 14:25

atgeirr added 6 commits August 4, 2020 09:27

Removed old amg/cpr code that can be replaced.

6388fa5

The replacement consists of using the FlexibleSolver code.

Removed further (now) unused code.

29a7495

Eliminate unneeded parts of ISTLSolver.

65532ae

Add back matrix modification in parallel.

ec5a42a

Always call makeOverlapRowsInvalid() when parallel.

5aa7fd2

Make shouldCreateSolver() and getWeightsCalculator() separate functions.

cad7db2

atgeirr force-pushed the unify-linsolve branch from 3a52a1d to cad7db2 Compare August 4, 2020 07:27

Add forgotten code branch for serial runs.

04d2042

atgeirr marked this pull request as ready for review August 4, 2020 10:34

blattms requested changes Aug 6, 2020

View reviewed changes

Fix typo: missing underscore.

860cc47

[WIP] testing code for eliminating ghost rows

4c22289

atgeirr mentioned this pull request Sep 3, 2020

Excessive time step choping for Norne when using flexible solvers. #2759

Closed

blattms reviewed Sep 3, 2020

View reviewed changes

atgeirr mentioned this pull request Sep 29, 2020

Bug in Nightly: Default Value Not Correctly Set OPM/opm-common#1976

Open

This was referenced Oct 8, 2020

Unify linear solvers #2843

Closed

Unify linear solvers #2848

Merged

atgeirr closed this Oct 10, 2020

atgeirr deleted the unify-linsolve branch October 15, 2020 08:08

		OPM_THROW(std::runtime_error, "In parallel, the flexible solver requires "
		"--owner-cells-first=true when --matrix-add-well-contributions=false is used.");

Unify linear solvers #2706

Unify linear solvers #2706

Conversation

atgeirr commented Jul 10, 2020

atgeirr commented Aug 4, 2020

atgeirr commented Aug 4, 2020

blattms commented Aug 4, 2020 via email

atgeirr commented Aug 4, 2020

atgeirr commented Aug 4, 2020

atgeirr commented Aug 4, 2020

atgeirr commented Aug 4, 2020

alfbr commented Aug 4, 2020

blattms left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

atgeirr Aug 6, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

atgeirr Aug 6, 2020 • edited by bska Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

blattms commented Aug 6, 2020

atgeirr commented Aug 6, 2020

blattms commented Aug 6, 2020

atgeirr commented Aug 6, 2020 • edited Loading

atgeirr commented Aug 11, 2020

alfbr commented Aug 12, 2020

alfbr commented Aug 12, 2020

alfbr commented Aug 12, 2020

atgeirr commented Aug 12, 2020

atgeirr commented Aug 12, 2020 • edited Loading

blattms commented Aug 12, 2020

atgeirr commented Aug 12, 2020

atgeirr commented Aug 12, 2020

atgeirr commented Aug 12, 2020

atgeirr commented Aug 12, 2020

atgeirr commented Aug 13, 2020

alfbr commented Aug 13, 2020

atgeirr commented Aug 13, 2020

alfbr commented Aug 13, 2020

blattms commented Aug 13, 2020 via email

alfbr commented Aug 13, 2020

atgeirr commented Aug 13, 2020

bska commented Aug 13, 2020

blattms commented Aug 13, 2020 via email

ytelses commented Aug 13, 2020

blattms commented Aug 14, 2020

atgeirr commented Aug 14, 2020

blattms commented Aug 14, 2020 via email

atgeirr commented Aug 14, 2020

blattms commented Aug 17, 2020

Choose a reason for hiding this comment

atgeirr commented Oct 10, 2020

atgeirr Aug 6, 2020 •

edited

Loading

atgeirr Aug 6, 2020 •

edited by bska

Loading

atgeirr commented Aug 6, 2020 •

edited

Loading

atgeirr commented Aug 12, 2020 •

edited

Loading