Make use of HYPRE for more linear algebra on GPU #193

sebastiangrimberg · 2024-02-09T02:30:29Z

NOTE: Builds on #184

Use HYPRE's inner-product routine (wraps cuBLAS/equivalent) instead of MFEM's custom kernel
Use HYPRE's sequential hypre_CSRMatrix data structure instead of mfem::SparseMatrix to speed up CPU-based CSR assembly with OpenMP

hughcars

Modulo the mac m1 and threading issue with SuperLU, which I agree should just be a bug report for SuperLU, lgtm.

hughcars · 2024-02-26T20:16:36Z

palace/models/waveportoperator.cpp

+  pr.UseDevice(false);
+  pi.UseDevice(false);
  pr.Assemble();
  pi.Assemble();
+  pr.UseDevice(true);
+  pi.UseDevice(true);


So if I am understanding correctly, this is performing the assembly on host and then transferring the assembled operators to device?

The flag set by UseDevice is just an execution flag, rather than a memory location flag. So this is just a minor optimization so that operations like pr = 0.0 inside of LinearForm::Assemble happen on the host since we know assembly is going to happen on the host anyway. No need to move the Vector to the device only to pull it back after setting it equal to a value.

hughcars · 2024-02-26T20:26:06Z

palace/linalg/hypre.cpp

+namespace
+{
+
+static HypreVector X, Y;


I'm sure you've considered this, but for posterity, using static like this is going to be thread safe because these are basically just pointers to hypre variables, and hypre is handling the threads itself yes?

It's expected that you would not call any of the HypreCSRMatrix routines which use these variables from multiple threads simultaneously. So I think the answer to your question is no this isn't thread-safe but it doesn't have to be based on the expected use case.

hughcars · 2024-02-26T20:43:19Z

palace/fem/bilinearform.cpp

@@ -49,15 +49,9 @@ BilinearForm::PartialAssemble(const FiniteElementSpace &trial_fespace,
  // This should work fine if some threads create an empty operator (no elements or boundary
  // elements).
  const std::size_t nt = ceed::internal::GetCeedObjects().size();
-  PalacePragmaOmp(parallel for schedule(static))
-  for (std::size_t i = 0; i < nt; i++)
+  PalacePragmaOmp(parallel if (nt > 1))


This new optional omp looping, this is to remove some overhead of starting an omp loop if there's only 1 thread correct?

Overhead yes but also in the case where nt = 1 but OMP_NUM_THREADS > 1 (running on GPU), we need to make sure this is serial.

hughcars · 2024-02-28T17:55:30Z

palace/linalg/vector.cpp

+void ComplexVector::MakeRef(Vector &y, int offset, int size)
 {
-  Set(py, n, on_dev);
+  MFEM_ASSERT(y.Size() <= 2 * size,
+              "Insufficient storage for ComplexVector alias reference of the given size!");
+  data.MakeRef(y, offset, 2 * size);
+  xr.MakeRef(data, offset, size);
+  xi.MakeRef(data, offset + size, size);
 }


If I'm getting this correctly, this is MakeRef is to treat a vector as a ComplexVector, stacked [xr;xi], starting at offset, with size 2*size. Then I think the assertion should be offset + 2 *size <= y.Size(), to check the accessed block isn't going off the end of y. As is, offset greater than y.Size(), wouldn't be triggered, and I think if size == 10 and y.size() == 1, this would also go off the end.

That's correct, this was actually fixed in ff7e163.

hughcars · 2024-02-28T18:11:31Z

palace/fem/libceed/restriction.cpp

          {
-            nnz++;
+            if (k < j - 1 && k > j + 1 && el_trans_j(k) != 0.0)


Shouldn't that be (k < j - 1 || k > j + 1) && el_trans_j(k) != 0.0? k < j - 1 && k > j + 1 should return false for any k, j.

Yep, good catch. This is an old bug in a debug check.

hughcars

Approved as part of testing on #204

…PU code

…d dot product improvements, and quadrature data assembly (now always disabled)

… default)

This allows OpenMP and better GPU acceleration for ceed::Operator full assembly into a CSR matrix using Hypre's functionality.

… elimination is intended only for square matrices)

sebastiangrimberg added enhancement New feature or request performance Related to performance labels Feb 9, 2024

sebastiangrimberg force-pushed the sjg/gpu-build-system-dev branch from c614668 to d1567c8 Compare February 16, 2024 01:03

sebastiangrimberg force-pushed the sjg/hypre-interface-dev branch from f6ea580 to 2b7f73b Compare February 16, 2024 01:04

sebastiangrimberg force-pushed the sjg/gpu-build-system-dev branch from d1567c8 to 0700248 Compare February 16, 2024 01:32

sebastiangrimberg force-pushed the sjg/hypre-interface-dev branch from 2b7f73b to 35cad58 Compare February 16, 2024 01:32

sebastiangrimberg force-pushed the sjg/gpu-build-system-dev branch 3 times, most recently from bf79e54 to dd42790 Compare February 16, 2024 21:27

sebastiangrimberg force-pushed the sjg/hypre-interface-dev branch 2 times, most recently from 4798f29 to de2db6f Compare February 17, 2024 01:04

sebastiangrimberg force-pushed the sjg/gpu-build-system-dev branch from 23a099a to 4e1010f Compare February 17, 2024 02:01

sebastiangrimberg force-pushed the sjg/hypre-interface-dev branch 2 times, most recently from 41491c7 to d70f023 Compare February 17, 2024 02:04

sebastiangrimberg force-pushed the sjg/gpu-build-system-dev branch from 4e1010f to 93f7f6b Compare February 17, 2024 04:16

sebastiangrimberg force-pushed the sjg/hypre-interface-dev branch from d70f023 to 6c7d418 Compare February 19, 2024 16:41

sebastiangrimberg marked this pull request as ready for review February 19, 2024 16:44

sebastiangrimberg requested a review from hughcars February 19, 2024 16:45

sebastiangrimberg force-pushed the sjg/hypre-interface-dev branch from 9c6e91d to 9115ebc Compare February 20, 2024 01:12

sebastiangrimberg force-pushed the sjg/gpu-build-system-dev branch from 93f7f6b to a6eb633 Compare February 22, 2024 01:27

sebastiangrimberg force-pushed the sjg/hypre-interface-dev branch 2 times, most recently from 74e35d9 to 83f60dd Compare February 22, 2024 02:13

sebastiangrimberg force-pushed the sjg/gpu-build-system-dev branch from a228960 to 9e002ef Compare February 28, 2024 00:03

sebastiangrimberg mentioned this pull request Feb 28, 2024

Add CUDA/HIP support and GPU builds #184

Merged

2 tasks

sebastiangrimberg force-pushed the sjg/hypre-interface-dev branch 3 times, most recently from 4962970 to 1ed1123 Compare February 29, 2024 01:00

hughcars reviewed Feb 29, 2024

View reviewed changes

sebastiangrimberg force-pushed the sjg/hypre-interface-dev branch 2 times, most recently from 19f9618 to 36b3cfb Compare March 1, 2024 19:31

sebastiangrimberg mentioned this pull request Mar 1, 2024

Add palace::GridFunction to unify mfem::ParGridFunction and mfem::ParComplexGridFunction #204

Merged

hughcars approved these changes Mar 4, 2024

View reviewed changes

sebastiangrimberg force-pushed the sjg/gpu-build-system-dev branch 2 times, most recently from 8132f54 to b802a98 Compare March 4, 2024 19:51

sebastiangrimberg force-pushed the sjg/hypre-interface-dev branch from 36b3cfb to 4932816 Compare March 4, 2024 19:51

Base automatically changed from sjg/gpu-build-system-dev to main March 4, 2024 21:01

sebastiangrimberg added 10 commits March 4, 2024 13:04

Accelerate CeedElemRestriction construction when using OpenMP on GPU

4afad87

Upgrade vector inner products to use Hypre instead of MFEM's custom G…

5dc0d46

…PU code

Updates to models/, including linear form assembly on CPU, some misse…

3171cdb

…d dot product improvements, and quadrature data assembly (now always disabled)

Replace inner products with ones vector with sums

f7c068d

Remove unneeded parallel check (OpenMP disables nested parallelism by…

da6a2bb

… default)

Update OpenMP parallelism and Hypre improvements

279fe4a

Fix aliasing issue for ComplexVector

d5393d2

Add HypreCSRMatrix wrapper class to replace mfem::SparseMatrix

59aa00a

This allows OpenMP and better GPU acceleration for ceed::Operator full assembly into a CSR matrix using Hypre's functionality.

Fix bug which surfaced with wave ports running on many processors (BC…

cb3ead1

… elimination is intended only for square matrices)

Fix bug in old debug check

c2c5ae9

sebastiangrimberg force-pushed the sjg/hypre-interface-dev branch from 4932816 to c2c5ae9 Compare March 4, 2024 21:04

sebastiangrimberg enabled auto-merge March 4, 2024 21:16

sebastiangrimberg merged commit e9670fc into main Mar 4, 2024
17 checks passed

sebastiangrimberg deleted the sjg/hypre-interface-dev branch March 4, 2024 21:45

sebastiangrimberg mentioned this pull request Mar 22, 2024

AMS without Multigrid fails #216

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make use of HYPRE for more linear algebra on GPU #193

Make use of HYPRE for more linear algebra on GPU #193

sebastiangrimberg commented Feb 9, 2024

hughcars left a comment

hughcars Feb 26, 2024

sebastiangrimberg Feb 29, 2024

hughcars Feb 26, 2024

sebastiangrimberg Feb 29, 2024

hughcars Feb 26, 2024

sebastiangrimberg Feb 29, 2024

hughcars Feb 28, 2024

sebastiangrimberg Feb 29, 2024

hughcars Feb 28, 2024

sebastiangrimberg Feb 29, 2024

hughcars left a comment •

edited

Loading

Make use of HYPRE for more linear algebra on GPU #193

Make use of HYPRE for more linear algebra on GPU #193

Conversation

sebastiangrimberg commented Feb 9, 2024

hughcars left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hughcars left a comment • edited Loading

Choose a reason for hiding this comment

hughcars left a comment •

edited

Loading