-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make use of HYPRE for more linear algebra on GPU #193
Conversation
c614668
to
d1567c8
Compare
f6ea580
to
2b7f73b
Compare
d1567c8
to
0700248
Compare
2b7f73b
to
35cad58
Compare
bf79e54
to
dd42790
Compare
4798f29
to
de2db6f
Compare
23a099a
to
4e1010f
Compare
41491c7
to
d70f023
Compare
4e1010f
to
93f7f6b
Compare
d70f023
to
6c7d418
Compare
9c6e91d
to
9115ebc
Compare
93f7f6b
to
a6eb633
Compare
74e35d9
to
83f60dd
Compare
a228960
to
9e002ef
Compare
4962970
to
1ed1123
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modulo the mac m1 and threading issue with SuperLU, which I agree should just be a bug report for SuperLU, lgtm.
pr.UseDevice(false); | ||
pi.UseDevice(false); | ||
pr.Assemble(); | ||
pi.Assemble(); | ||
pr.UseDevice(true); | ||
pi.UseDevice(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if I am understanding correctly, this is performing the assembly on host and then transferring the assembled operators to device?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The flag set by UseDevice
is just an execution flag, rather than a memory location flag. So this is just a minor optimization so that operations like pr = 0.0
inside of LinearForm::Assemble
happen on the host since we know assembly is going to happen on the host anyway. No need to move the Vector
to the device only to pull it back after setting it equal to a value.
namespace | ||
{ | ||
|
||
static HypreVector X, Y; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sure you've considered this, but for posterity, using static
like this is going to be thread safe because these are basically just pointers to hypre
variables, and hypre
is handling the threads itself yes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's expected that you would not call any of the HypreCSRMatrix
routines which use these variables from multiple threads simultaneously. So I think the answer to your question is no this isn't thread-safe but it doesn't have to be based on the expected use case.
@@ -49,15 +49,9 @@ BilinearForm::PartialAssemble(const FiniteElementSpace &trial_fespace, | |||
// This should work fine if some threads create an empty operator (no elements or boundary | |||
// elements). | |||
const std::size_t nt = ceed::internal::GetCeedObjects().size(); | |||
PalacePragmaOmp(parallel for schedule(static)) | |||
for (std::size_t i = 0; i < nt; i++) | |||
PalacePragmaOmp(parallel if (nt > 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This new optional omp
looping, this is to remove some overhead of starting an omp loop if there's only 1 thread correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overhead yes but also in the case where nt = 1
but OMP_NUM_THREADS > 1
(running on GPU), we need to make sure this is serial.
palace/linalg/vector.cpp
Outdated
void ComplexVector::MakeRef(Vector &y, int offset, int size) | ||
{ | ||
Set(py, n, on_dev); | ||
MFEM_ASSERT(y.Size() <= 2 * size, | ||
"Insufficient storage for ComplexVector alias reference of the given size!"); | ||
data.MakeRef(y, offset, 2 * size); | ||
xr.MakeRef(data, offset, size); | ||
xi.MakeRef(data, offset + size, size); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm getting this correctly, this is MakeRef
is to treat a vector as a ComplexVector, stacked [xr;xi]
, starting at offset
, with size 2*size
. Then I think the assertion should be offset + 2 *size <= y.Size()
, to check the accessed block isn't going off the end of y
. As is, offset
greater than y.Size()
, wouldn't be triggered, and I think if size == 10
and y.size() == 1
, this would also go off the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct, this was actually fixed in ff7e163.
palace/fem/libceed/restriction.cpp
Outdated
{ | ||
nnz++; | ||
if (k < j - 1 && k > j + 1 && el_trans_j(k) != 0.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't that be (k < j - 1 || k > j + 1) && el_trans_j(k) != 0.0
? k < j - 1 && k > j + 1
should return false for any k
, j
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, good catch. This is an old bug in a debug check.
19f9618
to
36b3cfb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved as part of testing on #204
8132f54
to
b802a98
Compare
36b3cfb
to
4932816
Compare
…d dot product improvements, and quadrature data assembly (now always disabled)
This allows OpenMP and better GPU acceleration for ceed::Operator full assembly into a CSR matrix using Hypre's functionality.
… elimination is intended only for square matrices)
4932816
to
c2c5ae9
Compare
NOTE: Builds on #184
hypre_CSRMatrix
data structure instead ofmfem::SparseMatrix
to speed up CPU-based CSR assembly with OpenMP