-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/mrhs misc #1515
base: develop
Are you sure you want to change the base?
Feature/mrhs misc #1515
Conversation
…ocation overheads when memory pool is disabled
This reverts commit 8f72560.
…imization can optimize out infinities rendering the use of heterogenous atomics with an inifinity sentinal. As an alternative we can use negative zero as the sentinal. Default remains infinity, disable with QUDA_HETEROGENEOUS_ATOMIC_INF_INIT=OFF
…nvc++ allowing for the removal of the WARs deployed previously
…PrintMatrix routine. Apply same patch to genericPrintVector for future proofing
@@ -104,13 +103,18 @@ namespace quda | |||
if (doHalo<kernel_type>(d) && ghost) { | |||
const int ghost_idx = ghostFaceIndexStaggered<1>(coord, arg.dim, d, 1); | |||
const Link U = arg.improved ? arg.U(d, coord.x_cb, parity) : arg.U(d, coord.x_cb, parity, StaggeredPhase(coord, d, +1, arg)); | |||
Vector in = arg.halo.Ghost(d, 1, ghost_idx + src_idx * arg.nFace * arg.dc.ghostFaceCB[d], their_spinor_parity); | |||
out = mv_add(U, in, out); | |||
for (auto s = 0; s < n_src_tile; s++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should #pragma unroll
be added here? Although I would assume the compiler does this by itself already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably a good idea. I'll add that on next push.
This PR is a bit of a catch all
QUDA_MAX_MULTI_RHS_TILE
, with the default left at size 1 for now.FieldTmp
now supports creating temporaries using parameters as opposed another field instanceQUDA_HETEROGENEOUS_ATOMIC_INF_INIT=OFF
). Not a problem by default, but is with latest clang with-Ofast
.printGenericMatrix