-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Out of bounds error when running multi-GPU/partitioned HISQ MG with long links dropped #1512
Labels
Comments
When running with
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
In brief, there is an oob error when running HISQ MG with long links dropped, though it can be triggered without ever dropping to a true coarser level. It only appears with non-zero partitioning; I haven't tested if running it with true multi-GPU is fine or not. There are no issues when "normal" HISQ MG is run (improved staggered on the pseudo-fine level as well), suggesting that something is going awry with switching between the improved staggered (outer level) and unimproved staggered (inner level) operators.
The error does not hit until the first solve, i.e. after setup as completed. It more specifically triggers when returning to the fine level from the pseudo-fine level, aka when going to applying the improved operator from the unimproved operator. The time at which it hits (when it does) depends on the local volume---no error on ~16^4, but it hits on the first iteration on ~24^4+. It does seem to be deterministic at fixed command incl volume, at least.
This error hits independent of if tuning is enabled or not.
A command that triggers it is as follows:
This is roughly trimmed down as much as possible, the various combinations of
mat
anddirect
are non-default but required for HISQ MG as is currently implemented. As noted above you never actually need to enter a true coarse solve to trigger the error, but you do still need to compile withNc = 24
for the KD-operator construction.A representative error message is:
My cmake command was:
The text was updated successfully, but these errors were encountered: