-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add HYP smearing. Also add 3D/4D version of STOUT and APE smearing. #1426
Conversation
Fix bugs in 3D HYP smearing.
@cpviolator any chance you can review this? |
@Jenkins test this please |
@maddyscientist Sure thing! |
to get the default tol for SU(3) projection.
This is excellent work! I spent some time revising the HYP algorithm and your implementation here in QUDA is very much in line with QUDA coding style. I very much like the implementation of arbitrary orthogonal dimension for smearing. I tried to find ways to reduce code repetition by using the existing staple computation, but the differences between APE style and Stout style means there's very little overlap in the code. The only code change I would like to request for the moment is that you place the templates with name structure Before merging with develop I'd like to do a performance test of the kernels to see if we're not missing any obvious speed-up we can take advantage of with minimal code changes. I'll post here with the results. |
OK, using the command: for SMEAR in ape stout ovrimp-stout hyp wilson symanzik ; do
./su3_test --verbosity verbose --su3-smear-type ${SMEAR} --verify false --dim 32 32 32 32 --prec double --su3-smear-steps 5 > log_${SMEAR}_perf.log
wait
./su3_test --verbosity verbose --su3-smear-type ${SMEAR} --verify false --dim 32 32 32 32 --prec double --su3-smear-steps 500 > log_${SMEAR}_WC.log
wait
done I collected the following data:
So I think there may be some improvements for the future, but certainly nothing holding up a PR. It may require a slightly different approach to kernel launches and memory transfer, or something as simple as revising the flop/byte count in statistics. Either way, this is a great addition to the QUDA library! |
@cpviolator Thank you for the testing. I'll check the performance issue. |
The performance regression comes from (using
If I use Please let me know if you have any ideas! |
I had the same I idea and I've flattened out the loops as best as I can, but now I'm chasing a numerical error. @SaltyChiang may I ask that you join us on slack? @maddyscientist can arrange the invite if so. |
@SaltyChiang can you share the Chroma code that you are using to call this? It would be great to have an example of how to call the smearing routines in QUDA with Chroma. Thanks. |
@maddyscientist Oh, I don't use Chroma to call the smearing routines in QUDA.
The checking follows the steps:
|
@SaltyChiang there are a few convention changes I want to put into your PR, plus reconcile the merge conflicts and perform a |
@weinbe2 Sure, this sounds good to me. Glad to hear some progress on the performance issue. |
…gister count getting maxed out
Using the same command as Dean in my branch---on a different GPU (A100-80GB-PCIe), I got:
HYP time is now ~66% of Symanzik time as opposed to ~75%, but it's hard to say if that's due to optimization or because A100 has a larger cache... in any case, it's relatively faster. |
@SaltyChiang I'm done with my updates---can you please merge my branch ( Assuming it passes your Chroma comparison and everything else looks good, I imagine we can get this merged in. |
@weinbe2 I checked the performance again before/after merging your additions on Tesla P100.
And yes, the new code returns the same result as Chroma. Again, they differ by 1e-6~1e-7, which comes from the different SU(3) projection algorithms. I plan to implement a similar projection as Chroma in quda just for checking, do you think this is acceptable to push such a thing upstream? @cpviolator Do you have any further ideas about the performance issue? |
Thanks @SaltyChiang , I'm glad it passes the correctness check on your end. I'd like to get this merged in sooner as opposed to later, further optimizations can go into a future PR. As for the Chroma algorithm---I like the idea of checking it locally, and if you see a deviation file after that please file an issue and we can go from there. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved (pending Jenkins), we'll still need a second review before a merge which @maddyscientist promised she'd get to soon.
@Jenkins ok to test @SaltyChiang I expect once we fix #1436, this will give a boost to double-precision performance of all the smearing routines that use SU(3) projection. We'll handle this in a follow up PR though. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving, subject to CI completing successfully.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
approve cmake changes
@SaltyChiang I had to put in a few quick updates to address the CI issues---can you please merge my branch back into yours again? |
@SaltyChiang @weinbe2 I think these improvements are great and an excellent contribution! As we can all see from the algorithm, there are some unavoidable aspects when it comes to memory that prohibit HYP from performing as well as STOUT based algorithms or vanilla APE. |
They might not happy with implicit conversion to `int2`.
@weinbe2 @maddyscientist Done with merge. The default value of |
It might be best to preserve that behavior, yes. Thoughts @cpviolator ? You're the real-world smearing workflow expert here |
@weinbe2 yeah, I think that most users of non analytical smearing in QUDA are using it for spectroscopic measurement so defaulting to excluding the temporal dimension would be best in my opinion. A wiki addition on this behaviour would be good. |
Thanks @cpviolator --- @SaltyChiang , you should change the default to 3 per that recommendation. Creating a wiki page can be done orthogonal to this PR. |
3D for APE/STOUT and 4D for OVRIMP_STOUT/HYP.
@weinbe2 @cpviolator Change the default value of |
Use
QudaGaugeSmearParam::dir_ignore
to tell QUDA not to perform smearing in this direction.I've checked the result with Chroma, which looks good. There are some residuals caused by different SU(3) projection implementations.
We might should set different default values of
dir_ignore
for different smearing types to make the interface compatible with before.