-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Half dtype and mixed precision training. #77
base: master
Are you sure you want to change the base?
Conversation
Amazing!!! Thank you so much for contributing, this is a really needed feature. Tagging @CCInc so that he can take a look as well. |
@maskjp Thanks for the contribution! I'm curious which version of pytorch you compiled against, did they change the tensor namespace to I'm guessing the models use ~50% the memory compared to full precision? Did you notice any training speed increase too? |
@nicolas-chaulet we should add precommit.ci to this repo, so we can ensure consistent formatting on PRs, what do you think? Also, maybe we could add a pytorch version matrix to unittesting? Maybe 1.7.0 to latest? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks good at first glance! I will look at it closer later on, in the meantime can you clean up all the comments and install/run pre-commit for the code formatting?
cuda/src/ball_query_gpu.cu
Outdated
int b = xyz.size(0); | ||
int n = xyz.size(1); | ||
int m = new_xyz.size(1); | ||
torch::Tensor idx = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can use auto
on all tensor types I think, to make it a little cleaner
cuda/src/interpolate.cpp
Outdated
|
||
// three_nn_kernel_wrapper(unknowns.size(0), unknowns.size(1), knows.size(1), | ||
// unknowns.DATA_PTR<float>(), knows.DATA_PTR<float>(), | ||
// dist2.DATA_PTR<float>(), idx.DATA_PTR<int>()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can delete all the commented lines from the files I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @CCInc,
I removed the comments and made the changes.
Hi,@CCInc, I used torch1.8.1+cu111 to compile it. About the tensor namespace, Yes, the modification saves the memory, but I didn't see the training speed increase too. The speed also depends on the model architecture, ops, and io. |
Looks good to me, @CCInc could you please verify that the gpu tests pass on your machine? I don't have access to gpus anymore... Thanks! And yes to pre-commit ci! |
@maskjp I'm getting an issue with the testing, it seems like the cpu and gpu fps are not matching up?
|
@CCInc , Yes, test_gpu in test_fps.py file is created by me. The original test didn't test the GPU version of fps. I found that the output of the CPU and GPU versions are different even before my modification. |
That would make sense, since the seed point might be different. In that
case, the test should be modified so that it passes while still checking
for some level of correctness. Having test that fails can be a bit
confusing.
Le mar. 3 août 2021 à 02:31, maskjp ***@***.***> a écrit :
… @CCInc <https://github.com/CCInc> , Yes, test_gpu in test_fps.py file is
created by me. The original test didn't test the GPU version of fps. I
found that the output of the CPU and GPU versions are different even before
my modification.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#77 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAYH57O4EZZ4AK5CEZPN35DT25BH5ANCNFSM5BCTEFGQ>
.
|
Thanks for clarifying! In addition to what Nicolas suggested, would you mind also add a test that verifies the functionality of cuda fps on its own, similar to The rest of the PR works well! |
Hi,
Thank you for this great library and torch-points3d.
I made some modifications to support mixed-precision training.
The major changes are as follows:
AT_DISPATCH_FLOATING_TYPES
toAT_DISPATCH_FLOATING_TYPES_AND_HALF
;atomicAdd
togpuAtomicAdd
(in pytorch THCAtomics.cuh);custom_fwd
andcustom_bwd
intorch.autograd.Function
to allowautocast
;The modified version passed the tests in the test folder. I didn't see affection on full-precision training. And I tried to train the PointNet2 model in the torch-point3d library in a mixed-precision style, it works.