Add support for Half dtype and mixed precision training. #77

maskjp · 2021-07-27T16:30:25Z

Hi,

Thank you for this great library and torch-points3d.

I made some modifications to support mixed-precision training.

The major changes are as follows:

template kernel function in interpolate_gpu.cu sampling_gpu.cu and ball_query_gpu.cu.
Change AT_DISPATCH_FLOATING_TYPES to AT_DISPATCH_FLOATING_TYPES_AND_HALF;
Change atomicAdd to gpuAtomicAdd (in pytorch THCAtomics.cuh);
Add custom_fwd and custom_bwd in torch.autograd.Function to allow autocast;
fixed bug of sampling_gpu which the first element of idx output is always 0; (But I found that the output of GPU version and CPU version are not the same. I haven't fixed this.)

The modified version passed the tests in the test folder. I didn't see affection on full-precision training. And I tried to train the PointNet2 model in the torch-point3d library in a mixed-precision style, it works.

nicolas-chaulet · 2021-07-28T08:37:22Z

Amazing!!! Thank you so much for contributing, this is a really needed feature. Tagging @CCInc so that he can take a look as well.

clee-ai · 2021-07-28T15:34:52Z

@maskjp Thanks for the contribution!

I'm curious which version of pytorch you compiled against, did they change the tensor namespace to torch at somepoint?

I'm guessing the models use ~50% the memory compared to full precision? Did you notice any training speed increase too?

clee-ai · 2021-07-28T15:35:48Z

@nicolas-chaulet we should add precommit.ci to this repo, so we can ensure consistent formatting on PRs, what do you think? Also, maybe we could add a pytorch version matrix to unittesting? Maybe 1.7.0 to latest?

clee-ai

The code looks good at first glance! I will look at it closer later on, in the meantime can you clean up all the comments and install/run pre-commit for the code formatting?

clee-ai · 2021-07-28T15:38:18Z

cuda/src/ball_query_gpu.cu

+    int b = xyz.size(0);
+    int n = xyz.size(1);
+    int m = new_xyz.size(1);
+    torch::Tensor idx =


we can use auto on all tensor types I think, to make it a little cleaner

clee-ai · 2021-07-28T15:42:45Z

cuda/src/interpolate.cpp

+
+    // three_nn_kernel_wrapper(unknowns.size(0), unknowns.size(1), knows.size(1),
+    //                         unknowns.DATA_PTR<float>(), knows.DATA_PTR<float>(),
+    //                         dist2.DATA_PTR<float>(), idx.DATA_PTR<int>());


we can delete all the commented lines from the files I think

Hi, @CCInc,

I removed the comments and made the changes.

maskjp · 2021-07-28T15:58:53Z

I'm guessing the models use ~50% of the memory compared to full precision? Did you notice any training speed increase too?

Hi,@CCInc,

I used torch1.8.1+cu111 to compile it. About the tensor namespace, torch tensor has a higher level, we can also use at too. I just noticed that in chamfer_dist.cpp, cubic_feature_sampling.cpp the torch namespace are used but interpolate.cpp, sampling.cpp and ball_query.cpp used at tensor. I refer to the toturial of pytorch and torch_geomtrics and decide to use torch tensor.

Yes, the modification saves the memory, but I didn't see the training speed increase too. The speed also depends on the model architecture, ops, and io.

nicolas-chaulet · 2021-07-29T12:27:28Z

Looks good to me, @CCInc could you please verify that the gpu tests pass on your machine? I don't have access to gpus anymore... Thanks! And yes to pre-commit ci!

clee-ai · 2021-08-02T15:01:18Z

@maskjp I'm getting an issue with the testing, it seems like the cpu and gpu fps are not matching up?

FAIL: test_gpu (test.test_fps.TestFps)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/mnt/f/data/PC/torch-points-kernels/test/__init__.py", line 7, in wrapped_func
    return func(*args, **kwargs)
  File "/mnt/f/data/PC/torch-points-kernels/test/test_fps.py", line 35, in test_gpu
    torch.testing.assert_allclose(sorted_idx,sorted_idx_cpu)
  File "/home/chris/miniconda3/envs/tpk/lib/python3.7/site-packages/torch/testing/_core.py", line 270, in assert_allclose
    raise AssertionError(msg)
AssertionError: Found 27 different element(s) (out of 32), with the greatest difference of 63 (82 vs. 19) occuring at index (8, 1).

maskjp · 2021-08-03T01:31:32Z

@CCInc , Yes, test_gpu in test_fps.py file is created by me. The original test didn't test the GPU version of fps. I found that the output of the CPU and GPU versions are different even before my modification.

nicolas-chaulet · 2021-08-03T08:41:19Z

That would make sense, since the seed point might be different. In that case, the test should be modified so that it passes while still checking for some level of correctness. Having test that fails can be a bit confusing. Le mar. 3 août 2021 à 02:31, maskjp ***@***.***> a écrit :

…

@CCInc <https://github.com/CCInc> , Yes, test_gpu in test_fps.py file is created by me. The original test didn't test the GPU version of fps. I found that the output of the CPU and GPU versions are different even before my modification. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#77 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAYH57O4EZZ4AK5CEZPN35DT25BH5ANCNFSM5BCTEFGQ> .

clee-ai · 2021-08-03T13:52:20Z

Thanks for clarifying! In addition to what Nicolas suggested, would you mind also add a test that verifies the functionality of cuda fps on its own, similar to test_simplecpu?

The rest of the PR works well!

maskjptamu added 7 commits July 26, 2021 15:39

tempolate interpolate

c1dd124

add Half support

0bd1cce

add ball query half support

d8dfc99

add sampling half support

7d8e0df

add lib root

e5935c1

add custom_fwd custom_bwd for half

b85a4dc

change half uppder bounds

8423a3a

clee-ai reviewed Jul 28, 2021

View reviewed changes

maskjptamu added 2 commits July 28, 2021 20:05

clean comments

b6d1428

use auto to replace torch::tensor

7d598d6

nicolas-chaulet requested a review from clee-ai August 2, 2021 09:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for Half dtype and mixed precision training. #77

Add support for Half dtype and mixed precision training. #77

maskjp commented Jul 27, 2021

nicolas-chaulet commented Jul 28, 2021

clee-ai commented Jul 28, 2021

clee-ai commented Jul 28, 2021 •

edited

Loading

clee-ai left a comment

clee-ai Jul 28, 2021

clee-ai Jul 28, 2021

maskjp Jul 29, 2021

maskjp commented Jul 28, 2021

nicolas-chaulet commented Jul 29, 2021

clee-ai commented Aug 2, 2021 •

edited

Loading

maskjp commented Aug 3, 2021

nicolas-chaulet commented Aug 3, 2021 via email

clee-ai commented Aug 3, 2021

Add support for Half dtype and mixed precision training. #77

Are you sure you want to change the base?

Add support for Half dtype and mixed precision training. #77

Conversation

maskjp commented Jul 27, 2021

nicolas-chaulet commented Jul 28, 2021

clee-ai commented Jul 28, 2021

clee-ai commented Jul 28, 2021 • edited Loading

clee-ai left a comment

Choose a reason for hiding this comment

clee-ai Jul 28, 2021

Choose a reason for hiding this comment

clee-ai Jul 28, 2021

Choose a reason for hiding this comment

maskjp Jul 29, 2021

Choose a reason for hiding this comment

maskjp commented Jul 28, 2021

nicolas-chaulet commented Jul 29, 2021

clee-ai commented Aug 2, 2021 • edited Loading

maskjp commented Aug 3, 2021

nicolas-chaulet commented Aug 3, 2021 via email

clee-ai commented Aug 3, 2021

clee-ai commented Jul 28, 2021 •

edited

Loading

clee-ai commented Aug 2, 2021 •

edited

Loading