-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make it optional to build CUDA extension for SAM 2; also fallback to all available kernels if Flash Attention fails #155
Conversation
509f0b1
to
268ad1c
Compare
8522a19
to
6943cf6
Compare
Do you think that Kornia like pure pytorch connected components will be too much numerically misaligned? |
@bhack Thanks for the suggestion! We have also tried this kornia implementation before, but it was too slow for video applications (as it's using an iteration loop in Python and its algorithm has not been carefully optimized for GPUs), so we added a custom CUDA kernel in connected_components.cu instead, which is much faster. |
…all available kernels if Flash Attention fails In this PR, we make it optional to build the SAM 2 CUDA extension, in observation that many users encounter difficulties with the CUDA compilation step. 1. During installation, we catch build errors and print a warning message. We also allow explicitly turning off the CUDA extension building with `SAM2_BUILD_CUDA=0`. 2. At runtime, we catch CUDA kernel errors from connected components and print a warning on skipping the post processing step. We also fall back to the all available kernels if the Flash Attention kernel fails.
6943cf6
to
1757177
Compare
Yes I know that it has loops. It is not easy to implement with pytorch ops. Have you benchmarked how is the pytorch compiler behaving with these loops? |
Quite funny that... |
@bhack In our internal benchmarking, the custom CUDA kernel is much (~100x) faster than the kornia implementation even if we try to optimize the latter (e.g. via torch compilation). Another user also reported similar observations (prittt/YACCLAB#28 (comment)). |
… into facebookresearch-main * 'main' of github.com:facebookresearch/segment-anything-2: (40 commits) open `README.md` with unicode (to support Hugging Face emoji); fix various typos (facebookresearch#218) accept kwargs in auto_mask_generator Fix HF image predictor improving warning message and adding further tips for installation (facebookresearch#204) better support for non-CUDA devices (CPU, MPS) (facebookresearch#192) Update hieradet.py add Colab support to the notebooks; pack config files in `sam2_configs` package during installation (facebookresearch#176) also catch errors during installation in case `CUDAExtension` cannot be loaded (facebookresearch#175) Add interface for box prompt in SAM 2 video predictor (facebookresearch#174) Address comment Update hieradet.py Update docstrings Revert code snippet Updated INSTALL.md with CUDA_HOME-related troubleshooting (facebookresearch#140) Format using ufmt Update INSTALL.md (facebookresearch#156) Update README Make it optional to build CUDA extension for SAM 2; also fallback to all available kernels if Flash Attention fails (facebookresearch#155) Clean up Address comment ...
…all available kernels if Flash Attention fails (facebookresearch#155) In this PR, we make it optional to build the SAM 2 CUDA extension, in observation that many users encounter difficulties with the CUDA compilation step. 1. During installation, we catch build errors and print a warning message. We also allow explicitly turning off the CUDA extension building with `SAM2_BUILD_CUDA=0`. 2. At runtime, we catch CUDA kernel errors from connected components and print a warning on skipping the post processing step. We also fall back to the all available kernels if the Flash Attention kernel fails.
In this PR, we make it optional to build the SAM 2 CUDA extension, in observation that many users encounter difficulties with the CUDA compilation step.
SAM2_BUILD_CUDA=0
.We also fall back to the all available kernels if the Flash Attention kernel fails.