Updating Healpix CUDA primitive #290

ASKabalan · 2025-03-26T16:23:14Z

Adding a few updates

Updating to the newest custom call API (API 4) using FFI
implementing a grad rule for healpix cuda FFT
Implementing a Batching rule

A batching rule seems to be very important for two things
Being able to jacrev/ jacfwd
and because in most cases .. the size of a healpix map can fit on a single GPU but sometimes we want to batch the spherical transform

I will be doing that next

ASKabalan · 2025-03-28T16:16:58Z

Hello @matt-graham @jasonmcewen @CosmoMatt

Just a quick PR to wrap up a few stuff

Updated the binding API to the newest FFI
Added a vmap implementation of the cuda primitive
Added a transpose rule which allows jacfwd and jacrev (consequently grad aswell)
added more tests https://github.com/astro-informatics/s2fft/blob/ASKabalan/tests/test_healpix_ffts.py#L100
Removed two files which are now no longer needed with the FFI API (kernel helpers) (so maybe they should be removed from the license section)
Constrained nanobind to be nanobind >=2.0,<2.6" because of a regression [BUG]: Regression when using scikit build tools and nanobind wjakob/nanobind#982

And finally I added cudastreamhandler which is used to split the XLA provided stream for the VMAP lowering (this header is my own work)

There is an issue with building pyssht not sure that this is my fault

I will check the failing worflows when I get the chance, but in the meantime a review is appreciated

matt-graham

Hello @matt-graham @jasonmcewen @CosmoMatt

Just a quick PR to wrap up a few stuff
1. Updated the binding API to the newest [FFI](https://docs.jax.dev/en/latest/ffi.html)

2. Added a vmap implementation of the cuda primitive

3. Added a transpose rule which allows jacfwd and jacrev (consequently grad aswell)

4. added more tests https://github.com/astro-informatics/s2fft/blob/ASKabalan/tests/test_healpix_ffts.py#L100

5. Removed two files which are now no longer needed with the FFI API ([kernel helpers](https://github.com/astro-informatics/s2fft/blob/main/lib/include/kernel_helpers.h)) (so maybe they should be removed from the license section)

6. Constrained nanobind to be nanobind >=2.0,<2.6" because of a regression [[BUG]: Regression when using scikit build tools and nanobind wjakob/nanobind#982](https://github.com/wjakob/nanobind/issues/982)
And finally I added cudastreamhandler which is used to split the XLA provided stream for the VMAP lowering (this header is my own work)

There is an issue with building pyssht not sure that this is my fault

I will check the failing worflows when I get the chance, but in the meantime a review is appreciated

Hi @ASKabalan, sorry for the delay in getting back to you.

This all sounds great - thanks for picking up #237 in particular and for the updates to use the newer FFI interface.

With regards to the failing workflows - this was probably due to #292 which was fixed in #293. If you merge in latest main here that should hopefully resolve the upstream dependency build problems that were causing the test workflows to fail.

I've added some initial review comments below. Will have a closer look next week and try testing this out, but don't have access to GPU machine atm.

tests/test_healpix_ffts.py

matt-graham · 2025-04-11T17:06:56Z

tests/test_healpix_ffts.py

+        flm_hp = samples.flm_2d_to_hp(flm, L)
+        f = hp.sphtfunc.alm2map(flm_hp, nside, lmax=L - 1)


I think we could use s2fft.inverse(flm, L=L, reality=False, method="jax", sampling="healpix") here instead of going via healpy? Rationale being that I would have a slight preference for minimising the number of additional tests that depend on healpy as it we are no longer requiring it as direct dependency for package and in the long run it might be possible to also remove it as a test dependency.

tests/test_healpix_ffts.py

s2fft/utils/healpix_ffts.py

Co-authored-by: Matt Graham <[email protected]>

matt-graham · 2025-04-16T10:45:25Z

I've tried building, installing and running this on a system with CUDA 12.6 + a NVIDIA A100, and running the HEALPix FFT tests with

pytest tests/test_healpix_ffts.py

consistently the tests hang when trying to run the first test_healpix_fft_cuda instance.

Running just the IFFT tests with

pytest tests/test_healpix_ffts.py::test_healpix_ifft_cuda

the tests for both set of test parameters pass.

Trying to dig into this a bit, running the following locally

import healpy
import jax
import s2fft
import numpy

jax.config.update("jax_enable_x64", True)

seed = 20250416
nside = 4
L = 2 * nside
reality = False

rng = numpy.random.default_rng(seed)
flm = s2fft.utils.signal_generator.generate_flm(rng=rng, L=L, reality=reality)
flm_hp = s2fft.sampling.s2_samples.flm_2d_to_hp(flm, L)
f = healpy.sphtfunc.alm2map(flm_hp, nside, lmax=L - 1)
flm_cuda = s2fft.utils.healpix_ffts.healpix_fft_cuda(f=f, L=L, nside=nside, reality=reality).block_until_ready()

raises an error

jaxlib.xla_extension.XlaRuntimeError: INTERNAL: CUDA error: : CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered

so it looks like there is some memory addressing issue somewhere in the healpix_fft_cuda implementation?

ASKabalan · 2025-04-18T09:09:59Z

Thank you

I was able to reproduce with 12.4.1 but not locally with 12.4

I will take a look

ASKabalan added 4 commits March 26, 2025 17:18

Update JAX Binding to use FFI

2fd3c8a

Update JAX Primitive to accept is_linear

2b591ca

Update healpix_ffts to use new FFI lowered cuda healpix ffts

8fe86c2

Update benchmarks

933ac2a

ASKabalan marked this pull request as draft March 26, 2025 16:25

ASKabalan added 5 commits March 28, 2025 16:54

Update Pyproject.toml and build to include FFI headers

e2cc68c

Implement VMAP and transpose rules for cuda primitive

b5cbeac

Update JAX binding layer

9e0f121

add vmap jacrev and jacfwd tests

92fe6a0

Fix build without CUDA NVCC

a70b262

ASKabalan marked this pull request as ready for review March 28, 2025 16:08

ASKabalan requested review from matt-graham and CosmoMatt March 28, 2025 16:08

ASKabalan mentioned this pull request Mar 31, 2025

Check autodiff and batching support for healpix_fft_cuda primitive and add if needed #237

Open

matt-graham reviewed Apr 11, 2025

View reviewed changes

ASKabalan and others added 2 commits April 16, 2025 11:17

Implement requested changes

0e03787

Update tests/test_healpix_ffts.py

6f6c07e

Co-authored-by: Matt Graham <[email protected]>

matt-graham linked an issue Apr 16, 2025 that may be closed by this pull request

Check autodiff and batching support for healpix_fft_cuda primitive and add if needed #237

Open

matt-graham mentioned this pull request Apr 23, 2025

Tests failing when running with JAX v0.6.0 due to breaking changes #299

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updating Healpix CUDA primitive #290

Updating Healpix CUDA primitive #290

ASKabalan commented Mar 26, 2025 •

edited

Loading

ASKabalan commented Mar 28, 2025

matt-graham left a comment

matt-graham Apr 11, 2025

matt-graham commented Apr 16, 2025

ASKabalan commented Apr 18, 2025

		flm_hp = samples.flm_2d_to_hp(flm, L)
		f = hp.sphtfunc.alm2map(flm_hp, nside, lmax=L - 1)

Updating Healpix CUDA primitive #290

Are you sure you want to change the base?

Updating Healpix CUDA primitive #290

Conversation

ASKabalan commented Mar 26, 2025 • edited Loading

ASKabalan commented Mar 28, 2025

matt-graham left a comment

Choose a reason for hiding this comment

matt-graham Apr 11, 2025

Choose a reason for hiding this comment

matt-graham commented Apr 16, 2025

ASKabalan commented Apr 18, 2025

ASKabalan commented Mar 26, 2025 •

edited

Loading