Plans for new sparse compilation backend #618
Replies: 5 comments 16 replies
-
I'm interested in the compilation backend library. Would this be an independently usable library? If it does not use numba then would it still be based on llvmlite or llvm or will it be an alternative to those somehow? I would be interested to use something like llvmlite but that can construct and compile the IR directly without the need to generate and parse the code as text. |
Beta Was this translation helpful? Give feedback.
-
One thing that would be quite nice is to ensure that in PyData Sparse + runtime dependencies (which may be only the new backend) there is only use of the CPython limited C API. That would improve the packaging a lot (no issues with new / not-yet-released Python versions), and should be feasible. Somewhere there must be some compiled code because function calls will go from Python to some compiled target, but there is probably no reason that that would need anything special from the Python C API. May be an additional goal? |
Beta Was this translation helpful? Give feedback.
-
To what extent are y'all interested in specifying a purely C ABI for describing different kinds of sparse data and graphs along with how they can compose together? Given that Taco historically has been focused on source generation, I can't tell what (if anything) it uses as a runtime data description ABI. It certainly seems to be providing a compute engine for this kind of data though. This functionality is present in numpy for dense arrays, but it's a bit entangled in the C API. PEP 3118 is more condensed. I did some prototyping work on separating the underlying ABI out of libdynd and checking that it could support graphs with arbitrary node data back in 2020. The underlying ABI is one of the more innovative parts of that library. Ideally the ABI design there would be able to serve as a metaobject protocol for both dense, sparse data, and compositions of the two. I can put together a writeup on what I learned in the process if that'd be helpful. I just wanted to check whether it'd even be of interest before spending time writing that up. |
Beta Was this translation helpful? Give feedback.
-
I'm very glad to see that MLIR is under consideration. The sparse_tensor dialect has some amazing capabilities for code generation of n-dimensional sparse objects. While writing an MLIR-based version of the GraphBLAS spec for Python, I found the Python MLIR bindings very easy to work with for code generation. As an example, here is my implementation of applying a binary operation to overlapping entries in 2d sparse matrices. This generates efficient iteration regardless of the orientation of the two incoming matrices (CSR x CSR, CSR x CSC, etc). The magic all happens in the MLIR |
Beta Was this translation helpful? Give feedback.
-
Hey all, I’m very curious about the compiled approach for sparse operations. I’m a maintainer of SimPEG, an open source library for geophysical simulation and inversion, and we make heavy use of sparse operations, solving numerical PDE’s. We construct everything piecewise internally (I.e. we create gradient, divergence and curl matrices, mass matrices dependent on properties, interpolation matrices afterwards, etc.) I imagine we could make significant use out of the operation of fusing the sparse operations together. Is there anywhere I can look to for the progress on this? |
Beta Was this translation helpful? Give feedback.
-
The stated goal for
sparse
is to provide a NumPy-like API with a sparse representation of arrays. To this end, Quansight and I have been collaborating with researchers at MIT CSAIL - in particular Prof. Amarasinge's group and the TACO team - to develop a performant and production-ready package for N-dimensional sparse arrays. There were several attempts made to explore this over the last couple of years, including a LLVM back-end for TACO, and a pure-C++ template-metaprogramming approach called XSparse.To this end, we, at Quansight, are happy to announce that we have received funding from DARPA, together with our partners from MIT, under their Small Business Innovation Research (SBIR) program to build out
sparse
using state-of-the-art just-in-time compilation strategies to boost performance for users. Additionally, as an interface, we'll adopt the Array API standard which was championed by major libraries like NumPy, PyTorch and CuPy.The key differentiator for this library will be N-dimensional sparse array support, including operations that mix sparse and dense operands across common architectures. Users will see a number of new interfaces. We are considering methods to introduce JIT compilation to the user in a gentle fashion, while minimizing impact to existing key users of this library. Please leave a comment if you have thoughts on the plans mentioned below.
Goals of the Project
Array API support
A big goal for this library will be to achieve (near-)complete support of the Array API standard. This will allow many library users, such as SciPy and scikit-learn, to have library-neutral versions of their algorithms automagically work with
sparse
. It will also help those desiring to work with sparse arrays have access to a familiar interface.Previously, much of this functionality was available via the
__array_function__
or__array_ufunc__
protocol from thenumpy
namespace when called with sparse inputs. However, this meant that it didn't show up in the API documentation and was not discoverable. These functions will now be added to the main namespace, as well.Note that SciPy's sparse arrays also aim for array API compatibility (1-D/2-D only), see this discussion, which also goes in depth on how applicable the array API standard is to sparse arrays.
Stable in-memory representation
We intend to target a stable in-memory representation of common formats (most notably CSR/CSC) so that interoperability with
scipy.sparse.linalg
andscipy.sparse.csgraph
is guaranteed to be zero-copy whenever possible.Ecosystem integration
We plan to integrate more tightly with Dask, SciPy, CuPy, and other popular libraries using established protocols.
GPU support
We plan to add support for NVIDIA GPUs at a minimum in our compilation back-end. Additionally, we plan interoperability with
cupy
andcupyx
.API Design Alternatives for JIT Compilation and Lazy Execution
This section is intended to put some ideas out there and get some early feedback from the community on which pattern they prefer to work with, while minimizing impact on existing users.
Need for a sparse compilation back-end
There are asymptotic benefits to fusing kernels for sparse array operations. There is a lot of work done in this field; I would suggest watching this YouTube video of a talk presented at the ACM. However, one cannot fuse kernels in an eager computation mode. We propose to use a task graph-based approach, which will compile a kernel and materialise the array in a just-in-time fashion.
Due to Numba relying on Python bytecode, it has historically been late to support new Python releases. For this reason, we will work with the team at MIT CSAIL to write our own compilational back-end library and drop Numba as a dependency, thus allowing us to support newer Python versions much faster.
Dask-like
.compute
patternWe plan on adopting a Dask-like pattern where a computation graph will be built up and calling
.compute(...)
will materialize the underlying array. In addition, there will not be one, but a collection of sparse array types due to the multiplicity of supported formats. These formats will include, at the very least,COO
andDOK
that are already present and used in this library.One reason to choose this pattern is that it allows you flexibility in how the computation is done. For example, a different kernel might be compiled based on the schedule or output format; specifying that in the compute call is simple.
Decorator pattern
Another pattern that could potentially be adopted is a Numba-style decorator pattern, where functions or types could be decorated to enable compilation when called.
The advantage of this format is its simplicity; however, one cannot manipulate the options at the point of the function call, only at the function definition.
Compiled-version pattern
Yer another pattern that could be adopted is a PyTorch-like option where the compiled version of the function is used separately from the function itself.
This is functionally equivalent to the
.compute
pattern and which to choose is a matter of preference.These three patterns are by no means mutually exclusive. For example, it would be relatively simple to support both the second and third option at the same time, with the first being reserved as an internal API used for building these options.
Beta Was this translation helpful? Give feedback.
All reactions