Plans for new sparse compilation backend #618

hameerabbasi · 2024-01-03T13:22:49Z

hameerabbasi
Jan 3, 2024
Maintainer

The stated goal for sparse is to provide a NumPy-like API with a sparse representation of arrays. To this end, Quansight and I have been collaborating with researchers at MIT CSAIL - in particular Prof. Amarasinge's group and the TACO team - to develop a performant and production-ready package for N-dimensional sparse arrays. There were several attempts made to explore this over the last couple of years, including a LLVM back-end for TACO, and a pure-C++ template-metaprogramming approach called XSparse.

To this end, we, at Quansight, are happy to announce that we have received funding from DARPA, together with our partners from MIT, under their Small Business Innovation Research (SBIR) program to build out sparse using state-of-the-art just-in-time compilation strategies to boost performance for users. Additionally, as an interface, we'll adopt the Array API standard which was championed by major libraries like NumPy, PyTorch and CuPy.

The key differentiator for this library will be N-dimensional sparse array support, including operations that mix sparse and dense operands across common architectures. Users will see a number of new interfaces. We are considering methods to introduce JIT compilation to the user in a gentle fashion, while minimizing impact to existing key users of this library. Please leave a comment if you have thoughts on the plans mentioned below.

Goals of the Project

Array API support

A big goal for this library will be to achieve (near-)complete support of the Array API standard. This will allow many library users, such as SciPy and scikit-learn, to have library-neutral versions of their algorithms automagically work with sparse. It will also help those desiring to work with sparse arrays have access to a familiar interface.

Previously, much of this functionality was available via the __array_function__ or __array_ufunc__ protocol from the numpy namespace when called with sparse inputs. However, this meant that it didn't show up in the API documentation and was not discoverable. These functions will now be added to the main namespace, as well.

Note that SciPy's sparse arrays also aim for array API compatibility (1-D/2-D only), see this discussion, which also goes in depth on how applicable the array API standard is to sparse arrays.

Stable in-memory representation

We intend to target a stable in-memory representation of common formats (most notably CSR/CSC) so that interoperability with scipy.sparse.linalg and scipy.sparse.csgraph is guaranteed to be zero-copy whenever possible.

Ecosystem integration

We plan to integrate more tightly with Dask, SciPy, CuPy, and other popular libraries using established protocols.

GPU support

We plan to add support for NVIDIA GPUs at a minimum in our compilation back-end. Additionally, we plan interoperability with cupy and cupyx.

API Design Alternatives for JIT Compilation and Lazy Execution

This section is intended to put some ideas out there and get some early feedback from the community on which pattern they prefer to work with, while minimizing impact on existing users.

Need for a sparse compilation back-end

There are asymptotic benefits to fusing kernels for sparse array operations. There is a lot of work done in this field; I would suggest watching this YouTube video of a talk presented at the ACM. However, one cannot fuse kernels in an eager computation mode. We propose to use a task graph-based approach, which will compile a kernel and materialise the array in a just-in-time fashion.

Due to Numba relying on Python bytecode, it has historically been late to support new Python releases. For this reason, we will work with the team at MIT CSAIL to write our own compilational back-end library and drop Numba as a dependency, thus allowing us to support newer Python versions much faster.

Dask-like `.compute` pattern

We plan on adopting a Dask-like pattern where a computation graph will be built up and calling .compute(...) will materialize the underlying array. In addition, there will not be one, but a collection of sparse array types due to the multiplicity of supported formats. These formats will include, at the very least, COO and DOK that are already present and used in this library.

import sparse

# `a`, `b`, `c` are sparse arrays
# `o` holds a reference to these plus a computation graph
o = f(a, b, c)

# `o` is materialized into an array here
o = o.compute(...)

One reason to choose this pattern is that it allows you flexibility in how the computation is done. For example, a different kernel might be compiled based on the schedule or output format; specifying that in the compute call is simple.

Decorator pattern

Another pattern that could potentially be adopted is a Numba-style decorator pattern, where functions or types could be decorated to enable compilation when called.

import sparse

@sparse.compile(...)
def f(a, b, c):
    ...

# `o` is an already-materialized sparse array.
o = f(a, b, c)

The advantage of this format is its simplicity; however, one cannot manipulate the options at the point of the function call, only at the function definition.

Compiled-version pattern

Yer another pattern that could be adopted is a PyTorch-like option where the compiled version of the function is used separately from the function itself.

import sparse

def f(a, b, c):
    ...

# `f_compiled` is a version of `f` with specific compilation options baked in.
f_compiled = sparse.compile(f, ...)
# `o` is an already-materialized sparse array.
# `f_compiled` is actually compiled here because it depends on the input array formats
o = f_compiled(a, b, c)

This is functionally equivalent to the .compute pattern and which to choose is a matter of preference.

These three patterns are by no means mutually exclusive. For example, it would be relatively simple to support both the second and third option at the same time, with the first being reserved as an internal API used for building these options.

oscarbenjamin · 2024-01-03T16:22:26Z

oscarbenjamin
Jan 3, 2024

I'm interested in the compilation backend library.

Would this be an independently usable library?

If it does not use numba then would it still be based on llvmlite or llvm or will it be an alternative to those somehow?

I would be interested to use something like llvmlite but that can construct and compile the IR directly without the need to generate and parse the code as text.

5 replies

hameerabbasi Jan 3, 2024
Maintainer Author

That's certainly a use case we can consider when building the library, but it's possible that it will be an unstable and private interface, at least at first.

The reason behind that is that the more code you need to stabilise, the more time it takes for the developers to design, and there's a higher chance of getting locked into that design, which is something we'd like to avoid if possible.

Long-term, if we find that the interface is something that's unlikely to change, we can circle back and consider making it public.

oscarbenjamin Jan 3, 2024

That's understandable.

In any case will it still be LLVM-based or something else?

alugowski Jan 4, 2024

Pardon my ignorance, but what made you reject Cython and pythran?

hameerabbasi Jan 4, 2024
Maintainer Author

In any case will it still be LLVM-based or something else?

It isn't set in stone at the moment, but MLIR is one of the options.

what made you reject Cython and pythran?

We need to generate code and compile code on the fly, since there is a combinatorial explosion of kernels that can be constructed, many of which offer asymptotic performance benefits over non-fused kernels. Cython and Pythran are centred around writing or compiling a fixed set of kernels, not perform code generation. I'd suggest you watch this talk, it explains the need for a compilation approach.

willow-ahrens Mar 22, 2024
Collaborator

Hi! I'm the author of the Finch tensor compiler, and I'm excited to collaborate with PyData to build a python interface to the Finch compiler for an initial prototype version! Finch targets LLVM, so we can expect the code to be competitive. It's written in Julia, a language with first-class metaprogramming, dynamic compilation, and easy interfacing with Python. Finch supports a wide variety of sparse and structured tensors and programs. Here's a link: https://github.com/willow-ahrens/Finch.jl

To answer OP's second question: One also call Finch directly through its Julia interface.

rgommers · 2024-01-05T13:13:23Z

rgommers
Jan 5, 2024

One thing that would be quite nice is to ensure that in PyData Sparse + runtime dependencies (which may be only the new backend) there is only use of the CPython limited C API. That would improve the packaging a lot (no issues with new / not-yet-released Python versions), and should be feasible. Somewhere there must be some compiled code because function calls will go from Python to some compiled target, but there is probably no reason that that would need anything special from the Python C API.

May be an additional goal?

2 replies

hameerabbasi Jan 8, 2024
Maintainer Author

That sounds like a nice thing to aim for, so that we can avoid as much packaging churn and overhead as possible.

hameerabbasi Nov 11, 2024
Maintainer Author

Circling back to this as a development update: The MLIR Python Bindings use pybind11; which doesn't support the limited API, so we can't do this for the moment.

insertinterestingnamehere · 2024-01-08T23:13:46Z

insertinterestingnamehere
Jan 8, 2024

To what extent are y'all interested in specifying a purely C ABI for describing different kinds of sparse data and graphs along with how they can compose together? Given that Taco historically has been focused on source generation, I can't tell what (if anything) it uses as a runtime data description ABI. It certainly seems to be providing a compute engine for this kind of data though. This functionality is present in numpy for dense arrays, but it's a bit entangled in the C API. PEP 3118 is more condensed. I did some prototyping work on separating the underlying ABI out of libdynd and checking that it could support graphs with arbitrary node data back in 2020. The underlying ABI is one of the more innovative parts of that library. Ideally the ABI design there would be able to serve as a metaobject protocol for both dense, sparse data, and compositions of the two. I can put together a writeup on what I learned in the process if that'd be helpful. I just wanted to check whether it'd even be of interest before spending time writing that up.

6 replies

rgommers Jan 9, 2024

Hi @insertinterestingnamehere, thanks for sharing your thoughts!

a purely C ABI for describing different kinds of sparse data and graphs along with how they can compose together?

I am curious why you are jumping straight to "data + graphs" rather than starting with data only. It's fairly clear to me that it should be possible to target a stable ABI for data, and I think that that will support many interop needs - e.g., "do a bunch of things with PyData Sparse, then call some scikit-learn API which uses scipy.sparse.linalg under the hood, then continue with PyData Sparse functions". It's less clear to me if an ABI for the compute graph will be useful. It may be more efficient in some cases, but if you don't have it then all it means (I think) is that you have to execute a computation before crossing over to another library, rather than keeping everything lazy.

insertinterestingnamehere Jan 9, 2024

I don't have an immediate use-case for this. I'm just trying to forward whatever knowledge I can from a previous project hoping it can be useful elsewhere. Getting a good high-performance interface for sparse matrices and graphs into Python is an incredibly important and useful project and I'm excited to see what can be done here.

Not standardizing the ABI initially totally makes sense. I'll try to write something up and report back on what info I have and you can use whatever you want from there. The next few weeks will be chaotic for me but I'll see what I can do. It's good to know this isn't urgent.

I mentioned graphs, not as compute graphs, but graph data that's being manipulated. I have no idea whether a stable ABI for a compute graphs would be useful or not. Certainly not in the short term. The reason I mentioned graph data though is because the underlying needs are nearly identical to sparse matrices. There's an equivalence between graphs and sparse matrices and the data representations can often be used interchangeably. For example, a sparse matrix of integers is equivalent to a graph with integer node labels and integer node data. This equivalence breaks down a bit when you have dynamic modifications to a graph happening in parallel since you usually want pointers as node labels in that setting, but integer node labels are used in a huge number of use cases for graph computation kernels. Whenever you have a graph with integer node labels there's an equivalent sparse matrix. The storage questions are mostly the same in both cases too. TACO covers the data layout question pretty extensively but there are also more basic questions like (with CSR) whether you want to interleave the node data (sparse matrix entries) with the array of indices in memory or have them stored separately. TACO's nested data layout system for sparse tensors could also be useful for applications involving hypergraphs.

The motivation for my previous project was twofold: exporting a JIT interface for a graph computation library for easy use in Python and providing an easy way to test whether array-of-structs or struct-of-arrays layout would be better for a given graph and computation kernel (a purely C++ use-case). The graph library I was working on was Galois (https://github.com/IntelligentSoftwareSystems/Galois) and later Katana (https://github.com/KatanaGraph/katana), though I had much less to do with the development of Katana. They made some progress on building a JIT interface there by layering a numba interface on-top of the Galois data structures. I'm not sure how far they got with that. @arthurp worked on it longer than I did and I don't know where things ended up. Generally though, Galois is heavily templated but we wanted a Python JIT interface for it. Building a fully dynamic interface for the full range of graph types it supported would have required a dynamic type/layout description system. We never got something that fancy working, but I did spend some time mapping out what it would look like.

hameerabbasi Jan 9, 2024
Maintainer Author

Ah; sorry for the misunderstanding there. I'll admit "compute graphs" and "node graphs" are easily confused. I'll just say that graph algorithms should indeed be possible in a few lines of code; and that support for scipy.sparse.csgraph is indeed planned.

I'll also drop in an earlier paper by the TACO team, which will give an idea of what we'd like to do eventually: https://arxiv.org/abs/2207.13291

rgommers Jan 9, 2024

That all makes sense, thanks!

I'll try to write something up and report back on what info I have and you can use whatever you want from there.

That would be great to see 🙏🏼

willow-ahrens Mar 22, 2024
Collaborator

Since we're talking about sparse ABIs, I just thought I'd link some interesting work underway to specify a binary file format for sparse tensors: https://graphblas.org/binsparse-specification/ (from https://github.com/GraphBLAS/binsparse-specification).

jim22k · 2024-01-16T20:31:13Z

jim22k
Jan 16, 2024

In any case will it still be LLVM-based or something else?

It isn't set in stone at the moment, but MLIR is one of the options.

I'm very glad to see that MLIR is under consideration. The sparse_tensor dialect has some amazing capabilities for code generation of n-dimensional sparse objects.

While writing an MLIR-based version of the GraphBLAS spec for Python, I found the Python MLIR bindings very easy to work with for code generation. As an example, here is my implementation of applying a binary operation to overlapping entries in 2d sparse matrices. This generates efficient iteration regardless of the orientation of the two incoming matrices (CSR x CSR, CSR x CSC, etc). The magic all happens in the MLIR linalg.generic method and is able to build an efficient looping structure for the incoming sparse arrays based on their layout. Even outside of fusing operations, this sort of code generation + JIT at runtime feels like the right approach for n-dimensional sparse.

1 reply

hameerabbasi Jan 17, 2024
Maintainer Author

I'm very glad to see that MLIR is under consideration. The sparse_tensor dialect has some amazing capabilities for code generation of n-dimensional sparse objects.

Thanks for linking us to prior work and example usage, that will certainly help set the direction for this project!

jcapriot · 2024-03-19T17:19:53Z

jcapriot
Mar 19, 2024

Hey all, I’m very curious about the compiled approach for sparse operations. I’m a maintainer of SimPEG, an open source library for geophysical simulation and inversion, and we make heavy use of sparse operations, solving numerical PDE’s. We construct everything piecewise internally (I.e. we create gradient, divergence and curl matrices, mass matrices dependent on properties, interpolation matrices afterwards, etc.) I imagine we could make significant use out of the operation of fusing the sparse operations together. Is there anywhere I can look to for the progress on this?

2 replies

hameerabbasi Mar 21, 2024
Maintainer Author

Hello, most of the development right now is taking place in this repo: https://github.com/willow-ahrens/finch-tensor as well as https://github.com/willow-ahrens/Finch.jl, as a first prototype, we plan to rewrite or port the relevant features to a runtime-free language later.

willow-ahrens Mar 22, 2024
Collaborator

Hi! Thanks for your interest! I'm the author of Finch, and I'm really excited to collaborate with PyData to implement a Python interface to the Finch array compiler. Finch is written in Julia, and compiles to LLVM through Julia, so we think that our first version of the lazy fusion interface will give a decent performance boost. Finch is one of the first sparse array compilers to support this kind of high-level interface fusion, so there's still some interesting work to do as we integrate it!

Once the software becomes more polished, I'll be very interested to see what kind of benefits of fusion we might realize in your application, it sounds like a good fit!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Plans for new sparse compilation backend #618

{{title}}

Replies: 5 comments 16 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Plans for new sparse compilation backend #618

hameerabbasi Jan 3, 2024 Maintainer

Goals of the Project

Array API support

Stable in-memory representation

Ecosystem integration

GPU support

API Design Alternatives for JIT Compilation and Lazy Execution

Need for a sparse compilation back-end

Dask-like .compute pattern

Decorator pattern

Compiled-version pattern

Replies: 5 comments · 16 replies

hameerabbasi Jan 3, 2024 Maintainer Author

hameerabbasi Jan 4, 2024 Maintainer Author

willow-ahrens Mar 22, 2024 Collaborator

hameerabbasi Jan 8, 2024 Maintainer Author

hameerabbasi Nov 11, 2024 Maintainer Author

hameerabbasi Jan 9, 2024 Maintainer Author

willow-ahrens Mar 22, 2024 Collaborator

hameerabbasi Jan 17, 2024 Maintainer Author

hameerabbasi Mar 21, 2024 Maintainer Author

willow-ahrens Mar 22, 2024 Collaborator

hameerabbasi
Jan 3, 2024
Maintainer

Dask-like `.compute` pattern

Replies: 5 comments 16 replies

hameerabbasi Jan 3, 2024
Maintainer Author

hameerabbasi Jan 4, 2024
Maintainer Author

willow-ahrens Mar 22, 2024
Collaborator

hameerabbasi Jan 8, 2024
Maintainer Author

hameerabbasi Nov 11, 2024
Maintainer Author

hameerabbasi Jan 9, 2024
Maintainer Author

willow-ahrens Mar 22, 2024
Collaborator

hameerabbasi Jan 17, 2024
Maintainer Author

hameerabbasi Mar 21, 2024
Maintainer Author

willow-ahrens Mar 22, 2024
Collaborator