Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Builtin Matrix type #4960

Open
data-man opened this issue Apr 6, 2020 · 24 comments
Open

Builtin Matrix type #4960

data-man opened this issue Apr 6, 2020 · 24 comments
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Milestone

Comments

@data-man
Copy link
Contributor

data-man commented Apr 6, 2020

LLVM 10 introduced nice Matrix intrinsics.

Possible syntax:

@Matrix(rows, cols, type)

Related issue: #903

@Vexu Vexu added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label Apr 6, 2020
@Sobeston
Copy link
Contributor

Sobeston commented Apr 6, 2020

Would @Vector(len, T) be equivalent to @Matrix(len, 1, T) (or @Matrix(1, len, T))? If their code gen and memory is the same, then we might as well drop @Vector for @Matrix.

@data-man
Copy link
Contributor Author

data-man commented Apr 6, 2020

Would @Vector(len, T) be equivalent to @Matrix(len, 1, T) (or @Matrix(1, len, T))?

Yes.

we might as well drop @Vector for @Matrix

Oh, no! Vector's operators are already perfect. :)

@Sobeston
Copy link
Contributor

Sobeston commented Apr 6, 2020

Perhaps in keeping Vector and Matrix, it would be a good opportunity to consider operations between them.

@andrewrk andrewrk added this to the 0.7.0 milestone Apr 7, 2020
@BarabasGitHub
Copy link
Contributor

Do they have anything else than transpose, multiple, load and store? Is that useful to add to a language? How much magic will that hide?

To be honest, I'm not a big fan. It's already not clear how @Matrix(rows, cols, type) would look like in memory without reading the LLVM documentation.

@SpexGuy
Copy link
Contributor

SpexGuy commented Apr 7, 2020

The RFC [1] referenced by the LLVM commit [2] has this to say:

The main motivation for the matrix support on the Clang side is to give users a way to

  • Guarantee generating high-quality code for matrix operations and trees of matrix operations. For isolated operations, we can guarantee vector code generation suitable for the target. For trees of operations, the proposed value type helps with eliminating temporary loads & stores.
  • Make use of specialized matrix ISA extensions, like the new matrix instructions in ARM v8.6 or various proprietary matrix accelerators, in their C/C++ code.
  • Move optimisations from matrix wrapper libraries into the compiler. We use it internally to simplify an Eigen-style matrix library, by relying on LLVM for generating tiled & fused loops for matrix operations.

Clearly the members of the LLVM community (or at least the ones backing this extension) believe that the optimizer can perform better here with the additional information about matrix representation, which to me seems like a valid argument that this should be included in the language. As long as we don't care about being bound more tightly to LLVM (which we don't seem to, given zig c++), I don't see a strong reason not to expose this.

But that still leaves a lot of free space in terms of how it should be exposed. At the LLVM level, there is no Matrix type [3]; the matrix intrinsics operate on Vectors with additional information supplied at the call site to describe the matrix dimensions. I do think that there would be concrete benefits to having a Matrix type abstraction for these intrinsics in Zig though. It would make it much easier to specify the dimensions in one place, and would allow for dimension inference when the compiler determines the result type of a matrix multiply. As long as the language supports a cast between matrices of the same size but different dimensions (which could just be @bitCast), and between matrix types and vector types of the same size, I think a specialized Matrix type is a net win. This also mirrors the decision made by the clang authors, who exposed these intrinsics via typedef float m4x4_t __attribute__((matrix_type(4, 4)));

It's already not clear how @Matrix(rows, cols, type) would look like in memory without reading the LLVM documentation.

I agree that this is a potential issue. We could make it easier by documenting the layout in the Zig documentation of the @Matrix intrinsic. The LLVM notes seem to suggest that they are considering adding support for multiple layouts, so we could alternatively change the builtin to specify layout explicitly, e.g. @Matrix(rows, cols, type, .COL_MAJOR).

Since a * b is more complex than element-wise multiply, operates on inputs of different types, and may return a third type, I would advise introducing an intrinsic @matrixMultiply(a, b) instead of overloading the * operator. This would also give us a place to specify the other information that can be attached to the LLVM intrinsic, like fast-math flags.

Perhaps in keeping Vector and Matrix, it would be a good opportunity to consider operations between them.

Looking at the LLVM documentation, the Matrix type is internally backed by a Vector type, so @bitCast support (or some specialized cast) definitely makes sense. But for the same reasons I stated above, I don't think we should implement matrix * vector. Since @Vector is semantically a SIMD type, not a mathematical vector type, I also don't think we should make @matrixVectorMultiply(matrix, vector), unless LLVM makes a specialized intrinsic for this specific operation. Instead, if this is needed, @matrixMultiply(matrix, @bitCast(@Matrix(4, 1, f32), vector)) should give all of the code generation benefits without introducing an operator to the language that has unexpected nontrivial cost, or encouraging treating simd vectors like mathematical vectors.

Overall I think our investment in this feature should be parallel to LLVM's. If they start making large improvements to the codegen from these intrinsics, or supporting more new hardware with them, it becomes more worthwhile for us to add support.

[1] RFC: Matrix Math Support http://lists.llvm.org/pipermail/llvm-dev/2019-October/136240.html
[2] LLVM Code review for Matrix intrinsics https://reviews.llvm.org/D70456
[3] LLVM documentation, matrix intrinsics https://llvm.org/docs/LangRef.html#matrix-intrinsics

@BarabasGitHub
Copy link
Contributor

* We use it internally to simplify an Eigen-style matrix library, by relying on LLVM for generating tiled & fused loops for matrix operations.

Ehm... ok. Not sure what to think of this. This is going in the way of Fortran. Doesn't mean it's bad, but I'm also not sure if implementing matrix multiplication algorithms is a compiler's job. Maybe I'm overestimating the extent of tiled & fused loops?

* Make use of specialized matrix ISA extensions, like the new matrix instructions in ARM v8.6 or various proprietary matrix accelerators, in their C/C++ code.

This is a valid argument for specialized matrix operations.

* For trees of operations, the proposed value type helps with eliminating temporary loads & stores.

I don't know enough here to have an opinion.

The LLVM notes seem to suggest that they are considering adding support for multiple layouts, so we could alternatively change the builtin to specify layout explicitly, e.g. @Matrix(rows, cols, type, .COL_MAJOR).

Seems like a good solution.

Since a * b is more complex than element-wise multiply, operates on inputs of different types, and may return a third type, I would advise introducing an intrinsic @matrixMultiply(a, b) instead of overloading the * operator.

Agreed. That makes it a lot clearer already than I originally imagined. 👍

@ghost
Copy link

ghost commented Nov 21, 2020

Given the variation in matrix memory layout between architectures (row-major or column-major? Is [|c|]*[|r|]T a matrix? Do we allow multiplies between different index orderings? If so, what's the index order of the result? Where is it stored?), and the implicit memory management inherent to some of them, I really don't think a separate matrix type is wise. If processors ever implement dedicated matrix multiply instructions (not just SIMD instructions to make matrix multiply easier), this can be revisited -- until then, I think the best course of action is to tighten the guarantees around auto-tiling and -fusing of loops.

@kyle-github
Copy link

If processors ever implement dedicated matrix multiply instructions (not just SIMD instructions to make matrix multiply easier), this can be revisited

Intel AMX is a new addition to x86 to support matrix operations. Intel Instruction Set Reference (PDF). See chapter 3.

Personally I think this kind of thing is an edge case and should wait until the rest of the language is finished. Also with the rise of Arm CPUs perhaps a more sane way of dealing with vector and matrix data will become more common. We can only hope at any rate.

@kyle-github
Copy link

One final comment: to be fair to Intel, AMX is a lot more sane than the ever changing set of SIMD instructions from MMX to AVX-512. But, wow, is that a lot of state. Task switching is going to be painful with the addition of that much state.

@topolarity
Copy link
Contributor

Relating this to an idea from #6771: packed [N][M]T might replace the need for a @Matrix type

Even if Zig supports none of the standard math operators (+-*/ etc.) for packed [N][M]T, it's very useful to encode functions or built-ins that implement matrix-level intrinsics:

// SIMD Matrix-Multiply-Accumulate on ARMv8.6-A
// Computes A * B' + C, storing the result in C
inline fn arm_ummla(
    A: packed(u128) [2][8]u8,
    B: packed(u128) [2][8]u8,
    C: *packed(u128) [2][2]u32,
) void {
    asm volatile ("ummla %[C], %[A], %[B]",
        : [C] "=*w" (C)
        : [A] "w" (A), [B] "w" (B)
        :);
}

The above notation is a bit of shorthand: @TypeOf(A[0]) is actually packed(u64) [8]u8

The layout of the matrices in memory is implied directly by the existing rules about how packed elements are stored contiguously (specifically, packed matrices are row-major from LSB to MSB in the backing integer).

@topolarity
Copy link
Contributor

topolarity commented Jul 15, 2022

FWIW, I don't think we really need a dedicated representation for column-major matrices, even to take advantage of hardware SIMD operations that are defined in terms of column-wise inputs/outputs. Any operation involving column-major matrices is equivalent to an operation on row-major matrices with a few transposes added and some arguments shuffled around.

(row-major) A * (column-major) B is the same thing as row-major A * transpose(B). Similarly, column-major matmul(A, B) is the same thing as row-major matmul(B, A) (this means that one can implement @matrixMultiply for packed row-major matrices in terms of "llvm.matrix.multiply" without overhead)

Column-major indexing still has it's conveniences so that you don't have to manually reverse indices (x[i][j][k] becomes x[k][j][i]), but it's not necessary to describe accelerated matrix ops

@ryoppippi
Copy link
Contributor

I really want this feature!(Matrix and maybe Tensor)

@AdamGoertz
Copy link
Contributor

AdamGoertz commented Mar 9, 2023

Just adding a comment similar to my comment in #7295 supporting this proposal. Matrix operations are extremely important in robotics, so having support for them in the language is a big plus for me.

For ergonomic reasons I’d prefer to have all (meaningful) arithmetic operators defined on matrices (+, -, *, and something equivalent to .* from MATLAB or other languages for element wise multiplication), though I could understand if Matrix multiplication was split into a separate compiler builtin (I’d suggest ‘@Matmul’ over ‘@matrixMultiply’ purely for length).

@andrewrk andrewrk modified the milestones: 0.11.0, 0.12.0 Apr 9, 2023
@andrewrk andrewrk modified the milestones: 0.13.0, 0.12.0 Jul 9, 2023
@anderflash
Copy link

Perhaps it is better for Matrix to be in a library instead of being a language feature. By taking inspiration from scientific computing libraries like Numpy, Blitz, Armadillo and Eigen, we could suggest/develop a math/statistics library for all those operations. Some people in Academia are even replacing Numpy arrays to Pytorch/Tensorflow tensors to do stuff on GPU and run some numerical optimizations by taking advantage of AutoGrad. Internally, some operations might convert subarrays into @Vector when needed.
Do we have BLAS/OpenBLAS/Lapack alternatives in Zig? Do we have libraries for numerical solvers?

Some matrix operations can be done in-place, which is not possible with the '*' syntax. For example, numpy.matmul can receive a property "out", which can be even one of the inputs (many Numpy operations have "out"). So you can reuse memory during operations. I wonder whether the matmul operation "m1 * m2 * ... * mn" (all being @Matrix) might reuse memory in between the steps.

@ilayn
Copy link

ilayn commented Sep 24, 2024

Hi, SciPy maintainer and a generic linalg code person here;

We have been chewing on the obnoxious idea of taking LAPACK out of Fortran77 by rewriting it all in another language. Thus we are looking for some sort of a new home for it. Hence Zig and Rust is always mentioned anytime a rewrite is discussed. I take this as a success story for Zig hence my congratulations to you and Rust folks.

I've written a brief (it can be much much longer) introduction about this idea here scientific-python/faster-scientific-python-ideas#7 but the array type and complex type are indispensable for any type of scientific work to be pulled off in one language. Vectors and swizzling and other niceties are always great but the real workhorses are typically ndarrays and how the syntax and slicing works on those and the dtypes we can generate. Hence it is almost a binary decision to do scientific code with a language or not.

So an external library or not, what I am wondering is how the multiindexing such as A[x, y, z] and slicing would work if it would ever that is. I have some questions for a generic understanding;

  • From the help section, I take it as Zig went ahead with C based A[x][y][z] is this correct ?
  • Alternatively, can a library ever manage to overload the bracket notation to switch to the former?
  • Does slicing supported on these matrix/array types?

I have more questions about the SIMD parts and all about complex numbers but they can wait. Since array operations are going to create the most comfort or pain points for a possible rewrite I wanted to get your opinion about this. You can imagine the importance of these basic parts as they are going to make the most difference hence array ergonomics are indeed the go-no go deciders.

@BratishkaErik
Copy link
Contributor

BratishkaErik commented Sep 24, 2024

* From the help section, I take it as Zig went ahead with C based `A[x][y][z]` is this correct ?

Yes, if you mean accessing element by index. If you mean type of array its var a: [x][y][z]u32 for example.

* Alternatively, can a library ever manage to overload the bracket notation to switch to the former?

Not in the language, but we have packages like comath which allows to use custom syntax by passing custom syntax and arguments to the context, in a style similar to std.fmt functions. It has some compile time cost but (thankfully) LLVM can optimize runtime performance of it very well.

* Does slicing supported on these matrix/array types?

If you mean "matrix/array which is declared as a structure", then no and you should create custom function like get or slice which will return value or another struct. As an example you can check std.PackedIntArray.

If it's tolerable to use more verbose syntax using comath instead of non-existant operators overloading (or custom operators), then Zig might be a good suite for you, but if it's too much IMHO it won't fit for scientific code itself. I think you might still use it comfortly for low-level math routines with another language for logic tho.

UPD: all text above is considering situation currently, so without custom operators or matrix proposals, since they are not accepted (nor rejected) yet.

@ilayn
Copy link

ilayn commented Sep 24, 2024

Let me give you a NumPy example

In [1]: import numpy as np

In [2]: A = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10], [11, 12, 13, 14, 15]])

In [3]: A
Out[3]:
array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15]])

Here element access is done with tuple indexing

In [4]: A[1, 3]
Out[4]: 9

Slicing is for picking up a particular part of the array

In [5]: A[1:, 3:]
Out[5]:
array([[ 9, 10],
       [14, 15]])

In [6]: A[1:, 1:4]
Out[6]:
array([[ 7,  8,  9],
       [12, 13, 14]])

In [7]: A[0, ::2]
Out[7]: array([1, 3, 5])

I know these seem esoteric but actually makes the life quite miserable if they are not there. From the look of it comath won't help in this situation it seems. But thank you for the pointers anyways.

@ethernetsellout
Copy link

If you mean "matrix/array which is declared as a structure", then no and you should create custom function like get or slice which will return value or another struct. As an example you can check std.PackedIntArray.

I think they were referring to the SIMD vector types and to this proposal. In the case of the SIMD types, these cannot be sliced currently. I don't see why this has to be the case.

It's interesting that many of the questions asked by @ilayn have to do with ergonomics. I.e. scientific programmers prefer to have succinct and obvious ways to express basic operations for linear algebra types and complex numbers.

Not in the language, but we have packages like comath which allows to use custom syntax by passing custom syntax and arguments to the context, in a style similar to std.fmt functions. It has some compile time cost but (thankfully) LLVM can optimize runtime performance of it very well.

As an aside, this seems like a bad compromise. It amounts to just using another language.

@BratishkaErik
Copy link
Contributor

Let me give you a NumPy example

In [1]: import numpy as np

In [2]: A = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10], [11, 12, 13, 14, 15]])

In [3]: A
Out[3]:
array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15]])

Here element access is done with tuple indexing

In [4]: A[1, 3]
Out[4]: 9

Slicing is for picking up a particular part of the array

In [5]: A[1:, 3:]
Out[5]:
array([[ 9, 10],
       [14, 15]])

In [6]: A[1:, 1:4]
Out[6]:
array([[ 7,  8,  9],
       [12, 13, 14]])

In [7]: A[0, ::2]
Out[7]: array([1, 3, 5])

I know these seem esoteric but actually makes the life quite miserable if they are not there. From the look of it comath won't help in this situation it seems. But thank you for the pointers anyways.

Here's my attempt:

const std = @import("std");

pub fn main() void {
    const A: [3][5]usize = .{ .{ 1, 2, 3, 4, 5 }, .{ 6, 7, 8, 9, 10 }, .{ 11, 12, 13, 14, 15 } };

    std.log.debug("{any}", .{A});
    // { { 1, 2, 3, 4, 5 }, { 6, 7, 8, 9, 10 }, { 11, 12, 13, 14, 15 } }

    std.log.debug("{}", .{A[1][3]});
    // 9

    // Different code, sorry, couldn't re-create like in NumPy.
    const second, const third = A[1..].*;
    std.log.debug("{any}", .{.{ second[3..], third[3..] }});
    // { { 9, 10 }, { 14, 15 } }

    std.log.debug("{any}", .{.{ second[1..4], third[1..4] }});
    // { { 7, 8, 9 }, { 12, 13, 14 } }

    // No slicing step in the language, but you can use standard library:
    std.log.debug("Slicing...", .{});
    var it = std.mem.window(usize, A[0][0..], 1, 2);
    while (it.next()) |window| std.log.debug("{any}", .{window});
    // { 1 }
    // { 3 }
    // { 5 }
}

@BratishkaErik
Copy link
Contributor

I think they were referring to the SIMD vector types and to this proposal. In the case of the SIMD types, these cannot be sliced currently. I don't see why this has to be the case.

Vectors and arrays are easily casted to both directions, so as a workaround you can cast vector to array and slice (or loop over result): https://zig.godbolt.org/z/3f3TdPbE3 . For looping issue there's #17886 but IIRC proposal for slicing vectors does not exist.

@ilayn
Copy link

ilayn commented Sep 24, 2024

Here's my attempt:

Much appreciated. The initial impressions are always wrong, but it seems like we are using built-ins to achieve some tasks here which is perfectly fine. No language needs to answer everything. I'll play around with this a bit more to get a better feeling for it. As @ethernetsellout mentioned indeed we, the royal "we" on behalf of number crunching community, are trying to arrive at to a better ergonomics to do many operations in a procedural manner.

So let's say planets aligned and * is matrix multiplication infix for array objects; then we do stuff like

  • invert the top left n x n array and multiply with the top right n x n part
  • Get even numbered columns and subtract from the odd numbered columns
  • Transpose or transpose and conjugate a complex array
  • Get the absolute values of all entries and return the max of it
  • Change the memory layout from C to Fortran layout in a cache-friendly manner etc.

and many many other stuff that requires shapes and slice manipulations to be a part of the multidimensional array syntax out of the box.

There is no problem if we don't have it in Zig, it is a massive undertaking to create a language so please take these as introductory remarks and not feature requests. I am looking for both ergonomics and access to SSE optimizations so I am also confused a lot :)

I also don't know how the end product should look like hence my question. Thanks again.

@BarabasGitHub
Copy link
Contributor

BarabasGitHub commented Sep 25, 2024 via email

@ilayn
Copy link

ilayn commented Sep 25, 2024

I am a maintainer of SciPy so yes NumPy borrowed all this from Fortran. Definitely not claiming that these are trivial but essential for multidimensional arrays. Because the languages we use in the C tree, make this so difficult to pull off (I have written way too many lines of C for Scipy) and pointless; think of all the unnecessary pointer arithmetics.

However a library definitely won't cut it. In fact that's why I am looking around to find a language that treats these as first class citizens. If you want an example, complex.h is a major historical screw-up and still does not have a good ergonomics that is compiler agnostic.

Like I said, this is not a feature request, I am just asking.

@opzap
Copy link

opzap commented Nov 9, 2024

@Vector is a language-level guarantee that a group of elements is safe to vectorize. It ensures that this group satisfies all of the invariants needed for SIMD operations and grants the compiler leeway to vectorize and perform optimizations that it would otherwise have to skip. That is why it is useful and unique enough to warrant its own @builtin.

There is currently no other way to communicate that kind of information to the compiler. Even if you implement all of the matrix intrinsics yourself, you are probably not going to get optimal machine code because your assembly is opaque to the compiler. This is not something a library can solve. But I do understand why the Zig team might be hesitant to add @Matrix as a builtin.

The restrict keyword was introduced in C99 as a way to emulate Fortran's aliasing rules and improve auto-vectorization in C. Maybe Zig needs a keyword or a scope-level builtin (like @optimizeFor) that allows us to communicate when code is safe to vectorize?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Projects
None yet
Development

No branches or pull requests