The X-HEEP Matrix Extension is closely derived from the T-HEAD RVM proposal. However, a number of features are absent to maintain hardware simplicity and efficiency. This document discusses these details.
There are 8 tile registers, m0-m7
, each holding a 4x4 matrix of 32-bit values. Smaller datatypes like 8-bit and 16-bit can be packed into this space, yielding 4x16 and 4x8 configurations. However, the same load instructions are used; only the arithmetic changes (more below).
The accelerator does not guarantee memory coherency with the CPU. That is, the effects of a matrix accelerator store instruction may not be visible in a load to the same address from the CPU, despite being in program order. The opposite is also true, a CPU store instruction may not be immediately visible to a matrix load.
Note that this is not modeled in the Spike simulator, all of the instructions will appear to have sequential consistency.
Let S
represent the size of individual matrix elements in bytes. Let ms1
hold matrix A[4][16/S]
, ms2
hold B[4][16/S]
, and md
hold C[4][4]
. The pseudocode for the matrix multiplication instruction <mnemonic> md, ms1, ms2
is
for (int i = 0; i < 4; i++) {
for (int j = 0; j < 4; j++) {
for (int k = 0; k < (16/S); k++) {
C[i][j] += A[i][k] * B[j][k];
}
}
}
That is, the instruction computes C += A * B^T. Below, the available mnemonics for different datatypes are given.
operand type | accumulator type | S | mnemonic |
---|---|---|---|
fp32 | fp32 | 4 | fmmacc.s |
int32 | int32 | 4 | mmasa.w |
int16 | int32 | 2 | mmada.h |
int8 | int32 | 1 | mmaqa.b |
Clear register md
with 0s:
mzero md
Load/store from address in rs1
, row stride in bytes in rs2
, for register md
/ms1
:
mld.w md, (rs1), rs2
mst.w ms1, (rs1), rs2
Element stride is assumed to be 4 in bytes; only 32-bit word memory instructions are available.
All instructions share 7'b0101011
(CUSTOM 1) as the major opcode, and func3 is 3'b000
.
31 27 | 26 25 | 24 | 23 21 | 20 18 | 17 15 | 14 12 | 11 10 | 9 7 | 6 0 | mnemonic |
---|---|---|---|---|---|---|---|---|---|---|
00001 | 00 | 0 | ms2 | ms1 | md | func3 | 10 | 000 | major opcode | fmmacc.s |
11110 | 00 | 0 | ms2 | ms1 | md | func3 | 10 | 000 | major opcode | mmasa.w |
11100 | 00 | 0 | ms2 | ms1 | md | func3 | 01 | 000 | major opcode | mmada.h |
00010 | 00 | 0 | ms2 | ms1 | md | func3 | 00 | 000 | major opcode | mmaqa.b |
11111 | 00 | 0 | 000 | 000 | md | func3 | 00 | 000 | major opcode | mzero |
31 27 | 26 25 | 24 20 | 19 15 | 14 12 | 11 10 | 9 7 | 6 0 | mnemonic |
---|---|---|---|---|---|---|---|---|
00000 | 10 | rs2 | rs1 | func3 | 10 | md | major opcode | mld.w |
00001 | 10 | rs2 | rs1 | func3 | 10 | ms1 | major opcode | mst.w |
See BUILDING.md for instructions.