Add support for new opcodes to AS #1

deanm1278 · 2018-02-22T00:08:26Z

Add with carry, subtract with borrow for better 64-bit arithmetic.

Rd = Rx + Ry + AC0

Rd = Rx - Ry + AC0 - 1

Rd = Rx + Ry + AC0 (S)

Rd = Rx - Ry + AC0 - 1 (S)

Single cycle 32-bit multiplies and multiply accumulates.

~~A 32-bit result can be produced an R-register,~~

~~A 64-bit result in an R-register pair, e.g. R3:2, with high bits in odd register.~~

~~Accumulation is done in A1:0. The accumulator is actually 72 bits because A0.X is not used.~~

~~Rd = Rx * Ry mode~~

~~Re:d = Rx * Ry mode~~

~~A1:0 = Rx * Ry mode~~

~~A1:0 += Rx * Ry mode~~

~~A1:0 -= Rx * Ry mode~~

~~Rd = (A1:0 = Rx * Ry) mode~~

~~Rd = (A1:0 += Rx * Ry) mode~~

~~Rd = (A1:0 -= Rx * Ry) mode~~

~~Rd = A1:0 mode~~

~~Re:d = (A1:0 = Rx * Ry) mode~~

~~Re:d = (A1:0 += Rx * Ry) mode~~

~~Re:d = (A1:0 -= Rx * Ry) mode~~

~~Re:d = A1:0 mode~~

~~These modes are supported for the 32-bit multiplies and multiply accumulates.~~

~~default fractional rounding~~

~~(T) signed fractional truncating~~

~~(IS) signed integer~~

~~(IS,NS) signed integer, non saturating~~

~~(FU) unsigned fractional rounding~~

~~(TFU) unsigned fractional truncating~~

~~(IU) unsigned integer saturating~~

~~(IU,NS) unsigned integer non-saturating~~

~~(M) mixed, signed fractional rounding~~

~~(M,T) mixed, signed fractional truncating~~

~~(M,IS) mixed, signed integer saturating~~

~~(M,IS,NS) mixed not saturating, integer non-saturating~~

~~The fractional rounding modes cannot be used with accumulate and extract instructions.~~

~~So "R1:0 = (A1:0 = R3 * R4)" will give an error but "R1:0 = (A1:0 = R3 * R4) (T)" will not.~~

~~The existing *= operation is single cycle too in Blackfin+.~~

~~## Single cycle Complex multiplication.~~

~~Operands are R-registers with the 16-bit imaginary part in the high bits and the 16-bit real part in the low bits.~~

~~Results can be the same format or R-register pairs containing 32-bit imaginary in odd register and 32-bit real in the even register.~~

~~Accumulation is done in A1:0 with the imaginary part in the full 40-bits of A1 and the real part the full 40-bits of A0.~~

~~Rd = CMUL(Rx, Ry) mode~~

~~Rd = CMUL(Rx, Ry*) mode~~

~~Rd = CMUL(Rx*, Ry*) mode~~

~~Re:d = CMUL(Rx, Ry) mode~~

~~Re:d = CMUL(Rx, Ry*) mode~~

~~Re:d = CMUL(Rx*, Ry*) mode~~

~~A1:0 = CMUL(Rx, Ry) mode~~

~~A1:0 = CMUL(Rx, Ry*) mode~~

~~A1:0 = CMUL(Rx*, Ry*) mode~~

~~A1:0 += CMUL(Rx, Ry) mode~~

~~A1:0 += CMUL(Rx, Ry*) mode~~

~~A1:0 += CMUL(Rx*, Ry*) mode~~

~~A1:0 -= CMUL(Rx, Ry) mode~~

~~A1:0 -= CMUL(Rx, Ry*) mode~~

~~A1:0 -= CMUL(Rx*, Ry*) mode~~

~~Rd = (A1:0 = CMUL(Rx, Ry)) mode~~

~~Rd = (A1:0 = CMUL(Rx, Ry*)) mode~~

~~Rd = (A1:0 = CMUL(Rx*, Ry*)) mode~~

~~Rd = (A1:0 += CMUL(Rx, Ry)) mode~~

~~Rd = (A1:0 += CMUL(Rx, Ry*)) mode~~

~~Rd = (A1:0 += CMUL(Rx*, Ry*)) mode~~

~~Rd = (A1:0 -= CMUL(Rx, Ry)) mode~~

~~Rd = (A1:0 -= CMUL(Rx, Ry*)) mode~~

~~Rd = (A1:0 -= CMUL(Rx*, Ry*)) mode~~

~~Re:d = (A1:0 = CMUL(Rx, Ry)) mode~~

~~Re:d = (A1:0 = CMUL(Rx, Ry*)) mode~~

~~Re:d = (A1:0 = CMUL(Rx*, Ry*)) mode~~

~~Re:d = (A1:0 += CMUL(Rx, Ry)) mode~~

~~Re:d = (A1:0 += CMUL(Rx, Ry*)) mode~~

~~Re:d = (A1:0 += CMUL(Rx*, Ry*)) mode~~

~~Re:d = (A1:0 -= CMUL(Rx, Ry)) mode~~

~~Re:d = (A1:0 -= CMUL(Rx, Ry*)) mode~~

~~Re:d = (A1:0 -= CMUL(Rx*, Ry*)) mode~~

~~A * after an operand indicate the operand to the multiply is the complex conjugate of the value in the register.~~

~~These modes are supported for complex multiplies and multiply accumulates.~~

~~default signed fractional saturating rounding~~

~~(T) signed fractional saturating truncating (extract accumulator to single R-register operations only.)~~

~~(IS) signed integer saturating~~

~~## New Accumulator Loads.~~

~~A couple of new options have been added to the DSP32ALU instruction to help initialize A1:0 for complex and 32-bit multiply accumulates.~~

~~A1 = Rx (X), A0 = Ry (Z) // sign extend Rx:y into A1:0~~

~~A1 = Rx (X), A0 = Ry (X)~~

~~A1 = Rx (Z), A0 = Ry (Z) // zero extend Rx:y into A1:0~~

~~A1 = Rx (Z), A0 = Ry (X)~~

~~Initializing the accumulator pair to zero is supported by the existing A1 = A0 = 0 instruction.~~

New hardware loop instructions for zero-trip and known iteration loops.

LSETUPZ (lab) LCx=Py // Jumps over loop if Py==0 when executed

LSETUPZ (lab) LCx=Py>>1 // Jumps over loop if Py==0 when executed

LSETUPLEZ (lab) LCx=Py // Jumps over loop if Py<=0 when executed

LSETUPLEZ (lab) LCx=Py>>1 // Jumps over loop if Py<=0 when executed

LSETUP (lab) LCx=imm // Loop with immediate trip count

Jumps and calls with 32-bit immediate target`

JUMP.A imm32 // absoulte addess
JUMP.XL imm32 // PC-relative
CALL.A imm32
CALL.XL imm32

As always the assembler and linker will chose the right call or jump for you if you use a label and do not specify an extension.

Move 32-bit value to register

~~Rd = imm32~~

~~Pd = imm32~~

ureg = imm32 // works with all register allowed in the register move instruction

Loads and stores with 32-bit immediate address

Rd = [ imm32 ]

Pd = [ imm32 ]

[ imm32 ] = Rx

[ imm32 ] = Px

Rd = W[ imm32 ] (Z)

Rd = W[ imm32 ] (X)

W[ imm32 ] = Rx

Rd = B[ imm32 ] (Z)

Rd = B[ imm32 ] (X)

B[ imm32 ] = Rx

Rd_hi = W[ imm32 ]

Rd_lo = W[ imm32 ]

W[ imm32 ] = Rx_hi

A few changes that improve orthogonality

Rd = !CC // filled in an encoding hole

ureg = ureg // restrictions on which registers can be copied to which have been removed

compute || preg-access || preg-load // P-register addressing allowed in both DAG slots

The last of these is quite pervasive as there is now no need to load an I-register to enable dual load/store. I find when writing assembler for BF70x I only use I- and M-registers for circular buffering.

The assembler will accept alternative syntax for immediate shift instructions or new error checks in the assembler

System instructions

STI IDLE Rx // combines STI and IDLE to avoid a race condition

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for new opcodes to AS #1

Add support for new opcodes to AS #1

deanm1278 commented Feb 22, 2018 •

edited

Loading

Add support for new opcodes to AS #1

Add support for new opcodes to AS #1

Comments

deanm1278 commented Feb 22, 2018 • edited Loading

Add with carry, subtract with borrow for better 64-bit arithmetic.

Single cycle 32-bit multiplies and multiply accumulates.

New hardware loop instructions for zero-trip and known iteration loops.

Jumps and calls with 32-bit immediate target`

Move 32-bit value to register

Loads and stores with 32-bit immediate address

A few changes that improve orthogonality

System instructions

deanm1278 commented Feb 22, 2018 •

edited

Loading