You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As always the assembler and linker will chose the right call or jump for you if you use a label and do not specify an extension.
Move 32-bit value to register
Rd = imm32
Pd = imm32
ureg = imm32 // works with all register allowed in the register move instruction
Loads and stores with 32-bit immediate address
Rd = [ imm32 ]
Pd = [ imm32 ]
[ imm32 ] = Rx
[ imm32 ] = Px
Rd = W[ imm32 ] (Z)
Rd = W[ imm32 ] (X)
W[ imm32 ] = Rx
Rd = B[ imm32 ] (Z)
Rd = B[ imm32 ] (X)
B[ imm32 ] = Rx
Rd_hi = W[ imm32 ]
Rd_lo = W[ imm32 ]
W[ imm32 ] = Rx_hi
A few changes that improve orthogonality
Rd = !CC // filled in an encoding hole
ureg = ureg // restrictions on which registers can be copied to which have been removed
compute || preg-access || preg-load // P-register addressing allowed in both DAG slots
The last of these is quite pervasive as there is now no need to load an I-register to enable dual load/store. I find when writing assembler for BF70x I only use I- and M-registers for circular buffering.
The assembler will accept alternative syntax for immediate shift instructions or new error checks in the assembler
System instructions
STI IDLE Rx // combines STI and IDLE to avoid a race condition
The text was updated successfully, but these errors were encountered:
Add with carry, subtract with borrow for better 64-bit arithmetic.
Rd = Rx + Ry + AC0
Rd = Rx - Ry + AC0 - 1
Rd = Rx + Ry + AC0 (S)
Rd = Rx - Ry + AC0 - 1 (S)
Single cycle 32-bit multiplies and multiply accumulates.A 32-bit result can be produced an R-register,A 64-bit result in an R-register pair, e.g. R3:2, with high bits in odd register.Accumulation is done in A1:0. The accumulator is actually 72 bits because A0.X is not used.Rd = Rx * Ry modeRe:d = Rx * Ry modeA1:0 = Rx * Ry modeA1:0 += Rx * Ry modeA1:0 -= Rx * Ry modeRd = (A1:0 = Rx * Ry) modeRd = (A1:0 += Rx * Ry) modeRd = (A1:0 -= Rx * Ry) modeRd = A1:0 modeRe:d = (A1:0 = Rx * Ry) modeRe:d = (A1:0 += Rx * Ry) modeRe:d = (A1:0 -= Rx * Ry) modeRe:d = A1:0 modeThese modes are supported for the 32-bit multiplies and multiply accumulates.default fractional rounding(T) signed fractional truncating(IS) signed integer(IS,NS) signed integer, non saturating(FU) unsigned fractional rounding(TFU) unsigned fractional truncating(IU) unsigned integer saturating(IU,NS) unsigned integer non-saturating(M) mixed, signed fractional rounding(M,T) mixed, signed fractional truncating(M,IS) mixed, signed integer saturating(M,IS,NS) mixed not saturating, integer non-saturatingThe fractional rounding modes cannot be used with accumulate and extract instructions.So "R1:0 = (A1:0 = R3 * R4)" will give an error but "R1:0 = (A1:0 = R3 * R4) (T)" will not.The existing *= operation is single cycle too in Blackfin+.## Single cycle Complex multiplication.Operands are R-registers with the 16-bit imaginary part in the high bits and the 16-bit real part in the low bits.Results can be the same format or R-register pairs containing 32-bit imaginary in odd register and 32-bit real in the even register.Accumulation is done in A1:0 with the imaginary part in the full 40-bits of A1 and the real part the full 40-bits of A0.Rd = CMUL(Rx, Ry) modeRd = CMUL(Rx, Ry*) modeRd = CMUL(Rx*, Ry*) modeRe:d = CMUL(Rx, Ry) modeRe:d = CMUL(Rx, Ry*) modeRe:d = CMUL(Rx*, Ry*) modeA1:0 = CMUL(Rx, Ry) modeA1:0 = CMUL(Rx, Ry*) modeA1:0 = CMUL(Rx*, Ry*) modeA1:0 += CMUL(Rx, Ry) modeA1:0 += CMUL(Rx, Ry*) modeA1:0 += CMUL(Rx*, Ry*) modeA1:0 -= CMUL(Rx, Ry) modeA1:0 -= CMUL(Rx, Ry*) modeA1:0 -= CMUL(Rx*, Ry*) modeRd = (A1:0 = CMUL(Rx, Ry)) modeRd = (A1:0 = CMUL(Rx, Ry*)) modeRd = (A1:0 = CMUL(Rx*, Ry*)) modeRd = (A1:0 += CMUL(Rx, Ry)) modeRd = (A1:0 += CMUL(Rx, Ry*)) modeRd = (A1:0 += CMUL(Rx*, Ry*)) modeRd = (A1:0 -= CMUL(Rx, Ry)) modeRd = (A1:0 -= CMUL(Rx, Ry*)) modeRd = (A1:0 -= CMUL(Rx*, Ry*)) modeRe:d = (A1:0 = CMUL(Rx, Ry)) modeRe:d = (A1:0 = CMUL(Rx, Ry*)) modeRe:d = (A1:0 = CMUL(Rx*, Ry*)) modeRe:d = (A1:0 += CMUL(Rx, Ry)) modeRe:d = (A1:0 += CMUL(Rx, Ry*)) modeRe:d = (A1:0 += CMUL(Rx*, Ry*)) modeRe:d = (A1:0 -= CMUL(Rx, Ry)) modeRe:d = (A1:0 -= CMUL(Rx, Ry*)) modeRe:d = (A1:0 -= CMUL(Rx*, Ry*)) modeA * after an operand indicate the operand to the multiply is the complex conjugate of the value in the register.These modes are supported for complex multiplies and multiply accumulates.default signed fractional saturating rounding(T) signed fractional saturating truncating (extract accumulator to single R-register operations only.)(IS) signed integer saturating## New Accumulator Loads.A couple of new options have been added to the DSP32ALU instruction to help initialize A1:0 for complex and 32-bit multiply accumulates.A1 = Rx (X), A0 = Ry (Z) // sign extend Rx:y into A1:0A1 = Rx (X), A0 = Ry (X)A1 = Rx (Z), A0 = Ry (Z) // zero extend Rx:y into A1:0A1 = Rx (Z), A0 = Ry (X)Initializing the accumulator pair to zero is supported by the existing A1 = A0 = 0 instruction.New hardware loop instructions for zero-trip and known iteration loops.
LSETUPZ (lab) LCx=Py // Jumps over loop if Py==0 when executed
LSETUPZ (lab) LCx=Py>>1 // Jumps over loop if Py==0 when executed
LSETUPLEZ (lab) LCx=Py // Jumps over loop if Py<=0 when executed
LSETUPLEZ (lab) LCx=Py>>1 // Jumps over loop if Py<=0 when executed
LSETUP (lab) LCx=imm // Loop with immediate trip count
Jumps and calls with 32-bit immediate target`
JUMP.A imm32 // absoulte addess
JUMP.XL imm32 // PC-relative
CALL.A imm32
CALL.XL imm32
As always the assembler and linker will chose the right call or jump for you if you use a label and do not specify an extension.
Move 32-bit value to registerRd = imm32Pd = imm32ureg = imm32 // works with all register allowed in the register move instruction
Loads and stores with 32-bit immediate address
Rd = [ imm32 ]
Pd = [ imm32 ]
[ imm32 ] = Rx
[ imm32 ] = Px
Rd = W[ imm32 ] (Z)
Rd = W[ imm32 ] (X)
W[ imm32 ] = Rx
Rd = B[ imm32 ] (Z)
Rd = B[ imm32 ] (X)
B[ imm32 ] = Rx
Rd_hi = W[ imm32 ]
Rd_lo = W[ imm32 ]
W[ imm32 ] = Rx_hi
A few changes that improve orthogonality
Rd = !CC // filled in an encoding hole
ureg = ureg // restrictions on which registers can be copied to which have been removed
compute || preg-access || preg-load // P-register addressing allowed in both DAG slots
The last of these is quite pervasive as there is now no need to load an I-register to enable dual load/store. I find when writing assembler for BF70x I only use I- and M-registers for circular buffering.
The assembler will accept alternative syntax for immediate shift instructions or new error checks in the assembler
System instructions
STI IDLE Rx // combines STI and IDLE to avoid a race condition
The text was updated successfully, but these errors were encountered: