This high-level overview describes some of the details of using inline assembly with ARM and the GNU toolchain. It isn't intended as a primer on ARM Thumb assembly, but rather an aid to understanding code snippets and functions written in ARM assembly in C programs.
When connecting inline assembly and C using GNU, the assembly code generally takes the following form:
__asm__ volatile (
code
: output operand list
: input operand list
: clobber list
);
This mandatory section includes the assembler instructions as string literals.
- Each line of assembly should be enclosed in double-quotes, and terminated
with the
\n\t
sequence, which ensures proper formatting of the assembly code generated by the compiler.
The second section in the inline assembly is the output operand list, which allows the C and inline assembly code to share output operands.
- Multiple output operands should be comma separated.
- The symbolic name of the operand should be enclosed in square brackets
[]
, followed by the constraint string enclosed in double quotes""
, followed by the C expression enclosed in parentheses()
.
For a list of possible constraint string values, see ARM Thumb Constraint Modifiers/Codes further below.
For example, the output operand list:
: [dptr] "+l" (result)
- Declares the asm symbolic name
dptr
(accessible as%[dptr]
in asm code). - Adds a readwrite constraint (
+l
) with use of Thumb state core registers R0-R7. - Associates it with the
result
C expression.
The third section in the inline assembly is the input operand list, which allows the C and inline assembly code to share input operands.
This uses the same syntax as the list of output operands.
For example, the input operand list:
: [value] "l" (x)
- Declares the asm symbolic name
value
(accessible as%[value]
in the asm code). - Indicates that the thumb state general purpose registers should be used
- Associates it with the
x
C expression.
The fourth and final section in the inline assembly is the clobber list.
When the compiler selects which registers to use to represent input and output
operands (%0
, %1
, etc.), it does not use any of the clobbered registers.
As a result, clobbered registers can be freely used in the inline code.
In the comma-separated clobber list, You can include:
- Specific core or VFP registers (
"r12"
) - The condition register (
"cc"
) - The
"memory"
keyword to tell the compiler that the assembler instructions may change memory locations, forcing the compiler to store all cached values before and reload them after executing the assembler instructions.
For example, the following clobber list entry indicates that we change the condition register, and that we likely alter memory locations:
: "cc", "memory"
Note that
r7
has special meaning in the Thumb Procedure Call Standard, where it's defined as "work register in function entry/exit" and typically used as the frame pointer (fp
). As such, it can't be included in the clobber list, and some care needs to be taken using it directly.
You should generally name outputs/inputs rather than relying on the default
convention of %0
, %1
, etc.
The benefit to naming them is maintainability. If someone inherits your
inline ASM at a later date, and changes the output/input list, but doesn't
update the default %n
values, you can end up with non-obvious errors at
run time. Using names outputs/inputs avoids this issue.
In this example:
%0
will resolve to the address ofztest_thread_callee_saved_regs_container
as an INPUT operand"memory"
,"r0"
tells the compiler that we alter memory locations, and user0
internally.
GCC will pick the register to use, but it's generally something from the caller saved registers (
r4
,r5
, etc.)
__asm__ volatile (
"push {v1-v8};\n\t"
"push {r0, r1};\n\t"
"mov r0, r7;\n\t"
"ldmia %0, {v1-v8};\n\t"
"mov r7, r0;\n\t"
"pop {r0, r1};\n\t"
: /* No outputs */
: "r" (&ztest_thread_callee_saved_regs_container)
: "memory", "r0"
);
NOTE: Multiple input/output operands should be comma-separated, i.e.:
: : "r" (&ztest_thread_callee_saved_regs_container), "r" (&other_container)
Any registers that can not be used for input/output assignment need to be
added to the clobber list, so that GCC is aware that those registers shouldn't
be used when assigning registers to %0
, etc.
In this example:
memory
indicates we change memory locationsr0
,r2
,r3
, indicates that we use these registers internally, causing%0
to be assigned a register outside this range.
__asm__ volatile (
/* Stash r4-r11 in stack, they will be restored much later in
* another inline asm -- that should be reworked since stack
* must be balanced when we leave any inline asm. We could
* use simply an alternative stack for storing them instead
* of the function's stack.
*/
"push {r4-r7};\n\t"
"mov r2, r8;\n\t"
"mov r3, r9;\n\t"
"push {r2, r3};\n\t"
"mov r2, r10;\n\t"
"mov r3, r11;\n\t"
"push {r2, r3};\n\t"
/* Save r0 and r7 since we want to preserve them but they
* are used below: r0 is used as a copy of struct pointer
* we don't want to mess and r7 is the frame pointer which
* we must not clobber it.
*/
"push {r0, r7};\n\t"
/* Load struct into r4-r11 */
"mov r0, %0;\n\t"
"add r0, #16;\n\t"
"ldmia r0!, {r4-r7};\n\t"
"mov r8, r4;\n\t"
"mov r9, r5;\n\t"
"mov r10, r6;\n\t"
"mov r11, r7;\n\t"
"sub r0, #32;\n\t"
"ldmia r0!, {r4-r7};\n\t"
/* Restore r0 and r7 */
"pop {r0, r7};\n\t"
: /* no output */
: "r" (&ztest_thread_callee_saved_regs_container)
: "memory", "r0", "r2", "r3"
);
Constraint modifiers and codes are used to control which registers are used when compiling asm code for the ARM core, and the type of access they have.
An exhaustive list of ARM constraints for GNU is available in the GCC documentation: Constraints for Particular Machines
Possible ARM Thumb constraint modifiers are:
+
readwrite: This operand is both read from and written to.=
write: This operand is only written to, and only after all input operands have been read for the last time. Output only.=&
write: This operand is only written to. It might be modified before the assembly block finishes reading the input operands. Therefore the compiler cannot use the same register to store this operand and an input operand. Operands with the=&
constraint modifier are known as early-clobber operands. Output only.
Use the
&
constraint modifier on all output operands that must not overlap an input. Otherwise, GCC may allocate the output operand in the same register as an unrelated input operand, on the assumption that the assembler code consumes its inputs before producing outputs. This assumption may be false if the assembler code actually consists of more than one instruction.
Possible ARM Thumb constraint codes are:
l
: Thumb state general purpose registers- Operand must be an integer or floating-point type.
- For T32 state, the compiler can use R0-R7.
- For A32 state, the compiler can use R0-R12, or R14.
h
: Thumb state upper general purpose registers- Operand must be an integer or floating-point type.
- For T32 state, the compiler can use R8-R12, or R14.
- Not valid for A32 state!
t
: Vector floating point registers- Operand must be a 32-bit floating-point or integer type.
- The compiler can use S0-S31.
w
: Vector floating point registers- Operand must be a 64-bit floating-point or vector type, or a 64-bit integer.
- The compiler can use S0-S31, D0-D31, or Q0-Q15, depending on the size of the operand type.
- Not valid for Thumb1.
TODO: Add link to ARM document on function argument usage in assembly.
The table below lists core registers for ARM devices and their usage:
Register | Usage |
---|---|
R0 | First function argument, Integer function result, Scratch reg. |
R1 | Second function argument, Scratch reg. |
R2 | Third function argument, Scratch reg. |
R3 | Fourth function argument, Scratch reg. |
R4 | Register variable |
R5 | Register variable |
R6 | Register variable |
R7 | Register variable (Thumb fp ) |
R8 | Register variable |
R9 (rfp ) |
Register variable, Real frame pointer |
R10 (sl ) |
Stack limit |
R11 (fp ) |
Argument pointer |
R12 (lp ) |
Temporary workspace |
R13 (sp ) |
Stack pointer |
R14 (lr ) |
Link register, Workspace |
R15 (pc ) |
Program counter |
The function below demonstrates how to work with a C function with a parameter in GNU inline assembly, assigning the parameter as an input to a general-purpose register:
vstmia
stores multiple SIMD or FP registers to consecutive memory, using the address from a general-purpose register. Theia
indicates increment-after, meaning the address in the general-purpose register will be updated one the operation is completed.
static void ALWAYS_INLINE z_arm_fpu_caller_save(struct __fpu_sf *fpu)
{
__asm__ volatile (
"vstmia %0, {s0-s15};\n"
: : "r" (&fpu->s[0])
: "memory"
);
#if CONFIG_VFP_FEATURE_REGS_S64_D32
__asm__ volatile (
"vstmia %0, {d16-d31};\n\t"
:
: "r" (&fpu->d[0])
: "memory"
);
#endif
}
The non-inline version of this code might resemble something like this:
r2
includes register-writeback (!
), meaning it's value will be updated after the operation. Without!
the register would not be updated despite theia
in the mnemonic.
mov r2, sp
vstmia r2!, {s0-s15}
#ifdef CONFIG_VFP_FEATURE_REGS_S64_D32
vstmia r2!, {d16-d31}
#endif
- ARM inline assembly Somewhat dated, but very useful summary of GNU inline ARM assembly.
- GNU Extended ASM Details the format used for inline assembler instructions with C expression operands