Skip to content

Latest commit

 

History

History
331 lines (254 loc) · 10.8 KB

armasm_gnu_inline.md

File metadata and controls

331 lines (254 loc) · 10.8 KB

Inline ARM Assembly with GNU

This high-level overview describes some of the details of using inline assembly with ARM and the GNU toolchain. It isn't intended as a primer on ARM Thumb assembly, but rather an aid to understanding code snippets and functions written in ARM assembly in C programs.

GNU Inline Assembly Format

When connecting inline assembly and C using GNU, the assembly code generally takes the following form:

__asm__ volatile (
    code
    : output operand list
    : input operand list
    : clobber list
);

code

This mandatory section includes the assembler instructions as string literals.

  • Each line of assembly should be enclosed in double-quotes, and terminated with the \n\t sequence, which ensures proper formatting of the assembly code generated by the compiler.

output operand list

The second section in the inline assembly is the output operand list, which allows the C and inline assembly code to share output operands.

  • Multiple output operands should be comma separated.
  • The symbolic name of the operand should be enclosed in square brackets [], followed by the constraint string enclosed in double quotes "", followed by the C expression enclosed in parentheses ().

For a list of possible constraint string values, see ARM Thumb Constraint Modifiers/Codes further below.

For example, the output operand list:

    : [dptr] "+l" (result)
  • Declares the asm symbolic name dptr (accessible as %[dptr] in asm code).
  • Adds a readwrite constraint (+l) with use of Thumb state core registers R0-R7.
  • Associates it with the result C expression.

input operand list

The third section in the inline assembly is the input operand list, which allows the C and inline assembly code to share input operands.

This uses the same syntax as the list of output operands.

For example, the input operand list:

    : [value] "l" (x)
  • Declares the asm symbolic name value (accessible as %[value] in the asm code).
  • Indicates that the thumb state general purpose registers should be used
  • Associates it with the x C expression.

clobber list

The fourth and final section in the inline assembly is the clobber list.

When the compiler selects which registers to use to represent input and output operands (%0, %1, etc.), it does not use any of the clobbered registers. As a result, clobbered registers can be freely used in the inline code.

In the comma-separated clobber list, You can include:

  • Specific core or VFP registers ("r12")
  • The condition register ("cc")
  • The "memory" keyword to tell the compiler that the assembler instructions may change memory locations, forcing the compiler to store all cached values before and reload them after executing the assembler instructions.

For example, the following clobber list entry indicates that we change the condition register, and that we likely alter memory locations:

    : "cc", "memory"

Note that r7 has special meaning in the Thumb Procedure Call Standard, where it's defined as "work register in function entry/exit" and typically used as the frame pointer (fp). As such, it can't be included in the clobber list, and some care needs to be taken using it directly.

Best Practices

Name all outputs/inputs

You should generally name outputs/inputs rather than relying on the default convention of %0, %1, etc.

The benefit to naming them is maintainability. If someone inherits your inline ASM at a later date, and changes the output/input list, but doesn't update the default %n values, you can end up with non-obvious errors at run time. Using names outputs/inputs avoids this issue.

Examples

Input Operand

In this example:

  • %0 will resolve to the address of ztest_thread_callee_saved_regs_container as an INPUT operand
  • "memory", "r0" tells the compiler that we alter memory locations, and use r0 internally.

GCC will pick the register to use, but it's generally something from the caller saved registers (r4, r5, etc.)

__asm__ volatile (
	"push {v1-v8};\n\t"
	"push {r0, r1};\n\t"
	"mov r0, r7;\n\t"
	"ldmia %0, {v1-v8};\n\t"
	"mov r7, r0;\n\t"
	"pop {r0, r1};\n\t"
	: /* No outputs */ 
	: "r" (&ztest_thread_callee_saved_regs_container)
	: "memory", "r0"
);

NOTE: Multiple input/output operands should be comma-separated, i.e.:

: : "r" (&ztest_thread_callee_saved_regs_container), "r" (&other_container)

Clobber List

Any registers that can not be used for input/output assignment need to be added to the clobber list, so that GCC is aware that those registers shouldn't be used when assigning registers to %0, etc.

In this example:

  • memory indicates we change memory locations
  • r0, r2, r3, indicates that we use these registers internally, causing %0 to be assigned a register outside this range.
__asm__ volatile (
    /* Stash r4-r11 in stack, they will be restored much later in
        * another inline asm -- that should be reworked since stack
        * must be balanced when we leave any inline asm. We could
        * use simply an alternative stack for storing them instead
        * of the function's stack.
        */
    "push {r4-r7};\n\t"
    "mov r2, r8;\n\t"
    "mov r3, r9;\n\t"
    "push {r2, r3};\n\t"
    "mov r2, r10;\n\t"
    "mov r3, r11;\n\t"
    "push {r2, r3};\n\t"

    /* Save r0 and r7 since we want to preserve them but they
        * are used below: r0 is used as a copy of struct pointer
        * we don't want to mess and r7 is the frame pointer which
        * we must not clobber it.
        */
    "push {r0, r7};\n\t"

    /* Load struct into r4-r11 */
    "mov r0, %0;\n\t"
    "add r0, #16;\n\t"
    "ldmia r0!, {r4-r7};\n\t"
    "mov r8, r4;\n\t"
    "mov r9, r5;\n\t"
    "mov r10, r6;\n\t"
    "mov r11, r7;\n\t"
    "sub r0, #32;\n\t"
    "ldmia r0!, {r4-r7};\n\t"

    /* Restore r0 and r7 */
    "pop {r0, r7};\n\t"

    : /* no output */
    : "r" (&ztest_thread_callee_saved_regs_container)
    : "memory", "r0", "r2", "r3"
);

ARM Thumb Constraint Modifiers/Codes

Constraint modifiers and codes are used to control which registers are used when compiling asm code for the ARM core, and the type of access they have.

An exhaustive list of ARM constraints for GNU is available in the GCC documentation: Constraints for Particular Machines

Constraint modifiers

Possible ARM Thumb constraint modifiers are:

  • + readwrite: This operand is both read from and written to.
  • = write: This operand is only written to, and only after all input operands have been read for the last time. Output only.
  • =& write: This operand is only written to. It might be modified before the assembly block finishes reading the input operands. Therefore the compiler cannot use the same register to store this operand and an input operand. Operands with the =& constraint modifier are known as early-clobber operands. Output only.

Use the & constraint modifier on all output operands that must not overlap an input. Otherwise, GCC may allocate the output operand in the same register as an unrelated input operand, on the assumption that the assembler code consumes its inputs before producing outputs. This assumption may be false if the assembler code actually consists of more than one instruction.

Constraint codes

Possible ARM Thumb constraint codes are:

  • l : Thumb state general purpose registers
    • Operand must be an integer or floating-point type.
    • For T32 state, the compiler can use R0-R7.
    • For A32 state, the compiler can use R0-R12, or R14.
  • h : Thumb state upper general purpose registers
    • Operand must be an integer or floating-point type.
    • For T32 state, the compiler can use R8-R12, or R14.
    • Not valid for A32 state!
  • t : Vector floating point registers
    • Operand must be a 32-bit floating-point or integer type.
    • The compiler can use S0-S31.
  • w : Vector floating point registers
    • Operand must be a 64-bit floating-point or vector type, or a 64-bit integer.
    • The compiler can use S0-S31, D0-D31, or Q0-Q15, depending on the size of the operand type.
    • Not valid for Thumb1.

Register Usage

TODO: Add link to ARM document on function argument usage in assembly.

The table below lists core registers for ARM devices and their usage:

Register Usage
R0 First function argument, Integer function result, Scratch reg.
R1 Second function argument, Scratch reg.
R2 Third function argument, Scratch reg.
R3 Fourth function argument, Scratch reg.
R4 Register variable
R5 Register variable
R6 Register variable
R7 Register variable (Thumb fp)
R8 Register variable
R9 (rfp) Register variable, Real frame pointer
R10 (sl) Stack limit
R11 (fp) Argument pointer
R12 (lp) Temporary workspace
R13 (sp) Stack pointer
R14 (lr) Link register, Workspace
R15 (pc) Program counter

Examples

The function below demonstrates how to work with a C function with a parameter in GNU inline assembly, assigning the parameter as an input to a general-purpose register:

vstmia stores multiple SIMD or FP registers to consecutive memory, using the address from a general-purpose register. The ia indicates increment-after, meaning the address in the general-purpose register will be updated one the operation is completed.

static void ALWAYS_INLINE z_arm_fpu_caller_save(struct __fpu_sf *fpu)
{
	__asm__ volatile (
		"vstmia %0, {s0-s15};\n"
		: : "r" (&fpu->s[0])
		: "memory"
		);
#if CONFIG_VFP_FEATURE_REGS_S64_D32
	__asm__ volatile (
		"vstmia %0, {d16-d31};\n\t"
		:
		: "r" (&fpu->d[0])
		: "memory"
		);
#endif
}

The non-inline version of this code might resemble something like this:

r2 includes register-writeback (!), meaning it's value will be updated after the operation. Without ! the register would not be updated despite the ia in the mnemonic.

mov r2, sp
vstmia r2!, {s0-s15}
#ifdef CONFIG_VFP_FEATURE_REGS_S64_D32
vstmia r2!, {d16-d31}
#endif

Sources and Further Reading

  • ARM inline assembly Somewhat dated, but very useful summary of GNU inline ARM assembly.
  • GNU Extended ASM Details the format used for inline assembler instructions with C expression operands