While I’m sure everyone here probably knows what functions are, you might be wondering what a "Calling Convention" is. In short, it is an agreement between the caller and callee about how to treat/use certain registers. We’ll get to the why and how later.
In assembly, a function is simply a label with a return instruction associated with it; because this is far more ambiguous than a function in a higher level language, it is good practice to only have a single return instruction associated with a function.[1] A comment above the label is also helpful. Together those help you quickly see the start and end of the function.
void func1() {}
would be
# void func1()
func1:
# body goes here
ret
As you can see my policy is to put a single line comment of the C prototype above label.
But how do you call a function in assembly? You use the instruction Jump and Link:
jal func_label
.
Let’s change the hello world program from chapter 0 to call a function:
.data
hello: .asciz "Hello World!\n"
.text
main:
jal hello_world
li a7, 10 # exit ecall
ecall
# void hello_world()
hello_world:
li a7, 4 # print string ecall
la a0, hello # load address of string to print into a0
ecall
ret
What jal
actually does, is save the address of the next instruction to ra
and then do an unconditional jump to the function label. So you could achieve
the same results with the following:
jal func # is equivalent to la ra, next_instr j func next_instr:
That would get tiring and ugly fast though, having to come up with unique labels
for the next instruction every time. You also might be confused about why the
greensheet says jal
saves PC+4 in an arbitrary register R[rd] instead of ra
specifically (which would be R[1]). The instruction does actually take a register argument but
since it’s most commonly used to call a function if you don’t specify a register
it will use ra
as if you did jal ra, func
. This works in conjunction with
the pseudoinstruction ret
which does PC=R[1] using the instruction jalr
(specifically jalr x0, ra, 0
) to easily return from functions.
You might also see another way of returning using jr
which stands for
"Jump Register" and jumps to the address in the register, so to return
from a function you’d do jr ra
. It is also a pseudoinstruction
that uses jalr
. Unless your professor insists on something else, prefer
ret
; not only is it the shortest, returning from functions is its sole
purpose.
We’ve gone as far as we can without starting to talk about registers and their purposes in functions. You can think of registers as variables[2] that are part of the CPU. In this case, since we’re dealing with a 32-bit RISC-V architecture, they are 32-bit (aka 4 bytes, 1 word) variables.[3] Since they’re part of the CPU, they exist for the life of the program and the whole program shares the same registers.
But how does that work? If all parts of the program use the same
32 registers, how does one function not stomp all over what another was doing
when it uses them? In fact, how do functions communicate at all? How do they
pass arguments or return results? All these questions are solved by deciding
on a "Calling Convention". It’s different for different architectures and even
different operating systems on the same architecture. This is because different
architectures have different numbers of registers, and some registers like ra
have
semi-hardcoded uses. The the pseudoinstruction ret
uses ra
, and x0
is a
constant 0 and there’s no way to change either of those facts. That still
leaves a lot of flexibility when designing a calling convention. While they
mostly match, you can probably find several variations of RISC-V calling
conventions online. They usually differ in how they setup a stack frame.
The convention covered in this chapter is consistent with, and sufficient for,
almost every college course I’ve ever heard of.
Regardless, what matters is that the calling convention works by setting rules (and guidelines) for register use, and when/how to use the stack.
If you’re unfamiliar with the
runtime stack, it’s exactly what it sounds like. It’s a Last-In-First-Out (LIFO)
data structure that you can use to store smaller values in a program. It grows
in a negative direction, so to allocate 12 bytes, you would subtract 12 from the
stack pointer (in RISC-V that’s sp
).
RISC-V specifically designates certain registers to be used for passing arguments
(at least the first 8), a couple for return values, and others for misc. temporary
or saved values. The rest are special use registers like ra
.
The quickest way to summarize is to look at the table on the greensheet which is reproduced below:
Register | Name | Use | Preserved Across a Call |
---|---|---|---|
x0 |
zero |
Constant 0 |
N.A. |
x1 |
ra |
Return address |
No |
x2 |
sp |
Stack pointer |
Yes |
x3 |
gp |
Global pointer |
— |
x4 |
tp |
Thread pointer |
— |
x5-x7 |
t0-t2 |
Temporaries |
No |
x8 |
s0/fp |
Saved register/Frame pointer |
Yes |
x9 |
s1 |
Saved register |
Yes |
x10-x11 |
a0-a1 |
Function arguments/Return values |
No |
x12-x17 |
a2-a7 |
Function arguments |
No |
x18-x27 |
s2-s11 |
Saved registers |
Yes |
x28-x31 |
t3-t6 |
Temporaries |
No |
To summarize, you have 15 registers that can be used anytime for temporary
values, though some have special uses too (the a and t registers).
You have 12 s registers that have to be saved on the stack if you use
them, plus ra
as well. The zero
register is obviously a special case.
The sp
register is technically preserved but not in the same way. Basically
what you allocate (subtract) you have to deallocate (add) before returning
from a function, thus preserving the original value.
You can ignore gp, tp
, and most of the time fp
too. Also, with 8 registers
to pass arguments, you’ll almost never need to pass arguments on the stack.
Let’s start with something simple that doesn’t use the stack.
int hello_name_number(char* name, int number) { printf("Hello %s!\n", name); return number + 10; }
According to the convention that becomes:
.data
hello_space: .asciz "Hello "
exclaim_nl: .asciz "!\n"
.text
# int hello_name_number(char* name, int number)
hello_name_number:
mv t0, a0 # save name in t0 since we need a0 for the ecall
li a7, 4 # print string
la a0, hello_space
ecall
mv a0, t0 # print name (a7 is still 4)
ecall
la a0, exclaim_nl # print "!\n"
ecall
addi a0, a1, 10 # return number + 10
ret
Some things to note, ecalls are not function calls so we can "save" a0
in
a t register and know that it’ll still be there when the ecall is done. In the
same way, we know that a7
is still the same so we don’t have to keep setting
it to 4 for print string. Lastly, to return a value, we make sure that value
is in a0
before returning.
First, let’s establish the rules on when you have to use the stack (You can always use it for arbitrary local variables, like a local array for example, but generally don’t if you don’t have a good reason).
-
You call another function, ie you’re a non-leaf function.
This means you have to save
ra
on the stack at the very least, otherwise when you do yourret
you’d jump back into yourself (right after the lastjal
instruction). This does not apply to main because you don’t/shouldn’t return from main, you should call the exit (or exit2) ecall (10 or 93). -
You need to save values across a function call (automatically includes reason 1).
This is fairly common for non-trivial functions. Obvious examples are calling a function in a loop or loops (you’d have to preserve the iterator(s)), and many recursive functions.
-
You run out of temporary registers and overflow into the s registers.
This is very rare. The most common reason this "happens" is people forget they have 8 a registers, in addition to the 7 t registers, that they can also use for temporaries. 15 is more than enough to handle pretty much any function because you rarely need 16 discrete values at the same time.
Let’s look at an example for the first two. Any example for the last rule would be prohibitively large and complicated.
int non_leaf()
{
func1();
return 42
}
This calls the empty function discussed at the top of this chapter.
#int non_leaf()
non_leaf:
addi sp, sp, -4 # space to save 1 register, ra
sw ra, 0(sp) # store ra in the newly allocated stack space
jal func1
li a0, 42 # return 42
lw ra, 0(sp) # restore original ra
addi sp, sp, 4 # pop the stack
ret
The bit of code at the top and bottom of the function are called the prologue
and epilogue respectively for obvious reasons. We allocate 4 bytes on the stack
by subtracting 4 (I add a negative rather than subtract
because I can copy-paste the line with a single character change for the
epilogue). Then we store the current ra
in that space at the new top of the
stack. Then before we exit we have to load it back and pop the stack.
If we didn’t save and restore ra
we would jump to line 7 when we do our
ret
and then we’d be in an infinite loop.
Next we have the second case, where we need to preserve regular local values across a function call.
void print_letters(char letter, int count)
{
for (int i=0; i<count; i++) {
putchar(letter);
}
putchar('\n');
}
int save_vals()
{
for (int i=0; i<10; i++) {
print_letters('A'+i, i+1);
}
return 8;
}
That becomes this in RISC-V:
#void print_letters(char letter, int count)
print_letters:
ble a1, x0, exit_pl # if (count <= 0) goto exit_pl
li a7, 11 # print character
pl_loop:
ecall
addi a1, a1, -1 # count--
bgt a1, x0, pl_loop # while (count > 0)
li a0, 10 # '\n'
ecall
exit_pl:
ret
#int save_vals()
save_vals:
addi sp, sp, -12
sw ra, 0(sp)
sw s0, 4(sp)
sw s1, 8(sp)
li s0, 0 # i = 0
li s1, 10
sv_loop:
addi a0, s0, 65 # i + 'A'
addi a1, s0, 1 # i + 1
jal print_letters
addi s0, s0, 1 # i++
blt s0, s1, sv_loop # while (i < 10)
lw ra, 0(sp)
lw s0, 4(sp)
lw s1, 8(sp)
addi sp, sp, 12
ret
Notice that for print_letters, we not only convert the loop to a do-while
, but
we also use the parameter count
as the iterator to count down to 0. It saves
us an instruction initializing an i
.
Second, for save_vals
, we save not only ra
because we call another function,
but also two s registers to save i
and our stopping point. The second is not
actually necessary; because it’s a constant, we could load 10 into a register
right before the check every iteration of the loop. Which version is better depends on
several factors, like how long or complex the loop is, how many times it executes, and
of course personal preference.
Let’s do a classic recursive function, the fibonacci sequence.
int fib(int n)
{
if (n <= 1)
return n;
return fib(n-2) + fib(n-1);
}
You can see how, at the very least, we’ll have to save ra
and n
, because we
need the original even after the first recursive call. It’s not as
obvious, but we’ll also have to save the return value of the first call so
we’ll still have it to do the addition after the second. You might think
this would require using two s regs, but does it? Let’s see…
#int fib(int n)
fib:
addi sp, sp, -8
sw ra, 0(sp)
sw s0, 4(sp)
# n already in a0 for immediate return
li t0, 1
ble a0, t0, exit_fib # if (n <= 1) goto exit_fib (ie return n)
mv s0, a0 # save n
addi a0, a0, -2
jal fib # fib(n-2)
addi t0, s0, -1 # calc n-1 first so we can use s0 to save fib(n-2)
mv s0, a0 # save return of fib(n-2) in s0
mv a0, t0 # copy n-1 to a0
jal fib # fib(n-1)
add a0, a0, s0 # a0 = fib(n-1) + fib(n-2)
exit_fib:
lw ra, 0(sp)
lw s0, 4(sp)
addi sp, sp, 8
ret
Notice how we don’t have to save n
any sooner than necessary, ie right before
we have to use a0
to setup the first recursive call. Also, the ordering of
lines 16-18 is important. We needed the original n
to calculate n-1
but
once that’s in a0
ready for the call, because we won’t need n
again afterward,
we can now use s0
to preserve the return value of the first call.
Some of you, if you were paying attention, might point out that you could save a few instructions of performance if you moved the base case testing before the prologue as long as you put the exit label after the epilogue. This is true, but I’d recommend against it unless you were really trying to eke out every last microsecond. It’s nicer/cleaner to keep the prologue and epilogue as the first and last things; they’re one more thing to catch your eye and help delineate where functions start and end. Regardless, if you’re curious, you can see that version, along with every other function in this chapter in the included program calling.s.
While grasping the basics of a calling convention is not too difficult, it takes
practice to get used to it. There are many things that we haven’t covered
in this chapter, like how to pass more than 8 arguments, or use fp
, or handle
floating point arguments or return values. The latter at least, will be covered in
the next chapter.