Skip to content

Latest commit

 

History

History
447 lines (340 loc) · 14.7 KB

ch5.adoc

File metadata and controls

447 lines (340 loc) · 14.7 KB

Chapter 5: Functions and the RISC-V Calling Convention

While I’m sure everyone here probably knows what functions are, you might be wondering what a "Calling Convention" is. In short, it is an agreement between the caller and callee about how to treat/use certain registers. We’ll get to the why and how later.

Functions

In assembly, a function is simply a label with a return instruction associated with it; because this is far more ambiguous than a function in a higher level language, it is good practice to only have a single return instruction associated with a function.[1] A comment above the label is also helpful. Together those help you quickly see the start and end of the function.

void func1() {}

would be

# void func1()
func1:
	# body goes here
	ret

As you can see my policy is to put a single line comment of the C prototype above label.

But how do you call a function in assembly? You use the instruction Jump and Link: jal func_label. Let’s change the hello world program from chapter 0 to call a function:

.data
hello:   .asciz "Hello World!\n"

.text
main:
	jal   hello_world

	li    a7, 10     # exit ecall
	ecall


# void hello_world()
hello_world:
	li    a7, 4      # print string ecall
	la    a0, hello  # load address of string to print into a0
	ecall

	ret

What jal actually does, is save the address of the next instruction to ra and then do an unconditional jump to the function label. So you could achieve the same results with the following:

	jal    func

	# is equivalent to

	la     ra, next_instr
	j      func
next_instr:

That would get tiring and ugly fast though, having to come up with unique labels for the next instruction every time. You also might be confused about why the greensheet says jal saves PC+4 in an arbitrary register R[rd] instead of ra specifically (which would be R[1]). The instruction does actually take a register argument but since it’s most commonly used to call a function if you don’t specify a register it will use ra as if you did jal ra, func. This works in conjunction with the pseudoinstruction ret which does PC=R[1] using the instruction jalr (specifically jalr x0, ra, 0) to easily return from functions. You might also see another way of returning using jr which stands for "Jump Register" and jumps to the address in the register, so to return from a function you’d do jr ra. It is also a pseudoinstruction that uses jalr. Unless your professor insists on something else, prefer ret; not only is it the shortest, returning from functions is its sole purpose.

The Convention

We’ve gone as far as we can without starting to talk about registers and their purposes in functions. You can think of registers as variables[2] that are part of the CPU. In this case, since we’re dealing with a 32-bit RISC-V architecture, they are 32-bit (aka 4 bytes, 1 word) variables.[3] Since they’re part of the CPU, they exist for the life of the program and the whole program shares the same registers.

But how does that work? If all parts of the program use the same 32 registers, how does one function not stomp all over what another was doing when it uses them? In fact, how do functions communicate at all? How do they pass arguments or return results? All these questions are solved by deciding on a "Calling Convention". It’s different for different architectures and even different operating systems on the same architecture. This is because different architectures have different numbers of registers, and some registers like ra have semi-hardcoded uses. The the pseudoinstruction ret uses ra, and x0 is a constant 0 and there’s no way to change either of those facts. That still leaves a lot of flexibility when designing a calling convention. While they mostly match, you can probably find several variations of RISC-V calling conventions online. They usually differ in how they setup a stack frame. The convention covered in this chapter is consistent with, and sufficient for, almost every college course I’ve ever heard of.

Regardless, what matters is that the calling convention works by setting rules (and guidelines) for register use, and when/how to use the stack.

If you’re unfamiliar with the runtime stack, it’s exactly what it sounds like. It’s a Last-In-First-Out (LIFO) data structure that you can use to store smaller values in a program. It grows in a negative direction, so to allocate 12 bytes, you would subtract 12 from the stack pointer (in RISC-V that’s sp).

RISC-V specifically designates certain registers to be used for passing arguments (at least the first 8), a couple for return values, and others for misc. temporary or saved values. The rest are special use registers like ra.

The quickest way to summarize is to look at the table on the greensheet which is reproduced below:

Table 1. RISC-V Registers and Uses
Register Name Use Preserved Across a Call

x0

zero

Constant 0

N.A.

x1

ra

Return address

No

x2

sp

Stack pointer

Yes

x3

gp

Global pointer

 — 

x4

tp

Thread pointer

 — 

x5-x7

t0-t2

Temporaries

No

x8

s0/fp

Saved register/Frame pointer

Yes

x9

s1

Saved register

Yes

x10-x11

a0-a1

Function arguments/Return values

No

x12-x17

a2-a7

Function arguments

No

x18-x27

s2-s11

Saved registers

Yes

x28-x31

t3-t6

Temporaries

No

To summarize, you have 15 registers that can be used anytime for temporary values, though some have special uses too (the a and t registers). You have 12 s registers that have to be saved on the stack if you use them, plus ra as well. The zero register is obviously a special case.

The sp register is technically preserved but not in the same way. Basically what you allocate (subtract) you have to deallocate (add) before returning from a function, thus preserving the original value.

You can ignore gp, tp, and most of the time fp too. Also, with 8 registers to pass arguments, you’ll almost never need to pass arguments on the stack.

Basic example

Let’s start with something simple that doesn’t use the stack.

int hello_name_number(char* name, int number)
{
	printf("Hello %s!\n", name);
	return number + 10;
}

According to the convention that becomes:

.data
hello_space:  .asciz "Hello "
exclaim_nl:   .asciz "!\n"

.text
# int hello_name_number(char* name, int number)
hello_name_number:
	mv       t0, a0   # save name in t0 since we need a0 for the ecall

	li       a7, 4        # print string
	la       a0, hello_space
	ecall

	mv       a0, t0    # print name (a7 is still 4)
	ecall

	la       a0, exclaim_nl  # print "!\n"
	ecall


	addi     a0, a1, 10  # return number + 10
	ret

Some things to note, ecalls are not function calls so we can "save" a0 in a t register and know that it’ll still be there when the ecall is done. In the same way, we know that a7 is still the same so we don’t have to keep setting it to 4 for print string. Lastly, to return a value, we make sure that value is in a0 before returning.

Using the Stack

First, let’s establish the rules on when you have to use the stack (You can always use it for arbitrary local variables, like a local array for example, but generally don’t if you don’t have a good reason).

  1. You call another function, ie you’re a non-leaf function.

    This means you have to save ra on the stack at the very least, otherwise when you do your ret you’d jump back into yourself (right after the last jal instruction). This does not apply to main because you don’t/shouldn’t return from main, you should call the exit (or exit2) ecall (10 or 93).

  2. You need to save values across a function call (automatically includes reason 1).

    This is fairly common for non-trivial functions. Obvious examples are calling a function in a loop or loops (you’d have to preserve the iterator(s)), and many recursive functions.

  3. You run out of temporary registers and overflow into the s registers.

    This is very rare. The most common reason this "happens" is people forget they have 8 a registers, in addition to the 7 t registers, that they can also use for temporaries. 15 is more than enough to handle pretty much any function because you rarely need 16 discrete values at the same time.

Let’s look at an example for the first two. Any example for the last rule would be prohibitively large and complicated.

int non_leaf()
{
	func1();
	return 42
}

This calls the empty function discussed at the top of this chapter.

#int non_leaf()
non_leaf:
	addi     sp, sp, -4  # space to save 1 register, ra
	sw       ra, 0(sp)   # store ra in the newly allocated stack space

	jal      func1

	li       a0, 42       # return 42

	lw       ra, 0(sp)   # restore original ra
	addi     sp, sp, 4   # pop the stack
	ret

The bit of code at the top and bottom of the function are called the prologue and epilogue respectively for obvious reasons. We allocate 4 bytes on the stack by subtracting 4 (I add a negative rather than subtract because I can copy-paste the line with a single character change for the epilogue). Then we store the current ra in that space at the new top of the stack. Then before we exit we have to load it back and pop the stack.

If we didn’t save and restore ra we would jump to line 7 when we do our ret and then we’d be in an infinite loop.

Next we have the second case, where we need to preserve regular local values across a function call.

void print_letters(char letter, int count)
{
	for (int i=0; i<count; i++) {
		putchar(letter);
	}
	putchar('\n');
}

int save_vals()
{
	for (int i=0; i<10; i++) {
		print_letters('A'+i, i+1);
	}
	return 8;
}

That becomes this in RISC-V:

#void print_letters(char letter, int count)
print_letters:
	ble      a1, x0, exit_pl   # if (count <= 0) goto exit_pl
	li       a7, 11            # print character
pl_loop:
	ecall
	addi     a1, a1, -1        # count--
	bgt      a1, x0, pl_loop   # while (count > 0)

	li       a0, 10            # '\n'
	ecall

exit_pl:
	ret


#int save_vals()
save_vals:
	addi     sp, sp, -12
	sw       ra, 0(sp)
	sw       s0, 4(sp)
	sw       s1, 8(sp)

	li       s0, 0  # i = 0
	li       s1, 10
sv_loop:
	addi     a0, s0, 65   # i + 'A'
	addi     a1, s0, 1    # i + 1
	jal      print_letters

	addi     s0, s0, 1        # i++
	blt      s0, s1, sv_loop  # while (i < 10)

	lw       ra, 0(sp)
	lw       s0, 4(sp)
	lw       s1, 8(sp)
	addi     sp, sp, 12
	ret

Notice that for print_letters, we not only convert the loop to a do-while, but we also use the parameter count as the iterator to count down to 0. It saves us an instruction initializing an i.

Second, for save_vals, we save not only ra because we call another function, but also two s registers to save i and our stopping point. The second is not actually necessary; because it’s a constant, we could load 10 into a register right before the check every iteration of the loop. Which version is better depends on several factors, like how long or complex the loop is, how many times it executes, and of course personal preference.

Recursive Functions

Let’s do a classic recursive function, the fibonacci sequence.

int fib(int n)
{
	if (n <= 1)
		return n;

	return fib(n-2) + fib(n-1);
}

You can see how, at the very least, we’ll have to save ra and n, because we need the original even after the first recursive call. It’s not as obvious, but we’ll also have to save the return value of the first call so we’ll still have it to do the addition after the second. You might think this would require using two s regs, but does it? Let’s see…​

#int fib(int n)
fib:
	addi    sp, sp, -8
	sw      ra, 0(sp)
	sw      s0, 4(sp)

	# n already in a0 for immediate return
	li      t0, 1
	ble     a0, t0, exit_fib  # if (n <= 1) goto exit_fib (ie return n)

	mv      s0, a0        # save n

	addi    a0, a0, -2
	jal     fib           # fib(n-2)

	addi    t0, s0, -1    # calc n-1 first so we can use s0 to save fib(n-2)
	mv      s0, a0        # save return of fib(n-2) in s0
	mv      a0, t0        # copy n-1 to a0
	jal     fib           # fib(n-1)

	add     a0, a0, s0    #  a0 = fib(n-1) + fib(n-2)

exit_fib:
	lw      ra, 0(sp)
	lw      s0, 4(sp)
	addi    sp, sp, 8
	ret

Notice how we don’t have to save n any sooner than necessary, ie right before we have to use a0 to setup the first recursive call. Also, the ordering of lines 16-18 is important. We needed the original n to calculate n-1 but once that’s in a0 ready for the call, because we won’t need n again afterward, we can now use s0 to preserve the return value of the first call.

Some of you, if you were paying attention, might point out that you could save a few instructions of performance if you moved the base case testing before the prologue as long as you put the exit label after the epilogue. This is true, but I’d recommend against it unless you were really trying to eke out every last microsecond. It’s nicer/cleaner to keep the prologue and epilogue as the first and last things; they’re one more thing to catch your eye and help delineate where functions start and end. Regardless, if you’re curious, you can see that version, along with every other function in this chapter in the included program calling.s.

Conclusion

While grasping the basics of a calling convention is not too difficult, it takes practice to get used to it. There are many things that we haven’t covered in this chapter, like how to pass more than 8 arguments, or use fp, or handle floating point arguments or return values. The latter at least, will be covered in the next chapter.


1. I do not agree with an ironclad "one return" policy in higher level languages. Sometimes returning early results in cleaner code, sometimes not. Similarly, `goto` is not evil and there are rare cases where using it creates the best code.
2. Obviously the zero register is not really a variable. I never understood how people could say "const variable" with a straight face, it’s literally an oxymoron.
3. RARS does support 64 bit I think TODO