Chapter 13. Assembly: Subroutines

Programmers need to break their programs into pieces. In Java, the pieces are called methods; in C, they are called functions; and in assembly language, they are called subroutines. In this chapter we look at how to write subroutines; and through learning about subroutines, we'll also learn about how methods/functions in higher-level languages are actually executed.

13.1. The link register

We've seen already that each processor has a register called its program counter to track the address of the instruction it is about to execute. The ARM processor uses R15 for this purpose. In fact, assembly programs would typically write PC rather than R15, though the two are synonymous.

In invoking a subroutine, a program must store where the processor should return after completing the subroutine. The ARM processor uses R14 for this purpose; it is called the link register, and is usually referenced as LR in programs.

Below is an example illustrating how this works, using our strcpy example from the previous chapter. Note that this is not how you should typically call a subroutine; we'll soon see the right way. For now, we just want to illustrate how one could write subroutines using the instructions we've already seen.

        ; Note: This way of calling subroutines is stylistically poor.
        ADD LR, PC, #after  ; place into LR the address to return to
        B strcpy
after    ; main code continues its work here after subroutine completes

strcpy  LDRB R2, [R1], #1
        STRB R2, [R0], #1
        TST R2, R2          ; repeat if R2 is nonzero
        BNE strcpy
        MOV PC, LR          ; return back into the code calling strcpy

This fragment first loads into LR the address of the instruction following the call to strcpy, labeled after in the above program. In the next instruction, it branches into strcpy, and the subroutines starts its work. Once complete, the subroutine copies LR back into PC, which leads the processor to pick up at the after label.

The process of calling subroutines happens enough that the ARM processors felt that the two-instruction process in the above illustration was too cumbersome. So they created a new instruction type, called BL. Thus, rather than the first two instructions above, we'd write simply BL strcpy.

13.2. The program stack

Often we want subroutines that themselves call subroutines. Our subroutines will of course use some registers to perform their work. But that brings up a problem: How should we save registers that contain useful information, so that subroutines we call can use those registers for their own purposes?

The solution is naturally to save the registers' values in memory. In fact, we'll use a stack — called the program stack — to save information that a subroutine needs. When it starts, the subroutine will allocate a new block of memory on the top of the stack; and when it returns, the subroutine will release the block from the top, leaving on the stack's top the block of memory for the subroutine that called it.

We implement the stack within our programs by using a large block of memory, and we use a register called the stack pointer to point to the location of the top of the stack. In most processors, including the ARM processor, the stack pointer typically starts at a high address and decreases as we push more values onto it.

In the ARM processor, R13 is conventionally used for the stack pointer; in assembly programs, we'd typically reference it as SP. We'll push things onto the stack using the STMDB instruction, and we'll pop things off the stack using LDMIA.

An example of such a subroutine is below. This subroutine, an adaptation of our earlier fragment for adding the numbers of an array, needs two additional registers beyond the registers containing the parameters. Since a subroutine that calls this subroutine may not be able to afford those registers, we instead opt to write the subroutine so that it saves both registers onto the stack as soon as it is called; and just before returning, it restores the registers to their previous values.

; sumArray: Places sum of entries in array into R0. On entry, R0
; should be address of first array element, R1 should be array length.
sumArray STMDB SP!, { R4, R5 }  ; push R4 and R5 onto stack
         MOV R4, #0
sumLoop  MOV R5, [R0], #1
         ADD R4, R4, R5
         SUBS R1, R1, #1
         BNE sumLoop
         MOV R0, R4
         LDMIA SP!, { R4, R5 }  ; restore R4 and R5 from stack
         MOV LR, PC             ; return back to after sumArray call

One of the most common registers a subroutine will want to save is the link register, since the subroutine will often want to modify the link register itself as it calls other subroutines. There is a handy trick involving this: When we restore the registers at the subroutine's end, we can easily restore the link register's saved value into the program counter instead. As a result, we won't need the MOV PC, LR instruction.

subName STMDB SP!, { R4-R5,LR }
        ; code within subroutine goes here, with perhaps some calls
        ; to other subroutines (thus changing LR)
        LDMIA SP!, { R4-R5,PC } ; loading into PC returns out of subroutine

13.3. Calling conventions

To write large assembly programs, we need a standard system for passing parameters, return values, and allocate registers between subroutines. After all, if each subroutine created its own system, things would quickly get very confusing as we try to remember for each subroutine how to hand parameters to it and which registers it uses. Such a standard system is called a calling convention, and often there's a standard calling convention associated with the processor.

For the ARM processor, we'll follow the standard calling convention that parameters are passed by placing the parameter values into registers R0 through R3 before calling the subroutine, and a subrotine returns a value by placing it into R0 before returning. In the rare situation that a subroutine wants more than four parameters, we'd place any additional parameters onto the stack before entering the subroutine (with the earlier parameters pushed last onto the stack, so that the fifth parameter is on the stack's top (referenced by SP).

Each subroutine is allowed to alter R0 through R3 as it wishes; but if it uses R4 through R12, it must restore them to their previous values. It must also restore the stack pointer R13, effectively removing everything from the stack. It may change the link pointer R14.

Assembly programmers divide the registers into caller-save registers and callee-save registers. Caller-save registers are those that the subroutine may change, such as R0 through R3 in the ARM convention described above: They are caller-save because since a caller of a subroutine must save the registers' values if it wants the values after the subroutine completes. Callee-save registers are those that a subroutine must leave unchanged, like R4 through R12 is the convention described above: Upon being called, the subroutine (the callee) must save the registers' values if it wishes to use them.

It's beneficial for a calling convention to designate both caller-save registers and callee-save registers. If the convention designated all registers as callee-save, then subroutines would not be able to use any registers at all without saving them onto the stack first — which would be a waste, since some of the saved registers would be transient values that the calling subroutine did not care about long-term. And if the convention designated all registers as caller-save, then programmers would be forced to save many registers before every call to a subroutine and to restore them afterwards, lengthening the amount of time to call a subroutine.