Chapter 12. Assembly: Memory
We've seen how to build assembly programs that perform basic numerical computation. We'll now turn to examining how assembly programs can access memory.
12.1. Basic memory instructions
The ARM supports memory access via the LDR
instruction for loading and the STR instruction for storing.
Each takes two arguments. The first argument is the register
into which data should be loaded or from which data should be stored.
The second is a memory address as found in a register. The register in
the second argument should be in brackets.
For an example of how these instructions work, let's suppose we want a assembly program fragment that adds the integers in an array. We imagine that R0 holds the address of the first integer of the array, and R1 holds the number of integers in the array.
addInts MOV R4, #0
addLoop LDR R2, [R0]
ADD R4, R4, R2
ADD R0, R0, #4
SUBS R1, R1, #1
BNE addLoop
In this fragment, we use R4 to hold the sum of the integers so far.
In the LDR instruction, we look into R0 for a memory address
and load the data found at that address into R2. We then add this value
into R4. Then, we move R0 so that it contains the memory address of the
next integer in the array; we increase R0 by four because each integer
consumes four bytes of memory. Finally, we decrement R1, which is the
number of integers left to read from the array, and we repeat the
process if there are integers remaining.
Both LDR and STR load and store 32-bit values.
There are also instructions for working with 8-bit values, LDRB
and STRB; these are useful primarily for working with strings.
Below is an implementation of C's strcpy function; we imagine
that R0 holds the address of the first character of the destination
array, and that R1 holds the address of the first character of the
source string. We want to keep copying until we copy the terminating NUL
character (ASCII 0).
strcpy LDRB R2, [R1]
STRB R2, [R0]
ADD R0, R0, #1
ADD R1, R1, #1
TST R2, R2 ; repeat if R2 is nonzero
BNE strcpy
12.2. Addressing modes
The ARM instructions for loading and storing memory provide several ways of indicating the memory address. These are called addressing modes. So far, we have always accessed memory by simply enclosing in brackets the register holding the memory address we wish to access; that is one addressing mode.
Another addressing mode is scaled register offset, where we include in the brackets a register, another register, and a shift value. To compute the memory address to access, the processor takes the first register, and adds to it the second register shifted according to the shift value. (Neither of the registers mentioned in brackets change values.) This addressing mode is useful when accessing an array where you know the array index. We can modify our earlier routine for adding the integers in an array to take advantage of this addressing mode.
addInts MOV R4, #0
addLoop SUBS R1, R1, #1
LDR R2, [R0, R1, LSL #2]
ADD R4, R4, R2
BNE addLoop
With each iteration of the loop, we first decrement our loop index R1. Then we retrieve the element at that entry of the array using a scaled register offset: We use R0 as our base, and we add to it R1 shifted left two places. We shift R1 left two places so that R1 is multiplied by four; after all, each integer in the array is four bytes long. After adding the loaded value into R4, which accumulates the total, we repeat the loop if R1 hasn't reached 0 yet.
This version of the code behaves somewhat differently from our earlier version. First, it loads the numbers in the array in reverse order — that is, it loads the last number in the array first. Second, R0 remains unaltered in the course of the fragment. And finally, it will be somewhat faster since it has one less instruction per loop iteration.
Immediate post-indexed addressing is another addressing mode. To indicate this mode in assembly language, we follow the we follow the brackets with a comma and a positive or negative immediate. In executing the instruction, the processor still accesses the memory address found in the register, but after access memory the register is increased or decreased according to the immediate.
Our strcpy implementation is a useful example where immediate
post-indexed addressing is useful: After we store to R0, we will always
want R0 to increase by 1; and similarly, after we load from R1, we will
always want R1 to increase by 1. We can use immediate post-indexed
addressing to avoid the two ADD instructions of our earlier
version.
strcpy LDRB R2, [R1], #1
STRB R2, [R0], #1
TST R2, R2 ; repeat if R2 is nonzero
BNE strcpy
All in all, the ARM processor supports ten addressing modes.
| [Rn, #±imm] | Immediate offset
Address accessed is imm more/less than the address found in Rn. Rn does not change. |
| [Rn] | Register
Address accessed is value found in Rn. This is just shorthand for [Rn, #0]. |
| [Rn, ±Rm, shift] | Scaled register offset
Address accessed is sum/difference of the value in Rn and the value in Rm shifted as specified. Rn and Rm do not change values. |
| [Rn, #±Rm] | Register offset
Address accessed is sum/difference of the value in Rn and the value in Rm. Rn and Rm do not change values. This is just shorthand for [Rn, #±Rm, LSL #0]. |
| [Rn, #±imm]! | Immediate pre-indexed
Address accessed is as with immediate offset mode, but Rn's value updates to become the address accessed. |
| [Rn, ±Rm, shift]! | Scaled register pre-indexed
Address accessed is as with scaled register offset mode, but Rn's value updates to become the address accessed. |
| [Rn, #±Rm]! | Register pre-indexed
Address accessed is as with register offset mode, but Rn's value updates to become the address accessed. |
| [Rn], #±imm | Immediate post-indexed
Address accessed is value found in Rn, and then Rn's value is increased/decreased by imm. |
| [Rn], ±Rm, shift | Scaled register post-indexed
Address accessed is value found in Rn, and then Rn's value is increased/decreased by Rm shifted according to shift. |
| [Rn], ±Rm | Register post-indexed
Address accessed is value found in Rn, and then Rn's value is increased/decreased by Rm. This is just shorthand for [Rn], ±Rm, LSL #0. |
For those addressing modes involving a shift, the shift technique is as with the arithmetic instructions (LSL, LSR, ASR, ROR, RRX). But the shift distance cannot be according to a register: The distance must be an immediate.
12.3. Initializing memory
We often want to reserve memory for holding data in a program. To do
this, we use directives: directions for the
assembler to do something that don't correspond to instructions that the
computer might execute at run-time. One useful directive
is DCD, after which you list the 32-bit values to be stored in
memory. (DCD cryptically stands for
Define Constant Double-words.)
primes DCD 2, 3, 5, 7, 11, 13, 17, 19
In this example, we've created the label primes, which will
correspond to the address of a 2 in memory. In the next four bytes is
the integer 3, then 5, and so on. In our program, we would want to load
the address of the array into a register; to do this, we add
primes into the program counter PC (which is synonymous with
R15). The below fragment loads the fifth prime (11) into R1.
ADD R0, PC, #primes ; load address of primes[0] into R0
LDR R1, [R0, #16] ; load primes[4] into R1
Another directive worth mentioning is DCB, for
loading bytes into memory. Thus, we could write the following.
primes DCB 2, 3, 5, 7, 11, 13, 17, 19
However, we are using just one byte for each number, so we can only store numbers between −128 and 127. We can also include a string in the list; each character of the string will occupy one byte of memory.
greet DCB "hello world\n", 0
Notice how we included 0 after the string. Without this, the string won't be terminated by the NUL character.
One more directive worth noting here is the percent sign %.
This is useful when you wish you reserve a block of memory, but you
don't care about the memory's initial value.
array % 120 ; reserve 120 bytes of memory, which can hold 30 ints
12.4. Multiple-register memory instructions
The ARM ISA also includes instructions allowing several values to be
loaded or stored in the same instruction. The LDMIA instruction
is one of these instruction: It allows loading into multiple registers starting
at an address named in another register. In the below example of
its usage, we rewrite our fragment to add the integers in an array so that it
actually processes four integers with each iteration of the loop; this strategy
allows the program to run using fewer instructions, at the expense of a bit more
complexity.
; R0 holds address of first integer in array
; R1 holds array's length; fragment works only if length is multiple of 4
addInts MOV R4, #0
addLoop LDMIA R0!, { R5-R8 }
ADD R5, R5, R6
ADD R7, R7, R8
ADD R4, R4, R5
ADD R4, R4, R7
SUBS R1, R1, #4
BNE addLoop
In executing the LDMIA instruction above, the ARM processor looks
into the R0 register for an address. It loads into R5 the four bytes starting
at that address, into R6 the next four bytes, into R6 the next four bytes,
and into R7 the next four bytes. Meanwhile, R0 is stepped forward by 16
bytes, so with the next iteration the LDMIA instruction will load
the next four words into the registers.
Inside the braces can be any list of registers, using dashes to indicate
ranges of registers, and using commas to separate ranges.
Thus, the instruction LDMIA R0!, { R1-R4, R8, R11-R12 } will load
seven words from memory. The order in which the registers are listed is not
significant; even if we write LDMIA R0!, { R11-R12, R8, R1-R4 },
R1 will receive the first word loaded from memory.
The exclamation point following R0 in our example may be omitted; if omitted, then the address register is not altered by the instruction. That is, R0 would continue pointing to the first integer in the array.
Another instruction is STMIA, which stores several registers into
memory. In the following example, we shift every number in an array into
the next spot; thus, the array <2,3,5,7> becomes
<0,2,3,5>.
; R0 holds address of first integer in array
; R1 holds array's length; fragment works only if length is multiple of 4
shift MOV R4, #0
shLoop LDMIA R0, { R5-R8 }
STMIA R0!, { R4-R7 }
MOV R4, R8
SUBS R1, R1, #4
BNE shLoop
Notice how the LDMIA instruction omits the exclamation point
so that R0 isn't modifed. This is so that STMIA stores into
the same range of addresses that were just loaded into the registers.
The STMIA instruction has the exclamation point because R0 must
be modified in preparation for any next iteration of the loop.
The ARM processor includes four modes of the multiple-load and multiple-store instructions.
LDMIA, STMIA |
Increment after
We start loading from the named address and into increasing addresses. |
LDMIB, STMIB |
Increment before
We start loading from four more than the named address and into increasing addresses. |
LDMDA, STMDA |
Decrement after
We start loading from the named address and into decreasing addresses. |
LDMDB, STMDB |
Decrement before
We start loading from four less than the named address and into decreasing addresses. |
Across all four modes, the highest-numbered register always
corresponds to the highest address in memory. Thus, the instruction
LDMDA R0, { R1-R4 } will place R4 into the address named by R0,
R3 into R0 − 4, and so on.
As we'll see in the next chapter, the different choices of modes is particularly useful when we want to use a block of unused memory as a stack.
