Chapter 11. Assembly: Introduction

Our introduction to digital circuits is now complete. There is much else to learn — particularly how one can put the pieces together to design a CPU — but that will have to be deferred for another place, like a course on computer architecture.

Recall that we can think of the modern computing system as being structured in a four-level hierarchy.

Programming language
Operating system
Machine language
Digital circuit

We now advance to studying machine language, the system for expressing the instructions that a computer can perform. Each line of processors supports a completely different form of machine language, specific to that line of processors. The design of the machine language is called the instruction set architecture (ISA).

11.1. ISA varieties

IA32 is handily the most widely recognized ISA. Designed originally in 8-bit form by Intel in 1979 and extended over the years. (IA32 stands for 32-bit Intel Architecture.) Today, processors supporting IA32 are now manufactured by Intel, AMD, and VIA, and they can be found in most personal computers.

Another well-known ISA is the PowerPC. This was used most prominently by Macintosh computers for many years, until Apple switched their line of computers to IA32 in 2006. PowerPC still retains an important niche, particularly in gaming consoles: The Wii, Playstation 3, and XBox 360 all incorporate PowerPC chips.

But the ISA that we'll study an ISA designed by a company called ARM. (Like other successful ISAs, ARM's ISA has grown over the years. We'll examine version 4T.) Though not as widely recognized as IA32 or PowerPC, processors supporting ARM's ISA are actually distributed much more widely. The ISA is designed specifically for the simple processors required for low-power devices such as cellphones, digital music players, and handheld game systems. The iPhone, Blackberry, and Nintendo DS are all prominent examples of devices that incorporate an ARM processor.

There are several reasons for examining ARM's ISA rather than IA32.

11.2. A simple program

Let's start our introduction to ARM's ISA using a simple example. Imagine that for some reason we wanted to write code to add the numbers from 1 to 10. We might do this in C as follows.

int total;
int i;

total = 0;
for(i = 10; i > 0; i--) total += i;

This can easily be translated into the instructions supported by ARM's ISA as follows.

        MOV  R0, #0         ; R0 accumulates total
        MOV  R1, #10        ; R1 counts from 10 down to 1
again   ADD  R0, R0, R1
        SUBS R1, R1, #1
        BNE  again
halt    B    halt           ; infinite loop to stop computation

This is an example of assembly language. Each line of an assembly language program has a straightforward translation into the machine language that the processor actually executes. Machine language is a sequence of 0's and 1's that is very difficult for human programmers to manipulate. So instead, we write in assembly language, and we use an assembler to automatically translate it into machine language.

You'll notice the mentions of R0 and R1 in the assembly language program. These are references to registers, which are places in a processor for storing data during computation. The ARM processor includes 16 easily accessible registers, numbered R0 through R15. Each stores a single 32-bit number. Note that though registers store data, they are very separate from the notion of memory: Memory is typically much larger (at least several kilobytes of data, and often even gigabytes) and so must exist outside of the processor. Because of memory's size, accesses to memory tend to take about 10 times longer than accesses to registers; thus, assembly language programming tends to focus on using registers when possible.

Because each line of an assembly language program corresponds directly to machine language, the lines are highly restricted in their format. You can see that each consists of an abbrevation, called the op code, indicating the type of operation, followed by a list of arguments. Each op code severely restricts the arguments: For example, a MOV instruction must identify a register for its first argument and either a constant (prefixed by a '#') or a register for its second argument. A constant placed directly in an instruction is called an immediate, since it is immediately available to the processor.

In the above assembly language program, we first use the MOV instruction to initialize R0 at 0 and R1 at 10. The ADD instruction computes the sum of R0 and R1 (the second and third arguments) and places the result into R0 (the first argument); this corresponds to the total += i; line of the equivalent C program. The subsequent SUBS instruction decreases R1 by 1.

To understand the next instruction, we need to understand that in addition to the registers R0 through R15, the ARM processor also incorporates a set of four flags, labeled the zero flag (Z), the negative flag (N), the carry flag (C), and the overflow flag (V). Whenever an arithmetic instruction has an S at its end, as SUBS does, these flags will be updated based on the result of the computation. In this case, if the result of decreasing R1 by 1 results in 0, the Z flag will become 1; the N, C, and V flags are also updated, but they're not useful to what follows.

The following instruction, BNE, will check the Z flag. If the Z flag is not set (i.e., the previous subtraction gives a nonzero result), then BNE arranges the processor so that the next instruction executed is the ADD instruction, labeled begin; this leads to repeating the loop with a smaller value of R1. If the Z flag is set, the processor will simply continue on to the next instruction. (BNE stands for Branch if Not Equal. The name comes from imagining that we want to check whether two numbers are equal. One way to do this using ARM's ISA would be to first tell the processor to subtract the two numbers; if the difference is zero, then the two numbers must be equal, and the zero flag will be 1. them results in zero, which would set the zero flag.)

The final instruction, B, always branches back to the named instruction. In this program, the instruction names itself, effectively halting the program by putting the computer into a tight infinite loop.

11.3. Another assembly example

Let's look at another example. Here, suppose that we want to add the digits of a positive number; for example, given the number 1,024, we would want to compute 1 + 0 + 2 + 4, which is 7.

The obvious way to express this in C is as follows.

total = 0;
while(i > 0) {
    total += i % 10;
    i /= 10;
}

It's difficult to translate this into ARM's ISA, though, since the ARM lacks any instruction for dividing values. However, there is a multiply instruction UMULL, which multiplies two 32-bit numbers and yields a 64-bit result. If we take a number and multiply by 232 / 10, the upper 32 bits of the product tell us the result of dividing the original number by 10. This insight leads to the following alternative way of summing the digits in a number.

base = 0x1999999A;
total = 0;
while(i > 0) {
    iDiv10 = (i * base) >> 32;
    total += i - iDiv10 * 10;
    i = iDiv10;
}

In translating this into assembly code, we have to confront two issues. The more obvious is determining which instruction to use to perform the multiplication. Here, we want to use the UMULL instruction (Unsigned MULtiply Long), which interprets two registers as unsigned 32-bit numbers, and places the 64-bit product of the registers' values into two different registers. The below example illustrates.

UMULL R4, R5, R0, R2  ; computes R0 * R2, placing lower 32 bits in R4, upper 32 in R5

The less obvious issue we have to confront is that of placing 0x1999999A into a register. You might be tempted at first to use MOV, but this has a major limitation: Any immediate value must be rotated by an even number of places to reach an eight-bit value. For numbers between 0 and 255, this is not a problem; nor it is a problem for 1,024, since 0x400 can be achieved by rotating 1 right 22 places. But there's no way to do this for 0x1999999A. The solution we'll use is to load each byte separately, joining them using the ORR instruction, which computes the bitwise OR of two values.

        MOV R0, #1024          ; R0 is input, decreases by factors of 10
        MOV R1, #0             ; R1 is sum of digits
        MOV R2, #0x19000000    ; R2 is constantly 0x1999999A
        ORR R2, R2, #0x00990000
        ORR R2, R2, #0x00009900
        ORR R2, R2, #0x0000009A
        MOV R3, #10            ; R3 is constantly 10
loop    UMULL R4, R5, R0, R2   ; R5 is R0 / 10
        UMULL R4, R6, R5, R3   ; R4 is now 10 * (R0 / 10)
        SUB R4, R0, R4         ; R5 is now one's digit of R0
        ADD R1, R1, R4         ; add it into R1
        MOVS R0, R5
        BNE loop
halt    B halt

By the way, you may sometimes want to place a small negative number like −10 into a register. You can't use MOV to accomplish this, because its two's-complement representation is 0xFFFFFFF6, which can't be rotated into an 8-bit number. If it happens that to know that some register holds the number 0, then you could use SUB. But if it doesn't, then the MVN (MoVe Not) instruction is useful: It places the bitwise NOT of its argument into the destination register. So to get −10 into R0, we can use MVN R0, #0x9.

11.4. Summary of instructions so far

The ARM includes sixteen basic arithmetic instructions, numbered 0 through 15. This doesn't include UMULL, since that's considered a non-basic instruction. All sixteen are listed below, with the functionality summarized by the relevant C operator. (The numbers are just for identifying the instructio in the machine language translation; there's no reason to us as programmers to memorize the correspondence: This, after all, is why we have assemblers.)

Figure 11.1: ARM's basic arithmetic instructions

0. AND regd, rega, argb     regdrega & argb
1. EOR regd, rega, argb regdrega ^ argb
2. SUB regd, rega, argb regdrega - argb
3. RSB regd, rega, argb regdargb - rega
4. ADD regd, rega, argb regdrega + argb
5. ADC regd, rega, argb regdrega + argb + carry
6. SBC regd, rega, argb regdrega - argb - !carry
7. RSC regd, rega, argb regdargb - rega - !carry
8. TST regd, rega, argb set flags for rega & argb
9. TEQ regd, rega, argb set flags for rega ^ argb
10. CMP regd, rega, argb set flags for rega - argb
11. CMN regd, rega, argb set flags for rega + argb
12. ORR regd, rega, argb regdrega | argb
13. MOV regd, arg regdarg
14. BIC regd, rega, argb regdrega & ~argb
15. MVN regd, arg regd ← ~argb

Except for TST, TEQ, CMP, and CMN, all instructions may have an S postfixed to the op code to signify that the operation should set the flags. For TST, TEQ, CMP, and CMN, the S is implicit: The instructions don't change any general-purpose registers, so the only point in performing the instruction is to set the flags.

Each ARM instruction may incorporate a condition code specifying that the operation should take place only when certain combinations of the flags hold. You can specify the condition code by including it as part of the op code (but for arithmetic instructions, the condition code precedes the optional S). The name for the condition codes is based on the supposition that the flags were set based on a CMP or SUBS instruction.

Figure 11.2: ARM's condition codes

0. EQ    equal    Z
1. NE not equal !Z
2. CS or HS carry set / unsigned higher or same C
3. CC or LO carry clear / unsigned lower !C
4. MI minus / negative N
5. PL plus / positive or zero !N
6. VS overflow set V
7. VC overflow clear !V
8. HI unsigned higher C && !Z
9. LS unsigned lower or same !C || Z
10. GE signed greater than or equal N == V
11. LT signed less than N != V
12. GT signed greater than !Z && (N == V)
13. LE signed greater than or equal Z || (N != V)
14. AL or omitted always true

There are two other instructions that we have seen, UMULL and B. We will see more in the following chapters.

11.5. Another assembly language example

So far, we have applied the condition codes only to B instructions, but the ARM instruction set is somewhat unusual in that the condition codes can actually applied to any instruction. For example, ADDEQ is a valid op code: The addition will only be performed in the zero flag is set.

One common example of when this might be useful is in computing the greatest common divisor of two numbers using Euclid's GCD algorithm.

a = 40;
b = 25;
while(a != b) {
    if(a > ba -= b;
    else      b -= a;
}

The traditional way one would translate this to assembly language would be to use condition codes only on branch instructions.

        MOV R0, #40      ; R0 is a
        MOV R1, #25      ; R1 is b
again   CMP R0, R1
        BEQ halt
        BLT isLess
        SUB R0, R0, R1
        B again
isLess  SUB R1, R1, R0
        B again
halt    B halt

However, the following is a much shorter and more efficient translation.

        MOV R0, #40      ; R0 is a
        MOV R1, #25      ; R1 is b
again   CMP R0, R1
        SUBGT R0, R0, R1
        SUBLT R1, R1, R0
        BNE again
halt    B halt

This is more efficient not only by virtue of having fewer instructions: Modern processors pre-fetch the next instruction while executing the current instruction, but branches can disrupt the process since the location of the next instruction can't be known certainly. The second translation involves many fewer branch instructions.