Chapter 11. Assembly: Introduction
Our introduction to digital circuits is now complete. There is much else to learn — particularly how one can put the pieces together to design a CPU — but that will have to be deferred for another place, like a course on computer architecture.
Recall that we can think of the modern computing system as being structured in a four-level hierarchy.
Programming language Operating system Machine language Digital circuit
We now advance to studying machine language, the system for expressing the instructions that a computer can perform. Each line of processors supports a completely different form of machine language, specific to that line of processors. The design of the machine language is called the instruction set architecture (ISA).
11.1. ISA varieties
IA32 is handily the most widely recognized ISA. Designed originally in 8-bit form by Intel in 1979 and extended over the years. (IA32 stands for 32-bit Intel Architecture.) Today, processors supporting IA32 are now manufactured by Intel, AMD, and VIA, and they can be found in most personal computers.
Another well-known ISA is the PowerPC. This was used most prominently by Macintosh computers for many years, until Apple switched their line of computers to IA32 in 2006. PowerPC still retains an important niche, particularly in gaming consoles: The Wii, Playstation 3, and XBox 360 all incorporate PowerPC chips.
But the ISA that we'll study an ISA designed by a company called ARM. (Like other successful ISAs, ARM's ISA has grown over the years. We'll examine version 4T.) Though not as widely recognized as IA32 or PowerPC, processors supporting ARM's ISA are actually distributed much more widely. The ISA is designed specifically for the simple processors required for low-power devices such as cellphones, digital music players, and handheld game systems. The iPhone, Blackberry, and Nintendo DS are all prominent examples of devices that incorporate an ARM processor.
There are several reasons for examining ARM's ISA rather than IA32.
Assembly language programming is rarely used for more powerful computing systems, since it's far more efficient to program in a high-level programming language. But for small devices, assembly language programming remains important: Due to power and price constraints, the devices have very few resources, and so developers use assembly language to use these resources as efficiently as possible.
The multiple extensions to the IA32 architecture lead it to be far too complicated for us to really understand thoroughly.
IA32 dates from the 1970's, which was a completely different era in computing. ARM is more representative of other, more modern ISA designs.
11.2. A simple program
Let's start our introduction to ARM's ISA using a simple example. Imagine that for some reason we wanted to write code to add the numbers from 1 to 10. We might do this in C as follows.
int total;
int i;
total = 0;
for(i = 10; i > 0; i--) total += i;
This can easily be translated into the instructions supported by ARM's ISA as follows.
MOV R0, #0 ; R0 accumulates total
MOV R1, #10 ; R1 counts from 10 down to 1
again ADD R0, R0, R1
SUBS R1, R1, #1
BNE again
halt B halt ; infinite loop to stop computation
This is an example of assembly language. Each line of an assembly language program has a straightforward translation into the machine language that the processor actually executes. Machine language is a sequence of 0's and 1's that is very difficult for human programmers to manipulate. So instead, we write in assembly language, and we use an assembler to automatically translate it into machine language.
You'll notice the mentions of R0 and R1 in the assembly language program. These are references to registers, which are places in a processor for storing data during computation. The ARM processor includes 16 easily accessible registers, numbered R0 through R15. Each stores a single 32-bit number. Note that though registers store data, they are very separate from the notion of memory: Memory is typically much larger (at least several kilobytes of data, and often even gigabytes) and so must exist outside of the processor. Because of memory's size, accesses to memory tend to take about 10 times longer than accesses to registers; thus, assembly language programming tends to focus on using registers when possible.
Because each line of an assembly language program corresponds
directly to machine language, the lines are highly restricted in their
format. You can see that each consists of an abbrevation, called the
op code, indicating the type of operation, followed by
a list of arguments. Each op code severely restricts the arguments: For
example, a MOV instruction must identify a register for its
first argument and either a constant (prefixed by a '#') or
a register for its second argument. A constant placed directly in an
instruction is called an immediate, since it is
immediately available to the processor.
In the above assembly language program, we first use the
MOV instruction to initialize R0 at 0 and R1 at 10.
The ADD instruction computes the sum of R0 and R1 (the
second and third arguments) and places the result into R0 (the first
argument); this corresponds to the line
of the equivalent C program.
The subsequent total += i;SUBS instruction decreases R1 by 1.
To understand the next instruction, we need to understand that in
addition to the registers R0 through R15, the ARM processor also
incorporates a set of four flags,
labeled the zero flag (Z), the
negative flag (N), the carry flag (C), and the overflow flag (V).
Whenever an arithmetic instruction has an S at its end, as
SUBS does, these
flags will be updated based on the result of the computation.
In this case, if the result of decreasing R1 by 1 results in 0, the Z
flag will become 1; the N, C, and V flags are also updated, but they're
not useful to what follows.
The following instruction, BNE, will check the Z flag.
If the Z flag is not set (i.e., the previous subtraction gives a nonzero
result), then BNE arranges the processor so that the next
instruction executed is the ADD instruction, labeled
begin; this leads to repeating the loop with a smaller
value of R1. If the Z flag is set, the processor will
simply continue on to the next instruction.
(BNE stands for Branch if Not Equal.
The name comes from imagining that we want to check whether two numbers
are equal. One way to do this using ARM's ISA would be to first
tell the processor to subtract the two numbers; if the difference is
zero, then the two numbers must be equal, and the zero flag will be 1.
them results in zero, which would set the zero flag.)
The final instruction, B, always branches back to
the named instruction. In this program, the instruction names itself,
effectively halting the program by putting the computer into a tight
infinite loop.
11.3. Another assembly example
Let's look at another example. Here, suppose that we want to add the digits of a positive number; for example, given the number 1,024, we would want to compute 1 + 0 + 2 + 4, which is 7.
The obvious way to express this in C is as follows.
total = 0;
while(i > 0) {
total += i % 10;
i /= 10;
}
It's difficult to translate this into ARM's ISA, though, since the
ARM lacks any instruction for dividing values. However, there is a
multiply instruction UMULL, which multiplies two 32-bit
numbers and yields a 64-bit result. If we take a number and multiply by
232 / 10, the upper 32 bits of the product tell us
the result of
dividing the original number by 10. This insight leads to the following
alternative way of summing the digits in a number.
base = 0x1999999A;
total = 0;
while(i > 0) {
iDiv10 = (i * base) >> 32;
total += i - iDiv10 * 10;
i = iDiv10;
}
In translating this into assembly code, we have to confront two
issues. The more obvious is determining which instruction to use to
perform the multiplication. Here, we want to use the UMULL
instruction (Unsigned MULtiply Long), which
interprets two registers as unsigned 32-bit numbers,
and places the 64-bit product of the registers' values into two
different registers. The below example illustrates.
UMULL R4, R5, R0, R2 ; computes R0 * R2, placing lower 32 bits in R4, upper 32 in R5
The less obvious issue we have to confront is that of placing
0x1999999A into a register. You might be tempted at first to use
MOV, but this has a major limitation: Any immediate value
must be rotated by an even number of places to reach an eight-bit value.
For numbers between 0 and 255, this is not a problem; nor it is a
problem for 1,024, since 0x400 can be achieved by rotating 1 right 22
places. But there's no way to do this for 0x1999999A. The solution we'll
use is to load each byte separately, joining them using the
ORR instruction, which computes the bitwise OR of two
values.
MOV R0, #1024 ; R0 is input, decreases by factors of 10
MOV R1, #0 ; R1 is sum of digits
MOV R2, #0x19000000 ; R2 is constantly 0x1999999A
ORR R2, R2, #0x00990000
ORR R2, R2, #0x00009900
ORR R2, R2, #0x0000009A
MOV R3, #10 ; R3 is constantly 10
loop UMULL R4, R5, R0, R2 ; R5 is R0 / 10
UMULL R4, R6, R5, R3 ; R4 is now 10 * (R0 / 10)
SUB R4, R0, R4 ; R5 is now one's digit of R0
ADD R1, R1, R4 ; add it into R1
MOVS R0, R5
BNE loop
halt B halt
By the way, you may sometimes want to place a small negative number
like −10 into a register. You can't use MOV to
accomplish this, because its two's-complement representation is
0xFFFFFFF6, which can't be rotated into an 8-bit number. If it happens
that to know that some register holds the number 0, then you could use
SUB. But if it doesn't, then the MVN
(MoVe Not) instruction is useful: It places the
bitwise NOT of its argument into the destination register. So to get
−10 into R0, we can use
.MVN R0, #0x9
11.4. Summary of instructions so far
The ARM includes sixteen basic arithmetic instructions, numbered 0
through 15. This doesn't include UMULL, since that's
considered a non-basic
instruction. All sixteen are listed below,
with the functionality summarized by the relevant C operator.
(The numbers are just for identifying the instructio in the machine
language translation; there's no reason to us as programmers to memorize
the correspondence: This, after all, is why we have assemblers.)
Figure 11.1: ARM's basic arithmetic instructions
0. ANDregd, rega, argbregd ← rega & argb 1. EORregd, rega, argbregd ← rega ^ argb 2. SUBregd, rega, argbregd ← rega - argb 3. RSBregd, rega, argbregd ← argb - rega 4. ADDregd, rega, argbregd ← rega + argb 5. ADCregd, rega, argbregd ← rega + argb + carry 6. SBCregd, rega, argbregd ← rega - argb - !carry 7. RSCregd, rega, argbregd ← argb - rega - !carry 8. TSTregd, rega, argbset flags for rega & argb 9. TEQregd, rega, argbset flags for rega ^ argb 10. CMPregd, rega, argbset flags for rega - argb 11. CMNregd, rega, argbset flags for rega + argb 12. ORRregd, rega, argbregd ← rega | argb 13. MOVregd, argregd ← arg 14. BICregd, rega, argbregd ← rega & ~argb 15. MVNregd, argregd ← ~argb
Except for TST, TEQ, CMP, and
CMN, all instructions may have an S postfixed to
the op code to signify that the operation should set the flags. For
TST, TEQ, CMP, and
CMN, the S is implicit: The instructions don't
change any general-purpose registers, so the only point in performing
the instruction is to set the flags.
Each ARM instruction may incorporate a condition
code specifying that the operation should take place only
when certain combinations of the flags hold. You can specify the
condition code by including it as part of the op code (but for
arithmetic instructions, the condition code precedes the
optional S). The name for the condition codes is based on
the supposition that the flags were set based on a CMP or
SUBS instruction.
Figure 11.2: ARM's condition codes
0. EQequal Z 1. NEnot equal !Z 2. CSorHScarry set / unsigned higher or same C 3. CCorLOcarry clear / unsigned lower !C 4. MIminus / negative N 5. PLplus / positive or zero !N 6. VSoverflow set V 7. VCoverflow clear !V 8. HIunsigned higher C && !Z 9. LSunsigned lower or same !C || Z 10. GEsigned greater than or equal N == V 11. LTsigned less than N != V 12. GTsigned greater than !Z && (N == V) 13. LEsigned greater than or equal Z || (N != V) 14. ALor omittedalways true
There are two other instructions that we have seen,
UMULL and B. We will see more in the following
chapters.
11.5. Another assembly language example
So far, we have applied the condition codes only to B
instructions, but the ARM instruction set is somewhat unusual in that
the condition codes can actually applied to any instruction. For
example, ADDEQ is a valid op code: The addition will only
be performed in the zero flag is set.
One common example of when this might be useful is in computing the greatest common divisor of two numbers using Euclid's GCD algorithm.
a = 40;
b = 25;
while(a != b) {
if(a > b) a -= b;
else b -= a;
}
The traditional way one would translate this to assembly language would be to use condition codes only on branch instructions.
MOV R0, #40 ; R0 is a
MOV R1, #25 ; R1 is b
again CMP R0, R1
BEQ halt
BLT isLess
SUB R0, R0, R1
B again
isLess SUB R1, R1, R0
B again
halt B halt
However, the following is a much shorter and more efficient translation.
MOV R0, #40 ; R0 is a
MOV R1, #25 ; R1 is b
again CMP R0, R1
SUBGT R0, R0, R1
SUBLT R1, R1, R0
BNE again
halt B halt
This is more efficient not only by virtue of having fewer
instructions: Modern processors pre-fetch
the next instruction
while executing the current instruction, but branches can disrupt the
process since the location of the next instruction can't be known
certainly. The second translation involves many fewer branch
instructions.
