Chapter 15. OS: Introduction

We have completed the first two parts of this class, concerning the programming language level and the instruction set level; we now enter the third part, wherein we study the operating system.

So let's start with the basics: What is the operating system's purpose?

It abstracts complex computer resources.

For example, a disk is a complex physical device that allows a system to read and write individual blocks of data. That's not too convenient for the typical program that wants to read or store a particular sequence of bytes. For that program, the notion of a file is more convenient. Note that the hardware is completely unaware of such a thing as a file — it is an abstraction created by the operating system to simplify how a program might deal with a disk.

The file isn't the only abstraction provided by an operating system. Other abstractions include the process for a running program, a window for access to a graphical display, or a connection for network communication. None of these abstractions has any basis in hard reality. But providing these convenient abstractions frees the programmer from worrying about the details of how the hardware actually works and from negotiating with other programs about who has what rights.

It provides hardware compatibility.

Everybody knows about the incompatibility issues surrounding operating systems, which cause people to have to use different software versions on different operating systems. Operating systems actually reduce incompatibility problems, though; we don't notice this because they eliminate incompatability problems so effectively. For example, there are many types of disks (hard disks, floppy disks, CD-ROMs); and even if you just look at hard disks, there are many standards for hard disks. Without an operating system, each program would have to include code to support each possible device. Some programs would work with some types of disks, while others would work with a different set of disks. With operating systems, the OS gets the responsibility for supporting this variety of disk types, and any program can use any disk supported by the operating system. If somebody releases a new type of disk, only the operating system needs to be updated so that all programs can use the new disk.

It protects the system.

If every program ran native on the computer, then each program would be able to wreak havoc with the system. One of the duties of the operating system is to stand guard over programs. It prevents individual programs from accessing the system directly, instead requiring any requests to go through the operating system. The operating system ensures that program requests are safe before executing them.

Part of this is to avoid malicious attacks, like those of a virus. But, just as significantly, it protects the system from permanent damage by errant programs, which perhaps haven't been tested fully yet.

You can think of an operating system as the adult in the computer, parenting the young user programs. An adult often has to explain events at the kid's level using metaphors (those are the abstractions), and the adult often performs tasks that the child can't handle on its own (buying a piece of candy).

15.1. Exceptions

To understand how operating systems work, we need to first go back to discuss an additional concept in the instruction set layer: the exception.

Note: When we talk about exceptions in the context of instruction sets, we are not referring to the exceptions that you find in programming languages like Java. That's a completely different topic, and any resemblance is basically superficial.

15.1.1. Categories

We're familiar with the normal control flow of a program: Usually, after a particular instruction, the CPU continues onto the following instruction. Sometimes, a branch instruction or a return from a subroutine will lead it to jump to a different instruction. All this is normal.

But there are exceptions to these rules, and that's what we're talking about here. There are basically five categories of these exceptions, summarized by the following table.

category initiator return location
hardware interruptI/O device next instruction
software interruptprogram codenext instruction
traps CPU next instruction
faults CPU same instruction
aborts CPU no return

In all these cases, the computer will jump into a subroutine called an exception handler for handling the exception. Usually, but not necessarily, the exception handler will return to the instruction following where the CPU was at the time of the exception. The exception handler is part of the operating system.

A hardware interrupt is initiated by a device, like a hard disk or a keyboard. An OS can alternatively be designed to receive information from a device by periodically asking it whether it has anything to say; this is called polling. Though simple, polling is inefficient, wasting CPU cycles continually confirming that each device has nothing. This can be especially problematic when systems have a wide variety of devices attached, all of which must be polled. To save this cost, hardware designers design devices to send hardware interrupts to the CPU when they have information to send. The interrupt sends the CPU into the exception handler to execute code to communicate with the device. This uses the CPU more efficiently than polling, since the only computation spent on receiving communication from a device occurs when the device is ready to send information.

A software interrupt is an exception initiated by the running program. The ARM instruction set, for example, includes an instruction SWI for a program to initiate an exception.

SWI #1

(SWI stands for SoftWare Interrupt, not switch as you might otherwise assume.)

This instruction is useful for transferring control into the operating system. For example, a program cannot save something to the disk directly; if it wants to do this, it must ask the operating system to do this. We'll see more about this in Section 15.2.

A trap is initiated by the CPU to signal that an instruction that is normally successful has failed utterly. Three examples of traps found in many computers are the following.

A trap allows the operating system to address the errors made by the running program. In some cases, the OS might attempt to continue running the offending program, but frequently it will abort the program.

A fault is also initiated by the CPU in response to an error in processing an instruction, but a fault signals that the operating system should be able to fix the problem. A prominent example comes from virtual memory — a program might try to load from an address that is actually stored on disk, not RAM. In this case, the OS needs to load the information into RAM, and then the OS can return to the same instruction of the program, which will then be successful. (We'll talk about virtual memory later.)

Finally, an abort is a very rare case that represents a disaster. For example, if the CPU senses that memory is not working, then it might cause an abort to occur, so that the computer can perhaps crash more gracefully than simply turning itself off.

15.1.2. Exception handling

A modern CPU can run in a variety of processor modes. Different modes have different access privileges. Most programs execute with the CPU in user mode. User mode is very restrictive, preventing the CPU from doing such things as communicating with I/O devices, so that the a program can't talk directly with the disk and perhaps read data it shouldn't be allowed to read. It will also restrict the CPU so that it can only access a smaller portion of memory.

Other modes are privileged, allowing access to all memory and allowing direct communication with I/O devices. However, when a program is in user mode, it cannot enter a privileged mode. The only way the CPU will switch into a privileged mode is in response to an exception. But when an exception occurs, the CPU will simultaneously jump into code that is part of the operating system. Because this jump into the operating system occurs whenever the CPU enters a privileged mode, a user-mode program has no way of tricking the CPU into executing code in the privileged environment. (The CPU must also allow the operating system to indicate regions of memory that the CPU should not allow user-mode programs to access; this way, user-mode programs can't modify the operating system.)

The ARM processor supports six processor modes, but we'll only concern ourselves with two — user mode and supervisor mode. Supervisor mode is the mode entered via a SWI instruction. These two modes have different R13 and R14 registers; when the CPU is told to access R13 or R14, it looks at one or the other depending on which mode it is in.

When the CPU encounters a SWI instruction, code, it goes through the following steps.

  1. It places the program counter R15 into the supervisor mode's link register R14. This is so that when the operating system finishes processing the interrupt, it knows where to return.

  2. It places the current program status register CPSR into the saved program status register SPSR. We haven't seen the CPSR yet, but it holds information about the processor state. It is not among the 16 general-purpose registers R0 through R15 that are accessible by normal instructions. The information in the CPSR includes the four flags set by arithmetic instructions such as CMP. It also includes some bits indicating which of the six modes the processor is currently in; the processor looks into these bits whenever it gets to an instruction that involves privileged access.

    The SPSR is unavailable when in user mode. It is used to save the CPSR so that it can be restored when returning back into user mode.

  3. The lower five bits of the CPSR are changed to 10011, the code the ARM processor uses to indicate that it is in supervisor mode.

  4. One bit in the CPSR is the interrupt flag. It indicates whether the CPU is to ignore interrupts received from I/O devices. This bit is normally clear, but the SWI instruction will set the interrupt flag. This prevents the CPU from responding to other interrupts received while the operating system is processing the software interrupt.

  5. Finally, the address 8 is placed into R15, so that the next instruction executed by the CPU will be the instruction in address 8 of memory.

15.2. System calls

A system call is a request by a user program to the operating system to perform some operation on the program's behalf. Examples of system calls in a typical operating system include a request to open a file, a request to start another program, a request to send a message to another computer, or a request to display a line on the screen.

As we'll study it here, we'll specify which system call we are making through the argument to the SWI instruction. Linux has assigned a unique identifier to each system call type. The below table show some of these codes.

Linux system call codes
system callidentifier
exit1
read3
write4
open5
close6

Thus to make the exit system call, we'd execute the instruction SWI #1. Recall from our earlier discussion of the SWI instruction that at no time does the CPU actually examine the argument. In fact, the processor ignores this argument when executing the instruction. However, the interrupt handler starting starting at memory address 8 will often load this instruction into a register in order to determine what type of operation it should perform. The following code loads this argument into R3.

MOV R3, [LR, #-4]       ; load SWI instruction into R3
BIC R3, R3, #0xFF000000 ; clear top 8 bits, where SWI op code was

System calls will usually have parameters; a program should place its arguments into the R0 through R3 registers, just as it does when calling subroutines. On completing the system call, the operating system leaves any return value in register R0.

As an example, let's look at the exit() system call with Linux. The exit() system call is for telling the operating system to remove the requesting process from the system entirely. It takes a single integer parameter, an integer code meant to be given to the process that started the program as a summary of whether the process was successful. Most often, this is simply 0, which conventionally means that the program completed its job successfully.

Below is a simple C program using the exit system call and its translation into ARM assembly using the system call conventions described here.

int main() {
    exit(0);
}
   main  MOV R0, #0  ; place parameter into R0
      SWI #1      ; enter OS with code 1 = exit

In the case of exit, there is no point in having additional code following the system call, since the function will not return to the user program. Note how the assembly translation places 0, the system call's parameter, into R0, and then it initiates the software interrupt using 1 for the system call code.

15.3. Handling files

In Unix-based systems, a process can interact with files through file descriptors, integer identifiers of files that the process has open. For each process, the operating system maintains a table to track how file descriptors map to locations on the disk, but this table is not available for the process to see.

15.3.1. Default descriptors

A process has three file descriptors by default.

Notice that I said usually in all of the above. When you use redirection, the system sets up the program's default file descriptors to have different meanings. Suppose we wrote the following at a Unix prompt.

unix% a.out < infile > outfile

The system will interpret this command as saying to run the a.out program, but make its 0 file descriptor refer to infile instead of the keyboard, and make its 1 file refer to outfile instead of the screen. (It would keep the 2 file descriptor referring to the screen, so any error messages sent to descriptor 2 by a.out would still appear for the user to see.)

The operating system handles the duty of placing the information into the file; in fact, the program doesn't even know about infile and outfile — it just reads from file descriptor 0 and writes to file descriptor 1 as normal, oblivious to the fact that it's actually reading from a file and writing to another file. Because the system handles redirection, redirection will work for any program.

15.3.2. Managing descriptors

For creating a file descriptor, Linux has the open system call, which takes two parameters, the file name and an int representing options to the system call. The open system call returns the integer file descriptor it creates, or a negative number if the requested file can't be opened.

file_desc = open(filenamemode);
The file name would be a pointer to the first character of a C string. The mode is an integer identifier for identifying how the program will use the file; for reading through a file, the right parameter value is 0.

The close system call allows a process to deallocate a file descriptor.

close(file_desc);

Closing a file is important in Linux for two reasons.

15.3.3. Reading and writing

To get information from a file, we use the read system call.

nbytes = read(file_descbufbuf_len);

This takes three parameters: first the file descriptor (an int), then a pointer to an array of bytes (a char*), and then an integer saying how long the array is. It returns an int representing the number of bytes read from the file, 0 if it has reached the file's end, or a negative integer in the case of an error.

The write system call is quite similar.

write(file_descbufnbytes);
It takes the file descriptor (an int), a pointer to an array of bytes (a char*), and an integer saying how many bytes to write to the file.

15.3.4. Example

Below is a translation of a C program using some system calls.

int main() {
    char buf[80];
    int nbytes;

    nbytes = read(0, buf, 80);
    write(1, bufnbytes);
    exit(0);
}
   main  MOV SP, #0x10000  ; set up stack
      SUB SP, SP, #80   ; and allocate 80 bytes on it

      MOV R0, #0        ; R4 = read(0, SP, 80)
      MOV R1, SP
      MOV R2, #80
      SWI #3
      MOV R4, R0

      MOV R0, #1        ; write(1, SP, R4);
      MOV R1, SP
      MOV R2, R4
      SWI #4

      MOV R0, #0        ; exit(0);
      SWI #1

Note the following about this program.

15.4. Library functions

When we write a C program, the system calls look mysteriously like calls to standard functions. Does this mean that all the functions we've learned about in C are system calls?

No. For example, printf() is a library function. This means that it is included in a library for the compiler to use, but it is not part of the operating system like a system call. When the compiler compiles the program, it finds whatever library functions the program uses and includes them in the executable file. Thus, printf() is not part of the operating system; it is part of the user program.

Library functions serve two main purposes.

They provide portability.

Programming language designers want programs written in their language to be written across multiple platforms. Therefore, the designers choose to design their own functions, requiring the compiler for each platform to include an implementation of the functions. This way, a program written using these functions should work on many platforms. Thus, a program using printf() can work on a wider variety of systems than one using write().

They provide complex functionality.

Programming language designers and operating system designers have conflicting interests. The operating system designer wants to keep system calls as elementary as possible so that the operating system is reliable and secure, while the language designer wants to make tasks easy for the programmer. Thus, system calls tend to be very elementary, leaving it to the compiler to provide more sophisticated behavior through its libraries. The printf() function is an example of this, where the library function provides complex formatting functionality, such as displaying numbers, that the much simpler write system call does not.