Chapter 2. C: Introduction

In the 1970's, Ken Thompson developed C to help with developing the UNIX operating system at Bell Laboratories. [UNIX is a registered trademark. We will use the mixed-case version (Unix) to refer to the various descendants of AT&T's original UNIX. This will generally include Linux, although Linux is not strictly a descendant.] Through a variety of historical events, few intentional, UNIX grew from a minor research diversion into the popular industrial-strength operating system of today. And along with UNIX's success came C, since the operating system was designed so that C programs could access all of its features. As more programmers gained experience with C, they began to use it on other platforms too, so that it became one of the primary languages for developing software by the end of the 1980's.

In the early 1990's, a team at Sun Microsystems led by James Gosling was presented the problem of developing software for embedded systems — that is, systems built for a specific purpose that incorporate a processor, as in modern microwaves, cars, and VCRs. After surveying the possible languages, the team decided that none of the existing languages were adequate, and they decided to develop their own, named Java. The group's purpose regarding embedded systems fell by the wayside; but Sun managed to get Java labeled as a Web development language when they released it an 1995, just as the Internet and the Web were becoming a popular phenomenon. Java has not lived up to its hype as a Web development language, but the language is proving a useful general programming language, and now it is proving popular in industrial environments.

Many college computer science programs now teach Java to students, enabling them to teach several features not commonly available with other languages. But a lot of software development is still done in C. Often this is for historical reasons — if the first version of a piece of software was written in the 1980's, the initial language choice was likely C, and that initial development still forms the core of the system. But also C and Java fill different niches in the software world: While Java is well suited for application programming (like a word processor), C's more primitive style works better for systems programming (like a Web server), where C's low-level efficiency and high compatibility with most of today's popular operating systems are important features.

2.1. Basics

The good news is that when the folks from Sun designed Java, they used C as their starting point, since they wanted all those C programmers to feel that learning Java would be a small step. (In fact, Java programming turns out to be very different from C programming; the similarity can be misleading.) As a result, much of C is incorporated into Java, although Java adds quite a bit more.

The primary difference between the two languages is that C is a procedural language, while Java is a object-oriented language. While objects and classes are essential to Java programming, the concepts are nonexistent in C.

Instead, the basic unit of organization in C is the function, equivalent to a class method in Java, except of course it won't be part of a class in C, since C doesn't include classes. Whereas a Java program can be seen as a collection of interrelated class definitions, a C program is a collection of interrelated functions. It will be instructive to look at an example explicitly contrasting a C program with its Java equivalent, as in Figure 2.1.

Figure 2.1: A C program and its Java equivalent.

#include <stdio.h>

/* Returns the greatest common
   denominator of a and b. */

int gcd(int aint b) {
    int r;

    while(b != 0) {
        r = a % b;
        a = b;
        b = r;
    }
    return a;
}

int main() {
    printf("GCD is %d\n"gcd(24, 40));
    return 0;
}
public class FindGCD {
    // Returns the greatest common
    // denominator of a and b. */
    public static int gcd(int aint b) {
        while(b != 0) {
            int r = a % b;
            a = b;
            b = r;
        }
        return a;
    }

    public static void main(String[] args) {
        System.out.println("GCD is " + gcd(24, 40));
    }
}

At the statement level, the languages are nearly identical: They share similar expressions and control statements. Figure 2.1 demonstrates some minor differences, though.

2.2. The printf() function

The printf() function is among the most useful functions included in C's library of language-defined functions. As we've already seen, it allows you to display text for the user to see.

The way printf() works is a bit complicated: The first parameter is a string specifying the format of what to print, and the following parameters indicate the values to print. The easiest way to understand this is to look at our earlier example.

printf("GCD is %d\n"gcd(24, 40));

This line says to print GCD is %d\n. The printf() function goes through this format string, printing the characters GCD is before getting to the percent character (`%'). The printf() function regards this as a special character saying to print a value specified in a subsequent parameter. In this case, a `d' follows the percent character, further specifying to display the parameter as an int in decimal form. (The d stands for decimal.) So when printf() reaches %d, it looks at the value of the following parameter (it would be 8 in this example) and displays that value instead.

Like Java, C allows you to include escape characters in a string using a backslash. The \n represents the newline character — that is, the character that represents a line break. Similarly, \t represents the tab character, \" represents the double-quote character, and \\ represents the backslash character. These escape characters are part of C syntax, not part of the printf() function. [That is, the string the printf() function receives actually contains a newline, not a backslash followed by an n. Thus, the nature of the backslash is fundamentally different from the percent character, which printf() would see and interpret at run-time.]

Thus, when printf() displays the string GCD is %d\n, the program will print GCD is 8, followed by a line break.

Let's look at another example.

int main() {
    int i = 5; int k = 4;
    printf("i is %d; k is %d;"ik);
    printf("i + k is %d\n"i + k);
    return 0;
}

When run, this would display the following on the screen.

i is 5; k is 4;i + k is 9

The first call to printf() in this example illustrates how the function can print multiple parameter values. In fact, there's really no reason we couldn't have combined the two calls to printf() into one in this case.

printf("i is %d; k is %d; i + k is %d\n"iki + k);

There's a variety of things that can follow the percent character in the formatting string.

You can also include a number between the percent character and the format descriptor as in %10d, which tells printf() to right-justify a decimal integer over ten columns.

2.3. Boolean operators

Although C includes most of the primitive types of Java (including char, int, and double), it does not include the boolean type. Instead, C treats the integer 0 value as false and all other integer values as true. This has major implications for its control statements, like if and while, for which the test expressions can have any value in C. (Java requires a boolean value for these conditions.) The following would be a legal C program.

int main() {
    int i = 5;
    if(i - 5) printf("yes\n");
    else printf("no\n");
    return 0;
}

This program would compile, and it would print no when run, since the value of the expression i - 5 turns out to be 0, and so the if condition fails.

The following program would compile too.

int main() {
    int i = 5;
    if(i = 4) printf("yes\n");  /* Using = instead of == leads to */
    else printf("no\n");        /* peculiar behavior!! */
    return 0;
}

This program would print yes: In the if condition, we have assigned the value 4 to i, and the value of the expression is the value assigned (4). Since this is non-zero, the if condition succeeds, and so the program prints 4.

This is almost never what you want. Never use an assignment operator in a control statement's condition. C is unforgiving with this error, and it's very difficult to find the error if you don't know to look for it.

Java has several operators that compute boolean values, like the comparison operators and ||, &&, and !. C includes these too, but they compute int values in C. In particular, they compute 1 to represent a true value and 0 to represent a false value.

This quirk — that C regards all non-zero integers as true — is generally regarded as a mistake. C introduced it because, on many computers, code to compare to zero takes two instructions (one to do the comparison and another to jump to the appropriate location), but many computers include a single instruction that will jump based on whether a value is zero (without the additional comparison operation). Today, good compilers would use the shorter version in either case, but at the time C was designed, compilers were not as sophisticated. Thus, today, most expert C programmers eschew using the shortcut, preferring instead to explicitly compare to zero as a matter of good programming style. But such avoidance doesn't fix the fact that this language quirk often leads to program errors. Java improved the situation tremendously by introducing the boolean type, required in if and while statements.

2.4. Function prototypes

In C, a function must be declared above the location where you use it. In the Java program of Figure 2.1, we could define the gcd class method after defining main. Not so in C: If we swapped the gcd() and main() functions, the compiler would complain in main() that the gcd() function is undeclared. Even though it's defined later in the program, the compiler won't look ahead to it.

This raises a problem, especially in larger programs that span several files, where functions in one file will need to call functions in another. To get around this, C provides the notion of a function prototype, where we write down the function header but omit the body definition.

As an example, say we want to break our C program of Figure 2.1 into two files: The first file, math.c, will contain the gcd() function, and the second file, main.c, will contain the main() function. The problem with this is that, in compiling main.c, the compiler won't know about the gcd() function that it is attempting to call.

A solution is to include a function prototype in main.c.

#include <stdio.h>

int gcd(int aint b);

int main() {
    printf("GCD is %d\n"gcd(24, 40));
    return 0;
}
The int gcd line is the function prototype. You can see that it begins the same as a function definition begins, but we simply put a semicolon where the body of the function would normally be. By doing this, we are declaring that the function will eventually be defined, but we are not defining it yet. The compiler accepts this and obediently compiles the program with no complaints.

2.5. Header files

Larger programs spanning several files frequently contain many functions that are used many times in many different files. It's a pain to have to repeat the function prototype in every file that happens to use the function. So C permits you to include a file within another using a line beginning #include. Usually, this file consists mostly of function prototypes. This file is called a header file, since it contains the heads of all these functions. Conventionally, header files use the .h prefix, rather than the .c prefix used for C source files.

For example, we might put the prototype for our gcd() function into a header file called math.h.

int gcd(int aint b);
Now, we can #include this header file at the top of main.c.
#include <stdio.h>
#include "math.h"

int main() {
    printf("GCD is %d\n"gcd(24, 40));
    return 0;
}
This particular example isn't very convincing, but imagine a program consisting of hundreds of functions, split across dozens of files, and suddenly the time savings of having just one prototype for each function in a single header file begins making sense.

The #include line is an example of what C calls its preprocessor. Before compiling a program, the C compiler will feed it through the preprocessor. The program can contain commands (directives) telling the preprocessor to manipulate the program text given to the C program. The #include directive tells the preprocessor to replace the #include line with the contents of the file specified.

You'll notice that we've placed stdio.h in angle brackets, while math.h is in double quotation marks. The angle brackets are for standard header files — files more or less built into the C system. The quotation marks are for custom-written header files that can be found in the same directory as the source files.

2.6. Constants

Another particularly useful preprocessor directive is the #define directive. It tells the preprocessor to substitute all future occurrences of some word with something else.

#define PI 3.14159
In this fragment, we've told the preprocessor that, for the rest of the program, it should replace every occurrence of PI with 3.14159 instead. So, if later in the program the preprocessor sees the following line:
printf("area is %f\n"PI * r * r);
the preprocessor would send this to the C compiler for processing instead:
printf("area is %f\n", 3.14159 * r * r);
This replacement happens behind the scenes, so that the programmer won't see the replacement.

C doesn't include constants (like a static final variable in Java). Instead, C programmers use the #define directive to simulate constants.

The #define directive is not restricted to this use, however. Because it uses textual replacement only, it can be used in other ways. For example, one might include the following.

#define forever while(1)
Subsequently, then, you could use forever as if it were a loop construct, and the preprocessor would replace it with while(1).
forever {
    printf("hello world\n");
}
Expert C programmers consider this very poor style, since it quickly leads to unreadable programs.