Chapter 2. C: Introduction
In the 1970's, Ken Thompson developed C to help with developing the UNIX operating system at Bell Laboratories. [UNIX is a registered trademark. We will use the mixed-case version (Unix) to refer to the various descendants of AT&T's original UNIX. This will generally include Linux, although Linux is not strictly a descendant.] Through a variety of historical events, few intentional, UNIX grew from a minor research diversion into the popular industrial-strength operating system of today. And along with UNIX's success came C, since the operating system was designed so that C programs could access all of its features. As more programmers gained experience with C, they began to use it on other platforms too, so that it became one of the primary languages for developing software by the end of the 1980's.
In the early 1990's, a team at Sun Microsystems led by James Gosling was presented the problem of developing software for embedded systems — that is, systems built for a specific purpose that incorporate a processor, as in modern microwaves, cars, and VCRs. After surveying the possible languages, the team decided that none of the existing languages were adequate, and they decided to develop their own, named Java. The group's purpose regarding embedded systems fell by the wayside; but Sun managed to get Java labeled as a Web development language when they released it an 1995, just as the Internet and the Web were becoming a popular phenomenon. Java has not lived up to its hype as a Web development language, but the language is proving a useful general programming language, and now it is proving popular in industrial environments.
Many college computer science programs now teach Java to students, enabling them to teach several features not commonly available with other languages. But a lot of software development is still done in C. Often this is for historical reasons — if the first version of a piece of software was written in the 1980's, the initial language choice was likely C, and that initial development still forms the core of the system. But also C and Java fill different niches in the software world: While Java is well suited for application programming (like a word processor), C's more primitive style works better for systems programming (like a Web server), where C's low-level efficiency and high compatibility with most of today's popular operating systems are important features.
2.1. Basics
The good news is that when the folks from Sun designed Java, they used C as their starting point, since they wanted all those C programmers to feel that learning Java would be a small step. (In fact, Java programming turns out to be very different from C programming; the similarity can be misleading.) As a result, much of C is incorporated into Java, although Java adds quite a bit more.
The primary difference between the two languages is that C is a procedural language, while Java is a object-oriented language. While objects and classes are essential to Java programming, the concepts are nonexistent in C.
Instead, the basic unit of organization in C is the function, equivalent to a class method in Java, except of course it won't be part of a class in C, since C doesn't include classes. Whereas a Java program can be seen as a collection of interrelated class definitions, a C program is a collection of interrelated functions. It will be instructive to look at an example explicitly contrasting a C program with its Java equivalent, as in Figure 2.1.
Figure 2.1: A C program and its Java equivalent.
#include <stdio.h>
/* Returns the greatest common
denominator of a and b. */
int gcd(int a, int b) {
int r;
while(b != 0) {
r = a % b;
a = b;
b = r;
}
return a;
}
int main() {
printf("GCD is %d\n", gcd(24, 40));
return 0;
}
public class FindGCD {
// Returns the greatest common
// denominator of a and b. */
public static int gcd(int a, int b) {
while(b != 0) {
int r = a % b;
a = b;
b = r;
}
return a;
}
public static void main(String[] args) {
System.out.println("GCD is " + gcd(24, 40));
}
}
At the statement level, the languages are nearly identical: They share similar expressions and control statements. Figure 2.1 demonstrates some minor differences, though.
The C program includes the line
. This is like an#include <stdio.h>importline in Java. We'll talk more about it later, but for the moment think of it only as a required line to include in your program.The Java program includes several additional words relating to organizing the functions into class methods within a class. In addition to declaring the class with
, each class method is prefaced with the wordspublic class FindGCDpublic staticto indicate that it is a class method accessible outside the class definition. C has none of this verbiage.In C, all of a function's variables should be defined at the beginning of the function, before any of the function's statements. In Figure 2.1, the local variable
rin thegcd()function is defined first, whereas the Java equivalent defers the declaration to the first usage ofrlater. [C is somewhat more flexible than this: A variable declaration can occur directly after any opening brace. Thus, we actually could push the declaration ofrin Figure 2.1 to inside thewhileloop. But C programmers conventionally place all variable declarations at the top of the function.]Whereas the Java program calls the
println()method on theSystem.outobject, the C program calls theprintf()function. These are parts of the respective languages' libraries, not part of the base syntax. We'll examine theprintf()function in the next section.C permits only one type of comment: It begins with
/*
and proceeds to the*/
that occurs next in the code, possibly several lines later. Java also allows this comment style; but it also has the//
comment style (introduced in C++). [Because of its usefulness and simplicity, many C compilers today support this commenting style as a hidden feature. Programs using it are not strictly C programs, however.] Neither does C have thedocumentation comment
style beginning with/**
, a feature peculiar to Java.The C program's
main()method returns anint, while the class methodmain()returnsvoid. Having the program return an integer is an oddity of C that derives from its association from UNIX, where a program can exit with an error code. Even under UNIX, this integer return value is rarely used. Just consider it a peculiarity of C that, for some reason,main()needs to return anint— and have yourmain()functions return 0, as in the above program.
2.2. The printf() function
The printf() function is among the most useful functions included
in C's library of language-defined functions. As we've already seen, it
allows you to display text for the user to see.
The way printf() works is a bit complicated: The first
parameter is a string specifying the format of what to print, and
the following parameters indicate the values to print. The easiest way
to understand this is to look at our earlier example.
printf("GCD is %d\n", gcd(24, 40));
This line says to print GCD is %d\n
.
The printf() function goes through this format string, printing
the
characters GCD is
before getting to the percent character
(`%'). The printf() function regards this as a special
character saying to print a value specified in a subsequent parameter.
In this case, a `d' follows the percent character, further
specifying to display the parameter as an
int in decimal form. (The d stands for
decimal.) So when printf() reaches
%d
, it
looks at the value of the following parameter (it would be 8 in this
example) and displays that value instead.
Like Java, C allows you to include escape characters in a string
using a backslash. The \n
represents the newline character
— that
is, the character that represents a line break. Similarly,
\t
represents the tab character, \"
represents
the double-quote character, and \\
represents the backslash
character. These escape characters are
part of C syntax, not part of the printf()
function. [That
is, the string the printf() function receives actually contains
a newline, not a backslash followed by an n. Thus, the
nature of the backslash is fundamentally different from the percent
character, which printf() would see and interpret at
run-time.]
Thus, when printf() displays the string
GCD is %d\n
, the program will print
GCD is 8
, followed by a line break.
Let's look at another example.
int main() {
int i = 5; int k = 4;
printf("i is %d; k is %d;", i, k);
printf("i + k is %d\n", i + k);
return 0;
}
When run, this would display the following on the screen.
i is 5; k is 4;i + k is 9
The first call to printf() in this example illustrates how
the function can print multiple parameter values.
In fact, there's really no reason we couldn't have combined the two
calls to printf() into one in this case.
printf("i is %d; k is %d; i + k is %d\n", i, k, i + k);
There's a variety of things that can follow the percent character in the formatting string.
- We've already seen that
%d
says to print anintvalue in decimal form. %x
says to print anintvalue in hexadecimal form.%f
says to print adoublevalue in decimal-point form.%e
says to print adoublevalue in scientific notation (for example,3.000000e8
).%c
says to print acharvalue.%s
says to print a string. There's no variable type for representing a string, but C does support some string facilities using arrays of characters. We'll defer discussion of these facilities to Section \ref{sec:string}, as they involve some more complex concepts that we haven't seen yet.
You can also include a number between the percent character and the
format descriptor as in %10d
, which tells
printf() to right-justify a decimal integer over ten
columns.
2.3. Boolean operators
Although C includes most of the primitive types of Java
(including char, int, and double),
it does not include the boolean type.
Instead, C treats the integer 0 value as false
and all other integer values as true.
This has major implications for its control
statements, like if and while, for which the
test expressions can have any value in C.
(Java requires a boolean value for these conditions.)
The following would be a legal C program.
int main() {
int i = 5;
if(i - 5) printf("yes\n");
else printf("no\n");
return 0;
}
This program would compile, and it would print no
when run,
since the value of the expression i - 5 turns out to be 0, and
so the if condition fails.
The following program would compile too.
int main() {
int i = 5;
if(i = 4) printf("yes\n"); /* Using = instead of == leads to */
else printf("no\n"); /* peculiar behavior!! */
return 0;
}
This program would print yes
: In the if condition,
we have assigned the value 4 to i, and the value of the
expression is the value assigned (4). Since this is non-zero, the
if condition succeeds, and so the program prints 4.
This is almost never what you want. Never use an assignment operator in a control statement's condition. C is unforgiving with this error, and it's very difficult to find the error if you don't know to look for it.
Java has several operators that compute boolean values,
like the comparison operators and ||, &&, and !.
C includes these too, but they compute int values in C.
In particular, they compute 1 to represent a true value and 0 to
represent a false value.
This quirk — that C regards all non-zero integers as
true —
is generally regarded as a mistake. C introduced it because, on many
computers, code to compare to zero takes two instructions (one
to do the comparison and another to jump to the appropriate
location), but many computers include a single instruction that
will jump based on whether a value is zero (without the
additional comparison operation).
Today, good compilers would use the shorter version
in either case, but at the time C was designed, compilers were
not as sophisticated.
Thus, today, most expert C programmers eschew using the shortcut,
preferring instead to explicitly compare to zero as a matter of good
programming style. But such avoidance
doesn't fix the fact that this language quirk often leads to program
errors.
Java improved the situation tremendously by introducing the
boolean type, required in if and while
statements.
2.4. Function prototypes
In C, a function must be declared above the location where you use
it. In the Java program of Figure 2.1, we could define
the gcd class method after defining main.
Not so in C: If we swapped the gcd() and main()
functions, the compiler would complain in main() that
the gcd() function is undeclared. Even though it's defined later
in the program, the compiler won't look ahead to it.
This raises a problem, especially in larger programs that span several files, where functions in one file will need to call functions in another. To get around this, C provides the notion of a function prototype, where we write down the function header but omit the body definition.
As an example, say we want to break our C program of
Figure 2.1 into
two files: The first file, math.c
, will contain the
gcd() function, and the second file, main.c
, will
contain the main() function. The problem with this is that,
in compiling main.c
, the compiler won't know about the
gcd() function that it is attempting to call.
A solution is to include a function prototype in
main.c
.
#include <stdio.h>
int gcd(int a, int b);
int main() {
printf("GCD is %d\n", gcd(24, 40));
return 0;
}
The int gcd… line is the function prototype. You can see
that it begins the
same as a function definition begins,
but we simply put a semicolon where the body of the function would
normally be. By doing this, we are declaring that the function will
eventually be defined, but we are not defining it yet. The compiler
accepts this and obediently compiles the program with no
complaints.
2.5. Header files
Larger programs spanning several files frequently contain many
functions that are used many times in many different files. It's a pain
to have
to repeat the function prototype in every file that happens to use the
function. So C permits you to include
a file within another using
a line beginning
.
Usually, this file consists mostly of function prototypes.
This file is called a header file, since it contains the
#includeheads
of all these functions.
Conventionally, header files use the .h
prefix, rather than the .c prefix used for C source
files.
For example, we might put the prototype for our gcd() function
into a header file called math.h.
int gcd(int a, int b);
Now, we can #include this header file at the top of main.c.
#include <stdio.h>
#include "math.h"
int main() {
printf("GCD is %d\n", gcd(24, 40));
return 0;
}
This particular example isn't very convincing, but imagine a program
consisting of hundreds of functions, split across dozens of files, and
suddenly the time savings of having just one prototype for each
function in a single header file begins making sense.
The #include line is an example of what C calls its
preprocessor. Before compiling a program, the C
compiler will feed it through the preprocessor. The program can contain
commands (directives) telling the preprocessor to
manipulate the program text given
to the C program. The #include directive tells the preprocessor
to replace the #include line with the contents of the file
specified.
You'll notice that we've placed stdio.h in angle brackets,
while math.h is in double quotation marks. The angle brackets are for
standard
header files — files more or less built into the C
system.
The quotation marks are for custom-written header files that can be
found in the same directory as the source files.
2.6. Constants
Another particularly useful preprocessor directive is the
#define directive.
It tells the preprocessor to substitute all future occurrences of some
word with something else.
#define PI 3.14159
In this fragment, we've told the preprocessor that, for the rest of the
program, it should replace every occurrence of PI with
3.14159 instead. So, if later in the program the
preprocessor sees the following line:
printf("area is %f\n", PI * r * r);
the preprocessor would send this to the C compiler for processing
instead:
printf("area is %f\n", 3.14159 * r * r);
This replacement happens behind the scenes, so that the programmer
won't see the replacement.
C doesn't include constants (like a static final
variable in Java). Instead, C programmers use the #define
directive to simulate constants.
The #define directive is not restricted to this use, however. Because it uses textual replacement only, it can be used in other ways. For example, one might include the following.
#define forever while(1)
Subsequently, then, you could use forever as if it were a loop
construct, and the preprocessor would replace it with
while(1).
forever {
printf("hello world\n");
}
Expert C programmers consider this very poor style, since it
quickly leads to unreadable programs.
