Chapter 4. C: Data structures
Although C doesn't have the notion of class as in object-oriented languages, it does have the structure, which allows you to define a new data type representing a conglomeration of data. The primary distinction between a class and a structure is that a class can have both variables and methods, while a structure can have only variables.
A structure type definition in C looks like the following.
struct Point {
int x;
int y;
}; /* The semicolon here is required! */
Here, we've defined a type called struct Point. In C,
struct is part of the type name for every structure defined.
(C++ makes this keyword optional, and so the type might just be called
Point. But we're talking about C now.)
Each struct Point variable has two sub-variables
(called
fields), x and y. So you can write
the following (which does nothing interesting except illustrate a C
structure's use).
int main() {
struct Point p;
p.x = 50;
p.y = 100;
return 0;
}
Notice how, in contrast to Java, a structure does not have to be allocated. It is automatically created at the beginning of the function, and it automatically goes away at the function's end.
You can easily make an array of structures, talk about pointers to structures, or place a structure within another structure.
When you pass a structure as a parameter, you should pass a pointer to the structure, not the structure itself. The reason is that structures tend to contain lots of data, and copying all of the fields into the parameter variable is inefficient.
#include <math.h>
double distanceToOrigin(struct Point *p) {
return sqrt((*p).x * (*p).x + (*p).y * (*p).y);
}
In fact, the (*ptr).field construct is so common that C includes a
shortcut for it: The arrow operator allows you to write
ptr->field in place of (*ptr).field. Thus, the following
definition is equivalent.
#include <math.h>
double distanceToOrigin(struct Point *p) {
return sqrt(p->x * p->x + p->y * p->y);
}
4.1. Dynamic memory
Sometimes you want to allocate memory in the course of running
a program, akin to the new operator in Java.
To do this, you can use the malloc() function included in the C
library.
The malloc() function takes a single integer, which represents
how many bytes the function should allocate, and it returns the memory
address of the first byte of memory allocated.
In computing how many bytes to allocate, the sizeof operator
is useful: Given a type, the sizeof operator computes how many
bytes a value of that type requires. So sizeof(int)
would be 4 on most computers (but on some computers it may be 2
or 8), and sizeof(struct Point) would have a value of 8 (or 4 or 16
depending on the computer).
So we could write the following, which allocates an array based on a length given by the user.
int main() {
int n;
int *arr;
printf("How long an array? ");
scanf("%d", &n);
arr = (int*) malloc(n * sizeof(int));
return 0;
}
In the malloc line, we've opted to allocate enough memory to
hold n integers. The casting operator in this line is
important — if we did not cast to an int*, the C
compiler would report that the type returned by
malloc() can't be assigned to an int* variable.
When you allocate memory in C, you should deallocate it
later to free the space. To accomplish this, you can use the
free() function, which takes a pointer as a parameter and
marks the memory to which it points as available.
free(arr);
After you deallocate memory, you should not use it again.
Neither should you deallocate the memory a second time.
Avoiding both of these issues can be pretty painful.
But it should be deallocated. If the program ends, all memory used by the program will be deallocated automatically. But, if the program continues for a long period and unused memory is never deallocated, then the program's memory requirements will grow steadily, leading to strange problems later once memory runs out. This is called a memory leak.
(Java avoids the issue of memory deallocation entirely by letting the computer automatically figure out when memory becomes available. This automatic process, called garbage collection, is complex, and efficient techniques for it were unknown at the time of C's development, so C opts to let the programmer control memory deallocation. For Java's designers, faster computers and more efficient collection techniques, coupled with a different target audience, tilted the scale toward automatic garbage collection instead. We'll examine garbage collection techniques later in this book.)
4.2. Example: Linked list
Figure 4.1 defines a List structure for
representing a linked list of numbers. Moreover,
Figure 4.1 contains
two functions, listCreate() to create a new, empty linked list
and listRemove() to remove a number from the list.
Figure 4.1: C implementation of a linked list
#include <stdlib.h>
struct Node {
int data;
struct Node *next;
};
struct List {
struct Node *first;
};
/* listCreate
* Creates an empty linked list.
*
* Returns:
* a newly allocated list, holding nothing.
*/
struct List* listCreate() {
struct List *ret;
ret = (struct List*) malloc(sizeof(struct List));
ret->first = NULL;
return ret;
}
/* listRemove
* Removes the first occurrence of a number from a linked list,
* returning 1 if successful
*
* Parameters:
* list - the list from which the item is to be removed.
* to_remove - the number to be removed from the list.
*
* Returns:
* 1 if item was found and removed, 0 otherwise.
*/
int listRemove(struct List *list, int to_remove) {
struct Node **cur_p;
struct Node *out;
cur_p = &(list->first);
while(*cur_p != NULL) {
if((*cur_p)->data == to_remove) {
out = *cur_p;
*cur_p = out->next;
free(out);
return 1;
}
cur_p = &((*cur_p)->next);
}
return 0;
}
The listRemove() function works with pointers in a peculiar
way: It uses cur_p to point to the pointer to the
current node; this way, once the desired number is found, we can alter
the pointer within the list referencing that node. The more intuitive
way of
writing this function is to have a cur variable stepping
through the list, but this necessitates tracking the previous node,
since that node's pointer to its successor will change once cur
points to the node to remove.
prev = NULL;
cur = list->first
while(cur != NULL) {
if(cur->data == to_remove) {
if(prev == NULL) list->first = cur->next;
else prev->next = cur->next;
free(cur);
return 1;
}
prev = cur;
cur = cur->next;
}
return 0;
By instead remembering the address of the pointer to the current node,
the implementation of Figure 4.1 avoids this additional
complexity. It's not really a better implementation, but it does
illustrate an interesting use of pointers. Incidentally, the
version used in Figure 4.1 would be impossible in a
language without pointers, such as Java.
Note that it would be tempting to replace lines 39–41 of
Figure 4.1 with the following instead.
(This change avoids using the out variable, and so we
could also delete line 35.)
39 free(*cur_p);
40 *cur_p = (*cur_p)->next; /* Bad!! */
This is a mistake, however, because this uses memory after it has
already been freed: Line 40 says to look into *cur_p to find
its next variable, but we freed *cur_p in
line 39,
so this memory is technically no longer available. (It looks like this
would be safe, since there are no intervening memory allocations between
lines 39 and 40, and indeed it would work on many systems; but
it's quite possible that the
free() function will write over the next field.)
Thus, this change results in a wrong program.
4.3. Analysis: Language and library levels
Most modern programming language systems draw an important distinction between the language level of the language system and the library level of the language system. The primary distinction between these two levels is whether the language incorporates special syntax for handling the feature.
Memory allocation is an example where different language designers make
differing decisions.
C opts to support memory allocation on the library level,
by requiring a malloc function to exist in the library.
In Java, however, memory allocation is done through
the new keyword — this is special-purpose syntax, and so we
would say that Java handles memory allocation on the language
level.
Another example where different languages make different decisions is in
exponentiation. In C and Java, this job is relegated to the
pow() function in the libraries. Other languages decide
to support exponentiation in the language through the addition of
another operator: In FORTRAN, for example,
represents 210.2 ** 10
In general, language designers want to push as much into the library level as possible, because special syntax both makes the language more complex for people to understand and because it leads to a more complex compiler development process. On the other hand, special syntax is often more convenient for programmers once they understand it. There's an subtle but important line between being convenient and being impossible to learn. No single language feature crosses a language over the line, but the conglomeration of many features does. C and Java are certainly on the simple side of the line; some argue that C++ crosses it. PL/I and Ada, two ambitious language design projects, failed to live up to expectations; many attribute this to buckling under the load of having too many features for programmers to learn.
The decision of whether a feature should be a language or a library feature is also guided by design principles. C, for example, works very hard to keep the language as simple as possible. The reason for this is that C is designed to be easy to implement on new systems — the original idea with UNIX was that you could write a simple C compiler for a new computer system, and from there you compile the UNIX source code. Another reason for C to strive to keep things simple was that the computational and memory limits at that time limited how much the compiler could do. Thus, C pushes much more into the library level than most languages. Design principles also guided FORTRAN to have an exponentiation operator: The language was designed with scientific computation in mind, where exponentiation is common, and the convenience of an additional operator was worth the incremental complexity.
Java's designers, working twenty years after C's designers, did not face these same limits. Moreover, they were developing more for user programs, which tend to be larger programs and which rely on an established underlying system. Thus, Java's designers chose a more complex language level, placing a priority on the application programmer's convenience, rather than the compiler programmer's convenience.
