Chapter 3. C: Pointers

The concept of pointer is relatively unique to C: It allows you to have a variable that represents the memory address of some data. The type name for such a variable is represented by the type name for the data to which it points followed by an asterisk (`*'); for example, an int* variable will hold the memory address of an integer.

To get the memory address of a variable, you can use the ampersand (`&') operator: For example, the value of the expression &i is the memory address of i. Conversely, to access the memory referenced by a pointer, you can use the asterisk (`*') operator — this is called dereferencing the pointer. Consider the following example.

int i;
int *p;

i = 4;
p = &i;
*p = 5;
printf("%d\n"i);

In this fragment, we have declared two variables: i, which holds an integer, and p, which holds the memory address of an integer. We first initialize i with the value 4 and p with the value &i. Then we write *p = 5;, which alters the memory referenced by p (that is, i) to hold 5. Finally, we print the value of i, which is now 5.

This is a contrived example. A less contrived usage of pointers is when you want a function to change the value of a parameter. For example, say we want to write a function to swap two values. You might be tempted to write the following.

void swap(int iint j) {  /* This won't work!! */
    int t;

    t = i;
    i = j;
    j = t;
}

It won't work, though, because C (like Java) passes all parameters by value: If I call swap(xy), the values contained by x and y are copied into the i and j variables. The swap() function will swap the values contained by i and j, but this will have no effect on x and y. We can get around this by passing pointers instead.

void swap(int *ipint *jp) {
    int t;

    t = *ip;
    *ip = *jp;
    *jp = t;
}

Now we would have to call swap(&x, &y) (not swap(xy)). The following figure illustrates how it works; an explanation is below.

Figure 3.1: The swap() function in action.

       
(a) (b) (c)

The value copied into ip will be the address of x, and the value copied into jp will be the address of y (Figure 3.1(a)). The line t = *ip; will copy the value referenced by ip (that is, x) into t. The next line will copy the value referenced by jp (that is, y) into the memory referenced by ip (that is, x) (Figure 3.1(b)). And the final line will copy the value of t (the original value of x) into the memory referenced by jp (that is, y) (Figure 3.1(c)). So the values contained by x and y will be swapped. [This is the only way to write such a function in C, where all parameters are passed by value. Some languages have a feature where you can designate a parameter to be an implicit pointer — it's called call by reference as opposed to the call by value used by C. Such a feature was added into C++; it was not retained by Java.]

Suppose the function said the following instead.

t = *ip;
ip = jp/* Before, this said: *ip = *jp; */
*jp = t;

This would still compile, but the second line would in fact change the pointer only, so that both ip and jp point to the same place. After this line, memory would look like the following.

Thus, the actual value of x would not change with this attempt at implementation.

In C, the null pointer is called NULL. Its use is similar to null in Java: It indicates a pointer that points to nothing.

3.1. The scanf() function

We've already seen the printf() function that allows you to output information to the screen.

printf("The value of i is %d."i);

There is also a scanf function that allows you to read information from the user. Suppose, for example, that you wanted to read a number from the user. You can write the following.

printf("Type a number. ");
scanf("%d", &i);

The scanf() function, like the printf() function, takes a format string indicating what sort of data the function will read from the user. The parameters following should be the memory addresses where the data read from the user should be placed. In this example, the format string %d indicates that the program should read an integer, written in decimal, from the user. The second parameter, &i, indicates that the value read should be placed into the i variable.

The important thing to remember about the scanf() function is that it wants memory addresses of variables, not the value of variables: Those ampersands are important. Of course, the reason it wants memory addresses is so that scanf() can save the user's typed data where the calling function wants them.

3.2. Arrays

Arrays in C must be given a fixed length at the time they are declared.

int main() {
    int arr[50];
    int i;

    for(i = 0; i < 50; i++) arr[i] = i;
    return 0;
}

Once you create the array variable, you're stuck with its length. Also, C provides no facility for accessing the length of the array (as with arr.length in Java).

In C, an array is basically a pointer whose value cannot be changed. In fact, when you pass an array as a parameter, the only thing that really gets passed is the memory address of the first element of the array. So you can write something like the following.

void setToZero(int *arrint n) {
    int i;
    for(i = 0; i < ni++) arr[i] = 0;
}

int main() {
    int grades[50];

    setToZero(grades, 50);
    return 0;
}

In this program, the setToZero function takes a pointer to an integer as its first parameter. When we call it with setToZero(grades, 50), the address of the first number in grades is copied into the arr parameter variable. The bracket operator can also be applied to pointers as if they referenced the first item in an array, so the line arr[i] = 0; is legal. (Alternatively, you could write *(arr + i) = 0;. Adding the integer i to the pointer arr would compute the address where index i would be located if arr were an array, and the asterisk would dereference this address.)

3.3. Writing outside an array

Actually, that brings up an important point. In Java, each access to an array is checked, and if you access an array out of bounds, you see the friendly ArrayIndexOutOfBoundsException message. C is not nearly so nice. When you access beyond an array's bounds, it blindly does it.

This can lead to peculiar behavior. For example, consider the following program.

int main() {
    int i;
    int vals[5];

    for(i = 0; i <= 5; i++) vals[i] = 0;
    printf("%d\n"i);
    return 0;
}

Some systems (including at least one version of Linux) would place i in memory just after the vals array; thus, when i reaches 5 and the computer executes vals[i] = 0, it in fact resets the memory corresponding to i to 0. As a result, the for loop has reset, and the program goes through the loop again, and again, repeatedly. The program never reaches the printf function call.

In more complicated programs, this can lead to very difficult bugs, where a variable's value changes mysteriously somewhere within hundreds of functions, and you as the programmer must determine where an array index was accessed out of bounds. This is the type of bug that takes a lot of time to uncover and repair.

That's why you should consider Java's ArrayIndexOutOfBoundsException message as friendly: Not only does it determine the cause of a problem, it even tells you exactly which line of the program was at fault. This saves you vast amounts of debugging time.

Every once in a while, you'll see a C program crash, with a message like Segmentation Fault or Bus Error. (It won't helpfully include any indication of what part of the program is at fault.) Such errors usually mean that the program attempts to access an invalid memory location. This may indicate an attempt to access an invalid array index, but typically the index needs to be pretty far out of bounds for this to occur; more frequently, it indicates an attempt to reference an uninitialized pointer or a NULL pointer.

3.4. Strings

C includes very minimal support for strings. Basically, a string in C is simply an array of characters. You could easily write the following in a C program.

char *str;
str = "hello";

Here, we've made str be a pointer to a character. In the next line, we made it point to the array of characters hello.

Actually, there's also a hidden character to mark the end of the string. This marker is NUL, the ASCII character whose value is 0. [Although they are spelled similarly, the distinction between NUL and NULL is significant: NUL is a character value, while NULL is a pointer value.] So, actually, "hello" refers to an array of six characters, with the sixth character being '\0' — that is, NUL.

If you wanted to copy all the letters from the string src to another string dst, you could use the following for loop.

for(i = 0; src[i] != '\0'; i++) dst[i] = src[i];
dst[i] = '\0';

This copies all of the characters in src up to, but not including, the NUL character. Then it places a NUL character at the end of dst so that the copied string has the terminator also.

In practice, I'd never write such a for loop in a program. Instead, I'd use the built-in strcpy() function. The string.h header file contains prototypes for many library functions built into C for working with strings. Following are three.

void strcpy(char *dstchar *src)
Copies all the characters of src into dst.
int strlen(char *src)
Returns the number of characters in src (not including the terminating NUL character).
int strcmp(char *achar *b)
Returns zero if a and b are identical, a negative number if a comes before b in lexicographic order, and a positive number if a comes after b. (Lexicographic order refers to the ordering based on ASCII codes. For example, Abc comes lexicographically after ABC, since the first characters match but the second characters do not, and the ASCII value of 'B' (66) is less than the ASCII value for 'b' (97).)

C has no support for strings of indefinite length. You can move the NUL character up the string to make it shorter, but you can't move it past the end of an array.

3.5. Example: Tokenizing a string

Figure 3.2 below shows a useful function that we'll explore as an illustration of many of the concepts we've covered so far. It defines a function that takes three parameters: a string referenced by buf, an array of pointers to strings referenced by argv, and an integer max_args. The function is to split the string buf into separate words, placing pointers to successive words into argv and returning the number of words found. The max_args parameter indicates how long the array is.

Figure 3.2: Splitting a string.

#include <ctype.h>

/* splitLine
 *  Breaks a string into a sequence of words. The pointer to each
 *  successive word is placed into an array. The function
 *  returns the number of words found.
 *
 * Parameters:
 *  buf - the string to be broken up into words
 *  argv - the array where pointers to the separate words should go.
 *  max_args - the maximum number of pointers that the array can hold.
 *
 * Returns:
 *  the number of words found in the string.
 */

int splitLine(char *bufchar **argvint max_args) {
    int arg;

    while(isspace(*buf)) buf++; /* skip over initial spaces */
    for(arg = 0; arg < max_args && *buf != '\0'; arg++) {
        argv[arg] = buf;
        while(*buf != '\0' && !isspace(*buf)) {
            buf++;         /* skip past letters in word */
        }
        if(*buf != '\0') { /* if we're not at sentence's end, */
            *buf = '\0';   /*   mark word's end and continue */
            buf++;
        }
        while(isspace(*buf)) buf++; /* skip over extra spaces */
    }
    return arg;
}

For example, suppose we wanted to use this function to split the sentence The dog is agog. into words. We'd place this into an array of characters and pass this string as buf into the function. We'd also create an array of string pointers to pass as argv, with max_args being the length of this array.

The function's job is to place pointers into argv to the individual words.

In this case, the function should return 4, since there are four words in the sentence.

The function accomplishes this by replacing spaces in the sentence with NUL characters and pointing the array entries referenced by argv into the sentence's array.

It uses the isspace() function for identifying space characters; this function's prototype is in the ctype.h header file included on line 1.