CSci 230: Computing Systems Organization
Home Syllabus Readings Assignments Tests

Assignment 13: Shell pipes

Due: 5:00pm, Thursday, December 5. Value: 50 pts.

In this assignment, your job is to complete a shell that executes user-provided Linux commands involving redirection and pipes. Already provided to you is a program that breaks a command into its component pieces; your primary job will be to start the child processes with their file descriptors configured correctly.

You will want to download the following files for working on this assignment. The only file you should modify for this assignment is exec.c.

exec.c is the file containing the function that you should complete for this assignment.
exec.h is the corresponding header file.
main.c contains the main function. This function contains a loop that reads a line from the user, converts it into a struct command as described below, and then passes this structure into your function in exec.c.

Your shell should respond appropriately in common error situations, such as an invalid executable (like sl), redirecting from a nonexistent file, or writing to an invalid file (like dummy/directory).

Your program should have no memory leaks and it should leave no open file descriptors. You can check for these using a profiler such as valgrind. [About valgrind] To use valgrind to check for file descriptors, use its track-fds option:

linux$ valgrind --track-fds=yes ./a.out

The command structure

The main function constructs a struct command that represents the pieces of a command typed by the user. This structure is then passed as a parameter into the exec function that you will write. Understanding this structure is important, therefore, to completing this assignment.

The definition of this structure, found in exec.h, is as follows.

struct command {
    char *src_file;   /* pointer to input file, or NULL if keyboard */
    char *dst_file;   /* pointer to output file, or NULL if display */
    int   num_cmds;   /* number of commands in user input */
    char ***cmd_args/* array of pointers to arrays of commands' args */
};

A picture may be more helpful, though. The below diagram (using × for NULL pointers) illustrates what the struct command parameter would hold if the user types the following:

mysh% sort < dorms | grep n | grep e

Suggested approach

I suggest that your implementation of exec use a multi-step process.

  1. First create two arrays of ints, and populate them as follows: For command i, entry i of the first array would be the file descriptor that the command should use for its standard input, and entry i of the second array would be the file descriptor that the command should use for its standard output.

  2. For each command, fork off a child process. As you fork off children, the parent should save the child processes' process IDs in an array. Each child process should go through these steps:

    1. Set up file descriptor 0 as standard input and file descriptor 1 as standard output.
    2. Close all file descriptors created in the first step above. This is important because many programs (like grep) depend on their standard input being closed before they themselves exit; but no process will see a file descriptor as being closed until all processes having that same file descriptor close it.
    3. Invoke execvp so that the child executes its assigned program. For child i, you can get the name of the program from cmd_args[i][0], and you can get the array of command-line parameters from cmd_args[i].
  3. Close all of the file descriptors created in the first step. This is important because file descriptors persist until the process explicitly closes them (a child process closing them does not count). If you leave any open, then eventually the parent process will run into the limit on open file descriptors set by the operating system. The shell would no longer be able to execute any commands.

  4. Execute waitpid on each of the child process IDs. (It's important that you do this after forking off all the children: Suppose you instead waited until the first command is finished before starting the second command. If the first command generates a lot of output that is supposed to be piped into the second command, this input will eventually fill up the pipe's buffer, and the OS will not allow the process to continue until the pipe's buffer is consumed. Of course, if you shell is waiting until the process to finish, then it never will, and the shell will effectively lock up.)

Useful system calls

There are two additional system calls that we did not discuss much in class, but which are important to completing this assignment.

int dup(int k)

Allocates a new file descriptor, and duplicates all information about file descriptor k for this new file descriptor. The return value is the file descriptor allocated.

When the OS allocate a file descriptor (whether by open or by dup), it always chooses the least (nonnegative) integer that is not currently allocated. Thus, to use a file descriptor k as standard input, where k isn't 0, you can reassign the descriptor to 0 by first using close(0) to deallocate descriptor 0 and then use dup(k) to allocate k as a new file descriptor. Since 0 will be the least unallocated file descriptor, 0 will end up being synonymous with k.

int pipe(int fds[2])

Allocates a pair of linked file descriptors, and places the input descriptor into fds[0] and the output descriptor into fds[1]. These descriptors are linked in this way: If a process writes some information into fds[1], and then a process later reads out of fds[0], the information read will be identical to what was written earlier into fds[1]. The below code illustrates this.

pipe(fds);
write(fds[1], "hello"5); // Sends "hello" into the pipe.
read(fds[0], data5);     // Receives into the array "data" - it now holds "hello"

The usage of pipe's parameter is a bit unusual: The array is being passed in solely so that the call can return two different values (since the real return value can only be one integer). It doesn't matter what two integers are in the fds array before pipe is called; they will be ignored, and pipe will simply replace them with the the two file descriptors it allocated.

The value returned by pipe is 0 upon success, and −1 in case of error.

Of course, more information about these system calls can be obtained through the man pages: man 2 pipe.

Simple shell

In class I showed you a very simple shell. Here it is again, for reference purposes.

#include <unistd.h>
#include <sys/wait.h>

int main() {
    char cmd[120];
    char *cmd_args[2];

    while (1) {
        /* Read command from user. */
        write(1"% "2);
        int n = read(0cmdCMD_LEN);
        if (n == 0break;  /* EOF reached; exit program */
        cmd[n - 1] = '\0'/* replace '\n' with '\0' */

        /* Fork off child to execute command. */
        int child_pid = fork();
        if (child_pid == 0) {
            cmd_args[0] = cmd;
            cmd_args[1] = NULL;
            execvp(cmdcmd_args);
            /* If execvp returns, the command is bad. */
            write(1"Command not found\n"19);
            exit(-1);
        }

        /* Wait for child to exit before continuing. */
        waitpid(child_pid, &n0);
    }
    return 0;
}