CSci 230: Computing Systems Organization
Home Syllabus Assignments Tests

Assignment 2: Pooling strings

Due: 5:00pm, Thursday, February 11. Value: 30 pts. Submit your solution (consisting of pool.c only) using the Sauron Submission System. [instructions]

In Assignment 1, you probably found yourself allocating lots of memory for strings. All these calls to malloc waste quite a bit of memory: Each time you ask malloc to reserve n bytes of memory, in fact it will round n up to a multiple of four, add another four, and then allocate than many bytes rather than n itself. This can add up to seven extra bytes for each memory allocation. If you have lots of short allocations (as with the dictionary words), all that can add up. Overall, the dictionary ends up using over 50% more than the base need (i.e., the sum of the sizes of the dictionary words).

In this assignment you'll develop a string pool. The idea is that you'll allocate just one large chunk of memory using malloc, and then you can allocate strings more efficiently out of that. One deficiency relative to malloc is that your pool won't allow individual strings to be deallocated: One can only deallocate the entire pool all at once.

The starting code is substantially similar to the code from Assignment 1. However, the distributed dict.c code has been integrated into main.c, and of course it has been modified to use your string pool rather than dupstr from before.

hash.c provides a hash table data structure where strings are the keys.
hash.h declares the data structure and function prototypes for hash.c.
pool.c provides three functions for managing a string pool. This is the file you will modify for this assignment.
pool.h declares the function prototypes for pool.c.
main.c defines the main function, which reads the dictionary file and prompts the user repeatedly for query words.

Note that you should certainly use valgrind to check for memory leaks. [instructions]

Algorithm

The idea behind the algorithm you implement is illustrated roughly by the below diagram.

In calling pool_create, we allocate a large fixed-length character array. Then with each subsequent call to pool_alloc, we copy the string just past the last-allocated string in this array, returning a pointer to the first. Note that the amount of memory taken by this new string is simply the string's length plus 1 for the string termination character. (We'll see that there is slightly more memory taken for tracking the status of the character array. But if you choose a large character array (I used 1000 bytes), this extra memory will be a tiny fraction of the overall memory used by the pool.)

In practice, more strings could be allocated than could fit into the initial size of the character array. When one array becomes full, you can simply allocate another large character array and start using that. However, eventually there will be a call to pool_free, which needs to free all the allocated arrays. To be able to find all these allocated arrays, I recommend keeping a linked list of them. In particular, I recommend using a struct which contains your large fixed-length character array, a counter tracking how many entries of that array have been allocated so far, and a pointer to the next struct.