CSCI 151 Lab 6: Skip lists


This lab will:




If we want to maintain a sorted list of items, from what we have seen so far we have two options:

A skip list is an intriguing structure which combines the benefits of both approaches, and allows us to keep a sorted list such that all operations (lookup, insert, remove) take \(O(\lg n)\) time.

Fundamentally, a skip list is like a normal (singly) linked list, but with extra links added that “skip” over parts of the list. These extra links mean we can avoid having to iterate through every single node when looking for something in the list: we can start by taking very big hops in the list until the next big hop would take us beyond what we are looking for; then we take slightly smaller hops; and so on.

For example, the image above illustrates a skip list containing the numbers \(1\) through \(10\). Suppose we are looking for \(8\). We start at the node marked “head”, at the topmost level, i.e. level 3 (the bottom level is level \(0\)). Following the topmost link takes us to the node containing \(1\), which is smaller than \(8\), so we continue searching. The next link on the top level would take us beyond the end of the list (i.e. it is null, shown as NIL in the above picture), so we don’t follow it. Instead, we stay on the node containing \(1\) and move down one level, to level 2. Now, we follow links on level 2 as long as the items we encounter are less than \(8\): the first link takes us to the node containing \(4\); another step takes us to \(6\); then the next step would take us past the end of the list, so we stop and move down to level \(1\). On level \(1\), the next step would take us to \(9\), which is too big. So instead we move down again to level \(0\), and then follow links to \(7\) and finally to \(8\). The point is to notice how we were able to find \(8\) without ever having to look at the nodes containing \(2\), \(3\), or \(5\).

How do we determine how “tall” each node should be? The answer is: by flipping a coin! Each time we create a node, we randomly decide how tall it should be by flipping a coin and seeing how many times in a row the coin comes up heads. We then add one level for each coin toss that came up heads. Every node has height at least \(1\); and then it has a 1/2 chance of having height at least \(2\); a 1/4 chance of having height at least \(3\); a 1/8 chance of height at least \(4\); and so on. So on average, we expect that at each level there will be half as many nodes as the previous level. Conversely, on the very top level, we expect to find only one or two nodes, so each step will cover about half the list; on the next level down, each step between nodes will cover about 1/4 the list; and so on. Thus, we expect searching for a given node to take \(O(\lg n)\) steps on average. As we will see, we also expect adding a new node to take \(O(\lg n)\) steps on average: we can find the right location for it in \(O(\lg n)\), just as with a sorted array, but then we can quickly add it just by shifting pointers around.

Step 1: SkipNode

The first step is to make a generic SkipNode<E> class to represent single nodes in our skip list. Each node should contain two fields:

Add the following methods to your SkipNode class:

Each of these methods should only be a few lines of code at most. If they are longer than that, you are doing something wrong. Don’t forget to add appropriate comments to your methods!

Step 2: Locating

We are now ready to start implementing skip lists. I have provided some starter code in; you will be filling in some of the important methods.

To cut down on the number of special cases we have to deal with, note that a skip list will always start with a “dummy” or “sentinel” SkipNode which contains no data. (This dummy node has already been created for you in the SkipList constructor.) This way, front itself never needs to point to a different node; we can just change the references in the dummy node just as we would with any other node. In addition, we can ensure that the dummy node is always as tall as the tallest node in the remainder of the list, so we can start at the highest level just by looking at the height of the initial dummy node.

The first step is to implement the locate method, which performs the important task of looking for a given data item. It should work very similarly to the process described in the introduction: it starts following links in the top level, and keeps moving down level by level while staying to the left of the node being searched for. However, it returns more than just return the data itself: in particular, it returns a stack of references to nodes encountered during its search. On top of the stack is the last node encountered at level 0; the next item in the stack is the last node encountered at level 1 (i.e. the last node we were on before deciding to move down to level 0); the next item is the last node encountered at level 2; and so on.

We can use this locate method and its stack of previous nodes to implement many other methods—in fact, locate is used to implement all the other methods such as add, lookup, and remove.

Here is how locate should work:

You won’t really be able to test your implementation yet—keep going to the next step!

Step 3: Adding

Step 3a: Implementation

In order to add a new item to the skip list, we begin by using locate to find the place where it should go.

Next, we decide how tall to make the new node using coin flips. To simulate a “coin flip” you can call randGen.nextInt(2) which will randomly return either 0 or 1.

Here is how insertSkipListNode should work:

Finally, create an add method which takes an item of type E as a parameter. It should first call locate, and then call insertSkipListNode with the returned stack of nodes. Don’t forget to increment the size too!

Step 3b: testing

You should finally be able to test your implementation! Run the unit tests in SkipListTest. It should pass testAdd() and testHeight(). Note that testHeight() may require a few seconds to complete.

You can also use the provided toString method to visualize a SkipList. Create a main method in which you create a new SkipList, add some things, and then print out the SkipList. It gets displayed “sideways”, with the lowest layer at the left and the top layer to the right. For example, here is what the result of adding ten random integers between 1 and 100 might look like:

 D : 16 16 16 16 16 16
16 : 20 39 39 42 42 42
20 : 39  |  |  |  |  |
39 : 42 42 42  |  |  |
42 : 43 43 49  /  /  /
43 : 49 49  |  |  |  |
49 : 55  /  /  |  |  |
55 : 75  |  |  |  |  |
75 : 76  |  |  |  |  |
76 : 84  |  |  |  |  |
84 :  /  |  |  |  |  |

Each row corresponds to a node in the SkipList, with the data stored in the node listed at the left. D represents the “dummy” node at the front of the list, which does not store any data. To the right of the colon you can see how tall each node is; each number shows the data contained in the node the link at that level points to. So, for example, from the row

16 : 20 39 39 42 42 42

we can see that this is a node containing the data value 16; at level 0 it points to 20 (the very next node); at levels 1 and 2 it points to the node containing 39; at the rest of the levels it points to the node containing 42. A frontslash (/) represents null. For example,

49 : 55  /  /  |  |  |

shows a node containing 49 with height 3. On level 0, it references the following node containing 55, but at levels 1 and 2 it contains null references. Indeed, you can see there are no nodes of this height in the rest of the list. This could change if any nodes are inserted later.

Step 4: Lookup and Remove

Now add lookup and remove methods.

Step 5: Indexing (extra credit challenge)

So far, although we can find a given element in \(O(\lg n)\) time, there is no way to quickly find the element at a given index (e.g. “find the 200th element in the list”). We could simply follow pointers on level \(0\) one by one and count, but this takes \(O(n)\). The problem with following pointers on higher levels, however, is that we do not know how many nodes we have skipped.

The solution is to store, along with each reference, a count of its “length”, that is, how many nodes it advances through the list. For example, every reference on level \(0\) has length \(1\); in the example below, the references on level \(1\) have lengths \(1\), \(2\), \(1\), \(2\), \(3\) respectively; and so on.

We can then find the element at a particular index using a similar algorithm to locate: as we follow links, we can keep a counter that tells us how far into the list we are. To decide whether to follow a reference, add its length to the current counter, and see whether that would take us farther than we want to go. If not, follow it and repeat; if so, move down a level.

Add length information to the SkipNode references, and use them to implement a method public E get(int index). Note, however, that you will have to modify insertSkipListNode and remove to make sure that they keep the length information properly updated; this is the trickiest part!

What to Hand In

You should turn in