CSCI 151 Lab 6: Skip lists

Overview

This lab will:

Teach you about skip lists, a very cool data structure that combines the benefits of sorted arrays and linked lists;
Give you experience working with linked structures, extensible arrays, and stacks.

Materials

151-skiplist.zip

Setup

Download the skeleton project, 151-skiplist.zip, and import it into Eclipse as usual (extract/unzip the code where you want it, then in Eclipse create a new project, uncheck “use default location” and select the location of the extracted 151-skiplist folder).
If necessary, add the JUnit library to the project build path (right-click on the project folder > Build path > Add Libraries > JUnit).

Introduction

If we want to maintain a sorted list of items, from what we have seen so far we have two options:

Keep a sorted array. Pro: we can use binary search, so finding things takes \(O(\lg n)\) time. Con: in order to insert a new item we have to shift everything to make room, which takes \(O(n)\).
Keep a sorted linked list. Con: we have to use linear search, which takes \(O(n)\). Pro: once we have found the correct place, inserting can be done in \(O(1)\) time just by moving references.

A skip list is an intriguing structure which combines the benefits of both approaches, and allows us to keep a sorted list such that all operations (lookup, insert, remove) take \(O(\lg n)\) time.

Fundamentally, a skip list is like a normal (singly) linked list, but with extra links added that “skip” over parts of the list. These extra links mean we can avoid having to iterate through every single node when looking for something in the list: we can start by taking very big hops in the list until the next big hop would take us beyond what we are looking for; then we take slightly smaller hops; and so on.

For example, the image above illustrates a skip list containing the numbers \(1\) through \(10\). Suppose we are looking for \(8\). We start at the node marked “head”, at the topmost level, i.e. level 3 (the bottom level is level \(0\)). Following the topmost link takes us to the node containing \(1\), which is smaller than \(8\), so we continue searching. The next link on the top level would take us beyond the end of the list (i.e. it is null, shown as NIL in the above picture), so we don’t follow it. Instead, we stay on the node containing \(1\) and move down one level, to level 2. Now, we follow links on level 2 as long as the items we encounter are less than \(8\): the first link takes us to the node containing \(4\); another step takes us to \(6\); then the next step would take us past the end of the list, so we stop and move down to level \(1\). On level \(1\), the next step would take us to \(9\), which is too big. So instead we move down again to level \(0\), and then follow links to \(7\) and finally to \(8\). The point is to notice how we were able to find \(8\) without ever having to look at the nodes containing \(2\), \(3\), or \(5\).

How do we determine how “tall” each node should be? The answer is: by flipping a coin! Each time we create a node, we randomly decide how tall it should be by flipping a coin and seeing how many times in a row the coin comes up heads. We then add one level for each coin toss that came up heads. Every node has height at least \(1\); and then it has a 1/2 chance of having height at least \(2\); a 1/4 chance of having height at least \(3\); a 1/8 chance of height at least \(4\); and so on. So on average, we expect that at each level there will be half as many nodes as the previous level. Conversely, on the very top level, we expect to find only one or two nodes, so each step will cover about half the list; on the next level down, each step between nodes will cover about 1/4 the list; and so on. Thus, we expect searching for a given node to take \(O(\lg n)\) steps on average. As we will see, we also expect adding a new node to take \(O(\lg n)\) steps on average: we can find the right location for it in \(O(\lg n)\), just as with a sorted array, but then we can quickly add it just by shifting pointers around.

Step 1: SkipNode

The first step is to make a generic SkipNode<E> class to represent single nodes in our skip list. Each node should contain two fields:

A generic data item of type E;
An ArrayList of references to other SkipNodes. Index i of the ArrayList will contain the link to the next node on level i.

Add the following methods to your SkipNode class:

A constructor SkipNode(E data) which makes a skip node containing the given data, and creates an empty ArrayList of references.
getHeight: returns the height of the node, defined simply as the length of the ArrayList.
getData: returns the data item stored in the node.
getNext: takes a level as a parameter and returns a reference to the next node on the given level, or null if the node is not tall enough to contain something at the given level.
addNext: takes a reference to a SkipNode as a parameter, and appends it to the ArrayList, i.e. the node becomes one level taller.
setNext: takes two parameters, a level and a reference to a SkipNode, and sets the reference at that level to the given reference. Does nothing if the node is not tall enough (e.g. if a node has height 3 then calling setNext(5, ...) should do nothing, because the node does not contain a reference at level 5).

Each of these methods should only be a few lines of code at most. If they are longer than that, you are doing something wrong. Don’t forget to add appropriate comments to your methods!

Step 2: Locating

We are now ready to start implementing skip lists. I have provided some starter code in SkipList.java; you will be filling in some of the important methods.

To cut down on the number of special cases we have to deal with, note that a skip list will always start with a “dummy” or “sentinel” SkipNode which contains no data. (This dummy node has already been created for you in the SkipList constructor.) This way, front itself never needs to point to a different node; we can just change the references in the dummy node just as we would with any other node. In addition, we can ensure that the dummy node is always as tall as the tallest node in the remainder of the list, so we can start at the highest level just by looking at the height of the initial dummy node.

The first step is to implement the locate method, which performs the important task of looking for a given data item. It should work very similarly to the process described in the introduction: it starts following links in the top level, and keeps moving down level by level while staying to the left of the node being searched for. However, it returns more than just return the data itself: in particular, it returns a stack of references to nodes encountered during its search. On top of the stack is the last node encountered at level 0; the next item in the stack is the last node encountered at level 1 (i.e. the last node we were on before deciding to move down to level 0); the next item is the last node encountered at level 2; and so on.

We can use this locate method and its stack of previous nodes to implement many other methods—in fact, locate is used to implement all the other methods such as add, lookup, and remove.

Here is how locate should work:

Create a new, empty stack of SkipNodes (you can use the standard Java Stack class).
Keep track of our current SkipNode. Begin at front.
For each level \(l\), starting at the highest level and proceeding to level \(0\):
- Keep following links at level \(l\) until following the next link would either take us off the end of the list, or would lead to a node containing a data item which is greater than or equal to the item we are looking for. That is, if you see that the next node contains the item we are looking for, you should not follow the link! This is because we need the stack to contain all the nodes which are previous to the node we are searching for.
- Push the current SkipNode on the stack.

You won’t really be able to test your implementation yet—keep going to the next step!

Step 3: Adding

Step 3a: Implementation

In order to add a new item to the skip list, we begin by using locate to find the place where it should go.

Next, we decide how tall to make the new node using coin flips. To simulate a “coin flip” you can call randGen.nextInt(2) which will randomly return either 0 or 1.

Here is how insertSkipListNode should work:

Create a new SkipNode.
Pop the first SkipNode from the stack; call it prev. This will be the previous node on level 0. Insert the newly created SkipNode right after this previous level 0 node by:
- Adding a reference to the new node which points to whatever prev was pointing to on level 0.
- Now set prev to point to the new node on level 0 instead.
Now for the fun part: we have to decide how tall to make the newly created node by flipping a coin (at this point it has height \(1\)).
As long as a “coin flip” results in heads:
- Increment the current level (use a local variable to keep track).
- If the stack is empty, just add a null reference to the top of the new node.
- Otherwise, pop a SkipNode from the stack (call it prev), and insert the new node after prev on the current level. That is, the new node should point to whatever prev used to point to on the current level, and prev should now point to the new node.
Finally, we may need to grow the height of the dummy node at the front of the skip list to match the height of the node we just created. As long as the height of front is less than the height of the new node, keep increasing the height of front by adding references to the new node.

Finally, create an add method which takes an item of type E as a parameter. It should first call locate, and then call insertSkipListNode with the returned stack of nodes. Don’t forget to increment the size too!

Step 3b: testing

You should finally be able to test your implementation! Run the unit tests in SkipListTest. It should pass testAdd() and testHeight(). Note that testHeight() may require a few seconds to complete.

You can also use the provided toString method to visualize a SkipList. Create a main method in which you create a new SkipList, add some things, and then print out the SkipList. It gets displayed “sideways”, with the lowest layer at the left and the top layer to the right. For example, here is what the result of adding ten random integers between 1 and 100 might look like:

 D : 16 16 16 16 16 16
16 : 20 39 39 42 42 42
20 : 39  |  |  |  |  |
39 : 42 42 42  |  |  |
42 : 43 43 49  /  /  /
43 : 49 49  |  |  |  |
49 : 55  /  /  |  |  |
55 : 75  |  |  |  |  |
75 : 76  |  |  |  |  |
76 : 84  |  |  |  |  |
84 :  /  |  |  |  |  |

Each row corresponds to a node in the SkipList, with the data stored in the node listed at the left. D represents the “dummy” node at the front of the list, which does not store any data. To the right of the colon you can see how tall each node is; each number shows the data contained in the node the link at that level points to. So, for example, from the row

16 : 20 39 39 42 42 42

we can see that this is a node containing the data value 16; at level 0 it points to 20 (the very next node); at levels 1 and 2 it points to the node containing 39; at the rest of the levels it points to the node containing 42. A frontslash (/) represents null. For example,

49 : 55  /  /  |  |  |

shows a node containing 49 with height 3. On level 0, it references the following node containing 55, but at levels 1 and 2 it contains null references. Indeed, you can see there are no nodes of this height in the rest of the list. This could change if any nodes are inserted later.

Step 4: Lookup and Remove

Now add lookup and remove methods.

public E lookup(E toFind): this should work by calling locate, and then checking whether the node on top of the returned stack points on level 0 to a node containing a data item that .equals(toFind). If so, return that data item. Otherwise, return null.

(Remember that this is useful because toFind and the data item might not be exactly identical, even though they are .equal—for example, we could implement a dictionary using a SkipList of Associations, which are .equal as long as their keys are.)
public E remove(E toRemove): this method should call locate, and then check whether the node on top of the returned stack points on level 0 to a node containing the thing we want to remove. If not, do nothing. If it does, then we want to remove that next node. You will need to loop over its layers and set the previous node in each layer (which are on the stack) to point to the node beyond it. Don’t forget to decrement the size.

Step 5: Indexing (extra credit challenge)

So far, although we can find a given element in \(O(\lg n)\) time, there is no way to quickly find the element at a given index (e.g. “find the 200th element in the list”). We could simply follow pointers on level \(0\) one by one and count, but this takes \(O(n)\). The problem with following pointers on higher levels, however, is that we do not know how many nodes we have skipped.

The solution is to store, along with each reference, a count of its “length”, that is, how many nodes it advances through the list. For example, every reference on level \(0\) has length \(1\); in the example below, the references on level \(1\) have lengths \(1\), \(2\), \(1\), \(2\), \(3\) respectively; and so on.

We can then find the element at a particular index using a similar algorithm to locate: as we follow links, we can keep a counter that tells us how far into the list we are. To decide whether to follow a reference, add its length to the current counter, and see whether that would take us farther than we want to go. If not, follow it and repeat; if so, move down a level.

Add length information to the SkipNode references, and use them to implement a method public E get(int index). Note, however, that you will have to modify insertSkipListNode and remove to make sure that they keep the length information properly updated; this is the trickiest part!

What to Hand In

You should turn in

SkipNode.java
SkipList.java
SkipListMain.java

Grading

To get a D, complete Step 1 (SkipNode).
To get a C, complete Step 2 (locating).
To get a B, complete Step 3 (adding).
To get an A, complete Step 4 (lookup and remove).
To get a 100, use good style and documentation throughout.