CSCI 151 Lab 6: Skip lists
Overview
This lab will:
- Teach you about skip lists, a very cool data structure that combines the benefits of sorted arrays and linked lists;
- Give you experience working with linked structures, extensible arrays, and stacks.
Materials
Setup
Download the skeleton project, 151-skiplist.zip, and import it into Eclipse as usual (extract/unzip the code where you want it, then in Eclipse create a new project, uncheck “use default location” and select the location of the extracted 151-skiplist folder).
If necessary, add the
JUnit
library to the project build path (right-click on the project folder > Build path > Add Libraries > JUnit).
Introduction
If we want to maintain a sorted list of items, from what we have seen so far we have two options:
- Keep a sorted array. Pro: we can use binary search, so finding things takes \(O(\lg n)\) time. Con: in order to insert a new item we have to shift everything to make room, which takes \(O(n)\).
- Keep a sorted linked list. Con: we have to use linear search, which takes \(O(n)\). Pro: once we have found the correct place, inserting can be done in \(O(1)\) time just by moving references.
A skip list is an intriguing structure which combines the benefits of both approaches, and allows us to keep a sorted list such that all operations (lookup, insert, remove) take \(O(\lg n)\) time.
Fundamentally, a skip list is like a normal (singly) linked list, but with extra links added that “skip” over parts of the list. These extra links mean we can avoid having to iterate through every single node when looking for something in the list: we can start by taking very big hops in the list until the next big hop would take us beyond what we are looking for; then we take slightly smaller hops; and so on.
For example, the image above illustrates a skip list containing the numbers \(1\) through \(10\). Suppose we are looking for \(8\). We start at the node marked “head”, at the topmost level, i.e. level 3 (the bottom level is level \(0\)). Following the topmost link takes us to the node containing \(1\), which is smaller than \(8\), so we continue searching. The next link on the top level would take us beyond the end of the list (i.e. it is null, shown as NIL in the above picture), so we don’t follow it. Instead, we stay on the node containing \(1\) and move down one level, to level 2. Now, we follow links on level 2 as long as the items we encounter are less than \(8\): the first link takes us to the node containing \(4\); another step takes us to \(6\); then the next step would take us past the end of the list, so we stop and move down to level \(1\). On level \(1\), the next step would take us to \(9\), which is too big. So instead we move down again to level \(0\), and then follow links to \(7\) and finally to \(8\). The point is to notice how we were able to find \(8\) without ever having to look at the nodes containing \(2\), \(3\), or \(5\).
How do we determine how “tall” each node should be? The answer is: by flipping a coin! Each time we create a node, we randomly decide how tall it should be by flipping a coin and seeing how many times in a row the coin comes up heads. We then add one level for each coin toss that came up heads. Every node has height at least \(1\); and then it has a 1/2 chance of having height at least \(2\); a 1/4 chance of having height at least \(3\); a 1/8 chance of height at least \(4\); and so on. So on average, we expect that at each level there will be half as many nodes as the previous level. Conversely, on the very top level, we expect to find only one or two nodes, so each step will cover about half the list; on the next level down, each step between nodes will cover about 1/4 the list; and so on. Thus, we expect searching for a given node to take \(O(\lg n)\) steps on average. As we will see, we also expect adding a new node to take \(O(\lg n)\) steps on average: we can find the right location for it in \(O(\lg n)\), just as with a sorted array, but then we can quickly add it just by shifting pointers around.
Step 1: SkipNode
The first step is to make a generic SkipNode<E>
class to represent single nodes in our skip list. Each node should contain two fields:
- A generic data item of type
E
; - An
ArrayList
of references to otherSkipNode
s. Indexi
of theArrayList
will contain the link to the next node on leveli
.
Add the following methods to your SkipNode
class:
- A constructor
SkipNode(E data)
which makes a skip node containing the given data, and creates an emptyArrayList
of references. getHeight
: returns the height of the node, defined simply as the length of theArrayList
.getData
: returns thedata
item stored in the node.getNext
: takes a level as a parameter and returns a reference to the next node on the given level, ornull
if the node is not tall enough to contain something at the given level.addNext
: takes a reference to aSkipNode
as a parameter, and appends it to theArrayList
, i.e. the node becomes one level taller.setNext
: takes two parameters, a level and a reference to aSkipNode
, and sets the reference at that level to the given reference. Does nothing if the node is not tall enough (e.g. if a node has height 3 then callingsetNext(5, ...)
should do nothing, because the node does not contain a reference at level 5).
Each of these methods should only be a few lines of code at most. If they are longer than that, you are doing something wrong. Don’t forget to add appropriate comments to your methods!
Step 2: Locating
We are now ready to start implementing skip lists. I have provided some starter code in SkipList.java
; you will be filling in some of the important methods.
To cut down on the number of special cases we have to deal with, note that a skip list will always start with a “dummy” or “sentinel” SkipNode
which contains no data. (This dummy node has already been created for you in the SkipList
constructor.) This way, front
itself never needs to point to a different node; we can just change the references in the dummy node just as we would with any other node. In addition, we can ensure that the dummy node is always as tall as the tallest node in the remainder of the list, so we can start at the highest level just by looking at the height of the initial dummy node.
The first step is to implement the locate
method, which performs the important task of looking for a given data item. It should work very similarly to the process described in the introduction: it starts following links in the top level, and keeps moving down level by level while staying to the left of the node being searched for. However, it returns more than just return the data itself: in particular, it returns a stack of references to nodes encountered during its search. On top of the stack is the last node encountered at level 0; the next item in the stack is the last node encountered at level 1 (i.e. the last node we were on before deciding to move down to level 0); the next item is the last node encountered at level 2; and so on.
We can use this locate
method and its stack of previous nodes to implement many other methods—in fact, locate
is used to implement all the other methods such as add
, lookup
, and remove
.
Here is how locate
should work:
- Create a new, empty stack of
SkipNode
s (you can use the standard JavaStack
class). - Keep track of our current
SkipNode
. Begin atfront
. - For each level \(l\), starting at the highest level and proceeding to level \(0\):
- Keep following links at level \(l\) until following the next link would either take us off the end of the list, or would lead to a node containing a
data
item which is greater than or equal to the item we are looking for. That is, if you see that the next node contains the item we are looking for, you should not follow the link! This is because we need the stack to contain all the nodes which are previous to the node we are searching for. - Push the current
SkipNode
on the stack.
- Keep following links at level \(l\) until following the next link would either take us off the end of the list, or would lead to a node containing a
You won’t really be able to test your implementation yet—keep going to the next step!
Step 3: Adding
Step 3a: Implementation
In order to add a new item to the skip list, we begin by using locate
to find the place where it should go.
Next, we decide how tall to make the new node using coin flips. To simulate a “coin flip” you can call randGen.nextInt(2)
which will randomly return either 0
or 1
.
Here is how insertSkipListNode
should work:
- Create a new
SkipNode
. - Pop the first
SkipNode
from the stack; call itprev
. This will be the previous node on level 0. Insert the newly createdSkipNode
right after this previous level 0 node by:- Adding a reference to the new node which points to whatever
prev
was pointing to on level 0. - Now set
prev
to point to the new node on level 0 instead.
- Adding a reference to the new node which points to whatever
- Now for the fun part: we have to decide how tall to make the newly created node by flipping a coin (at this point it has height \(1\)).
- As long as a “coin flip” results in heads:
- Increment the current level (use a local variable to keep track).
- If the stack is empty, just add a
null
reference to the top of the new node. - Otherwise, pop a
SkipNode
from the stack (call itprev
), and insert the new node afterprev
on the current level. That is, the new node should point to whateverprev
used to point to on the current level, andprev
should now point to the new node.
- Finally, we may need to grow the height of the dummy node at the front of the skip list to match the height of the node we just created. As long as the height of
front
is less than the height of the new node, keep increasing the height offront
by adding references to the new node.
Finally, create an add
method which takes an item of type E
as a parameter. It should first call locate
, and then call insertSkipListNode
with the returned stack of nodes. Don’t forget to increment the size
too!
Step 3b: testing
You should finally be able to test your implementation! Run the unit tests in SkipListTest
. It should pass testAdd()
and testHeight()
. Note that testHeight()
may require a few seconds to complete.
You can also use the provided toString
method to visualize a SkipList
. Create a main
method in which you create a new SkipList
, add
some things, and then print out the SkipList
. It gets displayed “sideways”, with the lowest layer at the left and the top layer to the right. For example, here is what the result of adding ten random integers between 1 and 100 might look like:
D : 16 16 16 16 16 16
16 : 20 39 39 42 42 42
20 : 39 | | | | |
39 : 42 42 42 | | |
42 : 43 43 49 / / /
43 : 49 49 | | | |
49 : 55 / / | | |
55 : 75 | | | | |
75 : 76 | | | | |
76 : 84 | | | | |
84 : / | | | | |
Each row corresponds to a node in the SkipList
, with the data stored in the node listed at the left. D
represents the “dummy” node at the front of the list, which does not store any data. To the right of the colon you can see how tall each node is; each number shows the data contained in the node the link at that level points to. So, for example, from the row
16 : 20 39 39 42 42 42
we can see that this is a node containing the data value 16
; at level 0 it points to 20
(the very next node); at levels 1 and 2 it points to the node containing 39
; at the rest of the levels it points to the node containing 42
. A frontslash (/
) represents null
. For example,
49 : 55 / / | | |
shows a node containing 49
with height 3. On level 0, it references the following node containing 55
, but at levels 1 and 2 it contains null
references. Indeed, you can see there are no nodes of this height in the rest of the list. This could change if any nodes are inserted later.
Step 4: Lookup and Remove
Now add lookup
and remove
methods.
public E lookup(E toFind)
: this should work by callinglocate
, and then checking whether the node on top of the returned stack points on level 0 to a node containing a data item that.equals(toFind)
. If so, return that data item. Otherwise, returnnull
.(Remember that this is useful because
toFind
and the data item might not be exactly identical, even though they are.equal
—for example, we could implement a dictionary using aSkipList
ofAssociation
s, which are.equal
as long as their keys are.)public E remove(E toRemove)
: this method should calllocate
, and then check whether the node on top of the returned stack points on level 0 to a node containing the thing we want to remove. If not, do nothing. If it does, then we want to remove that next node. You will need to loop over its layers and set the previous node in each layer (which are on the stack) to point to the node beyond it. Don’t forget to decrement thesize
.
Step 5: Indexing (extra credit challenge)
So far, although we can find a given element in \(O(\lg n)\) time, there is no way to quickly find the element at a given index (e.g. “find the 200th element in the list”). We could simply follow pointers on level \(0\) one by one and count, but this takes \(O(n)\). The problem with following pointers on higher levels, however, is that we do not know how many nodes we have skipped.
The solution is to store, along with each reference, a count of its “length”, that is, how many nodes it advances through the list. For example, every reference on level \(0\) has length \(1\); in the example below, the references on level \(1\) have lengths \(1\), \(2\), \(1\), \(2\), \(3\) respectively; and so on.
We can then find the element at a particular index using a similar algorithm to locate
: as we follow links, we can keep a counter that tells us how far into the list we are. To decide whether to follow a reference, add its length to the current counter, and see whether that would take us farther than we want to go. If not, follow it and repeat; if so, move down a level.
Add length information to the SkipNode
references, and use them to implement a method public E get(int index)
. Note, however, that you will have to modify insertSkipListNode
and remove
to make sure that they keep the length information properly updated; this is the trickiest part!
What to Hand In
You should turn in
SkipNode.java
SkipList.java
SkipListMain.java
Grading
- To get a D, complete Step 1 (
SkipNode
). - To get a C, complete Step 2 (locating).
- To get a B, complete Step 3 (adding).
- To get an A, complete Step 4 (lookup and remove).
- To get a 100, use good style and documentation throughout.