% -*- mode: LaTeX; compile-command: "mk" -*-
\documentclass{tufte-handout}
%include polycode.fmt
\arrayhs
\usepackage{../../hshw}
\title{\thecourse\ problem set 4}
\date{\small Revision 1: compiled \today\ at \currenttime}
\author{Due Tuesday, February 16}
\begin{document}
% The %if false tells lhs2TeX not to include this in the PDF output
%if false
\begin{code}
{-# LANGUAGE GADTs #-}
\end{code}
%endif
\maketitle
On this problem set, when you are asked to \emph{prove} something, you
should give a formal-style proof using a structured proof format and
equational reasoning. On the other hand, if you are asked to
\emph{show} or \emph{justify} something, an informal (but still
convincing) argument will suffice.
The exercises may refer to the following standard definitions:
\begin{spec}
length :: [a] -> Integer
length [] = 0
length (_:xs) = 1 + length xs
map :: (a -> b) -> [a] -> [b]
map _ [] = []
map f (x:xs) = f x : map f xs
(++) :: [a] -> [a] -> [a]
[] ++ ys = ys
(x:xs) ++ ys = x : (xs ++ ys)
(.) :: (b -> c) -> (a -> b) -> a -> c
(g . f) x = g (f x)
\end{spec}
\section{List induction}
\exercise You must complete \textbf{at least three} out of the
following seven proofs involving list induction. You need not do
more than three, although you may do more if you feel that the
practice is useful.
\begin{enumerate}[label=(\alph*)]
\item Prove that |map id = id|. \marginnote{To prove that two
functions are equal, it suffices to show they have the same result
on all inputs. That is, to prove |f = g|, it suffices to prove
that for all |x| of the appropriate type, |f x = g x|.}
\item Prove that for all functions |f| and |g| of appropriate type,
\[ |map (f . g) = map f . map g|. \]
\item Give a function |f| such that \[ |length = foldr f 0|, \] and
prove it.
\item Prove that for all lists |xs| and |ys|, \[ |length (xs ++ ys) =
length xs + length ys|. \]
\item Give an alternate definition of |(++)| via |foldr|, and prove
that your implementation gives the same results as the standard
definition.
\item Prove that |(++)| is associative. That is, prove that for all
lists |xs|, |ys|, and |zs|, \[ |(xs ++ ys) ++ zs = xs ++ (ys ++ zs)|. \]
\item State and prove a theorem relating |map| and |(++)|.
\end{enumerate}
\section{Difference lists and tree induction}
\noindent Recall the |isBST| function from Problem Set 2, to test
whether a binary tree is a valid binary search tree. One way to
implement it is by checking whether an inorder traversal of the tree
is sorted, as follows:\marginnote[2.5in]{Optional exercise for the
curious: why doesn't |isSorted| crash when given the empty list as
input?}
\begin{code}
data Tree a where
Empty :: Tree a
Node :: a -> Tree a -> Tree a -> Tree a
foldTree :: b -> (a -> b -> b -> b) -> Tree a -> b
foldTree e n Empty = e
foldTree e n (Node a l r) = n a (foldTree e n l) (foldTree e n r)
inorder :: Tree a -> [a]
inorder = foldTree [] (\a l r -> l ++ [a] ++ r)
isSorted :: [Integer] -> Bool
isSorted xs = and (zipWith (<) xs (tail xs))
isBST :: Tree Integer -> Bool
isBST = isSorted . inorder
\end{code}
However, as noted before, |inorder| takes $O(n^2)$ time in the worst
case. The remainder of this problem set first walks you through the
process of understanding why |inorder| can take $O(n^2)$ time, and
then through the development of an implementation with worst-case
$O(n)$ running time.
\label{ex:dlist-first}
\exercise
For each of the following functions, state its type, and explain what
it does.
\begin{enumerate}[label=(\alph*)]
\item |foldr (++) []|
\item |foldl (++) []|
\end{enumerate}
\label{ex:concat-LR}
\exercise \ec \marginnote{\emph{Hint}: \pref{ex:concat-LR} is
nontrivial. You will probably need to prove a lemma about |foldl|.
Ask me if you want some hints on what lemma to prove. Also, you are
welcome to assume without proof that |(++)| is associative, whether
or not you chose to do Exercise 1(f).} Prove that \[ |foldr (++) []
= foldl (++) []|. \]
\hrule \bigskip
Although they have the same \emph{result}, it turns out that
these two folds do \emph{not} have the same \emph{time complexity}!
Analyzing the time complexity of lazy programs turns out to be
somewhat tricky in general\footnote{See \citet{okasaki1999purely} for
a beautiful and comprehensive treatment; come by my office if you
want to look at it.}. For the purposes of this assignment, however,
we can get away with just assuming that the functions are strict (that
is, that function arguments are fully evaluated before the function
body), which simplifies the analysis.
To analyze a function, we first assume that its arguments are
completely evaluated, and then we count the \emph{number of reduction
steps} needed to reach a result which is itself fully evaluated. A
reduction step is simply defined as replacing the left-hand side of a
function definition with its right-hand side.\footnote{In general, one
must be careful when variables on the left-hand side of a function
definition are used multiple times on the right-hand side; the
values of such variables will be shared and only evaluated
once. GHC's runtime system actually maintains a \emph{graph} of
expressions rather than just doing simple textual substitution.
However, on this assignment the issue will not come up.} For
example, if evaluating the function |f| takes constant time, |map f
xs| takes time linear in the length of |xs|---it has to do one
reduction step for each cons in the list (plus one more for the empty
list).
\label{ex:foldLR-complexity}
\exercise
\begin{enumerate}[label=(\alph*)]
\item Show that |xs ++ ys| has time complexity $O(|length xs|)$, \ie
fully evaluating |xs ++ ys| requires a number of reduction steps
linear in the length of |xs|. Note in particular that the time
complexity is independent of the length of |ys|.
\item Let |xss :: [[T]]| be a list of lists (of some type |T| which
does not really matter). Assume, for simplicity, that all the
elements of |xss| are lists of length $1$. That is,
%
\[ |xss = [[x], [y], [z], ...]| \] Show that |foldr (++) [] xss| has
time complexity $O(|length xss|)$.
\item What is the time complexity of |foldl (++) [] xss|? Justify
your answer. \marginnote[-0.2in]{\emph{Hint}: it is not the same as for
|foldr|!}
\end{enumerate}
\hrule \bigskip
Now consider the type of \emph{binary parenthesizations},
\begin{marginfigure}
\begin{center}
\begin{diagram}[width=125]
import Data.Maybe (fromJust)
import Diagrams.TwoD.Layout.Tree
import Diagrams.Prelude hiding ((<&>))
(<&>) :: BTree (Maybe Int) -> BTree (Maybe Int) -> BTree (Maybe Int)
t1 <&> t2 = BNode Nothing t1 t2
l :: Int -> BTree (Maybe Int)
l i = leaf (Just i)
t :: BTree (Maybe Int)
t = ((l 2 <&> l 5) <&> ((l 1 <&> (l 2 <&> l 3)) <&> l 6))
drawT :: BTree (Maybe Int) -> Diagram B
drawT = renderTree drawNd (~~) . fromJust . symmLayoutBin' (with & slHSep .~ 3 & slVSep .~ 2)
drawNd Nothing = mempty
drawNd (Just i) = text (show i) <> circle 1 # fc lightblue
dia = drawT t
# frame 0.5
# lwO 0.7
\end{diagram}
\end{center}
\caption{An example value of type |Parens Integer|} \label{fig:parens}
\end{marginfigure}
\begin{spec}
data Parens a where
Leaf :: a -> Parens a
Bin :: Parens a -> Parens a -> Parens a
\end{spec}
A value of type |Parens a| can be thought of as a binary tree with
data at the leaves, as illustrated in \pref{fig:parens}. It can also
be thought of as a fully parenthesized sequence of |a| values, where
each pair of parentheses contains two expressions (either a value of
type |a| or another parenthesized expression). For for example, the
tree in \pref{fig:parens} corresponds to the parenthesized expression
$((25)((1(23))6))$.
\exercise Write a fold for |Parens|, and use it to implement a
function \[ |flatten :: Parens a -> [a]| \] which ``removes the
parentheses'' by concatenating all the |a| values together.
\marginnote[-0.5in]{Note that in various guises, operations such as
this are rather common---for example, consider traversing a syntax
tree representing a program and collecting a list of errors or
warnings.}
\exercise State the induction principle for |Parens|.
\exercise What is the worst-case big-O time complexity of |flatten|,
where $n$ is the number of values in the input
tree?\marginnote{\emph{Hint}: consider \pref{ex:foldLR-complexity}.}
\bigskip
\hrule \bigskip
\noindent Now consider the type
\begin{spec}
type DList a = [a] -> [a]
\end{spec}
of \term{difference lists}\footnote{This terminology originates in
Prolog.}. A value of type |DList a| is thus a function which
transforms one list of type |[a]| into another. The idea is to use
|DList a| as a (slightly\footnote{OK, very.} strange) way to encode a
list. In particular, we will encode the list |xs :: [a]| as the
\emph{function} which prepends |xs|, that is, the function \[ |\ys ->
xs ++ ys| \] which takes any list |ys| and yields |xs ++ ys| as the
result.
\exercise Implement functions \[ |toDList :: [a] -> DList a| \] and \[
|fromDList :: DList a -> [a]|. \] |toDList| should send a list |xs| to
its encoding as described above. |fromDList| should send an encoding
of a list back to the original list.
\exercise If you have defined |toDList| and |fromDList| correctly,
exactly one of the following two statements should be true. Prove the
one that is true, and give a counterexample for the other.
\begin{itemize}
\item |toDList . fromDList = id|
\item |fromDList . toDList = id|
\end{itemize}
\exercise Define a function \[ |append :: DList a -> DList a -> DList
a| \] such that \[ |toDList (xs ++ ys) = toDList xs `append` toDList
ys|. \] Prove it.
\exercise Define |emptyDList :: DList a| (it should be an identity for
|append|).
\exercise \marginnote{\emph{Hint}: you will probably not get very far
on this problem without trying some examples to develop your
intuition. Just try both functions on a short example list, and
write out the step-by-step reductions.} What is the time complexity
of \[ |fromDList . foldr append emptyDList . map toDList|? \] What
about \[ |fromDList . foldl append emptyDList . map toDList|? \]
Briefly justify your answers.
\label{ex:flatten-prime}
\exercise Now, let's bring everything together. Using the functions
you have defined previously, define a variant function \[ |flatten' ::
Parens a -> [a]| \] which works by converting the |a| values in the
tree to |DList a| values (representing singleton lists), combining
them with |append|, and then, at the end, converting the resulting
|DList a| back to |[a]|.
\label{ex:dlist-last}
\exercise Using the induction principle for |Parens|, prove that
|flatten = flatten'|. What is the worst-case big-O time complexity of
|flatten'|?
\bibliographystyle{plainnat}
\bibliography{Induction}
\end{document}