% -*- mode: LaTeX; compile-command: "mk" -*- \documentclass{tufte-handout} %include polycode.fmt \arrayhs \usepackage{../../hshw} \title{\thecourse\ problem set 4} \date{\small Revision 1: compiled \today\ at \currenttime} \author{Due Tuesday, February 16} \begin{document} % The %if false tells lhs2TeX not to include this in the PDF output %if false \begin{code} {-# LANGUAGE GADTs #-} \end{code} %endif \maketitle On this problem set, when you are asked to \emph{prove} something, you should give a formal-style proof using a structured proof format and equational reasoning. On the other hand, if you are asked to \emph{show} or \emph{justify} something, an informal (but still convincing) argument will suffice. The exercises may refer to the following standard definitions: \begin{spec} length :: [a] -> Integer length [] = 0 length (_:xs) = 1 + length xs map :: (a -> b) -> [a] -> [b] map _ [] = [] map f (x:xs) = f x : map f xs (++) :: [a] -> [a] -> [a] [] ++ ys = ys (x:xs) ++ ys = x : (xs ++ ys) (.) :: (b -> c) -> (a -> b) -> a -> c (g . f) x = g (f x) \end{spec} \section{List induction} \exercise You must complete \textbf{at least three} out of the following seven proofs involving list induction. You need not do more than three, although you may do more if you feel that the practice is useful. \begin{enumerate}[label=(\alph*)] \item Prove that |map id = id|. \marginnote{To prove that two functions are equal, it suffices to show they have the same result on all inputs. That is, to prove |f = g|, it suffices to prove that for all |x| of the appropriate type, |f x = g x|.} \item Prove that for all functions |f| and |g| of appropriate type, $|map (f . g) = map f . map g|.$ \item Give a function |f| such that $|length = foldr f 0|,$ and prove it. \item Prove that for all lists |xs| and |ys|, $|length (xs ++ ys) = length xs + length ys|.$ \item Give an alternate definition of |(++)| via |foldr|, and prove that your implementation gives the same results as the standard definition. \item Prove that |(++)| is associative. That is, prove that for all lists |xs|, |ys|, and |zs|, $|(xs ++ ys) ++ zs = xs ++ (ys ++ zs)|.$ \item State and prove a theorem relating |map| and |(++)|. \end{enumerate} \section{Difference lists and tree induction} \noindent Recall the |isBST| function from Problem Set 2, to test whether a binary tree is a valid binary search tree. One way to implement it is by checking whether an inorder traversal of the tree is sorted, as follows:\marginnote[2.5in]{Optional exercise for the curious: why doesn't |isSorted| crash when given the empty list as input?} \begin{code} data Tree a where Empty :: Tree a Node :: a -> Tree a -> Tree a -> Tree a foldTree :: b -> (a -> b -> b -> b) -> Tree a -> b foldTree e n Empty = e foldTree e n (Node a l r) = n a (foldTree e n l) (foldTree e n r) inorder :: Tree a -> [a] inorder = foldTree [] (\a l r -> l ++ [a] ++ r) isSorted :: [Integer] -> Bool isSorted xs = and (zipWith (<) xs (tail xs)) isBST :: Tree Integer -> Bool isBST = isSorted . inorder \end{code} However, as noted before, |inorder| takes $O(n^2)$ time in the worst case. The remainder of this problem set first walks you through the process of understanding why |inorder| can take $O(n^2)$ time, and then through the development of an implementation with worst-case $O(n)$ running time. \label{ex:dlist-first} \exercise For each of the following functions, state its type, and explain what it does. \begin{enumerate}[label=(\alph*)] \item |foldr (++) []| \item |foldl (++) []| \end{enumerate} \label{ex:concat-LR} \exercise \ec \marginnote{\emph{Hint}: \pref{ex:concat-LR} is nontrivial. You will probably need to prove a lemma about |foldl|. Ask me if you want some hints on what lemma to prove. Also, you are welcome to assume without proof that |(++)| is associative, whether or not you chose to do Exercise 1(f).} Prove that $|foldr (++) [] = foldl (++) []|.$ \hrule \bigskip Although they have the same \emph{result}, it turns out that these two folds do \emph{not} have the same \emph{time complexity}! Analyzing the time complexity of lazy programs turns out to be somewhat tricky in general\footnote{See \citet{okasaki1999purely} for a beautiful and comprehensive treatment; come by my office if you want to look at it.}. For the purposes of this assignment, however, we can get away with just assuming that the functions are strict (that is, that function arguments are fully evaluated before the function body), which simplifies the analysis. To analyze a function, we first assume that its arguments are completely evaluated, and then we count the \emph{number of reduction steps} needed to reach a result which is itself fully evaluated. A reduction step is simply defined as replacing the left-hand side of a function definition with its right-hand side.\footnote{In general, one must be careful when variables on the left-hand side of a function definition are used multiple times on the right-hand side; the values of such variables will be shared and only evaluated once. GHC's runtime system actually maintains a \emph{graph} of expressions rather than just doing simple textual substitution. However, on this assignment the issue will not come up.} For example, if evaluating the function |f| takes constant time, |map f xs| takes time linear in the length of |xs|---it has to do one reduction step for each cons in the list (plus one more for the empty list). \label{ex:foldLR-complexity} \exercise \begin{enumerate}[label=(\alph*)] \item Show that |xs ++ ys| has time complexity $O(|length xs|)$, \ie fully evaluating |xs ++ ys| requires a number of reduction steps linear in the length of |xs|. Note in particular that the time complexity is independent of the length of |ys|. \item Let |xss :: [[T]]| be a list of lists (of some type |T| which does not really matter). Assume, for simplicity, that all the elements of |xss| are lists of length $1$. That is, % $|xss = [[x], [y], [z], ...]|$ Show that |foldr (++) [] xss| has time complexity $O(|length xss|)$. \item What is the time complexity of |foldl (++) [] xss|? Justify your answer. \marginnote[-0.2in]{\emph{Hint}: it is not the same as for |foldr|!} \end{enumerate} \hrule \bigskip Now consider the type of \emph{binary parenthesizations}, \begin{marginfigure} \begin{center} \begin{diagram}[width=125] import Data.Maybe (fromJust) import Diagrams.TwoD.Layout.Tree import Diagrams.Prelude hiding ((<&>)) (<&>) :: BTree (Maybe Int) -> BTree (Maybe Int) -> BTree (Maybe Int) t1 <&> t2 = BNode Nothing t1 t2 l :: Int -> BTree (Maybe Int) l i = leaf (Just i) t :: BTree (Maybe Int) t = ((l 2 <&> l 5) <&> ((l 1 <&> (l 2 <&> l 3)) <&> l 6)) drawT :: BTree (Maybe Int) -> Diagram B drawT = renderTree drawNd (~~) . fromJust . symmLayoutBin' (with & slHSep .~ 3 & slVSep .~ 2) drawNd Nothing = mempty drawNd (Just i) = text (show i) <> circle 1 # fc lightblue dia = drawT t # frame 0.5 # lwO 0.7 \end{diagram} \end{center} \caption{An example value of type |Parens Integer|} \label{fig:parens} \end{marginfigure} \begin{spec} data Parens a where Leaf :: a -> Parens a Bin :: Parens a -> Parens a -> Parens a \end{spec} A value of type |Parens a| can be thought of as a binary tree with data at the leaves, as illustrated in \pref{fig:parens}. It can also be thought of as a fully parenthesized sequence of |a| values, where each pair of parentheses contains two expressions (either a value of type |a| or another parenthesized expression). For for example, the tree in \pref{fig:parens} corresponds to the parenthesized expression $((25)((1(23))6))$. \exercise Write a fold for |Parens|, and use it to implement a function $|flatten :: Parens a -> [a]|$ which removes the parentheses'' by concatenating all the |a| values together. \marginnote[-0.5in]{Note that in various guises, operations such as this are rather common---for example, consider traversing a syntax tree representing a program and collecting a list of errors or warnings.} \exercise State the induction principle for |Parens|. \exercise What is the worst-case big-O time complexity of |flatten|, where $n$ is the number of values in the input tree?\marginnote{\emph{Hint}: consider \pref{ex:foldLR-complexity}.} \bigskip \hrule \bigskip \noindent Now consider the type \begin{spec} type DList a = [a] -> [a] \end{spec} of \term{difference lists}\footnote{This terminology originates in Prolog.}. A value of type |DList a| is thus a function which transforms one list of type |[a]| into another. The idea is to use |DList a| as a (slightly\footnote{OK, very.} strange) way to encode a list. In particular, we will encode the list |xs :: [a]| as the \emph{function} which prepends |xs|, that is, the function $|\ys -> xs ++ ys|$ which takes any list |ys| and yields |xs ++ ys| as the result. \exercise Implement functions $|toDList :: [a] -> DList a|$ and $|fromDList :: DList a -> [a]|.$ |toDList| should send a list |xs| to its encoding as described above. |fromDList| should send an encoding of a list back to the original list. \exercise If you have defined |toDList| and |fromDList| correctly, exactly one of the following two statements should be true. Prove the one that is true, and give a counterexample for the other. \begin{itemize} \item |toDList . fromDList = id| \item |fromDList . toDList = id| \end{itemize} \exercise Define a function $|append :: DList a -> DList a -> DList a|$ such that $|toDList (xs ++ ys) = toDList xs append toDList ys|.$ Prove it. \exercise Define |emptyDList :: DList a| (it should be an identity for |append|). \exercise \marginnote{\emph{Hint}: you will probably not get very far on this problem without trying some examples to develop your intuition. Just try both functions on a short example list, and write out the step-by-step reductions.} What is the time complexity of $|fromDList . foldr append emptyDList . map toDList|?$ What about $|fromDList . foldl append emptyDList . map toDList|?$ Briefly justify your answers. \label{ex:flatten-prime} \exercise Now, let's bring everything together. Using the functions you have defined previously, define a variant function $|flatten' :: Parens a -> [a]|$ which works by converting the |a| values in the tree to |DList a| values (representing singleton lists), combining them with |append|, and then, at the end, converting the resulting |DList a| back to |[a]|. \label{ex:dlist-last} \exercise Using the induction principle for |Parens|, prove that |flatten = flatten'|. What is the worst-case big-O time complexity of |flatten'|? \bibliographystyle{plainnat} \bibliography{Induction} \end{document}