Project 1: Arith compiler
We won’t spend much time in this course talking about compilers. But for this first project you will explore a very simple compiler for the Arith language.
Remember that you must complete this project independently. The project is due Thursday, September 15, at 1:15pm.
First, some extensions and imports we will need for the parser. If you get errors about modules not being found, run cabal install parsec
from the command line.
> {-# LANGUAGE GADTs #-}
>
> import Text.Parsec hiding (Error, many, (<|>))
> import Text.Parsec.Expr
> import Text.Parsec.Language (emptyDef)
> import Text.Parsec.String (Parser)
> import qualified Text.Parsec.Token as P
> import Control.Applicative
Here are the data types we used to represent Arith abstract syntax in class, along with a simple interpreter.
> data Op where
> Plus :: Op
> Minus :: Op
> Times :: Op
> deriving (Show, Eq)
>
> data Arith where
> Lit :: Integer -> Arith
> Bin :: Op -> Arith -> Arith -> Arith
> deriving (Show)
>
> interp :: Arith -> Integer
> interp (Lit n) = n
> interp (Bin op a1 a2) = interpOp op (interp a1) (interp a2)
>
> interpOp :: Op -> Integer -> Integer -> Integer
> interpOp Plus = (+)
> interpOp Minus = (-)
> interpOp Times = (*)
The abstract stack machine
Instead of compiling Arith programs to machine code, you will compile them to an abstract machine. An abstract machine is just like a real machine except for the fact that it is imaginary.
Our imaginary machine is quite simple. It keeps track of a list of instructions to execute, and a stack of integers. There are four instructions it knows how to execute:
PUSH n
: given an integern
, push it on top of the stack.ADD
: pop the top two integers off the stack, add them, and push the result back on top of the stack. The machine halts with an error if there are fewer than two integers on the stack.SUB
: pop the top two integers, subtract the topmost from the other, and push the result.MUL
: pop the top two integers, multiply them, and push the result.
- Make a data type called
Instruction
to represent the four stack machine instructions described above.
Our machine can also be in one of three states:
WORKING
: the machine has some next instructions to execute and a stack.DONE
: in this case there are no more instructions to execute. The machine remembers only the final stack.ERROR
: something has gone terribly, horribly wrong. In this state, the machine does not need to remember any instructions or stack.
Make a data type called
MachineState
to represent the possible states of the machine, as described above. Each different state should contain whatever information the machine needs to remember in that state.Write a function
step :: MachineState -> MachineState
which executes a single step of the machine. For example, in theWORKING
state it should execute the next instruction and return an appropriate next state for the machine.Write
execute :: [Instruction] -> MachineState
, which takes a program and runs the machine (starting with an empty stack) until the machine won’t run anymore (that is, it has reached aDONE
orERROR
state).Finally, write
run :: [Instruction] -> Maybe Integer
, which executes the program and then returnsNothing
if the machine halted with anERROR
or an empty stack, orJust
the top integer on the stack if the machine successfully finished and left at least one integer on the stack.
The compiler
Now that you have a working abstract machine, you can compile Arith expressions into equivalent programs that run on the abstract machine.
Write a function compile
which takes an Arith
and yields a list of Instruction
s.
Of course, your compiler should output not just any list of instructions! It should output a program which, when run on the abstract machine, successfully produces the same integer result as the Arith interpreter would. That is, for any a :: Arith
,
run (compile a) == Just (interp a)
Optional extensions
- Try extending Arith along with the parser, interpreter, and compiler with more operations (for example, modulus or exponentiation). To extend the parser you should only have to edit the definition of
table
, which has an explanatory comment.
Parser
This parser is provided for your convenience, to help you test your functions. Use the readArith
function to parse concrete Arith syntax into an AST.
> readArith :: String -> Arith
> readArith s = case parse parseArith "" s of
> Left err -> error (show err)
> Right a -> a
Pay no attention to the man behind the curtain…
> lexer :: P.TokenParser u
> lexer = P.makeTokenParser $
> emptyDef
> { P.opStart = oneOf "+-*"
> , P.opLetter = oneOf "+-*"
> , P.reservedOpNames = ["+", "-", "*"]
> }
>
> parens :: Parser a -> Parser a
> parens = P.parens lexer
>
> reservedOp :: String -> Parser ()
> reservedOp = P.reservedOp lexer
>
> integer :: Parser Integer
> integer = P.integer lexer
>
> whiteSpace :: Parser ()
> whiteSpace = P.whiteSpace lexer
>
> parseAtom :: Parser Arith
> parseAtom = Lit <$> integer <|> parens parseExpr
>
> parseExpr :: Parser Arith
> parseExpr = buildExpressionParser table parseAtom <?> "expression"
> where
> -- Each list of operators in the table has the same precedence, and
> -- the lists are ordered from highest precedence to lowest. So
> -- in this case '*' has the highest precedence, and then "+" and
> -- "-" have lower (but equal) precedence.
> table = [ [ binary "*" (Bin Times) AssocLeft ]
> , [ binary "+" (Bin Plus) AssocLeft
> , binary "-" (Bin Minus) AssocLeft
> ]
> ]
>
> binary name fun assoc = Infix (reservedOp name >> return fun) assoc
>
> parseArith :: Parser Arith
> parseArith = whiteSpace *> parseExpr <* eof