Project 1: Arith compiler
We won’t spend much time in this course talking about compilers. But for this first project you will explore a very simple compiler for the Arith language.
Remember that you must complete this project independently. The project is due Tuesday, October 2, at 9:45am.
First, some extensions and imports we will need for the parser. Don’t forget to download Parsing.hs
and put it in the same directory as ArithCompiler.lhs
.
Here are the data types we used to represent Arith abstract syntax in class, along with a simple interpreter.
data Op where
Plus :: Op
Minus :: Op
Times :: Op
deriving (Show, Eq)
data Arith where
Lit :: Integer -> Arith
Bin :: Op -> Arith -> Arith -> Arith
deriving (Show)
interp :: Arith -> Integer
interp (Lit n) = n
interp (Bin op a1 a2) = interpOp op (interp a1) (interp a2)
interpOp :: Op -> Integer -> Integer -> Integer
interpOp Plus = (+)
interpOp Minus = (-)
interpOp Times = (*)
The abstract stack machine
Instead of compiling Arith programs to machine code, you will compile them to an abstract machine. An abstract machine is just like a real machine except for the fact that it is imaginary.
Our imaginary machine is quite simple. It keeps track of a list of instructions to execute, and a stack of integers (recall that Haskell lists can also be used as stacks). There are four instructions it knows how to execute:
PUSH n
: given an integern
, push it on top of the stack.ADD
: pop the top two integers off the stack, add them, and push the result back on top of the stack. The machine halts with an error if there are fewer than two integers on the stack.SUB
: pop the top two integers, subtract the topmost from the other, and push the result.MUL
: pop the top two integers, multiply them, and push the result.
- Make a data type called
Instruction
to represent the four stack machine instructions described above.
Our machine can also be in one of three states. Each state may additionally store some information.
WORKING
: this state corresponds to normal operation of the machine. It should contain a list of remaining instructions to execute and a stack of integers.DONE
: this state means there are no more instructions to execute. It should contain only the final stack.ERROR
: something has gone terribly, horribly wrong. In this state, the machine does not need to remember any instructions or stack.
Make a data type called
MachineState
to represent the possible states of the machine, as described above. Each different state should contain whatever information the machine needs to remember in that state.Write a function
step :: MachineState -> MachineState
which executes a single step of the machine. For example, in theWORKING
state it should try executing the next instruction and return an appropriate next state for the machine.Write
execute :: [Instruction] -> MachineState
, which takes a program and runs the machine (starting with an empty stack) until the machine won’t run anymore (that is, it has reached aDONE
orERROR
state).Finally, write
run :: [Instruction] -> Maybe Integer
, which executes the program and then returnsNothing
if the machine halted with anERROR
or an empty stack, orJust
the top integer on the stack if the machine successfully finished and left at least one integer on the stack.
The compiler
Now that you have a working abstract machine, you can compile Arith expressions into equivalent programs that run on the abstract machine.
- Write a function
compile
which takes anArith
and yields a list ofInstruction
s.
Of course, your compiler should output not just any list of instructions! It should output a program which, when run on the abstract machine, successfully produces the same integer result as the Arith interpreter would. That is, for any a :: Arith
,
run (compile a) == Just (interp a)
To test the above it will be convenient to write a function which finally puts some of these things together:
- Write a function
exec :: String -> Maybe Integer
which takes aString
, parses it, compiles the resultingArith
, and thenrun
s the generated abstract machine instructions. It should returnNothing
if the givenString
does not parse.
You should now be able to test that if s
is any String
, then eval s == exec s
.
Parser
This parser is provided for your convenience, to help you test your functions. Use the readArith
function to parse concrete Arith syntax into an AST.
readArith :: String -> Arith
readArith s = case parse parseArith s of
Left err -> error (show err)
Right a -> a
Pay no attention to the man behind the curtain…
lexer :: TokenParser u
lexer = makeTokenParser emptyDef
parens :: Parser a -> Parser a
parens = getParens lexer
reservedOp :: String -> Parser ()
reservedOp = getReservedOp lexer
integer :: Parser Integer
integer = getInteger lexer
whiteSpace :: Parser ()
whiteSpace = getWhiteSpace lexer
parseAtom :: Parser Arith
parseAtom = Lit <$> integer <|> parens parseExpr
parseExpr :: Parser Arith
parseExpr = buildExpressionParser table parseAtom
where
-- Each list of operators in the table has the same precedence, and
-- the lists are ordered from highest precedence to lowest. So
-- in this case '*' has the highest precedence, and then "+" and
-- "-" have lower (but equal) precedence.
table = [ [ binary "*" (Bin Times) AssocLeft ]
, [ binary "+" (Bin Plus) AssocLeft
, binary "-" (Bin Minus) AssocLeft
]
]
binary name fun assoc = Infix (reservedOp name >> return fun) assoc
parseArith :: Parser Arith
parseArith = whiteSpace *> parseExpr <* eof
eval :: String -> Maybe Integer
eval s = case parse parseArith s of
Left _ -> Nothing
Right a -> Just (interp a)