Module 15: IMP

In this module we are going to explore a simple imperative language called IMP.

This module is due Tuesday, November 8, at 1:15pm.

The IMP language

The syntax of the language is as follows:

<prog> ::= <stmt> [ ';' <stmt> ]*

<stmt> ::=
  | <type> <var>
  | <var> ':=' <expr>
  | '{' <prog> '}'
  | 'if' <expr> 'then' <stmt> 'else' <stmt>
  | 'repeat' <expr> <stmt>
  | 'while' <expr> <stmt>
  | 'input' <var>
  | 'output' <expr>

<type> ::= 'int' | 'bool'

<expr> ::=
  | <int>
  | 'False' | 'True'
  | <var>
  | <uop> <expr>
  | <expr> <bop> <expr>

<uop> ::= '-' | '!'

<bop> ::= '+' | '-' | '*' | '/' | '&&' | '||' | '<' | '=='

Notice that the syntax is separated into expressions and statements. The difference is that

Most imperative languages have this distinction between expressions and statements (though some blur the line quite a bit).

Note that a <prog> consists of a sequence of statements, separated by (not ended by) semicolons. A compound statement can be created by surrounding a <prog> with curly braces.

Here is an example of a simple IMP program, which reads an integer from the user and then counts from 1 up to the integer, printing the values to the screen:

int max; int i;
input max;
i := 0;
while i < max {
  i := i + 1;
  output i
}

In order to focus on the parts that are interesting and different, this week I have provided you with some starter code. First, some imports we will need.

> {-# LANGUAGE GADTs        #-}
> 
> import           Parsing2
> 
> import qualified Data.Map           as M
> import           Text.Read          (readMaybe)
> import           System.Environment (getArgs)

We now define algebraic data types for the abstract syntax of IMP. Note that we have two separate types, one for statements and one for expressions.

> type Var = String
> 
> type Prog = [Stmt]
> 
> data Type where
>   TyInt  :: Type
>   TyBool :: Type
>   deriving (Show, Eq)
> 
> data Stmt where
>   Decl   :: Type -> Var -> Stmt           -- <type> <var>
>   Assign :: Var  -> Expr -> Stmt          -- <var> ':=' <expr>
>   Block  :: Prog -> Stmt                  -- '{' <prog> '}'
>   If     :: Expr -> Stmt -> Stmt -> Stmt  -- 'if' <expr> 'then' <stmt> 'else' <stmt>
>   Repeat :: Expr -> Stmt -> Stmt          -- 'repeat' <expr> <stmt>
>   While  :: Expr -> Stmt -> Stmt          -- 'while' <expr> <stmt>
>   Input  :: Var  -> Stmt                  -- 'input' <var>
>   Output :: Expr -> Stmt                  -- 'output' <expr>
>   deriving Show
> 
> data Expr where
>   EInt  :: Integer -> Expr                -- <int>
>   EBool :: Bool    -> Expr                -- 'False' | 'True'
>   EVar  :: Var -> Expr                    -- <var>
>   EUn   :: UOp -> Expr -> Expr            -- <uop> <expr>
>   EBin  :: BOp -> Expr -> Expr -> Expr    -- <expr> <bop> <expr>
>   deriving Show
> 
> data UOp = Neg | Not
>   deriving (Show, Eq)
> 
> data BOp = Add | Sub | Mul | Div | And | Or | Equals | Less
>   deriving (Show, Eq)

Parser

Now, a parser for IMP. You are welcome to skim through it, but there’s nothing really surprising going on.

> lexer :: TokenParser u
> lexer = makeTokenParser $
>   emptyDef
>   { reservedNames   = [ "True", "False", "if", "then", "else", "begin", "end"
>                         , "repeat", "while", "input", "output", "int", "bool" ]
>   , reservedOpNames = [ ":=", "==", "<", "+", "-", "*", "!", "&&", "||"  ]
>   }
> 
> parens :: Parser a -> Parser a
> parens = getParens lexer
> 
> reserved, reservedOp :: String -> Parser ()
> reserved   = getReserved lexer
> reservedOp = getReservedOp lexer
> 
> symbol :: String -> Parser String
> symbol = getSymbol lexer
> 
> ident :: Parser String
> ident = getIdentifier lexer
> 
> integer :: Parser Integer
> integer = getInteger lexer
> 
> whiteSpace :: Parser ()
> whiteSpace = getWhiteSpace lexer
> 
> parseAtom :: Parser Expr
> parseAtom
>   =   EInt        <$> integer
>   <|> EBool True  <$  reserved "True"
>   <|> EBool False <$  reserved "False"
>   <|> EVar        <$> ident
>   <|> parens parseExpr
> 
> parseExpr :: Parser Expr
> parseExpr = buildExpressionParser table parseAtom
>   where
>     table = [ [ unary  "!"  (EUn Not) ]
>             , [ unary  "-"  (EUn Neg) ]
>             , [ binary "*"  (EBin Mul)    AssocLeft
>               , binary "/"  (EBin Div)    AssocLeft ]
>             , [ binary "+"  (EBin Add)    AssocLeft
>               , binary "-"  (EBin Sub)    AssocLeft
>               ]
>             , [ binary "==" (EBin Equals) AssocNone
>               , binary "<"  (EBin Less)   AssocNone
>               ]
>             , [ binary "&&" (EBin And)    AssocRight ]
>             , [ binary "||" (EBin Or)     AssocRight ]
>             ]
>     unary  name fun       = Prefix (fun <$ reservedOp name)
>     binary name fun assoc = Infix  (fun <$ reservedOp name) assoc
> 
> parseProg :: Parser Prog
> parseProg = parseStmt `sepBy` (reservedOp ";")
> 
> parseStmt :: Parser Stmt
> parseStmt =
>       parseBlock
>   <|> If      <$> (reserved "if" *> parseExpr)
>               <*> (reserved "then" *> parseStmt)
>               <*> (reserved "else" *> parseStmt)
>   <|> Repeat  <$> (reserved "repeat" *> parseExpr) <*> parseBlock
>   <|> While   <$> (reserved "while" *> parseExpr)  <*> parseBlock
>   <|> Input   <$> (reserved "input" *> ident)
>   <|> Output  <$> (reserved "output" *> parseExpr)
>   <|> Assign  <$> ident <*> (reservedOp ":=" *> parseExpr)
>   <|> Decl    <$> parseType <*> ident
> 
> parseType :: Parser Type
> parseType = (TyInt <$ reserved "int") <|> (TyBool <$ reserved "bool")
> 
> parseBlock :: Parser Stmt
> parseBlock = Block  <$> (symbol "{" *> parseProg <* symbol "}")
> 
> impParser :: Parser Prog
> impParser = whiteSpace *> parseProg <* eof

Type checker

Next, a static type checker. First, some errors:

> data TypeError where
>   DuplicateVar :: Var -> TypeError
>   UnboundVar   :: Var  -> TypeError
>   Mismatch     :: Expr -> Type -> Type -> TypeError
>   InputBool    :: Var  -> TypeError
>   deriving Show

We can infer/check the type of expressions in exactly the same way as usual.

> type Ctx = M.Map Var Type
> 
> infer :: Ctx -> Expr -> Either TypeError Type
> infer _   (EInt _)        = Right TyInt
> infer _   (EBool _)       = Right TyBool
> infer ctx (EVar x)        =
>   case M.lookup x ctx of
>     Nothing -> Left $ UnboundVar x
>     Just ty -> Right ty
> infer ctx (EBin op e1 e2) = inferBin ctx op e1 e2
> infer ctx (EUn op e)      = inferUn ctx op e
> 
> inferBin :: Ctx -> BOp -> Expr -> Expr -> Either TypeError Type
> inferBin ctx op e1 e2 =
>   case binTy op of
>     (ty1, ty2, tyOut) ->
>       check ctx e1 ty1 *>
>       check ctx e2 ty2 *>
>       Right tyOut
> 
> binTy :: BOp -> (Type, Type, Type)
> binTy op
>   | op `elem` [Add, Sub, Mul, Div] = (TyInt, TyInt, TyInt)
>   | op `elem` [And, Or]            = (TyBool, TyBool, TyBool)
>   | op `elem` [Equals, Less]       = (TyInt, TyInt, TyBool)
>   | otherwise                      = error "Unhandled operator in binTy"
> 
> inferUn :: Ctx -> UOp -> Expr -> Either TypeError Type
> inferUn ctx op e =
>   case unTy op of
>     (tyIn, tyOut) ->
>       check ctx e tyIn *>
>       Right tyOut
> 
> unTy :: UOp -> (Type, Type)
> unTy Neg = (TyInt, TyInt)
> unTy Not = (TyBool, TyBool)
> 
> check :: Ctx -> Expr -> Type -> Either TypeError ()
> check ctx e ty =
>   infer ctx e >>= \ty' ->
>   case ty == ty' of
>     False -> Left $ Mismatch e ty ty'
>     True  -> Right ()

For checking programs, we just go through and check each statement. The interesting difference is that because statements can create variables which are in scope for the rest of the program, both checkProg and checkStmt take a context as an argument and return a new context as output.

> checkProg :: Ctx -> Prog -> Either TypeError Ctx
> checkProg ctx []     = undefined
> checkProg ctx (s:ss) = undefined

And now for checkStmt, which checks an individual statement. Fill in all the undefined places below!

> checkStmt :: Ctx -> Stmt -> Either TypeError Ctx
> checkStmt ctx (Decl ty x)  = undefined
> checkStmt ctx (Assign x e) = undefined

Now we come to checking blocks of the form { <prog> }. In one sense, this is easy, since we can just call checkProg. However, there is one subtle thing to think about. Here are two possible different implementations of this case, call them A and B:

(A) checkStmt ctx (Block ss)   = checkProg ctx ss >>= \ctx' -> Right ctx'

(B) checkStmt ctx (Block ss)   = checkProg ctx ss *> Right ctx
> checkStmt ctx (Block ss)   = undefined

Checking if, repeat, and while is straightforward. Note that we take a similar approach to contexts as we did for blocks above. Take a look at the implementations of if and repeat, then fill in the implementation for while.

> checkStmt ctx (If e s1 s2) =
>   check ctx e TyBool *>
>   checkStmt ctx s1 *>
>   checkStmt ctx s2 *>
>   Right ctx
> checkStmt ctx (Repeat e body) =
>   check ctx e TyInt *>
>   checkStmt ctx body *>
>   Right ctx
> checkStmt ctx (While e body)  =
>   undefined

Checking input and output statements is straightforward: we can only input and output variables with type int.

> checkStmt ctx (Input v)    =
>   case M.lookup v ctx of
>     Nothing    -> Left $ UnboundVar v
>     Just TyInt -> Right ctx
>     Just _     -> Left $ InputBool v
> checkStmt ctx (Output e)   =
>   check ctx e TyInt *> Right ctx

An IMPterpreter

Let’s define a Value to just be an Integer. As usual, if everything has type checked successfully, we can use Integer to represent both integers and booleans, without worrying about nonsensical operations.

> type Value = Integer

A “memory” is a mapping from variable names to values. In the past we have called this an “environment”, but we use the name “memory” now to emphasize the fact that it keeps track of the values of mutable variables, which can be changed by assignment statements.

> type Mem = M.Map Var Value

We interpret expressions as usual. There’s nothing very interesting to see here.

> interpExpr :: Mem -> Expr -> Value
> interpExpr _ (EInt i)       = i
> interpExpr _ (EBool b)      = fromBool b
> interpExpr m (EVar x)       =
>   case M.lookup x m of
>     Just v  -> v
>     Nothing -> error $ "Impossible! Uninitialized variable " ++ x
> interpExpr m (EBin b e1 e2) = interpBOp b (interpExpr m e1) (interpExpr m e2)
> interpExpr m (EUn  u e)     = interpUOp u (interpExpr m e )
> 
> interpUOp :: UOp -> Value -> Value
> interpUOp Neg v = -v
> interpUOp Not v = 1-v
> 
> interpBOp :: BOp -> Value -> Value -> Value
> interpBOp Add    = (+)
> interpBOp Sub    = (-)
> interpBOp Mul    = (*)
> interpBOp Div    = div
> interpBOp And    = (*)
> interpBOp Or     = \v1 v2 -> min 1 (v1 + v2)
> interpBOp Equals = \v1 v2 -> fromBool (v1 == v2)
> interpBOp Less   = \v1 v2 -> fromBool (v1 <  v2)
> 
> fromBool :: Bool -> Value
> fromBool False = 0
> fromBool True  = 1

OK, now we have dealt with expressions. But how do we interpret statements?

Remember that an interpreter turns syntax into semantics, that is, it takes an abstract syntax tree and produces its meaning. So the question to ask ourselves is: what is the meaning of a statement?

The meaning of a statement is an effect: a statement changes the “state of the world” in some way. We can model this as a function which takes the current state of the world and produces a new state. That is, an interpreter for statements will have the type Stmt -> (World -> World) for some appropriate type World.

> data World where
>   W  :: Mem       -- Current state of memory
>      -> [String]  -- Strings typed by the user, waiting to be read by 'input'
>      -> [String]  -- Strings produced by 'output' (newest first)
>      -> World
>   Error :: World  -- Something went wrong
>   deriving Show
> 
> -- An initial world state, given user input
> initWorld :: String -> World
> initWorld inp = W M.empty (words inp) []
> interpStmt :: Stmt -> World -> World
> interpStmt = undefined
> 
> interpRepeat :: Integer -> Stmt -> World -> World
> interpRepeat = undefined
> 
> interpProg :: Prog -> World -> World
> interpProg = undefined

Programming in IMP

At this point, you should be able to compile this module (ghc --make 15-IMP.lhs) and then run it on input files containing IMP programs.

You should feel free to write other example IMP programs to test your implementation as well.

Extending IMP

Feedback

Some extra definitions (feel free to ignore)

> formatWorld :: World -> String
> formatWorld (W m _ o) = unlines $
>      reverse o
>   ++ ["-----"]
>   ++ map formatVar (M.assocs m)
> formatWorld Error = "Error"
> 
> formatVar (x,v) = x ++ " -> " ++ show v
> 
> run :: String -> IO ()
> run fileName = do
>   s <- readFile fileName
>   case parse impParser s of
>     Left err -> print err
>     Right p  ->
>       case checkProg M.empty p of
>         Left tyErr -> print tyErr
>         Right _    -> do
>           inp <- getContents
>           let es = interpProg p (initWorld inp)
>           putStr $ formatWorld es
> 
> main :: IO ()
> main = do
>   args <- getArgs
>   case args of
>     []     -> putStrLn "Please provide a file name."
>     (fn:_) -> run fn