Module 10: Static types
This module is due at 1:15pm on Tuesday, October 11.
- Write your team names here:
In this module, we will continue our exploration of type systems.
Once again, we will build on the Arith language from previous weeks.
Choose any driver to start.
Begin by copying only your AST types and parser (and desugaring, if applicable) from module 09 below. Do not copy your interpreter, since we will not be extending it, but instead redoing it in a rather different style.
Static type checking
We are going to add a new type checking phase in between parsing and interpreting. (Note, if you did the optional extension in the previous module to add extra operators like &&
, >=
, and so on, read this footnote.1) The type checker should adhere to the Central Dogma of Typechecking:
If type checking succeeds, then there should be no runtime errors.
Or, more pithily,
Well-typed programs don’t go wrong.2
(We will add some caveats to this later.)
First, we will need a Haskell data type to represent types in Arith. Create a new data type called
Type
with two constructors to represent the two types in Arith, namely, integers and booleans.We will also need a data type to represent errors that can be generated during type checking. For now, it should have two constructors:
One to represent the occurrence of an undefined (or “unbound”) variable.
One to represent a type error. It should contain three fields: an
Arith
(representing the expression which failed to typecheck), and twoType
s (representing the type the expression was expected to have, along with its actual type).
Remember that when writing an interpreter we need a way to deal with variables: when we get to a variable, how should it be interpreted? We solved this problem by adding an environment which was a mapping from variables to values. We have a similar problem when type checking: when we get to a variable, what type should it have? The solution is also similar: we need a type context which is a mapping from variables to types. Define
type Ctx = M.Map String Type
.Now write the type checker itself! You should implement two main functions,
infer
andcheck
, which are described below. I have also included lots of other notes and hints to help you with the implementation; I suggest at least skimming through all the notes before starting to implementinfer
andcheck
.infer :: Ctx -> Arith -> Either TypeError Type
takes a typing context and anArith
expression, and infers the type of the expression (or generates an error if the expression has no type).check :: Ctx -> Arith -> Type -> Either TypeError ()
takes a typing context, anArith
expression, and a type, and checks that the expression has the given type. It throws an error if either the expression does not have a type, or if it has a type which is different than the given one. Note that it returns a value of type()
(unit), which carries no useful information: we only care whether there are any errors or not.
Some notes:
check
should be very easy to implement: justinfer
the type of the givenArith
expression, and then test whether its inferred type is equal to the given type.infer
can make use ofcheck
in situations where you know someArith
expression needs to have a certain type. For example, if you encounter an expressiona + b
, you shouldcheck
that botha
andb
have type Integer.Yes, you read that right:
check
andinfer
should be mutually recursive, that is, each calls the other.Remember that for parsers,
p1 *> p2
runs parserp1
and then throws away its input before runningp2
. In this context, ift1
andt2
are expressions that could throw a type checking error,t1 *> t2
will runt1
and make sure it does not generate an error before runningt2
. This is especially useful for callingcheck
(which does not return a useful result anyway). For example, you might have something like this to infer the type ofa + b
:infer ctx (Bin Add a b) = check ctx a TyInt *> -- make sure a has type Int check ctx b TyInt *> -- make sure b has type Int Right TyInt -- If no errors were generated, -- finally return Int as the type -- of the whole expression (a + b)
infer
will also need to make use of(>>=)
. For example, when inferring the type of anif
expression, you will need to infer the types of itsthen
- andelse
-parts:infer ctx (If test t1 t2) = ... other stuff ... infer ctx t1 >>= \ty1 -> ... some more stuff here that can use ty1 ...
The types of all the binary operators should be evident. Hint: You might find it useful to make a function
bopType :: Op -> (Type, Type, Type)
which returns the expected input types and the output type of a binary operator.The most interesting cases are how to typecheck
if
-expressions andlet
-expressions. Rather than tell you how they are done, I will let you think about them—in principle, you should be able to derive the correct implementation by thinking about how these expressions are interpreted (although there is a slight twist withif
). But feel free to ask me questions.
Finally, make a function
inferArith :: Arith -> Either TypeError Type
which simply runsinfer
starting with an empty typing context.
Interpreter
ROTATE ROLES and write the name of the new driver here:
After typechecking has succeeded, can we still get runtime errors? If so, which runtime errors can still happen? Which are no longer possible?
As you can see, the Central Dogma of Typechecking does not hold literally in our case! In general, we pick a subset of runtime errors that we care about, and design a type system which prevents those particular runtime errors, but might not prevent others.
Given a static type checker, we can do something very interesting with the interpreter. The whole idea of static type checking is that if a program successfully type checks, there will never be any type errors at runtime, so checking for them would be a complete waste of time.
One might think, however, that this runtime type checking is unavoidable: our programs can evaluate either to booleans or integers, so we need the Value
type to represent the two possibilities. Then we have to pattern-match on Value
s all the time to make sure we have the right kind of value (even though we know that one of the cases cannot happen).
However, we can use a trick: we can make the interpreter just return Integer
again, and encode boolean values as integers. For example, encode False
as 0, and True
as 1. When the interpreter is running, there is no way to tell whether a particular Integer
value is supposed to represent an actual integer or a boolean—we say that the type information has been erased. This means that in theory, the interpreter could perform nonsensical operations such as adding 3 + True
(resulting in 4
). However, the interpreter will only ever run programs which successfully typecheck; if the typechecker is implemented correctly, then cases like this can never actually happen! Such errors would have been caught as type errors. (Ideally, this is something we could formally prove about our system.) This is one reason static type checking can lead to better runtime performance: we don’t have to carry around type information and constantly check types of values at runtime, which can remove a lot of runtime overhead (as well as create better opportunities for optimization).
Begin by copying your interpreter from module 8 and pasting it here (not your dynamic typechecking interpreter from the previous module).
Remove any
InterpError
s which cannot happen anymore. Replace them in the interpreter with calls toerror
, with a message like"Bug! This case should be impossible! Fizzwoz cannot happen in the interpreter, since bar baz."
(of course, you should replace the nonsense with something appropriate).Now extend
interpArith
appropriately. Remember that in the case that you need a boolean value, you can just assume that you have anInteger
which is either0
or1
representingFalse
orTrue
.Finally, update the
eval
function. It should deal appropriately with parse errors, type errors, runtime errors, and with showing the output of the interpreter. For example, here is what my solution looks like (note that depending on whether you did the optional extension, yours may not be able to handle things like||
,<=
, and so on):λ> eval "let y = 2 in let x = (if (3 > 5) || (y*3 <= 10) then 5+y else 9*10) in x" 7 λ> eval "let x = 3 in ! (x < 8)" False λ> eval "let y = 2 in let x = (if (3 > 5) || (y*3 <= 10) then 5+y else 9*10) in z" Unbound variable z λ> eval "let y = False in let x = (if (3 > 5) || (y*3 <= 10) then 5+y else 9*10) in x" Type mismatch: Var "y" was used in a context that expected TInt, but it actually has type TBool. λ> eval "let y = 2 in let x = (if (3 > 5) || (y*3 <= 10) then 5+y else False && True) in x" The branches of the if expression If (Bin FOr (Bin FGt (ILit 3) (ILit 5)) (Bin FLeq (Bin FTimes (Var "y") (ILit 3)) (ILit 10))) (Bin FPlus (ILit 5) (Var "y")) (Bin FAnd (BLit False) (BLit True)) should have the same type, but they have types TInt and TBool respectively.
A few notes:
- To get really nice type error messages, one would have to include a pretty-printer. As you can see, I have not done so, but you are welcome to add one if you wish.
- As you can see, I included a special kind of type error for
if
-expressions; you can do this if you wish but it is optional. - Note that one cannot simply
show
the result of the interpreter, since this would sometimes e.g. show0
when it should beFalse
. To correctly show the output of the interpreter, one needs to take the inferred type of the expression into account.
Try evaluating
if False then 3 else True
. What happens? How is this different than the previous module?
Feedback
How long would you estimate that you spent working on this module?
Were any parts particularly confusing or difficult?
Were any parts particularly fun or interesting?
Record here any other questions, comments, or suggestions for improvement.
If you desugar the new operators directly inside the parser, you will necessarily be running the type checker over the simpler, desugared syntax, which means that any type errors will refer to desugared programs. This is confusing for the programmer, because they may get errors about programs they did not write. For example, if the programmer writes
True && 3
, it will first desugar toif True then 3 else False
, and they will then get a type error referring to thisif
-expression even though they never wrote anyif
. This is one good reason to have desugaring as a separate phase from parsing: first, the more complex language (the “surface language”) is parsed into an AST which can represent the complete syntax of the surface language; this is then typechecked, so any type errors refer to the actual program the programmer wrote; a desugaring function then translates the successfully typechecked AST into the simpler language (the “core language”); finally, the interpreter (or compiler, optimizer, etc.) can work directly with the simplified core language. Thankfully, in this case, writing a typechecker for the language extended with a bunch of extra operators is only slightly more work than writing a typechecker for the simpler language. The hard parts are things likeif
andlet
; adding more operators is relatively trivial.↩This phrase is due to Robin Milner, though in the original context he uses it with a precise technical meaning, not as a general guiding principle for type systems. See Milner, Robin. “A theory of type polymorphism in programming.” Journal of computer and system sciences 17.3 (1978): 348-375.↩