Module 10: Static types

This module is due at 1:15pm on Tuesday, October 11.

In this module, we will continue our exploration of type systems.

Once again, we will build on the Arith language from previous weeks.

Static type checking

We are going to add a new type checking phase in between parsing and interpreting. (Note, if you did the optional extension in the previous module to add extra operators like &&, >=, and so on, read this footnote.1) The type checker should adhere to the Central Dogma of Typechecking:

If type checking succeeds, then there should be no runtime errors.

Or, more pithily,

Well-typed programs don’t go wrong.2

(We will add some caveats to this later.)

Interpreter

Given a static type checker, we can do something very interesting with the interpreter. The whole idea of static type checking is that if a program successfully type checks, there will never be any type errors at runtime, so checking for them would be a complete waste of time.

One might think, however, that this runtime type checking is unavoidable: our programs can evaluate either to booleans or integers, so we need the Value type to represent the two possibilities. Then we have to pattern-match on Values all the time to make sure we have the right kind of value (even though we know that one of the cases cannot happen).

However, we can use a trick: we can make the interpreter just return Integer again, and encode boolean values as integers. For example, encode False as 0, and True as 1. When the interpreter is running, there is no way to tell whether a particular Integer value is supposed to represent an actual integer or a boolean—we say that the type information has been erased. This means that in theory, the interpreter could perform nonsensical operations such as adding 3 + True (resulting in 4). However, the interpreter will only ever run programs which successfully typecheck; if the typechecker is implemented correctly, then cases like this can never actually happen! Such errors would have been caught as type errors. (Ideally, this is something we could formally prove about our system.) This is one reason static type checking can lead to better runtime performance: we don’t have to carry around type information and constantly check types of values at runtime, which can remove a lot of runtime overhead (as well as create better opportunities for optimization).

Feedback


  1. If you desugar the new operators directly inside the parser, you will necessarily be running the type checker over the simpler, desugared syntax, which means that any type errors will refer to desugared programs. This is confusing for the programmer, because they may get errors about programs they did not write. For example, if the programmer writes True && 3, it will first desugar to if True then 3 else False, and they will then get a type error referring to this if-expression even though they never wrote any if. This is one good reason to have desugaring as a separate phase from parsing: first, the more complex language (the “surface language”) is parsed into an AST which can represent the complete syntax of the surface language; this is then typechecked, so any type errors refer to the actual program the programmer wrote; a desugaring function then translates the successfully typechecked AST into the simpler language (the “core language”); finally, the interpreter (or compiler, optimizer, etc.) can work directly with the simplified core language. Thankfully, in this case, writing a typechecker for the language extended with a bunch of extra operators is only slightly more work than writing a typechecker for the simpler language. The hard parts are things like if and let; adding more operators is relatively trivial.

  2. This phrase is due to Robin Milner, though in the original context he uses it with a precise technical meaning, not as a general guiding principle for type systems. See Milner, Robin. “A theory of type polymorphism in programming.” Journal of computer and system sciences 17.3 (1978): 348-375.