Module 04: Syntax and semantics


> {-# LANGUAGE GADTs #-}
> import Data.Char   -- so you can use the 'isDigit' function later

Some examples

Example 1

<mirror> ::= '.'
           | 'L' <mirror> 'R'

Example strings that match <mirror>:


Example strings that do not match <mirror>:


Example 2

<tree> ::= '#'
         | '(' <tree> ')'
         | '(' <tree> <tree> ')'

Example strings that match <tree>:


Example strings that do not match <tree>:


Example 3

<digit>   ::= '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'
<natural> ::= <digit> | <digit> <natural>
<integer> ::= <natural> | '-' <natural>

Example strings that match <integer>:


Example strings that do not match <integer>:


The things in single quotes are usually called terminals, and the things in brackets are nonterminals. These sorts of definitions are known as (context-free) grammars, written in Backus-Naur Form or Backus Normal Form (BNF), named for John Backus and Peter Naur.

Now, back to your regularly scheduled grammars…

Technically this sort of + notation was not included in the original form of BNF, but it is a common extension.

Mirror, mirror

> {-
>    <mirror> ::= '.'
>               | 'L' <mirror> 'R'
> -}
> data Mirror where
>   Middle :: Mirror
>   Layer  :: Mirror -> Mirror
>   deriving Show
> prettyMirror :: Mirror -> String
> prettyMirror Middle    = "."
> prettyMirror (Layer m) = "L" ++ prettyMirror m ++ "R"
> parseMirror :: String -> (Mirror, String)
> parseMirror ('.' : rest) = (Middle, rest)
> parseMirror ('L' : rest) =
>   case parseMirror rest of
>     (m, 'R' : rest') -> (Layer m, rest')

BIN (aka Your First Language)

Consider the following BNF grammar:

<bin> ::= '#'
        | '(' <bin> <bin> ')'

We can give a semantics (meaning) to abstract Bin trees as follows: a leaf has value 1; the value of a branch with left and right subtrees is the value of the left subtree plus twice the value of the right subtree.

We can think of this as a little programming language for computing integers; let’s call it BIN. (Of course, BIN is not a very useful language, but don’t let it hear you say that because you might hurt its feelings.) For example, "((##)(##))" is a program to compute the value 9. parseBin is a parser for the language, and interp is an interpreter.


The language EBIN (which stands, of course, for extended BIN) is just like BIN except it also allows single digits, which can stand in for an entire tree. In other words, a digit can go anywhere a # can go in a BIN program. For example, "((4#)(#2))" is a valid EBIN program. Note that "((##)(42))" is also a valid EBIN program, which does not contain the number 42 but rather the single digits 4 and 2.

The digit n stands for a full binary tree of height n. For example,

and so on. Thus, EBIN is not a completely new, separate language, but just adds some syntax sugar to BIN. That is, single digits are just a convenient shorthand for more cumbersome but equivalent expressions in BIN. We define EBIN in terms of BIN, and we can think of EBIN programs as abbreviated BIN programs.

As an (optional) fun aside, make a conjecture about the value of EBIN programs consisting of a single digit. That is, what can you say about evalEBin "0", evalEBin "1", evalEBin "2", and so on? Can you prove your conjecture?