Martin Sulzmann
We consider a simple language of arithmetic expressions consisting of Integer and Boolean expressions.
We wish to represent this language in terms of a Haskell data type, so that we can exploit pattern matching to write functions that operate on this data type.
Specifically, we wish to write an evaluator (interpreter) and a type checker.
To specify the evaluator and type checker we will make use of
operational semantics rules
type checking rules
We show how to directly implement these rules in Haskell where we will make use of
data types
type classes
and introduces an extension of algebraic data types known as generalized algebraic data types (GADTs).
We first need to specify the syntax of our language of arithmetic expressions. From theoretical computer science, we recall context-free grammars.
Some context-free grammar rules to describe arithmetic and boolean expressions. For brevity, we only consider natural numbers 1 and 0.
E -> 1
E -> 0
E -> True
E -> False
E -> E + E
E -> E * E
E -> E || E
E -> (E)
We refer to E
as a non-terminal symbol. Terminal symbols
are 1
, 0
, etc.
The grammar is ambiguous as for 1 * 0 + 1
we find the
following two derivations.
(1)
E
-> E * E
-> 1 * E
-> 1 * E + E
-> 1 * 0 + E
-> 1 * 0 + 1
(2)
E
-> E + E
-> E + 1
-> E * E + 1
-> E * 0 + 1
-> 1 * 0 + 1
For each derivation, we also give its parse tree.
(1)
E
/|\
E * E
| /|\
1 E + E
| |
0 1
(2)
E
/|\
E + E
/|\ |
E * E 1
| |
1 0
As we can see parse trees are different. Hence, the above grammar is ambiguous.
The common rule to remove the above ambiguity is to assume that
*
binds tighter than +
. This is important
because we need to ensure that the meaning of 1 * 0 + 1
is
unambiguous.
We assume the following source syntax for arithmetic expressions where we make use of EBNF instead of context-free grammar rules.
E = E `+` E | E `+` E | E `<` E | E `|` E | i | t | f
where i
refers to some integer constant, t
refers to the always true (boolean) constant, and t
refers
to the always false (boolean) constant
The are two occurrences of the symbol “|”.
At the level of EBNF, we use “|” to represent the various forms of
arithmetic expressions. At the level of arithmetic expressions, we use
|
to represent boolean disjunction.
Why not using ||
? Possible, but using |
makes it easier to write a parser.
We seek for a representation of arithmetic expressions. As data types are perfect to represent tree-like structures such as parse trees, this is an easy task in Haskell. Instead of parse trees, we choose a more compact representation commonly referred to as the abstract syntax tree (AST). In the AST, there is no need to represent parentheses as the precedence among expressions is guaranteed by the tree structure.
Here is a Haskell data type representing the AST for arithmetic expressions.
We define the following short-hands.
Here is the Haskell AST representation for
1 * 0 + 1
.
What about the following?
The AST ex1
represents 1 * (0 + 1)
whereas
ex
represents 1 * 0 + 1
. Recall that in the
source syntax we assume that *
binds tighter than
+
. Hence, 1 * 0 + 1
corresponds to
(1 * 0) + 1
.
Here’s another example.
We will assume that comparison can only be performed among. We make use of operational semantics rules to specify this assumption.
The implementation provided comes with a parser that checks if the
input string matches the source syntax and converts the successfully
matched input string into the AST represented by the Haskell data type
E
.
We specify the (dynamic) semantics in terms of operational semantics rules where we make use of the natural deduction style notation.
Conventions. We write i
to refer to integer values and
b
to refer boolean values.
(O1) (N i) => i
(O2) (B b) => b
e1 => i1
e2 => i2
i=i1+i2
(O3) ------------------
(Plus e1 e2) => i
e1 => i1
e2 => i2
i=i1*i2
(O4) ------------------
(Mult e1 e2) => i
e1 => i1
e2 => i2
i1 < i2
(O5) ------------------
(LessThan e1 e2) => True
e1 => i1
e2 => i2
i1 >= i2
(O6) ------------------
(LessThan e1 e2) => False
e1 => True
(O7) ------------------
(Or e1 e2) => True
e1 => False
e2 => True
(O8) ------------------
(Or e1 e2) => True
e1 => False
e2 => False
(O9) ------------------
(Or e1 e2) => False
Points to note.
Rules (O5) and (O6) state that the arguments of LessThan
must be integer values.
Consider the expression LessThan (B True) (N 0)
. There
is no rule to evaluate this expressions. As another example, consider
the expression Or (B True) (N 0)
.
In case of Or
, we apply short-circuit
evaluation. If the left operand evaluates to true we immediately
return true. See rule (O7). Otherwise, we might need to evaluate the
left operand as well. See rules (O8) and (O9).
From the semantic description we can (almost) straightforwardly derive a Haskell interpreter for arithmetic expressions.
We represent =>
via the Haskell function
The Maybe
data type captures success (some result)
and failure (nothing)
The Either
data type captures that there can be two
possible (successful) outcomes. Either some integer value or some
boolean value.
Here are the details.
eval :: E -> Maybe (Either Int Bool)
eval (N i) = Just (Left i) -- (O1)
eval (B b) = Just (Right b) -- (O2)
eval (Plus e1 e2) = -- (O3)
let r1 = eval e1
r2 = eval e2
in case (r1, r2) of
(Just (Left i1), Just (Left i2)) -> Just (Left (i1 + i2))
(_,_) -> Nothing
eval (Mult e1 e2) = -- (O4)
let r1 = eval e1
r2 = eval e2
in case (r1, r2) of
(Just (Left i1), Just (Left i2)) -> Just (Left (i1 * i2))
(_,_) -> Nothing
eval (LessThan e1 e2) = -- (O5-6)
let r1 = eval e1
r2 = eval e2
in case (r1, r2) of
(Just (Left i1), Just (Left i2)) -> if i1 < i2
then Just (Right True)
else Just (Right False)
(_,_) -> Nothing
eval (Or e1 e2) = -- (O7-9)
case (eval e1) of
Nothing -> Nothing
(Just Left{}) -> Nothing
(Just (Right True)) -> Just (Right True)
_ -> case (eval e2) of
Nothing -> Nothing
(Just Left{}) -> Nothing
(Just (Right True)) -> Just (Right True)
_ -> Just (Right False)
Some examples.
where we find that
Point to note.
Expression ex5
is ill-typed. However, evaluation
succeeds. Why? [Hint: Short-circuit evaluation].
Before evaluation it is common to “simplify” expressions. We consider here a very simple form where we apply the algebraic law
0 * x = 0
Here is an attempt to carry out the above simplification step.
simp :: E -> E
simp e@N{} = e
simp e@B{} = e
simp (Plus e1 e2) = Plus (simp e1) (simp e2)
simp (Mult (N 0) _) = N 0
simp (Mult e1 e2) = Mult (simp e1) (simp e2)
simp (LessThan e1 e2) = LessThan (simp e1) (simp e2)
simp (Or e1 e2) = Or (simp e1) (simp e2)
Here are some examples.
-- (0 * 1) * 1
s1 = Mult (Mult zero one) one
{-
eval s1 => Just (Left 0)
simp s1 => Mult (N 0) (N 1)
simp (simp s1) => N 0
eval (simp s1) => Just (Left 0)
-}
-- 0 * (0 * False)
s2 = Mult zero (Or zero false)
{-
eval s2 => Nothing
simp s2 => N 0
eval (simp s2) => Just (Left 0)
-}
We have to exhaustively apply simp until no further simplifications are possible. See example s1. We apply a fixpoint construction to exhaustively apply simplifications.
Evaluation of the original and simplified expression ought to yield the same result. This is not necessarily the case. See example s2. We will implement a type checker to rule out ill-typed expressions.
simp
Next, we consider the task of static type checking. By static we mean that after parsing but before (dynamic) evaluation we check that expressions are well-typed.
An arithmetic expression is well-typed if the operands of Integer addition/multiplication are only Integer expressions and the operands of Boolean disjunction are only Boolean expressions. Unlike the above evaluation, there is no need to actually run (execute) expressions to decide if an expression is well-typed. Hence, we say that type checking is done statically.
The type checking rules follow the structure of the evaluation rules. The crucial difference is that we abstract away from any concrete values and only distinguish between an ill-typed expression, an expression of Integer type and an expression of Boolean type.
Conventions.
We write E |- T
to denote that expression E has type T
where T can be either Int or Bool.
(T1) (N i) |- Int
(T2) (B b) |- Bool
e1 |- Int
e2 |- Int
(T3) ------------------
(Plus e1 e2) |- Int
e1 |- Int
e2 |- Int
(T4) ------------------
(Mult e1 e2) |- Int
e1 |- Int
e2 |- Int
(T5) ------------------
(LessThan e1 e2) |- Bool
e1 |- Bool
e2 |- Bool
(T6) ------------------
(Or e1 e2) |- Bool
typecheck :: E -> Maybe Type
typecheck (N _) = Just TInt
typecheck (B _) = Just TBool
typecheck (Plus e1 e2) =
case (typecheck e1, typecheck e2) of
(Just TInt, Just TInt) -> Just TInt
(_, _) -> Nothing
typecheck (Mult e1 e2) =
case (typecheck e1, typecheck e2) of
(Just TInt, Just TInt) -> Just TInt
(_, _) -> Nothing
typecheck (LessThan e1 e2) =
case (typecheck e1, typecheck e2) of
(Just TInt, Just TInt) -> Just TBool
(_, _) -> Nothing
typecheck (Or e1 e2) =
case (typecheck e1, typecheck e2) of
(Just TBool, Just TBool) -> Just TBool
(_, _) -> Nothing
For example, we find the following
Our current evaluator has to perform some amount of type checking at run-time. For example, consider the following case.
eval (Mult e1 e2) = -- (O4)
let r1 = eval e1
r2 = eval e2
in case (r1, r2) of
(Just (Left i1), Just (Left i2)) -> Just (Left (i1 * i2))
(_,_) -> Nothing
We evaluate the left and right operands and via the pattern match “Just (Left …)” we effectively check that operands yield an Integer value.
For well-typed expressions, it is possible to optimize our evaluator. There is no need to cover cases that involve “Just” and “Nothing”.
evalT :: E -> Either Int Bool
evalT (N i) = Left i
evalT (B b) = Right b
evalT (Plus e1 e2) =
case (evalT e1, evalT e2) of
(Left i1, Left i2) -> Left (i1 + i2)
evalT (Mult e1 e2) =
case (evalT e1, evalT e2) of
(Left i1, Left i2) -> Left (i1 * i2)
evalT (LessThan e1 e2) =
case (evalT e1, evalT e2) of
(Left i1, Left i2) -> Right (i1 < i2)
evalT (Or e1 e2) =
case (evalT e1) of
Right True -> Right True
_ -> evalT e2
Of course, we shall apply evalT
only on well-typed
expressions. Hence, before calling evalT
we need to call
typecheck
first.
In Haskell, we can enforce well-typing of expressions when building ASTs. This is possible with the help of generalized algebraic data types (GADTs).
Here’s a GADT that enforces well-typing of arithmetic AST expressions.
data EE a where
N_EE :: Int -> EE Int
B_EE :: Bool -> EE Bool
Plus_EE :: EE Int -> EE Int -> EE Int
Mult_EE :: EE Int -> EE Int -> EE Int
LessThan_EE :: EE Int -> EE Int -> EE Bool
Or_EE :: EE Bool -> EE Bool -> EE Bool
For example, via constructor Plus_EE
we build Integer
expressions composed of addition. The arguments of the constructor
demand that left and right operands must be well-typed. The result is a
well-typed Integer expressions. A similar condition is enforced via
constructor Or_EE
for Boolean expressions.
As we can see, GADTs are a form of a parametric data type where the parameter instance may vary for each constructor. Pattern matching proceeds as before.
Based on GADTs, evaluation of well-typed (by construction) expressions is defined as follows.
evalEE :: EE a -> a
evalEE (N_EE i) = i
evalEE (B_EE b) = b
evalEE (Plus_EE e1 e2) = evalEE e1 + evalEE e2
evalEE (Mult_EE e1 e2) = evalEE e1 * evalEE e2
evalEE (LessThan_EE e1 e2)
| evalEE e1 < evalEE e2 = True
| otherwise = False
evalEE (Or_EE e1 e2) = evalEE e1 || evalEE e2
The type
states that any well-typed expression of type a
yields a
value of type a
. Our GADT covers Integer and Boolean
expressions. Hence, this statement applies to Int
and
Bool
.
Type checking and operational semantics rules to specify the static and dynamic semantics of programming languages
Haskell to implement such rules for the case of arithmetic expressions
Difference between static and dynamic type checking
Simple (compiler) optimizations such as algebraic simplifications