Martin Sulzmann
We consider a simple language of arithmetic expressions consisting of Integer and Boolean expressions.
We wish to represent this language in terms of a Haskell data type, so that we can exploit pattern matching to write functions that operate on this data type.
Specifically, we wish to write an evaluator (interpreter) and a type checker.
We will make use of
data types
type classes
and introduces an extension of algebraic data types known as generalized algebraic data types (GADTs).
We first need to specify the syntax of our language of arithmetic expressions. From theoretical computer science, we recall context-free grammars.
Some context-free grammar rules to describe arithmetic and boolean expressions. For brevity, we only consider natural numbers 1 and 0.
E -> 1
E -> 0
E -> True
E -> False
E -> E + E
E -> E * E
E -> E || E
E -> (E)
We refer to E as a non-terminal symbol. Terminal symbols
are 1, 0, etc.
The grammar is ambiguous as for 1 * 0 + 1 we find the
following two derivations.
(1)
E
-> E * E
-> 1 * E
-> 1 * E + E
-> 1 * 0 + E
-> 1 * 0 + 1
(2)
E
-> E + E
-> E + 1
-> E * E + 1
-> E * 0 + 1
-> 1 * 0 + 1
For each derivation, we also give its parse tree.
(1)
E
/|\
E * E
| /|\
1 E + E
| |
0 1
(2)
E
/|\
E + E
/|\ |
E * E 1
| |
1 0
As we can see parse trees are different. Hence, the above grammar is ambiguous.
The common rule to remove the above ambiguity is to assume that
* binds tigher than +. This is important
because we need to ensure that the meaning of 1 * 0 + 1 is
unambiguous.
We seek for a representation of arithemtic expressions. As data types are perfect to represent tree-like structures such as parse trees, this is an easy task in Haskell. Instead of parse trees, we choose a more compact representation commonly referred to as the abstract syntax tree (AST). In the AST, there is no need to represent parantheses as the precedence among expressions is guaranteed by the tree structure.
Here is a Haskell data type representing the AST for arithmetic expressions.
Here is the Haskell AST representation for 1 * 0 + 1
What about the following?
The AST ex1 represents 1 * (0 + 1) whereas
ex represents 1 * 0 + 1. Recall that in the
source syntax we assume that * binds tighter than
+. Hence, 1 * 0 + 1 corresponds to
(1 * 0) + 1.
We consider pretty printing of ASTs. Each AST shall be displayed in terms of the original source language which is a string.
We provide for a simple pretty printer via the following instance of
type class Show.
instance Show E where
show One = "1"
show Zero = "0"
show ETrue = "true"
show EFalse = "false"
show (Plus e1 e2) = "(" ++ show e1 ++ " + " ++ show e2 ++ ")"
show (Mult e1 e2) = "(" ++ show e1 ++ " * " ++ show e2 ++ ")"
show (Or e1 e2) = "(" ++ show e1 ++ " || " ++ show e2 ++ ")"We find that
Pretty-printing could be improved by removing as many parantheses as
possible. We cannot remove all parantheses as otherwise the meaning of
the expression changes in some unexpected way. Consider ex1
where the innermost parantheses are necessary.
We implement an evaluator (interpreter) for arithmetic expressions. There are a few things we need to take care of.
We assume that Integer and Boolean expressions cannot be mixed. For example, when trying to evaluate
we get stuck. That means, evaluation may return a result or nothing.
This can be represented via the Maybe data type.
The result of evaluation may either be an Integer or Boolean value.
This can be represented via the Either data type.
The evaluator traverses over the structure of the AST. We evaluate operands (from left to right) and then combine the result. This can elegantly expressed via pattern matching.
eval :: E -> Maybe (Either Int Bool)
eval One = Just (Left 1)
eval Zero = Just (Left 0)
eval ETrue = Just (Right True)
eval EFalse = Just (Right False)
eval (Plus e1 e2) =
let r1 = eval e1
r2 = eval e2
in case (r1, r2) of
(Just (Left i1), Just (Left i2)) -> Just (Left (i1 + i2))
(_,_) -> Nothing
eval (Mult e1 e2) =
let r1 = eval e1
r2 = eval e2
in case (r1, r2) of
(Just (Left i1), Just (Left i2)) -> Just (Left (i1 * i2))
(_,_) -> Nothing
eval (Or e1 e2) =
case (eval e1) of
Nothing -> Nothing
(Just Left{}) -> Nothing
(Just (Right True)) -> Just (Right True)
_ -> case (eval e2) of
Nothing -> Nothing
(Just Left{}) -> Nothing
(Just (Right True)) -> Just (Right True)
_ -> Just (Right False)In case of Or, we apply short-circuit
evaluation. If the left operand evaluates to true we immediately
return true. Otherwise, we might need to evaluate the left operand as
well.
Take note. We first check for Nothing and then
(Just Left{}).
Nothing indicates evaluation got stuck.
(Just Left{}) indicates operand e1 evaluates to an
Integer expression and therefore we’re stuck as well.
Consider
-- 1 + true
ex2 = Plus One ETrue
-- false || true
ex3 = Or EFalse ETrue
-- false || 1
ex4 = Or EFalse One
-- true || 1
ex5 = Or ETrue Onewhere we find that
Next, we consider the task of type checking. The purpose of type checking is to ensure that operands and operations are compatible.
An arithmetic expression is well-typed if the operands of Integer addition/multiplication are only Integer expressions and the operands of Boolean disjunction are only Boolean expressions. Unlike the above evaluation, there is no need to actually run (execute) expressions to decide if an expression is well-typed. Hence, we say that type checking is done statically.
The type checking rules follow the structure of the evaluation rules. The crucial difference is that we abstract away from any concrete values and only distinguish between an ill-typed expression, an expression of Integer type and an expression of Boolean type.
data Type = TInt | TBool deriving Show
typecheck :: E -> Maybe Type
typecheck Zero = Just TInt
typecheck One = Just TInt
typecheck ETrue = Just TBool
typecheck EFalse = Just TBool
typecheck (Plus e1 e2) =
case (typecheck e1, typecheck e2) of
(Just TInt, Just TInt) -> Just TInt
(_, _) -> Nothing
typecheck (Mult e1 e2) =
case (typecheck e1, typecheck e2) of
(Just TInt, Just TInt) -> Just TInt
(_, _) -> Nothing
typecheck (Or e1 e2) =
case (typecheck e1, typecheck e2) of
(Just TBool, Just TBool) -> Just TBool
(_, _) -> NothingFor the above examples, we find that
typecheck ex2 ==> Nothing
typecheck ex3 ==> Just TBool
typecheck ex4 ==> Nothing
typecheck ex5 ==> NothingNotice that evaluation of ex4 succeeds but type checking
reports that ex4 is ill-typed.
Our current evaluator has to perform some amount of type checking at run-time. For example, consider the following case.
eval (Mult e1 e2) =
let r1 = eval e1
r2 = eval e2
in case (r1, r2) of
(Just (Left i1), Just (Left i2)) -> Just (Left (i1 * i2))
(_,_) -> NothingWe evaluate the left and right operands and via the pattern match “Just (Left …)” we effectively check that operands yield an Integer value.
For well-typed expressions, it is possible to optimize our evaluator. There is no need to cover cases that involve “Just” and “Nothing”.
evalT :: E -> Either Int Bool
evalT One = Left 1
evalT Zero = Left 0
evalT ETrue = Right True
evalT EFalse = Right False
evalT (Plus e1 e2) =
case (evalT e1, evalT e2) of
(Left i1, Left i2) -> Left (i1 + i2)
evalT (Mult e1 e2) =
case (evalT e1, evalT e2) of
(Left i1, Left i2) -> Left (i1 * i2)
evalT (Or e1 e2) =
case (evalT e1) of
Right True -> Right True
_ -> evalT e2Of course, we shall apply evalT only on well-typed
expressions. Hence, before calling evalT we need to call
typecheck first.
We consider the task of simplifying expressions based on some algebraic laws. For example, we wish to enforce that law that “0 * x = 0”. This law can be enforced statically (and therefore might improve the run-time performance when evaluationg expressions).
Consider
simp :: E -> E
simp One = One
simp Zero = Zero
simp ETrue = ETrue
simp EFalse = EFalse
simp (Plus e1 e2) = Plus (simp e1) (simp e2)
simp (Mult Zero _) = Zero
simp (Mult e1 e2) = Mult (simp e1) (simp e2)
simp (Or e1 e2) = Or e1 e2What do we observe when simplifying the following expressions?
There are two issues.
The first issue is that s2 is ill-typed but the
expression resulting from simp s2 is well-typed. The second
issue is that simplification is not exhaustive. For example,
simp s1 yields Mult Zero One which could be
further simplified.
To cope with the first issue, we simply need to check if expressions are well-typed before simplification. To cope with the second issue, we repeatidly apply simplifications until no further simplifications are possible.
For example, simpFix s1 yields Zero.
In Haskell, we can enforce well-typing of expressions when building ASTs. This is possible with the help of generalized algebraic data types (GADTs).
Here’s a GADT that enforces well-typing of arithmetic AST expressions.
data Exp a where
One_Exp :: Exp Int
Zero_Exp :: Exp Int
ETrue_Exp :: Exp Bool
EFalse_Exp :: Exp Bool
Plus_Exp :: Exp Int -> Exp Int -> Exp Int
Mult_Exp :: Exp Int -> Exp Int -> Exp Int
Or_Exp :: Exp Bool -> Exp Bool -> Exp BoolFor example, via constructor Plus_Exp we build Integer
expressions composed of addition. The arguments of the constructor
demand that left and right operands must be well-typed. The result is a
well-typed Integer expressions. A similar condition is enforced via
constructor Or_Exp for Boolean expressions.
As we can see, GADTs are a form of a parametric data type where the parameter instance may vary for each constructor. Pattern matching proceeds as before.
Based on GADTs, evaluation of well-typed (by construction) expressions is defined as follows.
evalExp :: Exp a -> a
evalExp One_Exp = 1
evalExp Zero_Exp = 1
evalExp ETrue_Exp = True
evalExp EFalse_Exp = False
evalExp (Plus_Exp e1 e2) = evalExp e1 + evalExp e2
evalExp (Mult_Exp e1 e2) = evalExp e1 * evalExp e2
evalExp (Or_Exp e1 e2) = evalExp e1 || evalExp e2The type
states that any well-typed expression of type a yields a
value of type a. Our GADT covers Integer and Boolean
expressons. Hence, this statement applies to Int and
Bool.
{-# LANGUAGE GADTs #-}
-- Here's the AST for our example.
data E = One | Zero | ETrue | EFalse
| Plus E E | Mult E E
| Or E E deriving Eq
-- 1 * 0 + 1
ex = Plus (Mult One Zero) One
-- The AST for 1 * (0 + 1) equals
ex1 = Mult One (Plus Zero One)
-- Point to note. Parantheses are not required in the AST representation,
-- as they are captured by the AST structure.
-- Further examples.
-- 1 + true
ex2 = Plus One ETrue
-- false || true
ex3 = Or EFalse ETrue
-- false || 1
ex4 = Or EFalse One
-- true || 1
ex5 = Or ETrue One
-- Pretty printing
instance Show E where
show One = "1"
show Zero = "0"
show ETrue = "true"
show EFalse = "false"
show (Plus e1 e2) = "(" ++ show e1 ++ " + " ++ show e2 ++ ")"
show (Mult e1 e2) = "(" ++ show e1 ++ " * " ++ show e2 ++ ")"
show (Or e1 e2) = "(" ++ show e1 ++ " || " ++ show e2 ++ ")"
-- Task 1: Let's write an evaluater for our litte language.
eval :: E -> Maybe (Either Int Bool)
eval One = Just (Left 1)
eval Zero = Just (Left 0)
eval ETrue = Just (Right True)
eval EFalse = Just (Right False)
eval (Plus e1 e2) =
let r1 = eval e1
r2 = eval e2
in case (r1, r2) of
(Just (Left i1), Just (Left i2)) -> Just (Left (i1 + i2))
(_,_) -> Nothing
eval (Mult e1 e2) =
let r1 = eval e1
r2 = eval e2
in case (r1, r2) of
(Just (Left i1), Just (Left i2)) -> Just (Left (i1 * i2))
(_,_) -> Nothing
eval (Or e1 e2) =
case (eval e1) of
Nothing -> Nothing
(Just Left{}) -> Nothing
(Just (Right True)) -> Just (Right True)
_ -> case (eval e2) of
Nothing -> Nothing
(Just Left{}) -> Nothing
(Just (Right True)) -> Just (Right True)
_ -> Just (Right False)
-- We apply short-circuit evaluation!
-- The outer don't care pattern "_" covers the case "Just (Right False)".
-- The same holds for the inner don't care pattern
-- NOTE: We need to check for "Nothing" and "(Just Left{})" before.
-- Nothing indicates evaluation got stuck.
-- (Just Left{}) says the operand e1 evaluates to an Integer expression and
-- therefore we're stuck.
-- Task 2: Let's write a type checker.
data Type = TInt | TBool deriving Show
typecheck :: E -> Maybe Type
typecheck Zero = Just TInt
typecheck One = Just TInt
typecheck ETrue = Just TBool
typecheck EFalse = Just TBool
typecheck (Plus e1 e2) =
case (typecheck e1, typecheck e2) of
(Just TInt, Just TInt) -> Just TInt
(_, _) -> Nothing
typecheck (Mult e1 e2) =
case (typecheck e1, typecheck e2) of
(Just TInt, Just TInt) -> Just TInt
(_, _) -> Nothing
typecheck (Or e1 e2) =
case (typecheck e1, typecheck e2) of
(Just TBool, Just TBool) -> Just TBool
(_, _) -> Nothing
-- We can now write an evaluator for well-typed expressions.
-- (1) There's no need to check for ill-typed expressions
-- at run-time anymore!
-- (2) The below patterns do not cover all cases.
-- (3) This is not an issue, assuming we call evalT
-- with a well-typed expression only.
evalT :: E -> Either Int Bool
evalT One = Left 1
evalT Zero = Left 0
evalT ETrue = Right True
evalT EFalse = Right False
evalT (Plus e1 e2) =
case (evalT e1, evalT e2) of
(Left i1, Left i2) -> Left (i1 + i2)
evalT (Mult e1 e2) =
case (evalT e1, evalT e2) of
(Left i1, Left i2) -> Left (i1 * i2)
evalT (Or e1 e2) =
case (evalT e1) of
Right True -> Right True
_ -> evalT e2
-- With generalized algebraic data types (GADTs),
-- we can encode the typing rules in the construction of values.
-- So by construction, we can only ever generate well-typed expressions!
data Exp a where
One_Exp :: Exp Int
Zero_Exp :: Exp Int
ETrue_Exp :: Exp Bool
EFalse_Exp :: Exp Bool
Plus_Exp :: Exp Int -> Exp Int -> Exp Int
Mult_Exp :: Exp Int -> Exp Int -> Exp Int
Or_Exp :: Exp Bool -> Exp Bool -> Exp Bool
test1 :: Exp Int
test1 = Plus_Exp (Mult_Exp One_Exp Zero_Exp) One_Exp
-- will not type check
-- test2 = Or_Exp One_Exp ETrue_Exp
-- We can write an evaluator for the GADT.
-- Look at the type of the evaluator!
-- Unlike evalT, evalExp accepts only well-typed expressions.
evalExp :: Exp a -> a
evalExp One_Exp = 1
evalExp Zero_Exp = 1
evalExp ETrue_Exp = True
evalExp EFalse_Exp = False
evalExp (Plus_Exp e1 e2) = evalExp e1 + evalExp e2
evalExp (Mult_Exp e1 e2) = evalExp e1 * evalExp e2
evalExp (Or_Exp e1 e2) = evalExp e1 || evalExp e2
-- Task 3: Let's write a simplifyer for expressions where
-- we apply the law that 0 * x = 0
-- We use E.
simp :: E -> E
simp One = One
simp Zero = Zero
simp ETrue = ETrue
simp EFalse = EFalse
simp (Plus e1 e2) = Plus (simp e1) (simp e2)
simp (Mult Zero _) = Zero
simp (Mult e1 e2) = Mult (simp e1) (simp e2)
simp (Or e1 e2) = Or e1 e2
-- Consider the following two examples.
-- What do you observe before/after simplification?
-- (0 * 1) * 1
s1 = Mult (Mult Zero One) One
-- 0 * (0 * False)
s2 = Mult Zero (Or Zero EFalse)
-- Assumes we have defined equality among ASTs.
simpFix :: E -> E
simpFix e = let e2 = simp e
in if e2 == e then e
else simpFix e2