Haskell to implement other languages

Martin Sulzmann

Overview

We consider a simple language of arithmetic expressions consisting of Integer and Boolean expressions.

We wish to represent this language in terms of a Haskell data type, so that we can exploit pattern matching to write functions that operate on this data type.

Specifically, we wish to write an evaluator (interpreter) and a type checker.

We will make use of

and introduces an extension of algebraic data types known as generalized algebraic data types (GADTs).

Syntax

We first need to specify the syntax of our language of arithmetic expressions. From theoretical computer science, we recall context-free grammars.

Some context-free grammar rules to describe arithmetic and boolean expressions. For brevity, we only consider natural numbers 1 and 0.

E -> 1
E -> 0
E -> True
E -> False
E -> E + E
E -> E * E
E -> E || E
E -> (E)

We refer to E as a non-terminal symbol. Terminal symbols are 1, 0, etc.

The grammar is ambiguous as for 1 * 0 + 1 we find the following two derivations.

(1)

   E
-> E * E
-> 1 * E
-> 1 * E + E
-> 1 * 0 + E
-> 1 * 0 + 1

(2)

   E
-> E + E
-> E + 1
-> E * E + 1
-> E * 0 + 1
-> 1 * 0 + 1

For each derivation, we also give its parse tree.

(1)

             E
            /|\
           E * E
           |  /|\
           1 E + E
             |   |
             0   1

(2)

             E
            /|\
           E + E
          /|\  |
         E * E 1
         |   |
         1   0

As we can see parse trees are different. Hence, the above grammar is ambiguous.

The common rule to remove the above ambiguity is to assume that * binds tigher than +. This is important because we need to ensure that the meaning of 1 * 0 + 1 is unambiguous.

Haskell data type for expressions (syntax)

We seek for a representation of arithemtic expressions. As data types are perfect to represent tree-like structures such as parse trees, this is an easy task in Haskell. Instead of parse trees, we choose a more compact representation commonly referred to as the abstract syntax tree (AST). In the AST, there is no need to represent parantheses as the precedence among expressions is guaranteed by the tree structure.

Here is a Haskell data type representing the AST for arithmetic expressions.

data E = One | Zero | ETrue | EFalse
       | Plus E E | Mult E E
       | Or E E

Here is the Haskell AST representation for 1 * 0 + 1

ex = Plus (Mult One Zero) One

What about the following?

ex1 = Mult Zero (Plus Zero One)

The AST ex1 represents 1 * (0 + 1) whereas ex represents 1 * 0 + 1. Recall that in the source syntax we assume that * binds tighter than +. Hence, 1 * 0 + 1 corresponds to (1 * 0) + 1.

Pretty printing

We consider pretty printing of ASTs. Each AST shall be displayed in terms of the original source language which is a string.

We provide for a simple pretty printer via the following instance of type class Show.

instance Show E where
    show One = "1"
    show Zero = "0"
    show ETrue = "true"
    show EFalse = "false"
    show (Plus e1 e2) = "(" ++ show e1 ++ " + " ++ show e2 ++ ")"
    show (Mult e1 e2) = "(" ++ show e1 ++ " * " ++ show e2 ++ ")"
    show (Or e1 e2) = "(" ++ show e1 ++ " || " ++ show e2 ++ ")"

We find that

show ex   ==>   "((1 * 0) + 1)"

show ex1  ==>   "(1 * (0 + 1))"

Pretty-printing could be improved by removing as many parantheses as possible. We cannot remove all parantheses as otherwise the meaning of the expression changes in some unexpected way. Consider ex1 where the innermost parantheses are necessary.

Evaluator (interpreter)

We implement an evaluator (interpreter) for arithmetic expressions. There are a few things we need to take care of.

We assume that Integer and Boolean expressions cannot be mixed. For example, when trying to evaluate

-- 1 + true
ex2 = Plus One ETrue

we get stuck. That means, evaluation may return a result or nothing. This can be represented via the Maybe data type.

The result of evaluation may either be an Integer or Boolean value. This can be represented via the Either data type.

The evaluator traverses over the structure of the AST. We evaluate operands (from left to right) and then combine the result. This can elegantly expressed via pattern matching.

eval :: E -> Maybe (Either Int Bool)
eval One = Just (Left 1)
eval Zero = Just (Left 0)
eval ETrue = Just (Right True)
eval EFalse = Just (Right False)
eval (Plus e1 e2) =
     let r1 = eval e1
         r2 = eval e2
     in case (r1, r2) of
         (Just (Left i1), Just (Left i2)) -> Just (Left (i1 + i2))
         (_,_)  -> Nothing
eval (Mult e1 e2) =
     let r1 = eval e1
         r2 = eval e2
     in case (r1, r2) of
         (Just (Left i1), Just (Left i2)) -> Just (Left (i1 * i2))
         (_,_)  -> Nothing
eval (Or e1 e2) =
     case (eval e1) of
        Nothing -> Nothing
        (Just Left{}) -> Nothing
        (Just (Right True)) -> Just (Right True)
        _                   -> case (eval e2) of
                                 Nothing -> Nothing
                                 (Just Left{}) -> Nothing
                                 (Just (Right True)) -> Just (Right True)
                                 _                   -> Just (Right False)

In case of Or, we apply short-circuit evaluation. If the left operand evaluates to true we immediately return true. Otherwise, we might need to evaluate the left operand as well.

Take note. We first check for Nothing and then (Just Left{}).

Consider

-- 1 + true
ex2 = Plus One ETrue

-- false || true
ex3 = Or EFalse ETrue

-- false || 1
ex4 = Or EFalse One

-- true || 1
ex5 = Or ETrue One

where we find that

eval ex2      ==>   Nothing

eval ex3      ==>   Just (Right True)

eval ex4      ==>   Nothing

eval ex5      ==>   Just (Right True)

Static type-checker

Next, we consider the task of type checking. The purpose of type checking is to ensure that operands and operations are compatible.

An arithmetic expression is well-typed if the operands of Integer addition/multiplication are only Integer expressions and the operands of Boolean disjunction are only Boolean expressions. Unlike the above evaluation, there is no need to actually run (execute) expressions to decide if an expression is well-typed. Hence, we say that type checking is done statically.

The type checking rules follow the structure of the evaluation rules. The crucial difference is that we abstract away from any concrete values and only distinguish between an ill-typed expression, an expression of Integer type and an expression of Boolean type.

data Type = TInt | TBool deriving Show

typecheck :: E -> Maybe Type
typecheck Zero = Just TInt
typecheck One = Just TInt
typecheck ETrue = Just TBool
typecheck EFalse = Just TBool
typecheck (Plus e1 e2) =
     case (typecheck e1, typecheck e2) of
       (Just TInt, Just TInt) -> Just TInt
       (_, _) -> Nothing
typecheck (Mult e1 e2) =
     case (typecheck e1, typecheck e2) of
       (Just TInt, Just TInt) -> Just TInt
       (_, _) -> Nothing
typecheck (Or e1 e2) =
     case (typecheck e1, typecheck e2) of
       (Just TBool, Just TBool) -> Just TBool
       (_, _) -> Nothing

For the above examples, we find that

typecheck ex2        ==>     Nothing

typecheck ex3        ==>     Just TBool

typecheck ex4        ==>     Nothing

typecheck ex5        ==>     Nothing

Notice that evaluation of ex4 succeeds but type checking reports that ex4 is ill-typed.

Evaluation of well-typed expressions

Our current evaluator has to perform some amount of type checking at run-time. For example, consider the following case.

eval (Mult e1 e2) =
     let r1 = eval e1
         r2 = eval e2
     in case (r1, r2) of
         (Just (Left i1), Just (Left i2)) -> Just (Left (i1 * i2))
         (_,_)  -> Nothing

We evaluate the left and right operands and via the pattern match “Just (Left …)” we effectively check that operands yield an Integer value.

For well-typed expressions, it is possible to optimize our evaluator. There is no need to cover cases that involve “Just” and “Nothing”.

evalT :: E -> Either Int Bool
evalT One = Left 1
evalT Zero = Left 0
evalT ETrue = Right True
evalT EFalse = Right False
evalT (Plus e1 e2) =
    case (evalT e1, evalT e2) of
       (Left i1, Left i2) -> Left (i1 + i2)
evalT (Mult e1 e2) =
    case (evalT e1, evalT e2) of
       (Left i1, Left i2) -> Left (i1 * i2)
evalT (Or e1 e2) =
     case (evalT e1) of
       Right True -> Right True
       _          -> evalT e2

Of course, we shall apply evalT only on well-typed expressions. Hence, before calling evalT we need to call typecheck first.

Simplification

We consider the task of simplifying expressions based on some algebraic laws. For example, we wish to enforce that law that “0 * x = 0”. This law can be enforced statically (and therefore might improve the run-time performance when evaluationg expressions).

Consider

simp :: E -> E
simp One = One
simp Zero = Zero
simp ETrue = ETrue
simp EFalse = EFalse
simp (Plus e1 e2) = Plus (simp e1) (simp e2)
simp (Mult Zero _) = Zero
simp (Mult e1 e2) = Mult (simp e1) (simp e2)
simp (Or e1 e2) = Or e1 e2

What do we observe when simplifying the following expressions?

-- (0 * 1) * 1
s1 = Mult (Mult Zero One) One

-- 0 * (0 * False)
s2 = Mult Zero (Or Zero EFalse)

There are two issues.

The first issue is that s2 is ill-typed but the expression resulting from simp s2 is well-typed. The second issue is that simplification is not exhaustive. For example, simp s1 yields Mult Zero One which could be further simplified.

To cope with the first issue, we simply need to check if expressions are well-typed before simplification. To cope with the second issue, we repeatidly apply simplifications until no further simplifications are possible.

simpFix :: E -> E
simpFix e = let e2 = simp e
            in if e2 == e then e
               else simpFix e2

For example, simpFix s1 yields Zero.

Enforcing well-typing by construction via GADTs

In Haskell, we can enforce well-typing of expressions when building ASTs. This is possible with the help of generalized algebraic data types (GADTs).

Here’s a GADT that enforces well-typing of arithmetic AST expressions.

data Exp a where
   One_Exp :: Exp Int
   Zero_Exp :: Exp Int
   ETrue_Exp :: Exp Bool
   EFalse_Exp :: Exp Bool
   Plus_Exp :: Exp Int -> Exp Int -> Exp Int
   Mult_Exp :: Exp Int -> Exp Int -> Exp Int
   Or_Exp :: Exp Bool -> Exp Bool -> Exp Bool

For example, via constructor Plus_Exp we build Integer expressions composed of addition. The arguments of the constructor demand that left and right operands must be well-typed. The result is a well-typed Integer expressions. A similar condition is enforced via constructor Or_Exp for Boolean expressions.

As we can see, GADTs are a form of a parametric data type where the parameter instance may vary for each constructor. Pattern matching proceeds as before.

Based on GADTs, evaluation of well-typed (by construction) expressions is defined as follows.

evalExp :: Exp a -> a
evalExp One_Exp = 1
evalExp Zero_Exp = 1
evalExp ETrue_Exp = True
evalExp EFalse_Exp = False
evalExp (Plus_Exp e1 e2) = evalExp e1 + evalExp e2
evalExp (Mult_Exp e1 e2) = evalExp e1 * evalExp e2
evalExp (Or_Exp e1 e2) = evalExp e1 || evalExp e2

The type

evalExp :: Exp a -> a

states that any well-typed expression of type a yields a value of type a. Our GADT covers Integer and Boolean expressons. Hence, this statement applies to Int and Bool.

Complete source code


{-# LANGUAGE GADTs #-}


-- Here's the AST for our example.


data E = One | Zero | ETrue | EFalse
       | Plus E E | Mult E E
       | Or E E deriving Eq


-- 1 * 0 + 1
ex = Plus (Mult One Zero) One

-- The AST for 1 * (0 + 1) equals
ex1 = Mult One (Plus Zero One)

-- Point to note. Parantheses are not required in the AST representation,
-- as they are captured by the AST structure.

-- Further examples.

-- 1 + true
ex2 = Plus One ETrue

-- false || true
ex3 = Or EFalse ETrue

-- false || 1
ex4 = Or EFalse One

-- true || 1
ex5 = Or ETrue One


-- Pretty printing

instance Show E where
    show One = "1"
    show Zero = "0"
    show ETrue = "true"
    show EFalse = "false"
    show (Plus e1 e2) = "(" ++ show e1 ++ " + " ++ show e2 ++ ")"
    show (Mult e1 e2) = "(" ++ show e1 ++ " * " ++ show e2 ++ ")"
    show (Or e1 e2) = "(" ++ show e1 ++ " || " ++ show e2 ++ ")"




-- Task 1: Let's write an evaluater for our litte language.

eval :: E -> Maybe (Either Int Bool)
eval One = Just (Left 1)
eval Zero = Just (Left 0)
eval ETrue = Just (Right True)
eval EFalse = Just (Right False)
eval (Plus e1 e2) =
     let r1 = eval e1
         r2 = eval e2
     in case (r1, r2) of
         (Just (Left i1), Just (Left i2)) -> Just (Left (i1 + i2))
         (_,_)  -> Nothing
eval (Mult e1 e2) =
     let r1 = eval e1
         r2 = eval e2
     in case (r1, r2) of
         (Just (Left i1), Just (Left i2)) -> Just (Left (i1 * i2))
         (_,_)  -> Nothing
eval (Or e1 e2) =
     case (eval e1) of
        Nothing -> Nothing
        (Just Left{}) -> Nothing
        (Just (Right True)) -> Just (Right True)
        _                   -> case (eval e2) of
                                 Nothing -> Nothing
                                 (Just Left{}) -> Nothing
                                 (Just (Right True)) -> Just (Right True)
                                 _                   -> Just (Right False)
       -- We apply short-circuit evaluation!
       -- The outer don't care pattern "_" covers the case "Just (Right False)".
       -- The same holds for the inner don't care pattern
       -- NOTE: We need to check for "Nothing" and "(Just Left{})" before.
       --  Nothing indicates evaluation got stuck.
       --  (Just Left{}) says the operand e1 evaluates to an Integer expression and
       --  therefore we're stuck.

-- Task 2: Let's write a type checker.

data Type = TInt | TBool deriving Show

typecheck :: E -> Maybe Type
typecheck Zero = Just TInt
typecheck One = Just TInt
typecheck ETrue = Just TBool
typecheck EFalse = Just TBool
typecheck (Plus e1 e2) =
     case (typecheck e1, typecheck e2) of
       (Just TInt, Just TInt) -> Just TInt
       (_, _) -> Nothing
typecheck (Mult e1 e2) =
     case (typecheck e1, typecheck e2) of
       (Just TInt, Just TInt) -> Just TInt
       (_, _) -> Nothing
typecheck (Or e1 e2) =
     case (typecheck e1, typecheck e2) of
       (Just TBool, Just TBool) -> Just TBool
       (_, _) -> Nothing

-- We can now write an evaluator for well-typed expressions.
-- (1) There's no need to check for ill-typed expressions
--    at run-time anymore!
-- (2) The below patterns do not cover all cases.
-- (3) This is not an issue, assuming we call evalT
-- with a well-typed expression only.

evalT :: E -> Either Int Bool
evalT One = Left 1
evalT Zero = Left 0
evalT ETrue = Right True
evalT EFalse = Right False
evalT (Plus e1 e2) =
    case (evalT e1, evalT e2) of
       (Left i1, Left i2) -> Left (i1 + i2)
evalT (Mult e1 e2) =
    case (evalT e1, evalT e2) of
       (Left i1, Left i2) -> Left (i1 * i2)
evalT (Or e1 e2) =
     case (evalT e1) of
       Right True -> Right True
       _          -> evalT e2



-- With generalized algebraic data types (GADTs),
-- we can encode the typing rules in the construction of values.
-- So by construction, we can only ever generate well-typed expressions!


data Exp a where
   One_Exp :: Exp Int
   Zero_Exp :: Exp Int
   ETrue_Exp :: Exp Bool
   EFalse_Exp :: Exp Bool
   Plus_Exp :: Exp Int -> Exp Int -> Exp Int
   Mult_Exp :: Exp Int -> Exp Int -> Exp Int
   Or_Exp :: Exp Bool -> Exp Bool -> Exp Bool


test1 :: Exp Int
test1 = Plus_Exp (Mult_Exp One_Exp Zero_Exp) One_Exp

-- will not type check
-- test2 = Or_Exp One_Exp ETrue_Exp

-- We can write an evaluator for the GADT.
-- Look at the type of the evaluator!
-- Unlike evalT, evalExp accepts only well-typed expressions.

evalExp :: Exp a -> a
evalExp One_Exp = 1
evalExp Zero_Exp = 1
evalExp ETrue_Exp = True
evalExp EFalse_Exp = False
evalExp (Plus_Exp e1 e2) = evalExp e1 + evalExp e2
evalExp (Mult_Exp e1 e2) = evalExp e1 * evalExp e2
evalExp (Or_Exp e1 e2) = evalExp e1 || evalExp e2

-- Task 3: Let's write a simplifyer for expressions where
-- we apply the law that 0 * x = 0
-- We use E.

simp :: E -> E
simp One = One
simp Zero = Zero
simp ETrue = ETrue
simp EFalse = EFalse
simp (Plus e1 e2) = Plus (simp e1) (simp e2)
simp (Mult Zero _) = Zero
simp (Mult e1 e2) = Mult (simp e1) (simp e2)
simp (Or e1 e2) = Or e1 e2

-- Consider the following two examples.
-- What do you observe before/after simplification?

-- (0 * 1) * 1
s1 = Mult (Mult Zero One) One

-- 0 * (0 * False)
s2 = Mult Zero (Or Zero EFalse)

-- Assumes we have defined equality among ASTs.
simpFix :: E -> E
simpFix e = let e2 = simp e
            in if e2 == e then e
               else simpFix e2