Haskell to implement other languages

Martin Sulzmann

Overview

We consider a simple language of arithmetic expressions consisting of Integer and Boolean expressions.

We wish to represent this language in terms of a Haskell data type, so that we can exploit pattern matching to write functions that operate on this data type.

Specifically, we wish to write an evaluator (interpreter) and a type checker.

To specify the evaluator and type checker we will make use of

We show how to directly implement these rules in Haskell where we will make use of

and introduces an extension of algebraic data types known as generalized algebraic data types (GADTs).

Syntax

We first need to specify the syntax of our language of arithmetic expressions. From theoretical computer science, we recall context-free grammars.

Some context-free grammar rules to describe arithmetic and boolean expressions. For brevity, we only consider natural numbers 1 and 0.

E -> 1
E -> 0
E -> True
E -> False
E -> E + E
E -> E * E
E -> E || E
E -> (E)

We refer to E as a non-terminal symbol. Terminal symbols are 1, 0, etc.

The grammar is ambiguous as for 1 * 0 + 1 we find the following two derivations.

(1)

   E
-> E * E
-> 1 * E
-> 1 * E + E
-> 1 * 0 + E
-> 1 * 0 + 1

(2)

   E
-> E + E
-> E + 1
-> E * E + 1
-> E * 0 + 1
-> 1 * 0 + 1

For each derivation, we also give its parse tree.

(1)

             E
            /|\
           E * E
           |  /|\
           1 E + E
             |   |
             0   1

(2)

             E
            /|\
           E + E
          /|\  |
         E * E 1
         |   |
         1   0

As we can see parse trees are different. Hence, the above grammar is ambiguous.

The common rule to remove the above ambiguity is to assume that * binds tighter than +. This is important because we need to ensure that the meaning of 1 * 0 + 1 is unambiguous.

Haskell data type for expressions (syntax)

Source syntax

We assume the following source syntax for arithmetic expressions where we make use of EBNF instead of context-free grammar rules.

E = E `+` E | E `+` E | E `<` E | E `|` E | i | t | f

where i refers to some integer constant, t refers to the always true (boolean) constant, and t refers to the always false (boolean) constant

The are two occurrences of the symbol “|”.

At the level of EBNF, we use “|” to represent the various forms of arithmetic expressions. At the level of arithmetic expressions, we use | to represent boolean disjunction.

Why not using ||? Possible, but using | makes it easier to write a parser.

Abstract syntax

We seek for a representation of arithmetic expressions. As data types are perfect to represent tree-like structures such as parse trees, this is an easy task in Haskell. Instead of parse trees, we choose a more compact representation commonly referred to as the abstract syntax tree (AST). In the AST, there is no need to represent parentheses as the precedence among expressions is guaranteed by the tree structure.

Here is a Haskell data type representing the AST for arithmetic expressions.

data E = N Int | Plus E E | Mult E E
       | B Bool | LessThan E E | Or E E deriving (Show, Eq)

We define the following short-hands.

zero = N 0
one = N 1
true = B True
false = B False

Here is the Haskell AST representation for 1 * 0 + 1.

ex = Plus (Mult one zero) one

What about the following?

ex1 = Mult zero (Plus zero one)

The AST ex1 represents 1 * (0 + 1) whereas ex represents 1 * 0 + 1. Recall that in the source syntax we assume that * binds tighter than +. Hence, 1 * 0 + 1 corresponds to (1 * 0) + 1.

Here’s another example.

ex2 = Or (LessThan zero one) false

We will assume that comparison can only be performed among. We make use of operational semantics rules to specify this assumption.

Parsing

The implementation provided comes with a parser that checks if the input string matches the source syntax and converts the successfully matched input string into the AST represented by the Haskell data type E.

Dynamic semantics of arithmetic expressions

We specify the (dynamic) semantics in terms of operational semantics rules where we make use of the natural deduction style notation.

Conventions. We write i to refer to integer values and b to refer boolean values.

(O1)  (N i) => i

(O2)  (B b) => b

      e1 => i1
      e2 => i2
      i=i1+i2
(O3)  ------------------
      (Plus e1 e2) => i

      e1 => i1
      e2 => i2
      i=i1*i2
(O4)  ------------------
      (Mult e1 e2) => i

      e1 => i1
      e2 => i2
      i1 < i2
(O5)  ------------------
      (LessThan e1 e2) => True

      e1 => i1
      e2 => i2
      i1 >= i2
(O6)  ------------------
      (LessThan e1 e2) => False

      e1 => True
(O7)  ------------------
      (Or e1 e2) => True

      e1 => False
      e2 => True
(O8)  ------------------
      (Or e1 e2) => True

      e1 => False
      e2 => False
(O9)  ------------------
      (Or e1 e2) => False

Points to note.

Comparison only among integer expressions

Rules (O5) and (O6) state that the arguments of LessThan must be integer values.

Evaluation might get stuck

Consider the expression LessThan (B True) (N 0). There is no rule to evaluate this expressions. As another example, consider the expression Or (B True) (N 0).

Short-circuit evaluation for boolean expressions

In case of Or, we apply short-circuit evaluation. If the left operand evaluates to true we immediately return true. See rule (O7). Otherwise, we might need to evaluate the left operand as well. See rules (O8) and (O9).

Haskell interpreter (aka evaluator)

From the semantic description we can (almost) straightforwardly derive a Haskell interpreter for arithmetic expressions.

We represent => via the Haskell function

eval :: E -> Maybe (Either Int Bool)

Here are the details.

eval :: E -> Maybe (Either Int Bool)
eval (N i) = Just (Left i)                                               -- (O1)
eval (B b) = Just (Right b)                                              -- (O2)
eval (Plus e1 e2) =                                                      -- (O3)
     let r1 = eval e1
         r2 = eval e2
     in case (r1, r2) of
         (Just (Left i1), Just (Left i2)) -> Just (Left (i1 + i2))
         (_,_)  -> Nothing
eval (Mult e1 e2) =                                                      -- (O4)
     let r1 = eval e1
         r2 = eval e2
     in case (r1, r2) of
         (Just (Left i1), Just (Left i2)) -> Just (Left (i1 * i2))
         (_,_)  -> Nothing
eval (LessThan e1 e2) =                                                  -- (O5-6)
     let r1 = eval e1
         r2 = eval e2
     in case (r1, r2) of
         (Just (Left i1), Just (Left i2)) -> if i1 < i2
                                             then Just (Right True)
                                             else Just (Right False)
         (_,_)  -> Nothing
eval (Or e1 e2) =                                                        -- (O7-9)
     case (eval e1) of
        Nothing -> Nothing
        (Just Left{}) -> Nothing
        (Just (Right True)) -> Just (Right True)
        _                   -> case (eval e2) of
                                 Nothing -> Nothing
                                 (Just Left{}) -> Nothing
                                 (Just (Right True)) -> Just (Right True)
                                 _                   -> Just (Right False)

Some examples.

-- f | t
ex3 = Or false true

-- f | 1
ex4 = Or false one

-- t | 1
ex5 = Or true one

where we find that

eval ex3  ==>  Just (Right True)

eval ex4  ==>  Nothing

eval ex5  ==>  Just (Right True)

Point to note.

Expression ex5 is ill-typed. However, evaluation succeeds. Why? [Hint: Short-circuit evaluation].

Simplifyer

Before evaluation it is common to “simplify” expressions. We consider here a very simple form where we apply the algebraic law

0 * x = 0

Here is an attempt to carry out the above simplification step.

simp :: E -> E
simp e@N{} = e
simp e@B{} = e
simp (Plus e1 e2) = Plus (simp e1) (simp e2)
simp (Mult (N 0) _) = N 0
simp (Mult e1 e2) = Mult (simp e1) (simp e2)
simp (LessThan e1 e2) = LessThan (simp e1) (simp e2)
simp (Or e1 e2) = Or (simp e1) (simp e2)

Here are some examples.

-- (0 * 1) * 1
s1 = Mult (Mult zero one) one

{-

 eval s1 => Just (Left 0)

 simp s1 => Mult (N 0) (N 1)

 simp (simp s1) => N 0

 eval (simp s1) => Just (Left 0)

-}


-- 0 * (0 * False)
s2 = Mult zero (Or zero false)

{-

 eval s2 => Nothing

 simp s2 => N 0

 eval (simp s2) => Just (Left 0)

-}

Observations

We have to exhaustively apply simp until no further simplifications are possible. See example s1. We apply a fixpoint construction to exhaustively apply simplifications.

Evaluation of the original and simplified expression ought to yield the same result. This is not necessarily the case. See example s2. We will implement a type checker to rule out ill-typed expressions.

Exhaustive application of simp

simpFix :: E -> E
simpFix e = let e2 = simp e
            in if e2 == e then e
               else simpFix e2

Static type-checker

Next, we consider the task of static type checking. By static we mean that after parsing but before (dynamic) evaluation we check that expressions are well-typed.

An arithmetic expression is well-typed if the operands of Integer addition/multiplication are only Integer expressions and the operands of Boolean disjunction are only Boolean expressions. Unlike the above evaluation, there is no need to actually run (execute) expressions to decide if an expression is well-typed. Hence, we say that type checking is done statically.

The type checking rules follow the structure of the evaluation rules. The crucial difference is that we abstract away from any concrete values and only distinguish between an ill-typed expression, an expression of Integer type and an expression of Boolean type.

Type checking rules

Conventions.

We write E |- T to denote that expression E has type T where T can be either Int or Bool.

(T1)  (N i) |- Int

(T2)  (B b) |- Bool

      e1 |- Int
      e2 |- Int
(T3)  ------------------
      (Plus e1 e2) |- Int

      e1 |- Int
      e2 |- Int
(T4)  ------------------
      (Mult e1 e2) |- Int

      e1 |- Int
      e2 |- Int
(T5)  ------------------
      (LessThan e1 e2) |- Bool

      e1 |- Bool
      e2 |- Bool
(T6)  ------------------
      (Or e1 e2) |- Bool

Type checker

typecheck :: E -> Maybe Type
typecheck (N _) = Just TInt
typecheck (B _) = Just TBool
typecheck (Plus e1 e2) =
     case (typecheck e1, typecheck e2) of
       (Just TInt, Just TInt) -> Just TInt
       (_, _) -> Nothing
typecheck (Mult e1 e2) =
     case (typecheck e1, typecheck e2) of
       (Just TInt, Just TInt) -> Just TInt
       (_, _) -> Nothing
typecheck (LessThan e1 e2) =
     case (typecheck e1, typecheck e2) of
       (Just TInt, Just TInt) -> Just TBool
       (_, _) -> Nothing
typecheck (Or e1 e2) =
     case (typecheck e1, typecheck e2) of
       (Just TBool, Just TBool) -> Just TBool
       (_, _) -> Nothing

For example, we find the following

typecheck s1 => Just TInt

typecheck s2 => Nothing

Evaluation of well-typed expressions

Our current evaluator has to perform some amount of type checking at run-time. For example, consider the following case.

eval (Mult e1 e2) =                                                      -- (O4)
     let r1 = eval e1
         r2 = eval e2
     in case (r1, r2) of
         (Just (Left i1), Just (Left i2)) -> Just (Left (i1 * i2))
         (_,_)  -> Nothing

We evaluate the left and right operands and via the pattern match “Just (Left …)” we effectively check that operands yield an Integer value.

For well-typed expressions, it is possible to optimize our evaluator. There is no need to cover cases that involve “Just” and “Nothing”.

evalT :: E -> Either Int Bool
evalT (N i) = Left i
evalT (B b) = Right b
evalT (Plus e1 e2) =
    case (evalT e1, evalT e2) of
       (Left i1, Left i2) -> Left (i1 + i2)
evalT (Mult e1 e2) =
    case (evalT e1, evalT e2) of
       (Left i1, Left i2) -> Left (i1 * i2)
evalT (LessThan e1 e2) =
    case (evalT e1, evalT e2) of
       (Left i1, Left i2) -> Right (i1 < i2)
evalT (Or e1 e2) =
     case (evalT e1) of
       Right True -> Right True
       _          -> evalT e2

Of course, we shall apply evalT only on well-typed expressions. Hence, before calling evalT we need to call typecheck first.

Enforcing well-typing by construction via GADTs

In Haskell, we can enforce well-typing of expressions when building ASTs. This is possible with the help of generalized algebraic data types (GADTs).

Here’s a GADT that enforces well-typing of arithmetic AST expressions.

data EE a where
  N_EE :: Int -> EE Int
  B_EE :: Bool -> EE Bool
  Plus_EE :: EE Int -> EE Int -> EE Int
  Mult_EE :: EE Int -> EE Int -> EE Int
  LessThan_EE :: EE Int -> EE Int -> EE Bool
  Or_EE :: EE Bool -> EE Bool -> EE Bool

For example, via constructor Plus_EE we build Integer expressions composed of addition. The arguments of the constructor demand that left and right operands must be well-typed. The result is a well-typed Integer expressions. A similar condition is enforced via constructor Or_EE for Boolean expressions.

As we can see, GADTs are a form of a parametric data type where the parameter instance may vary for each constructor. Pattern matching proceeds as before.

Based on GADTs, evaluation of well-typed (by construction) expressions is defined as follows.

evalEE :: EE a -> a
evalEE (N_EE i) = i
evalEE (B_EE b) = b
evalEE (Plus_EE e1 e2) = evalEE e1 + evalEE e2
evalEE (Mult_EE e1 e2) = evalEE e1 * evalEE e2
evalEE (LessThan_EE e1 e2)
  | evalEE e1 < evalEE e2 = True
  | otherwise             = False
evalEE (Or_EE e1 e2) = evalEE e1 || evalEE e2

The type

evalEE :: EE a -> a

states that any well-typed expression of type a yields a value of type a. Our GADT covers Integer and Boolean expressions. Hence, this statement applies to Int and Bool.

Summary