Martin Sulzmann
Today, we are creating our own data types:
Above are examples of algebraic data types Algebraic data
types are build by composition via other types. In Haskell, the user can
define its own set of (algebraic) data types via the data
keyword.
Specify functions via pattern matching. This is very similar to why we have seen pattern matching over built-in lists and tuples.
Define overloaded functions which operate on data types. Overloading in Haskell is achieved via type classes.
We consider three extended examples (regular expressions, geometric objects and arithmetic expressions)
Example geometric objects and arithmetic expressions make use of two extensions of algebraic data types:
Existential data types (EADTs)
Generalized algebraic data types (GADTs)
GADTs subsume EADTs.
data
keyword introduces a new data typeMo
,
…printWD :: WeekDay -> String
printWD Mo = "Monday"
printWD Tue = "Tuesday"
printWD Wed = "Wednesday"
printWD Thur = "Thursday"
printWD Friday = "Friday"
printWD Sat = "Satursday"
printWD Sun = "Sunday"
eqWD:: WeekDay -> WeekDay -> Bool
eqWD Mo Mo = True
eqWD Tue Tue = True
eqWD Wed Wed = True
eqWD Thur Thur = True
eqWD Friday Friday = True
eqWD Sat Sat = True
eqWD Sun Sun = True
eqWD _ _ = False
OK
takes as an argument an
Int
Fail
takes as an argument a
String
Result
we must provide the
appropriate argumentshd :: List a -> a
hd (Cons x _) = x
tl :: List a -> List a
tl (Cons _ xs) = xs
mapL :: (a -> b) -> List a -> List b
mapL f Nil = Nil
mapL f (Cons x xs) = Cons (f x) (mapL f xs)
filterL p Nil = Nil
filterL p (Cons x xs)
| p x = Cons x (filterL p xs)
| otherwise = filterL p xs
coerce :: List a -> [a]
coerce Nil = []
coerce (Cons x xs) = (:) x (coerce xs)
coerceF :: [a] -> List a
coerceF [] = Nil
coerceF (x:xs) = Cons x (coerceF xs)
Define map for List
in terms of built-in map and
lists.
data List a = Nil | Cons a (List a)
hd :: List a -> a
hd (Cons x _) = x
tl :: List a -> List a
tl (Cons _ xs) = xs
mapL :: (a -> b) -> List a -> List b
mapL f Nil = Nil
mapL f (Cons x xs) = Cons (f x) (mapL f xs)
filterL p Nil = Nil
filterL p (Cons x xs)
| p x = Cons x (filterL p xs)
| otherwise = filterL p xs
sumL :: Num a => List a -> a
sumL Nil = 0
sumL (Cons x xs) = x + sumL xs
sumL2 :: Num a => List a -> a
sumL2 ys =
case ys of
Nil -> 0
Cons x xs -> x + sumL xs
coerce :: List a -> [a]
coerce Nil = []
coerce (Cons x xs) = (:) x (coerce xs)
coerceF :: [a] -> List a
coerceF [] = Nil
coerceF (x:xs) = Cons x (coerceF xs)
mapL2 :: (a1 -> a2) -> List a1 -> List a2
mapL2 f = coerceF . (map f) . coerce
type
versus data
versus
newtype
Recall the database example.
Type synonyms are introduced via type
and are expanded
before type checking programs.
We can define “new” types via newtype
.
The type Student2
is isomorphic to
(String, Int, [Int])
.
However, we cannot mix types Student2
and
(String, Int, [Int])
.
We can recast the above as follows.
Generally, we use data
for types that have several cases
(see List
, …).
Define a member
function to test if x
ist
an element of a list
member x Nil = False
member x (Cons y ys)
| x == y = True
| otherwise = member x ys
testMember = member Mo (Cons Tue (Cons Mo Null))
==
member Mo (Cons Tue (Cons Mo Nil))
?==
is not defined for
WeekDay
Possible solutions
memberWD x Nil = False
memberWD x (Cons y ys)
| eqWD x y = True
| otherwise = memberWD x ys
eqWD:: WeekDay -> WeekDay -> Bool
eqWD Mo Mo = True
eqWD Tue Tue = True
eqWD Wed Wed = True
eqWD Thur Thur = True
eqWD Friday Friday = True
eqWD Sat Sat = True
eqWD Sun Sun = True
eqWD _ _ = False
member2 eq x Nil = False
member2 eq x (Cons y ys)
| eq x y = True
| otherwise = member2 eq x ys
testMember2 = member2 eqWD Mo (Cons Tue (Cons Mo Nil))
==
?Type classes provide for an expressive form of (method) overloading.
A type classes represents a set of methods that share the same (overloaded) type
Methods can be funtion names as well as operators
Eq
classPart of the Haskell Prelude (i.e. already defined).
Recall
What’s the type?
*Main> :t member
member :: Eq a => a -> List a -> Bool
member
works for any type a
provided we supply an instance of (type) class Eq
for
this type a
Eq a
is a type class constraint
We can overload ==
for a specific type t
by
providing for an instance declaration of the Eq
type class for t
. Below we consider several sample instance
declarations.
We provide for an instance of Eq
for the type
Weekday
.
The following works now
What about the following?
Requires an instance of Eq
for
List Weekday
.
Here we go!
instance Eq a => Eq (List a) where
Nil == Nil = True
(Cons x xs) == (Cons y ys) = x == y && xs == ys
-- x == y requires Eq a
-- xs == ys requires Eq (List a)
_ == _ = False
We can provide an instance of Eq
for lists assuming we
provide an instance for the element types.
data WeekDay = Mo
| Tue
| Wed
| Thur
| Friday
| Sat
| Sun
printWD :: WeekDay -> String
printWD Mo = "Monday"
printWD Tue = "Tuesday"
printWD Wed = "Wednesday"
printWD Thur = "Thursday"
printWD Friday = "Friday"
printWD Sat = "Satursday"
printWD Sun = "Sunday"
eqWD:: WeekDay -> WeekDay -> Bool
eqWD Mo Mo = True
eqWD Tue Tue = True
eqWD Wed Wed = True
eqWD Thur Thur = True
eqWD Friday Friday = True
eqWD Sat Sat = True
eqWD Sun Sun = True
eqWD _ _ = False
memberWD x Nil = False
memberWD x (Cons y ys)
| eqWD x y = True
| otherwise = memberWD x ys
instance Eq WeekDay where
(==) = eqWD
data List a = Nil | Cons a (List a)
member x Nil = False
member x (Cons y ys)
| x == y = True
| otherwise = member x ys
testMember = member Mo (Cons Tue (Cons Mo Nil))
member2 eq x Nil = False
member2 eq x (Cons y ys)
| eq x y = True
| otherwise = member2 eq x ys
testMember2 = member2 eqWD Mo (Cons Tue (Cons Mo Nil))
instance Eq a => Eq (List a) where
Nil == Nil = True
(Cons x xs) == (Cons y ys) = x == y && xs == ys
-- x == y requires Eq a
-- xs == ys requires Eq (List a)
_ == _ = False
testMember3 = member (Cons Mo Nil) (Cons (Cons Tue Nil) Nil)
Show
type classinstance Show WeekDay where
show Mo = "Monday"
show Tue = "Tuesday"
show Wed = "Wednesday"
show Thur = "Thursday"
show Friday = "Friday"
show Sat = "Satursday"
show Sun = "Sunday"
ghci
always “shows” the result
*Main> Mo
Monday
data WeekDay = Mo
| Tue
| Wed
| Thur
| Friday
| Sat
| Sun
instance Show WeekDay where
show Mo = "Monday"
show Tue = "Tuesday"
show Wed = "Wednesday"
show Thur = "Thursday"
show Friday = "Friday"
show Sat = "Satursday"
show Sun = "Sunday"
data List a = Nil | Cons a (List a)
instance Show a => Show (List a) where
show Nil = ""
show (Cons x xs) = "(Cons " ++ show x ++ " " ++ show xs ++ ")"
l1 = Cons Mo (Cons Wed Nil)
run = putStrLn (show l1)
-- Ord class provided by the Prelude
class Eq a => Ord a where
(<) :: a -> a -> Bool
(<=) :: a -> a -> Bool
For every instance of Ord
there must be an instance of
Eq
as well.
It suffices to define (<=)
as by default
x < y iff x <= y && !(x == y)
Example.
data WeekDay = Mo
| Tue
| Wed
| Thur
| Friday
| Sat
| Sun
wdToInt :: WeekDay -> Int
wdToInt Mo = 0
wdToInt Tue = 1
wdToInt Wed = 2
wdToInt Thur = 3
wdToInt Friday = 4
wdToInt Sat = 5
wdToInt Sun = 6
instance Ord WeekDay where
(<=) w1 w2 = (wdToInt w1) <= (wdToInt w2)
instance Eq WeekDay where
(==) w1 w2 = (wdToInt w1) == (wdToInt w2)
Can be derived automatically
Works even for parametric data types
Type classes translate to dictionaries.
A dictionary holds the set of methods as described by the type class.
class Eq a where
(==) :: a -> a -> Bool
member x Nil = False
member x (Cons y ys)
| x == y = True
| otherwise = member x ys
translates to
data DictEq a = DictEqVal (a->a->Bool)
unDict (DictEqVal f) = f
memberDict :: DictEq a -> a -> List a -> Bool
memberDict d x Nil = False
memberDict (DictEqVal eq) x (Cons y ys)
| eq x y = True
| otherwise = memberDict (DictEqVal eq) x ys
-- instance Eq Bool
dBool = DictEqVal (\x -> \y -> case (x,y) of
(True,True) -> True
(False,False) -> True
_ -> False)
-- instance Eq t => Eq (List t)
dictEqList :: DictEq t -> DictEq (List t)
dictEqList (DictEqVal eq) =
DictEqVal (\x -> \y ->
case (x,y) of
(Nil, Nil) -> True
(Cons x xs, Cons y ys) ->
(eq x y) &&
(unDict (dictEqList (DictEqVal eq))) xs ys
_ -> False)
exMem = member True (Cons False Nil)
exMemDict= memberDict dBool True (Cons False Nil)
exMem2 = member (Cons True Nil) (Cons (Cons True Nil) Nil)
exMemDict2 = memberDict (dictEqList dBool) (Cons True Nil) (Cons (Cons True Nil) Nil)
More details on the dictionary-passing translation can be found in the paper Type classes in Haskell.
In EBNF Syntax:
r,s ::= x | y | z | ... Symbols aka letters taken from a finite alphabet
| epsilon Empty word
| phi Empty language
| r + r Alternatives
| r . r Sequence aka concatenation
| r* Kleene star
u,v,w ::= epsilon empty word
| w . w concatenation
Sigma denotes the finite set of alphabet symbols.
Regular expressions are popular formalism to specify (infinitely many) patterns of input.
For example,
(open . (read + write)* . close)*
specifies the valid access patterns of a resource usage policy.
We assume that open
, read
,
write
, close
are the primitive events
(symbols) which will be recorded during a program run.
data R = Phi
| Epsilon
| Sym Char
| Alt R R
| Conc R R
| Star R deriving Eq
instance Show R where
show Phi = "PHI"
show Epsilon = ""
show (Sym x) = [x]
show (Alt r s) = "(" ++ show r ++ "+" ++ show s ++ ")"
show (Conc r s) = "(" ++ show r ++ "." ++ show s ++ ")"
show (Star s) = "(" ++ show s ++ ")" ++ "*"
We we wish to apply the following laws:
epsilon . r = r
r + r = r
simp :: R -> R
simp r = let s = simp2 r
in if s == r then r
else simp s
where
simp2 (Conc Epsilon r) = simp2 r
simp2 (Conc r s) = Conc (simp2 r) (simp2 s)
simp2 (Alt r s)
| r == s = simp2 r
| otherwise = Alt (simp2 r) (simp2 s)
simp2 (Star r) = Star $ simp2 r
simp2 r = r
We exhaustively apply the two laws
We use a fixpoint construction because application of one law may enable the other law
L(r)
denotes the set of words represented by the regular
expression r.
Standard (denotational) formulation of L(r):
L(x) = { x }
L(epsilon) = { epsilon }
L(phi) = { }
L(r + s) = { w | w in L(r) or w in L(s) }
L(r . s) = { v . w | v in L(r) and w in L(s) }
L(r*) = { epsilon } cap { w_1 . ... . w_n | w_1, ..., w_n in L(r) and n >=1 }
The classical approach is to turn the regular expression into an automata (for example via the Thompson NFA construction). Here, we consider an alternative method based on Brzozowski https://en.wikipedia.org/wiki/Brzozowski_derivative. He introduced a symbolic method to construct an automata from a regular expression based on the concept of derivatives.
Given some expression r and a symbol x, we obtain the derivative of r w.r.t. x, written d(r,x), by taking way the leading symbol x from r.
In semantic terms, d(r,x) can be described as follows:
L(d(r,x)) = x \ L(r)
x L(r) denotes the left quotient, i.e. the language
{ w | x . w in L(r)}
.
.
to denote concatenation. In some exposition
this is left silent, i.e. x w
.Hence, the derivative d(r,x)
denotes the set of all
words from L(R) where the leading symbol x has been removed.
Thus, it is easy to solve the word problem. Let w
be a
word consisting of symbols x1 . x2 .... xn-1 . xn
.
Compute
d(r,x1) = r1
d(r1,x2) = r2
...
d(rn-1,xn) = rn
That is, we repeatidely build the derivative of r w.r.t symbols xi.
Check if the final expression rn
is nullable. An
expression s is nullable if epsilon in L(s).
We write n(r)
to denote the nullability test which
yields a Boolean value (true/false).
n(x) where xi is a symbol never holds.
n(epsilon) always holds.
n(phi) never holds.
n(r + s) holds if n(r) holds or n(s) holds.
n(r . s) holds if n(r) holds and n(s) holds.
n(r*) always holds.
A similar approach (definition by structural recursion) works for the derivative operation.
We write d(r,x)
to denote the derivative obtained by
extracting the leading symbol x
from expression
r
. For each derivative, we wish that the following holds:
L(d(r,x)) = x \ L(r)
.
As in case of the nullability test, the derivative operation is defined by observing the structure of regular expression patterns. Instead of a Boolean value, we yield another expression (the derivative).
d(x,y) = either epsilon if x == y or phi otherwise
d(epsilon,y) = phi
d(phi,y) = phi
d(r + s, y) = d(r,y) + d(s,y)
d(r . s, y) = if n(r)
then d(r,y) . s + d(s,y)
else d(r,y) . s
d(r*, y) = d(r,y) . r*
Let’s consider some examples to understand the workings of the derivative and nullability function.
We write r -x-> d(r,x)
to denote application of the
derivative on some expression r
with respect to symbol
x
.
x*
-x-> d(x*,x)
= d(x,x) . x*
= epsilon . x*
-x-> epsilon . x*
= d(epsilon,x) . x* + d(x*,x) -- n(epsilon) yields true
= phi . x* + epsilon . x*
So repeated applicaton of the derivative on
x*$ for input string "x.x" yields
phi . x* + epsilon . x*`.
Let’s carry out the nullability function on the final expression.
n(phi . x* + epsilon . x*)
= n(phi .x*) or n(epsilon . x*)
= (n(phi) and n(x*)) or (n(epsilon) and n(x*))
= (false and true) or (true and true)
= false or true
= true
The final expression phi . x* + epsilon . x*
is
nullable. Hence, we can argue that expression x*
matches
the input word “x.x”.
nullable :: R -> Bool
nullable Phi = False
nullable Epsilon = True
nullable Sym{} = False
-- spelled out as the following.
-- nullable (Sym _) = False
nullable (Alt r s) = nullable r || nullable s
nullable (Conc r s) = nullable r && nullable s
nullable Star{} = True
deriv :: Char -> R -> R
deriv _ Epsilon = Phi
deriv _ Phi = Phi
deriv x (Sym y)
| x == y = Epsilon
| otherwise = Phi
deriv x (Alt r s) = Alt (deriv x r) (deriv x s)
deriv x (Conc r s)
| nullable r = Alt (Conc (deriv x r) s) (deriv x s)
| otherwise = Conc (deriv x r) s
deriv x (Star r) = Conc (deriv x r) (Star r)
membership :: String -> R -> Bool
membership xs r =
go r xs
where
go r [] = nullable r
go r (x:xs) = go (deriv x r) xs
Pretty much direct implementation of the theory.
{-
EBNF Syntax of regular expressions and words
R ::= 'x' | 'y' | ... alphabet symbols
| epsilon represents the empty string
| phi represents the empty language
| R + R alternatives
| R . R concatenation
| R* Kleene star
| (R) Grouping via parentheses
W ::= epsilon | 'x' | 'y' | ...
| W W
-}
-- Data type to represent regular expressions.
data R = Phi
| Epsilon
| Sym Char
| Alt R R
| Conc R R
| Star R deriving Eq -- automatically derive equality among regular expressions
{-
-- The following instance can be derived automatically.
instance Eq R where
(==) Phi Phi = True
(==) Epsilon Epsilon = True
(==) (Sym x) (Sym y) = x == y
(==) (Alt r1 r2) (Alt s1 s2) = r1 == s1 && r2 == s2
(==) (Conc r1 r2) (Conc s1 s2) = r1 == s1 && r2 == s2
(==) (Star r) (Star s) = r == s
-}
-- Show instance for regular expressions
instance Show R where
show Phi = "PHI"
show Epsilon = ""
show (Sym x) = [x]
show (Alt r s) = "(" ++ show r ++ "+" ++ show s ++ ")"
show (Conc r s) = "(" ++ show r ++ "." ++ show s ++ ")"
show (Star s) = "(" ++ show s ++ ")" ++ "*"
-- some regular expressions plus short-hands
x = Sym 'x'
y = Sym 'y'
r1 = Star x
r2 = Star $ Alt x y
r3 = Conc y r2
r4 = Alt (Conc Epsilon r1) r1
-- Epsilon or r.
opt :: R -> R
opt r = Alt r Epsilon
-- One or more iterations.
kleenePlus :: R -> R
kleenePlus r = Conc r (Star r)
-----------------------------------------------------------------------
-- Writing functions that operate on data types via pattern matching.
-- Yield True if epsilon is part of the language, otherwise, we find False.
nullable :: R -> Bool
nullable Phi = False
nullable Epsilon = True
nullable Sym{} = False
-- spelled out as the following.
-- nullable (Sym _) = False
nullable (Alt r s) = nullable r || nullable s
nullable (Conc r s) = nullable r && nullable s
nullable Star{} = True
-- Simplifying expressions:
-- epsilon . R = R
-- R + R = R
--
-- We build a "fixpoint"!
-- We apply helper simp2 until no further simplifications can be applied.
-- Why is this necessary? Hint. Consider example r4.
simp :: R -> R
simp r = let s = simp2 r
in if s == r then r
else simp s
where
simp2 (Conc Epsilon r) = simp2 r
simp2 (Conc r s) = Conc (simp2 r) (simp2 s)
simp2 (Alt r s)
| r == s = simp2 r
| otherwise = Alt (simp2 r) (simp2 s)
simp2 (Star r) = Star $ simp2 r
simp2 r = r
------------------
-- https://en.wikipedia.org/wiki/Brzozowski_derivative
deriv :: Char -> R -> R
deriv _ Epsilon = Phi
deriv _ Phi = Phi
deriv x (Sym y)
| x == y = Epsilon
| otherwise = Phi
deriv x (Alt r s) = Alt (deriv x r) (deriv x s)
deriv x (Conc r s)
| nullable r = Alt (Conc (deriv x r) s) (deriv x s)
| otherwise = Conc (deriv x r) s
deriv x (Star r) = Conc (deriv x r) (Star r)
membership :: String -> R -> Bool
membership xs r =
go r xs
where
go r [] = nullable r
go r (x:xs) = go (deriv x r) xs
xs = Star (Sym 'x')
test0 = membership "xxxx" xs
aOrBs = Star (Alt (Sym 'a') (Sym 'b'))
test1 = membership "abba" aOrBs
test2 = membership "abbca" aOrBs
This example makes use of
data types
type classes (with super classes)
and introduces an extension of algebraic data types known as existential data types.
We consider geometric objects such as squares and rectangles and how to represent these objects and some associated operations with the help of data types and type classes.
Here are some straightforward data type representations of squares and rectangles.
The definition of Rectangle
makes use of a data type
feature where we can use labels to refer to individual components. The
advantage is that we can use labels to unambiguously distinguish among
components that are of the same type.
For each geometric object we wish to compute its area and provide for a scaling function. We represent this functionality via a type class.
The above class declaration makes use of super classes. Every
instance of GeoShape
must also provide for an instance of
Show
.
Here are the instances.
instance GeoShape Square where
area (Square x) = x * x
scale (Square x) s = Square (x * s)
instance GeoShape Rectangle where
area r = len r * width r
scale r s = Rec { len = len r * s, width = width r * s }
instance Show Square where
show (Square x) = "square(" ++ show x ++ ")"
instance Show Rectangle where
show (Rec {len = l, width = w }) = "rectangle(" ++ show l ++ "," ++ show w ++ ")"
Some examples that make use of the above functionality.
r = Rec { width = 2, len = 3 }
s = Square 4
r2 = scale r 3
ex1 = do
let a = area r + area s + area r2
putStrLn (unlines ["Sum of the area of",
show r,
show s,
show r2,
"equals",
show a])
It would be convenient if we could store geometric objects in a list for further processing.
The above will not type check! Lists in Haskell are homogenous. That is, all elements must be of the same type.
What we can do is to define a new data type that represents an abstract representation of a geometric object.
The above data type declaration makes use of existential data
types. Why existential? There seems to be a universal (for all)
quantor in the above definition? Yes, we universally quantify over
a
but this type is not part of the return type. Hence, the
following logical equivalence
forall a. (I implies J) = (exists a. I) implies J
where J does not refer to a
which justifies the name existential data type.
In more detail, the constructor MkShape
has the
following type.
The argument of the constructor is any value of type a
that is an instance of the GeoShape
type class.
For example, we find that
By embedding square and rectangle into the new type
Shape
, we can store both in a list.
What can do with elements of this list? We can apply all the
operations that are available for geometric objects by lifting
them to Shape
.
instance Show Shape where
show (MkShape x) = show x
instance GeoShape Shape where
area (MkShape x) = area x
scale (MkShape x) s = MkShape (scale x s)
Here is an example.
ex2 = putStrLn (unlines (["Sum of the area of"] ++
map show shapes ++
[show (sum (map area shapes))]))
{-# LANGUAGE ExistentialQuantification #-}
data Square = Square Int
data Rectangle = Rec { len :: Int, width :: Int }
class Show a => GeoShape a where
area :: a -> Int
scale :: a -> Int -> a
instance GeoShape Square where
area (Square x) = x * x
scale (Square x) s = Square (x * s)
instance GeoShape Rectangle where
area r = len r * width r
scale r s = Rec { len = len r * s, width = width r * s }
instance Show Square where
show (Square x) = "square(" ++ show x ++ ")"
instance Show Rectangle where
show (Rec {len = l, width = w }) = "rectangle(" ++ show l ++ "," ++ show w ++ ")"
r = Rec { width = 2, len = 3 }
s = Square 4
r2 = scale r 3
ex1 = do
let a = area r + area s + area r2
putStrLn (unlines ["Sum of the area of",
show r,
show s,
show r2,
"equals",
show a])
-- Does not type check
-- shapes = [r, s, r2]
-- In Haskell, constructors and data types do NOT share the same name space.
-- Therefore, it would be ligit to use the constructor name Shape instead of MkShape.
data Shape = forall a. GeoShape a => MkShape a
instance Show Shape where
show (MkShape x) = show x
instance GeoShape Shape where
area (MkShape x) = area x
scale (MkShape x) s = MkShape (scale x s)
s1 :: Shape
s1 = MkShape r
s2 :: Shape
s2 = MkShape s
shapes = [s1, s2, MkShape r2]
ex2 = putStrLn (unlines (["Sum of the area of"] ++
map show shapes ++
[show (sum (map area shapes))]))
This example makes use of
data types
type classes
and introduces an extension of algebraic data types known as generalized algebraic data types (GADTs).
We consider a simple language of arithmetic expressions consisting of Integer and Boolean expressions.
We wish to represent this language in terms of a Haskell data type, so that we can exploit pattern matching to write functions that operate on this data type.
Specifically, we wish to write an evaluator (interpreter) and a type checker.
We first need to specify the syntax of our language of arithmetic expressions. From theoretical computer science, we recall context-free grammars.
Some context-free grammar rules to describe arithmetic and boolean expressions. For brevity, we only consider natural numbers 1 and 0.
E -> 1
E -> 0
E -> True
E -> False
E -> E + E
E -> E * E
E -> E || E
E -> (E)
We refer to E
as a non-terminal symbol. Terminal symbols
are 1
, 0
, etc.
The grammar is ambiguous as for 1 * 0 + 1
we find the
following two derivations.
(1)
E
-> E * E
-> 1 * E
-> 1 * E + E
-> 1 * 0 + E
-> 1 * 0 + 1
(2)
E
-> E + E
-> E + 1
-> E * E + 1
-> E * 0 + 1
-> 1 * 0 + 1
For each derivation, we also give its parse tree.
(1)
E
/|\
E * E
| /|\
1 E + E
| |
0 1
(2)
E
/|\
E + E
/|\ |
E * E 1
| |
1 0
As we can see parse trees are different. Hence, the above grammar is ambiguous.
The common rule to remove the above ambiguity is to assume that
*
binds tigher than +
. This is important
because we need to ensure that the meaning of 1 * 0 + 1
is
unambiguous.
We seek for a representation of arithemtic expressions. As data types are perfect to represent tree-like structures such as parse trees, this is an easy task in Haskell. Instead of parse trees, we choose a more compact representation commonly referred to as the abstract syntax tree (AST). In the AST, there is no need to represent parantheses as the precedence among expressions is guaranteed by the tree structure.
Here is a Haskell data type representing the AST for arithmetic expressions.
Here is the Haskell AST representation for 1 * 0 + 1
What about the following?
The AST ex1
represents 1 * (0 + 1)
whereas
ex
represents 1 * 0 + 1
. Recall that in the
source syntax we assume that *
binds tighter than
+
. Hence, 1 * 0 + 1
corresponds to
(1 * 0) + 1
.
We consider pretty printing of ASTs. Each AST shall be displayed in terms of the original source language which is a string.
We provide for a simple pretty printer via the following instance of
type class Show
.
instance Show E where
show One = "1"
show Zero = "0"
show ETrue = "true"
show EFalse = "false"
show (Plus e1 e2) = "(" ++ show e1 ++ " + " ++ show e2 ++ ")"
show (Mult e1 e2) = "(" ++ show e1 ++ " * " ++ show e2 ++ ")"
show (Or e1 e2) = "(" ++ show e1 ++ " || " ++ show e2 ++ ")"
We find that
Pretty-printing could be improved by removing as many parantheses as
possible. We cannot remove all parantheses as otherwise the meaning of
the expression changes in some unexpected way. Consider ex1
where the innermost parantheses are necessary.
We implement an evaluator (interpreter) for arithmetic expressions. There are a few things we need to take care of.
We assume that Integer and Boolean expressions cannot be mixed. For example, when trying to evaluate
we get stuck. That means, evaluation may return a result or nothing.
This can be represented via the Maybe
data type.
The result of evaluation may either be an Integer or Boolean value.
This can be represented via the Either
data type.
The evaluator traverses over the structure of the AST. We evaluate operands (from left to right) and then combine the result. This can elegantly expressed via pattern matching.
eval :: E -> Maybe (Either Int Bool)
eval One = Just (Left 1)
eval Zero = Just (Left 0)
eval ETrue = Just (Right True)
eval EFalse = Just (Right False)
eval (Plus e1 e2) =
let r1 = eval e1
r2 = eval e2
in case (r1, r2) of
(Just (Left i1), Just (Left i2)) -> Just (Left (i1 + i2))
(_,_) -> Nothing
eval (Mult e1 e2) =
let r1 = eval e1
r2 = eval e2
in case (r1, r2) of
(Just (Left i1), Just (Left i2)) -> Just (Left (i1 * i2))
(_,_) -> Nothing
eval (Or e1 e2) =
case (eval e1) of
Nothing -> Nothing
(Just Left{}) -> Nothing
(Just (Right True)) -> Just (Right True)
_ -> case (eval e2) of
Nothing -> Nothing
(Just Left{}) -> Nothing
(Just (Right True)) -> Just (Right True)
_ -> Just (Right False)
In case of Or
, we apply short-circuit
evaluation. If the left operand evaluates to true we immediately
return true. Otherwise, we might need to evaluate the left operand as
well.
Take note. We first check for Nothing
and then
(Just Left{})
.
Nothing
indicates evaluation got stuck.
(Just Left{})
indicates operand e1 evaluates to an
Integer expression and therefore we’re stuck as well.
Consider
-- 1 + true
ex2 = Plus One ETrue
-- false || true
ex3 = Or EFalse ETrue
-- false || 1
ex4 = Or EFalse One
-- true || 1
ex5 = Or ETrue One
where we find that
eval ex2 ==> Nothing
eval ex3 ==> Just (Right True)
eval ex4 ==> Just (Right False)
eval ex5 ==> Just (Right True)
Next, we consider the task of type checking. The purpose of type checking is to ensure that operands and operations are compatible.
An arithmetic expression is well-typed if the operands of Integer addition/multiplication are only Integer expressions and the operands of Boolean disjunction are only Boolean expressions. Unlike the above evaluation, there is no need to actually run (execute) expressions to decide if an expression is well-typed. Hence, we say that type checking is done statically.
The type checking rules follow the structure of the evaluation rules. The crucial difference is that we abstract away from any concrete values and only distinguish between an ill-typed expression, an expression of Integer type and an expression of Boolean type.
data Type = TInt | TBool deriving Show
typecheck :: E -> Maybe Type
typecheck Zero = Just TInt
typecheck One = Just TInt
typecheck ETrue = Just TBool
typecheck EFalse = Just TBool
typecheck (Plus e1 e2) =
case (typecheck e1, typecheck e2) of
(Just TInt, Just TInt) -> Just TInt
(_, _) -> Nothing
typecheck (Mult e1 e2) =
case (typecheck e1, typecheck e2) of
(Just TInt, Just TInt) -> Just TInt
(_, _) -> Nothing
typecheck (Or e1 e2) =
case (typecheck e1, typecheck e2) of
(Just TBool, Just TBool) -> Just TBool
(_, _) -> Nothing
For the above examples, we find that
typecheck ex2 ==> Nothing
typecheck ex3 ==> Just TBool
typecheck ex4 ==> Nothing
typecheck ex5 ==> Nothing
Notice that evaluation of ex4
succeeds but type checking
reports that ex4
is ill-typed.
Our current evaluator has to perform some amount of type checking at run-time. For example, consider the following case.
eval (Mult e1 e2) =
let r1 = eval e1
r2 = eval e2
in case (r1, r2) of
(Just (Left i1), Just (Left i2)) -> Just (Left (i1 * i2))
(_,_) -> Nothing
We evaluate the left and right operands and via the pattern match “Just (Left …)” we effectively check that operands yield an Integer value.
For well-typed expressions, it is possible to optimize our evaluator. There is no need to cover cases that involve “Just” and “Nothing”.
evalT :: E -> Either Int Bool
evalT One = Left 1
evalT Zero = Left 0
evalT ETrue = Right True
evalT EFalse = Right False
evalT (Plus e1 e2) =
case (evalT e1, evalT e2) of
(Left i1, Left i2) -> Left (i1 + i2)
evalT (Mult e1 e2) =
case (evalT e1, evalT e2) of
(Left i1, Left i2) -> Left (i1 * i2)
evalT (Or e1 e2) =
case (evalT e1) of
Right True -> Right True
_ -> evalT e2
Of course, we shall apply evalT
only on well-typed
expressions. Hence, before calling evalT
we need to call
typecheck
first.
We consider the task of simplifying expressions based on some algebraic laws. For example, we wish to enforce that law that “0 * x = 0”. This law can be enforced statically (and therefore might improve the run-time performance when evaluationg expressions).
Consider
simp :: E -> E
simp One = One
simp Zero = Zero
simp ETrue = ETrue
simp EFalse = EFalse
simp (Plus e1 e2) = Plus (simp e1) (simp e2)
simp (Mult Zero _) = Zero
simp (Mult e1 e2) = Mult (simp e1) (simp e2)
simp (Or e1 e2) = Or e1 e2
What do we observe when simplifying the following expressions?
There are two issues.
The first issue is that s2
is ill-typed but the
expression resulting from simp s2
is well-typed. The second
issue is that simplification is not exhaustive. For example,
simp s1
yields Mult Zero One
which could be
further simplified.
To cope with the first issue, we simply need to check if expressions are well-typed before simplification. To cope with the second issue, we repeatidly apply simplifications until no further simplifications are possible.
For example, simpFix s1
yields Zero
.
In Haskell, we can enforce well-typing of expressions when building ASTs. This is possible with the help of generalized algebraic data types (GADTs).
Here’s a GADT that enforces well-typing of arithmetic AST expressions.
data Exp a where
One_Exp :: Exp Int
Zero_Exp :: Exp Int
ETrue_Exp :: Exp Bool
EFalse_Exp :: Exp Bool
Plus_Exp :: Exp Int -> Exp Int -> Exp Int
Mult_Exp :: Exp Int -> Exp Int -> Exp Int
Or_Exp :: Exp Bool -> Exp Bool -> Exp Bool
For example, via constructor Plus_Exp
we build Integer
expressions composed of addition. The arguments of the constructor
demand that left and right operands must be well-typed. The result is a
well-typed Integer expressions. A similar condition is enforced via
constructor Or_Exp
for Boolean expressions.
As we can see, GADTs are a form of a parametric data type where the parameter instance may vary for each constructor. Pattern matching proceeds as before.
Based on GADTs, evaluation of well-typed (by construction) expressions is defined as follows.
evalExp :: Exp a -> a
evalExp One_Exp = 1
evalExp Zero_Exp = 1
evalExp ETrue_Exp = True
evalExp EFalse_Exp = False
evalExp (Plus_Exp e1 e2) = evalExp e1 + evalExp e2
evalExp (Mult_Exp e1 e2) = evalExp e1 * evalExp e2
evalExp (Or_Exp e1 e2) = evalExp e1 || evalExp e2
The type
states that any well-typed expression of type a
yields a
value of type a
. Our GADT covers Integer and Boolean
expressons. Hence, this statement applies to Int
and
Bool
.