Haskell: Data Types und User-Controlled Overloading

Martin Sulzmann

Overview

Today, we are creating our own data types:

Above are examples of algebraic data types Algebraic data types are build by composition via other types. In Haskell, the user can define its own set of (algebraic) data types via the data keyword.

Specify functions via pattern matching. This is very similar to why we have seen pattern matching over built-in lists and tuples.

Define overloaded functions which operate on data types. Overloading in Haskell is achieved via type classes.

We consider three extended examples (regular expressions, geometric objects and arithmetic expressions)

Example geometric objects and arithmetic expressions make use of two extensions of algebraic data types:

GADTs subsume EADTs.

Enumerations

data WeekDay = Mo
             | Tue
             | Wed
             | Thur
             | Friday
             | Sat
             | Sun

Functions operating on enumerations

printWD :: WeekDay -> String
printWD Mo = "Monday"
printWD Tue = "Tuesday"
printWD Wed = "Wednesday"
printWD Thur = "Thursday"
printWD Friday = "Friday"
printWD Sat = "Satursday"
printWD Sun = "Sunday"

eqWD:: WeekDay -> WeekDay -> Bool
eqWD Mo Mo         = True
eqWD Tue Tue       = True
eqWD Wed Wed       = True
eqWD Thur Thur     = True
eqWD Friday Friday = True
eqWD Sat Sat       = True
eqWD Sun Sun       = True
eqWD _ _           = False

Constructors with arguments

data Result = OK Int | Fail String
*Main> OK 1
OK 1
*Main> Fail "failure"
Fail "failure"
failSafeDiv :: Int -> Int -> Result
failSafeDiv n 0 = Fail "division by zero"
failSafeDiv n m = OK (n `div` m)

Parametric data types: Maybe + Either

data Maybe a = Just a | Nothing

data Either a b = Left a | Right b
type Result2 = Either Int String
failSafeDiv2 :: Int -> Int -> Result2
failSafeDiv2 n 0 = Right "division by zero"
failSafeDiv2 n m = Left (n `div` m)

Recursive data types: List

data List a = Nil | Cons a (List a)

Head, tail and map

hd :: List a -> a
hd (Cons x _) = x

tl :: List a -> List a
tl (Cons _ xs) = xs

mapL :: (a -> b) -> List a -> List b
mapL f Nil = Nil
mapL f (Cons x xs) = Cons (f x) (mapL f xs)

filterL p Nil = Nil
filterL p (Cons x xs)
   | p x       = Cons x (filterL p xs)
   | otherwise = filterL p xs

Pattern matching and case

sumL :: Num a => List a -> a
sumL Nil = 0
sumL (Cons x xs) = x + sumL xs
sumL2 :: Num a => List a -> a
sumL2 ys =
   case ys of
     Nil -> 0
     Cons x xs -> x + sumL xs

Converting to built-in lists

coerce :: List a -> [a]
coerce Nil = []
coerce (Cons x xs) = (:) x (coerce xs)

coerceF :: [a] -> List a
coerceF [] = Nil
coerceF (x:xs) = Cons x (coerceF xs)

Define map for List in terms of built-in map and lists.

mapL2 :: (a1 -> a2) -> List a1 -> List a2
mapL2 f =  coerceF . (map f) . coerce

Complete source code

data List a = Nil | Cons a (List a)

hd :: List a -> a
hd (Cons x _) = x

tl :: List a -> List a
tl (Cons _ xs) = xs

mapL :: (a -> b) -> List a -> List b
mapL f Nil = Nil
mapL f (Cons x xs) = Cons (f x) (mapL f xs)

filterL p Nil = Nil
filterL p (Cons x xs)
   | p x       = Cons x (filterL p xs)
   | otherwise = filterL p xs

sumL :: Num a => List a -> a
sumL Nil = 0
sumL (Cons x xs) = x + sumL xs

sumL2 :: Num a => List a -> a
sumL2 ys =
   case ys of
     Nil -> 0
     Cons x xs -> x + sumL xs


coerce :: List a -> [a]
coerce Nil = []
coerce (Cons x xs) = (:) x (coerce xs)

coerceF :: [a] -> List a
coerceF [] = Nil
coerceF (x:xs) = Cons x (coerceF xs)


mapL2 :: (a1 -> a2) -> List a1 -> List a2
mapL2 f =  coerceF . (map f) . coerce

type versus data versus newtype

Type synonyms

Recall the database example.

type Student = (String, Int, [Int])
type DB = [Student]

Type synonyms are introduced via type and are expanded before type checking programs.

New types

We can define “new” types via newtype.

newtype Student2 = MkStd (String, Int, [Int])
newtype DB2 = MkDB [Student]

The type Student2 is isomorphic to (String, Int, [Int]).

However, we cannot mix types Student2 and (String, Int, [Int]).

Data types

We can recast the above as follows.

data Student3 = MkStd3 (String, Int, [Int])
data DB3 = MkDB3 [Student]

Generally, we use data for types that have several cases (see List, …).

Motivation: User-controllable overloading

Define a member function to test if x ist an element of a list

member x Nil = False
member x (Cons y ys)
    | x == y    = True
    | otherwise = member x ys

testMember = member Mo (Cons Tue (Cons Mo Null))

Possible solutions

memberWD x Nil = False
memberWD x (Cons y ys)
    | eqWD x y    = True
    | otherwise = memberWD x ys

eqWD:: WeekDay -> WeekDay -> Bool
eqWD Mo Mo         = True
eqWD Tue Tue       = True
eqWD Wed Wed       = True
eqWD Thur Thur     = True
eqWD Friday Friday = True
eqWD Sat Sat       = True
eqWD Sun Sun       = True
eqWD _ _           = False
member2 eq x Nil = False
member2 eq x (Cons y ys)
    | eq x y    = True
    | otherwise = member2 eq x ys

testMember2 = member2 eqWD Mo (Cons Tue (Cons Mo Nil))

Type classes

The Eq class

Part of the Haskell Prelude (i.e. already defined).

class Eq a where
 (==) :: a -> a -> Bool

Recall

member x Nil = False
member x (Cons y ys)
    | x == y    = True
    | otherwise = member x ys

What’s the type?

*Main> :t member
member :: Eq a => a -> List a -> Bool

We can overload == for a specific type t by providing for an instance declaration of the Eq type class for t. Below we consider several sample instance declarations.

Type class instances

We provide for an instance of Eq for the type Weekday.

instance Eq WeekDay where
   (==) = eqWD

The following works now

testMember = member Mo (Cons Tue (Cons Mo Nil))

What about the following?

testMember3 = member (Cons Mo Nil) (Cons (Cons Tue Nil) Nil)

Requires an instance of Eq for List Weekday.

Here we go!

instance Eq a => Eq (List a) where
   Nil == Nil               = True
   (Cons x xs) == (Cons y ys) = x == y && xs == ys
                              --  x == y    requires Eq a
                              --  xs == ys  requires Eq (List a)
   _ == _                     = False

We can provide an instance of Eq for lists assuming we provide an instance for the element types.

Complete source code

data WeekDay = Mo
             | Tue
             | Wed
             | Thur
             | Friday
             | Sat
             | Sun


printWD :: WeekDay -> String
printWD Mo = "Monday"
printWD Tue = "Tuesday"
printWD Wed = "Wednesday"
printWD Thur = "Thursday"
printWD Friday = "Friday"
printWD Sat = "Satursday"
printWD Sun = "Sunday"

eqWD:: WeekDay -> WeekDay -> Bool
eqWD Mo Mo         = True
eqWD Tue Tue       = True
eqWD Wed Wed       = True
eqWD Thur Thur     = True
eqWD Friday Friday = True
eqWD Sat Sat       = True
eqWD Sun Sun       = True
eqWD _ _           = False

memberWD x Nil = False
memberWD x (Cons y ys)
    | eqWD x y    = True
    | otherwise = memberWD x ys

instance Eq WeekDay where
   (==) = eqWD


data List a = Nil | Cons a (List a)

member x Nil = False
member x (Cons y ys)
    | x == y    = True
    | otherwise = member x ys

testMember = member Mo (Cons Tue (Cons Mo Nil))

member2 eq x Nil = False
member2 eq x (Cons y ys)
    | eq x y    = True
    | otherwise = member2 eq x ys

testMember2 = member2 eqWD Mo (Cons Tue (Cons Mo Nil))


instance Eq a => Eq (List a) where
   Nil == Nil               = True
   (Cons x xs) == (Cons y ys) = x == y && xs == ys
                              --  x == y    requires Eq a
                              --  xs == ys  requires Eq (List a)
   _ == _                     = False


testMember3 = member (Cons Mo Nil) (Cons (Cons Tue Nil) Nil)

Further type class examples

Show type class

instance Show WeekDay where
   show Mo = "Monday"
   show Tue = "Tuesday"
   show Wed = "Wednesday"
   show Thur = "Thursday"
   show Friday = "Friday"
   show Sat = "Satursday"
   show Sun = "Sunday"

ghci always “shows” the result

*Main> Mo
Monday

Complete example

data WeekDay = Mo
             | Tue
             | Wed
             | Thur
             | Friday
             | Sat
             | Sun


instance Show WeekDay where
   show Mo = "Monday"
   show Tue = "Tuesday"
   show Wed = "Wednesday"
   show Thur = "Thursday"
   show Friday = "Friday"
   show Sat = "Satursday"
   show Sun = "Sunday"

data List a = Nil | Cons a (List a)

instance Show a => Show (List a) where
    show Nil = ""
    show (Cons x xs) = "(Cons " ++ show x ++ " " ++ show xs ++ ")"

l1 = Cons Mo (Cons Wed Nil)

run = putStrLn (show l1)

Type class features

Super classes

-- Ord class provided by the Prelude

class Eq a => Ord a where
   (<)  :: a -> a -> Bool
   (<=) :: a -> a -> Bool

For every instance of Ord there must be an instance of Eq as well.

It suffices to define (<=) as by default

  x < y  iff  x <= y && !(x == y)

Example.

data WeekDay = Mo
             | Tue
             | Wed
             | Thur
             | Friday
             | Sat
             | Sun


wdToInt :: WeekDay -> Int
wdToInt Mo =     0
wdToInt Tue =    1
wdToInt Wed =    2
wdToInt Thur =   3
wdToInt Friday = 4
wdToInt Sat =    5
wdToInt Sun =    6

instance Ord WeekDay where
   (<=) w1 w2 = (wdToInt w1) <= (wdToInt w2)


instance Eq WeekDay where
   (==) w1 w2 = (wdToInt w1) == (wdToInt w2)

Derivable instances

instance Show WeekDay where ... -- boring

instance Eq WeekDay where ... -- boring

...

Can be derived automatically

data WeekDay = Mo
             | Tue
             | Wed
             | Thur
             | Friday
             | Sat
             | Sun
             deriving (Show, Eq, Ord)

Works even for parametric data types

data List a = Nil | Cons a (List a) deriving (Show, Eq, Ord)

How type classes work

Type classes translate to dictionaries.

A dictionary holds the set of methods as described by the type class.

From type classes to dictionaries

class Eq a where
  (==) :: a -> a -> Bool

member x Nil = False
member x (Cons y ys)
    | x == y    = True
    | otherwise = member x ys

translates to

data DictEq a = DictEqVal (a->a->Bool)
unDict (DictEqVal f) = f

memberDict :: DictEq a -> a -> List a -> Bool
memberDict d x Nil = False
memberDict (DictEqVal eq) x (Cons y ys)
    | eq x y    = True
    | otherwise = memberDict (DictEqVal eq) x ys

From type classes instances to dictionary constructors

-- instance Eq Bool
dBool = DictEqVal (\x -> \y -> case (x,y) of
                                     (True,True) -> True
                                     (False,False) -> True
                                     _ -> False)

-- instance Eq t => Eq (List t)
dictEqList :: DictEq t -> DictEq (List t)
dictEqList (DictEqVal eq) =
   DictEqVal (\x -> \y ->
               case (x,y) of
                 (Nil, Nil) -> True
                 (Cons x xs, Cons y ys) ->
                            (eq x y) &&
                            (unDict (dictEqList (DictEqVal eq))) xs ys
                 _ -> False)

Insert dictionaries

exMem = member True (Cons False Nil)

exMemDict= memberDict dBool True (Cons False Nil)


exMem2 = member (Cons True Nil) (Cons (Cons True Nil) Nil)

exMemDict2 = memberDict (dictEqList dBool) (Cons True Nil) (Cons (Cons True Nil) Nil)

Further reading

More details on the dictionary-passing translation can be found in the paper Type classes in Haskell.

Example - Regular expressions

Syntax

In EBNF Syntax:


r,s ::= x | y | z | ...         Symbols aka letters taken from a finite alphabet
    |  epsilon                  Empty word
    |  phi                      Empty language
    |  r + r                    Alternatives
    |  r . r                    Sequence aka concatenation
    |  r*                       Kleene star

u,v,w ::= epsilon               empty word
      |  w . w                  concatenation

Sigma denotes the finite set of alphabet symbols.

Regular expressions are popular formalism to specify (infinitely many) patterns of input.

For example,

 (open . (read + write)* . close)*

specifies the valid access patterns of a resource usage policy.

We assume that open, read, write, close are the primitive events (symbols) which will be recorded during a program run.

Haskell representation

data R = Phi
       | Epsilon
       | Sym Char
       | Alt R R
       | Conc R R
       | Star R deriving Eq

instance Show R where
  show Phi = "PHI"
  show Epsilon = ""
  show (Sym x) = [x]
  show (Alt r s) = "(" ++ show r ++  "+" ++ show s ++ ")"
  show (Conc r s) = "(" ++ show r ++  "." ++ show s ++ ")"
  show (Star s) = "(" ++ show s ++ ")" ++ "*"

Adding further operators

Option

opt :: R -> R
opt r = Alt r Epsilon

One or more iterations

kleenePlus :: R -> R
kleenePlus r = Conc r (Star r)

Simplifying regular expressions

We we wish to apply the following laws:

  1. epsilon . r = r

  2. r + r = r

simp :: R -> R
simp r = let s = simp2 r
         in if s == r then r
            else simp s
     where
        simp2 (Conc Epsilon r) = simp2 r
        simp2 (Conc r s) = Conc (simp2 r) (simp2 s)
        simp2 (Alt r s)
           | r == s    = simp2 r
           | otherwise = Alt (simp2 r) (simp2 s)
        simp2 (Star r) = Star $ simp2 r
        simp2 r = r

We exhaustively apply the two laws

We use a fixpoint construction because application of one law may enable the other law

Membership test

A bit of theory

L(r) denotes the set of words represented by the regular expression r.

Standard (denotational) formulation of L(r):

L(x) = { x }

L(epsilon) = { epsilon }

L(phi) = { }

L(r + s) = { w | w in L(r) or w in L(s) }

L(r . s) = { v . w | v in L(r) and w in L(s) }

L(r*) = { epsilon } cap { w_1 . ... . w_n | w_1, ..., w_n in L(r) and n >=1 }

Membership Test via Derivatives

The classical approach is to turn the regular expression into an automata (for example via the Thompson NFA construction). Here, we consider an alternative method based on Brzozowski https://en.wikipedia.org/wiki/Brzozowski_derivative. He introduced a symbolic method to construct an automata from a regular expression based on the concept of derivatives.

Given some expression r and a symbol x, we obtain the derivative of r w.r.t. x, written d(r,x), by taking way the leading symbol x from r.

In semantic terms, d(r,x) can be described as follows:

L(d(r,x)) = x \ L(r)

Thus, it is easy to solve the word problem. Let w be a word consisting of symbols x1 . x2 .... xn-1 . xn.

Compute

d(r,x1) = r1
d(r1,x2) = r2
...
d(rn-1,xn) = rn

That is, we repeatidely build the derivative of r w.r.t symbols xi.

Check if the final expression rn is nullable. An expression s is nullable if epsilon in L(s).

Formalization

We write n(r) to denote the nullability test which yields a Boolean value (true/false).

n(x) where xi is a symbol never holds.

n(epsilon) always holds.

n(phi) never holds.

n(r + s) holds if n(r) holds or n(s) holds.

n(r . s) holds if n(r) holds and n(s) holds.

n(r*) always holds.

A similar approach (definition by structural recursion) works for the derivative operation.

We write d(r,x) to denote the derivative obtained by extracting the leading symbol x from expression r. For each derivative, we wish that the following holds: L(d(r,x)) = x \ L(r).

As in case of the nullability test, the derivative operation is defined by observing the structure of regular expression patterns. Instead of a Boolean value, we yield another expression (the derivative).

d(x,y) =   either epsilon if x == y or phi otherwise

d(epsilon,y) = phi

d(phi,y)     = phi

d(r + s, y)  = d(r,y) + d(s,y)

d(r . s, y)  =  if n(r)
                then d(r,y) . s +  d(s,y)
                else d(r,y) . s

d(r*, y)     = d(r,y) . r*

Let’s consider some examples to understand the workings of the derivative and nullability function.

We write r -x-> d(r,x) to denote application of the derivative on some expression r with respect to symbol x.

       x*
-x->   d(x*,x)
       = d(x,x) . x*
       = epsilon . x*

-x->   epsilon . x*
       = d(epsilon,x) . x* + d(x*,x)     -- n(epsilon) yields true
       = phi . x* + epsilon . x*

So repeated applicaton of the derivative on x*$ for input string "x.x" yieldsphi . x* + epsilon . x*`. Let’s carry out the nullability function on the final expression.

    n(phi . x* + epsilon . x*)
    = n(phi .x*) or n(epsilon . x*)
    = (n(phi) and n(x*)) or (n(epsilon) and n(x*))
    = (false and true) or (true and true)
    = false or true
    = true

The final expression phi . x* + epsilon . x* is nullable. Hence, we can argue that expression x* matches the input word “x.x”.

Implementation

nullable :: R -> Bool
nullable Phi = False
nullable Epsilon = True
nullable Sym{} = False
-- spelled out as the following.
-- nullable (Sym _) = False
nullable (Alt r s) = nullable r || nullable s
nullable (Conc r s) = nullable r && nullable s
nullable Star{} = True

deriv :: Char -> R -> R
deriv _ Epsilon = Phi
deriv _ Phi = Phi
deriv x (Sym y)
  | x == y    = Epsilon
  | otherwise = Phi
deriv x (Alt r s) = Alt (deriv x r) (deriv x s)
deriv x (Conc r s)
  | nullable r = Alt (Conc (deriv x r) s) (deriv x s)
  | otherwise =  Conc (deriv x r) s
deriv x (Star r) = Conc (deriv x r) (Star r)

membership :: String -> R -> Bool
membership xs r =
          go r xs
             where
               go r [] = nullable r
               go r (x:xs) = go (deriv x r) xs

Pretty much direct implementation of the theory.

Complete source code

{-

EBNF Syntax of regular expressions and words

 R ::= 'x' | 'y' | ...              alphabet symbols
     | epsilon                      represents the empty string
     | phi                          represents the empty language
     | R + R                        alternatives
     | R . R                        concatenation
     | R*                           Kleene star
     | (R)                          Grouping via parentheses


 W ::= epsilon | 'x' | 'y' | ...
   | W W

-}

-- Data type to represent regular expressions.
data R = Phi
       | Epsilon
       | Sym Char
       | Alt R R
       | Conc R R
       | Star R deriving Eq  -- automatically derive equality among regular expressions

{-
-- The following instance can be derived automatically.
instance Eq R where
   (==) Phi Phi = True
   (==) Epsilon Epsilon = True
   (==) (Sym x) (Sym y) = x == y
   (==) (Alt r1 r2) (Alt s1 s2) = r1 == s1 && r2 == s2
   (==) (Conc r1 r2) (Conc s1 s2) = r1 == s1 && r2 == s2
   (==) (Star r) (Star s) = r == s
-}

-- Show instance for regular expressions
instance Show R where
  show Phi = "PHI"
  show Epsilon = ""
  show (Sym x) = [x]
  show (Alt r s) = "(" ++ show r ++  "+" ++ show s ++ ")"
  show (Conc r s) = "(" ++ show r ++  "." ++ show s ++ ")"
  show (Star s) = "(" ++ show s ++ ")" ++ "*"


-- some regular expressions plus short-hands
x = Sym 'x'
y = Sym 'y'

r1 = Star x
r2 = Star $ Alt x y
r3 = Conc y r2
r4 = Alt (Conc Epsilon r1) r1

-- Epsilon or r.
opt :: R -> R
opt r = Alt r Epsilon

-- One or more iterations.
kleenePlus :: R -> R
kleenePlus r = Conc r (Star r)


-----------------------------------------------------------------------
-- Writing functions that operate on data types via pattern matching.


-- Yield True if epsilon is part of the language, otherwise, we find False.
nullable :: R -> Bool
nullable Phi = False
nullable Epsilon = True
nullable Sym{} = False
-- spelled out as the following.
-- nullable (Sym _) = False
nullable (Alt r s) = nullable r || nullable s
nullable (Conc r s) = nullable r && nullable s
nullable Star{} = True

-- Simplifying expressions:
-- epsilon . R = R
-- R + R = R
--
-- We build a "fixpoint"!
-- We apply helper simp2 until no further simplifications can be applied.
-- Why is this necessary? Hint. Consider example r4.
simp :: R -> R
simp r = let s = simp2 r
         in if s == r then r
            else simp s
     where
        simp2 (Conc Epsilon r) = simp2 r
        simp2 (Conc r s) = Conc (simp2 r) (simp2 s)
        simp2 (Alt r s)
           | r == s    = simp2 r
           | otherwise = Alt (simp2 r) (simp2 s)
        simp2 (Star r) = Star $ simp2 r
        simp2 r = r


------------------
-- https://en.wikipedia.org/wiki/Brzozowski_derivative


deriv :: Char -> R -> R
deriv _ Epsilon = Phi
deriv _ Phi = Phi
deriv x (Sym y)
  | x == y    = Epsilon
  | otherwise = Phi
deriv x (Alt r s) = Alt (deriv x r) (deriv x s)
deriv x (Conc r s)
  | nullable r = Alt (Conc (deriv x r) s) (deriv x s)
  | otherwise =  Conc (deriv x r) s
deriv x (Star r) = Conc (deriv x r) (Star r)


membership :: String -> R -> Bool
membership xs r =
          go r xs
             where
               go r [] = nullable r
               go r (x:xs) = go (deriv x r) xs


xs = Star (Sym 'x')

test0 = membership "xxxx" xs

aOrBs = Star (Alt (Sym 'a') (Sym 'b'))


test1 = membership "abba" aOrBs

test2 = membership "abbca" aOrBs

Example - Geometric objects

This example makes use of

and introduces an extension of algebraic data types known as existential data types.

We consider geometric objects such as squares and rectangles and how to represent these objects and some associated operations with the help of data types and type classes.

Here are some straightforward data type representations of squares and rectangles.

data Square = Square Int

data Rectangle = Rec { len :: Int, width :: Int }

The definition of Rectangle makes use of a data type feature where we can use labels to refer to individual components. The advantage is that we can use labels to unambiguously distinguish among components that are of the same type.

For each geometric object we wish to compute its area and provide for a scaling function. We represent this functionality via a type class.

class Show a => GeoShape a where
  area :: a -> Int
  scale :: a -> Int -> a

The above class declaration makes use of super classes. Every instance of GeoShape must also provide for an instance of Show.

Here are the instances.

instance GeoShape Square where
  area (Square x)    = x * x
  scale (Square x) s = Square (x * s)

instance GeoShape Rectangle where
  area r    = len r * width r
  scale r s = Rec { len = len r * s, width = width r * s }

instance Show Square where
  show (Square x) = "square(" ++ show x ++ ")"

instance Show Rectangle where
  show  (Rec {len = l, width = w }) = "rectangle(" ++ show l ++ "," ++ show w ++ ")"

Some examples that make use of the above functionality.

r = Rec { width = 2, len = 3 }

s = Square 4

r2 = scale r 3

ex1 = do
      let a = area r + area s + area r2
      putStrLn (unlines ["Sum of the area of",
                         show r,
                         show s,
                         show r2,
                         "equals",
                         show a])

It would be convenient if we could store geometric objects in a list for further processing.

shapes = [r, s, r2]

The above will not type check! Lists in Haskell are homogenous. That is, all elements must be of the same type.

Abstract representation for geometric objects

What we can do is to define a new data type that represents an abstract representation of a geometric object.

data Shape = forall a. GeoShape a => MkShape a

The above data type declaration makes use of existential data types. Why existential? There seems to be a universal (for all) quantor in the above definition? Yes, we universally quantify over a but this type is not part of the return type. Hence, the following logical equivalence

forall a. (I implies J)        =     (exists a. I) implies J

where J does not refer to a

which justifies the name existential data type.

In more detail, the constructor MkShape has the following type.

MkShape :: GeoShape a => a -> Shape

The argument of the constructor is any value of type a that is an instance of the GeoShape type class.

For example, we find that

s1 :: Shape
s1 = MkShape r

s2 :: Shape
s2 = MkShape s

By embedding square and rectangle into the new type Shape, we can store both in a list.

shapes = [s1, s2, MkShape r2]

What can do with elements of this list? We can apply all the operations that are available for geometric objects by lifting them to Shape.

instance Show Shape where
    show (MkShape x)    = show x

instance GeoShape Shape where
    area (MkShape x)    = area x
    scale (MkShape x) s =  MkShape (scale x s)

Here is an example.

ex2 = putStrLn (unlines (["Sum of the area of"] ++
                         map show shapes ++
                         [show (sum (map area shapes))]))

Complete source code

{-# LANGUAGE ExistentialQuantification #-}


data Square = Square Int

data Rectangle = Rec { len :: Int, width :: Int }


class Show a => GeoShape a where
  area :: a -> Int
  scale :: a -> Int -> a


instance GeoShape Square where
  area (Square x)    = x * x
  scale (Square x) s = Square (x * s)

instance GeoShape Rectangle where
  area r    = len r * width r
  scale r s = Rec { len = len r * s, width = width r * s }


instance Show Square where
  show (Square x) = "square(" ++ show x ++ ")"

instance Show Rectangle where
  show  (Rec {len = l, width = w }) = "rectangle(" ++ show l ++ "," ++ show w ++ ")"



r = Rec { width = 2, len = 3 }

s = Square 4

r2 = scale r 3


ex1 = do
      let a = area r + area s + area r2
      putStrLn (unlines ["Sum of the area of",
                         show r,
                         show s,
                         show r2,
                         "equals",
                         show a])

-- Does not type check
-- shapes = [r, s, r2]


-- In Haskell, constructors and data types do NOT share the same name space.
-- Therefore, it would be ligit to use the constructor name Shape instead of MkShape.
data Shape = forall a. GeoShape a => MkShape a


instance Show Shape where
    show (MkShape x)    = show x

instance GeoShape Shape where
    area (MkShape x)    = area x
    scale (MkShape x) s =  MkShape (scale x s)


s1 :: Shape
s1 = MkShape r

s2 :: Shape
s2 = MkShape s

shapes = [s1, s2, MkShape r2]


ex2 = putStrLn (unlines (["Sum of the area of"] ++
                         map show shapes ++
                         [show (sum (map area shapes))]))

Example - Arithmetic expressions

This example makes use of

and introduces an extension of algebraic data types known as generalized algebraic data types (GADTs).

Overview

We consider a simple language of arithmetic expressions consisting of Integer and Boolean expressions.

We wish to represent this language in terms of a Haskell data type, so that we can exploit pattern matching to write functions that operate on this data type.

Specifically, we wish to write an evaluator (interpreter) and a type checker.

Syntax

We first need to specify the syntax of our language of arithmetic expressions. From theoretical computer science, we recall context-free grammars.

Some context-free grammar rules to describe arithmetic and boolean expressions. For brevity, we only consider natural numbers 1 and 0.

E -> 1
E -> 0
E -> True
E -> False
E -> E + E
E -> E * E
E -> E || E
E -> (E)

We refer to E as a non-terminal symbol. Terminal symbols are 1, 0, etc.

The grammar is ambiguous as for 1 * 0 + 1 we find the following two derivations.

(1)

   E
-> E * E
-> 1 * E
-> 1 * E + E
-> 1 * 0 + E
-> 1 * 0 + 1

(2)

   E
-> E + E
-> E + 1
-> E * E + 1
-> E * 0 + 1
-> 1 * 0 + 1

For each derivation, we also give its parse tree.

(1)

             E
            /|\
           E * E
           |  /|\
           1 E + E
             |   |
             0   1

(2)

             E
            /|\
           E + E
          /|\  |
         E * E 1
         |   |
         1   0

As we can see parse trees are different. Hence, the above grammar is ambiguous.

The common rule to remove the above ambiguity is to assume that * binds tigher than +. This is important because we need to ensure that the meaning of 1 * 0 + 1 is unambiguous.

Haskell data type for expressions (syntax)

We seek for a representation of arithemtic expressions. As data types are perfect to represent tree-like structures such as parse trees, this is an easy task in Haskell. Instead of parse trees, we choose a more compact representation commonly referred to as the abstract syntax tree (AST). In the AST, there is no need to represent parantheses as the precedence among expressions is guaranteed by the tree structure.

Here is a Haskell data type representing the AST for arithmetic expressions.

data E = One | Zero | ETrue | EFalse
       | Plus E E | Mult E E
       | Or E E

Here is the Haskell AST representation for 1 * 0 + 1

ex = Plus (Mult One Zero) One

What about the following?

ex1 = Mult Zero (Plus Zero One)

The AST ex1 represents 1 * (0 + 1) whereas ex represents 1 * 0 + 1. Recall that in the source syntax we assume that * binds tighter than +. Hence, 1 * 0 + 1 corresponds to (1 * 0) + 1.

Pretty printing

We consider pretty printing of ASTs. Each AST shall be displayed in terms of the original source language which is a string.

We provide for a simple pretty printer via the following instance of type class Show.

instance Show E where
    show One = "1"
    show Zero = "0"
    show ETrue = "true"
    show EFalse = "false"
    show (Plus e1 e2) = "(" ++ show e1 ++ " + " ++ show e2 ++ ")"
    show (Mult e1 e2) = "(" ++ show e1 ++ " * " ++ show e2 ++ ")"
    show (Or e1 e2) = "(" ++ show e1 ++ " || " ++ show e2 ++ ")"

We find that

show ex   ==>   "((1 * 0) + 1)"

show ex1  ==>   "(1 * (0 + 1))"

Pretty-printing could be improved by removing as many parantheses as possible. We cannot remove all parantheses as otherwise the meaning of the expression changes in some unexpected way. Consider ex1 where the innermost parantheses are necessary.

Evaluator (interpreter)

We implement an evaluator (interpreter) for arithmetic expressions. There are a few things we need to take care of.

We assume that Integer and Boolean expressions cannot be mixed. For example, when trying to evaluate

-- 1 + true
ex2 = Plus One ETrue

we get stuck. That means, evaluation may return a result or nothing. This can be represented via the Maybe data type.

The result of evaluation may either be an Integer or Boolean value. This can be represented via the Either data type.

The evaluator traverses over the structure of the AST. We evaluate operands (from left to right) and then combine the result. This can elegantly expressed via pattern matching.

eval :: E -> Maybe (Either Int Bool)
eval One = Just (Left 1)
eval Zero = Just (Left 0)
eval ETrue = Just (Right True)
eval EFalse = Just (Right False)
eval (Plus e1 e2) =
     let r1 = eval e1
         r2 = eval e2
     in case (r1, r2) of
         (Just (Left i1), Just (Left i2)) -> Just (Left (i1 + i2))
         (_,_)  -> Nothing
eval (Mult e1 e2) =
     let r1 = eval e1
         r2 = eval e2
     in case (r1, r2) of
         (Just (Left i1), Just (Left i2)) -> Just (Left (i1 * i2))
         (_,_)  -> Nothing
eval (Or e1 e2) =
     case (eval e1) of
        Nothing -> Nothing
        (Just Left{}) -> Nothing
        (Just (Right True)) -> Just (Right True)
        _                   -> case (eval e2) of
                                 Nothing -> Nothing
                                 (Just Left{}) -> Nothing
                                 (Just (Right True)) -> Just (Right True)
                                 _                   -> Just (Right False)

In case of Or, we apply short-circuit evaluation. If the left operand evaluates to true we immediately return true. Otherwise, we might need to evaluate the left operand as well.

Take note. We first check for Nothing and then (Just Left{}).

Consider

-- 1 + true
ex2 = Plus One ETrue

-- false || true
ex3 = Or EFalse ETrue

-- false || 1
ex4 = Or EFalse One

-- true || 1
ex5 = Or ETrue One

where we find that

eval ex2      ==>   Nothing

eval ex3      ==>   Just (Right True)

eval ex4      ==>   Just (Right False)

eval ex5      ==>   Just (Right True)

Static type-checker

Next, we consider the task of type checking. The purpose of type checking is to ensure that operands and operations are compatible.

An arithmetic expression is well-typed if the operands of Integer addition/multiplication are only Integer expressions and the operands of Boolean disjunction are only Boolean expressions. Unlike the above evaluation, there is no need to actually run (execute) expressions to decide if an expression is well-typed. Hence, we say that type checking is done statically.

The type checking rules follow the structure of the evaluation rules. The crucial difference is that we abstract away from any concrete values and only distinguish between an ill-typed expression, an expression of Integer type and an expression of Boolean type.

data Type = TInt | TBool deriving Show

typecheck :: E -> Maybe Type
typecheck Zero = Just TInt
typecheck One = Just TInt
typecheck ETrue = Just TBool
typecheck EFalse = Just TBool
typecheck (Plus e1 e2) =
     case (typecheck e1, typecheck e2) of
       (Just TInt, Just TInt) -> Just TInt
       (_, _) -> Nothing
typecheck (Mult e1 e2) =
     case (typecheck e1, typecheck e2) of
       (Just TInt, Just TInt) -> Just TInt
       (_, _) -> Nothing
typecheck (Or e1 e2) =
     case (typecheck e1, typecheck e2) of
       (Just TBool, Just TBool) -> Just TBool
       (_, _) -> Nothing

For the above examples, we find that

typecheck ex2        ==>     Nothing

typecheck ex3        ==>     Just TBool

typecheck ex4        ==>     Nothing

typecheck ex5        ==>     Nothing

Notice that evaluation of ex4 succeeds but type checking reports that ex4 is ill-typed.

Evaluation of well-typed expressions

Our current evaluator has to perform some amount of type checking at run-time. For example, consider the following case.

eval (Mult e1 e2) =
     let r1 = eval e1
         r2 = eval e2
     in case (r1, r2) of
         (Just (Left i1), Just (Left i2)) -> Just (Left (i1 * i2))
         (_,_)  -> Nothing

We evaluate the left and right operands and via the pattern match “Just (Left …)” we effectively check that operands yield an Integer value.

For well-typed expressions, it is possible to optimize our evaluator. There is no need to cover cases that involve “Just” and “Nothing”.

evalT :: E -> Either Int Bool
evalT One = Left 1
evalT Zero = Left 0
evalT ETrue = Right True
evalT EFalse = Right False
evalT (Plus e1 e2) =
    case (evalT e1, evalT e2) of
       (Left i1, Left i2) -> Left (i1 + i2)
evalT (Mult e1 e2) =
    case (evalT e1, evalT e2) of
       (Left i1, Left i2) -> Left (i1 * i2)
evalT (Or e1 e2) =
     case (evalT e1) of
       Right True -> Right True
       _          -> evalT e2

Of course, we shall apply evalT only on well-typed expressions. Hence, before calling evalT we need to call typecheck first.

Simplification

We consider the task of simplifying expressions based on some algebraic laws. For example, we wish to enforce that law that “0 * x = 0”. This law can be enforced statically (and therefore might improve the run-time performance when evaluationg expressions).

Consider

simp :: E -> E
simp One = One
simp Zero = Zero
simp ETrue = ETrue
simp EFalse = EFalse
simp (Plus e1 e2) = Plus (simp e1) (simp e2)
simp (Mult Zero _) = Zero
simp (Mult e1 e2) = Mult (simp e1) (simp e2)
simp (Or e1 e2) = Or e1 e2

What do we observe when simplifying the following expressions?

-- (0 * 1) * 1
s1 = Mult (Mult Zero One) One

-- 0 * (0 * False)
s2 = Mult Zero (Or Zero EFalse)

There are two issues.

The first issue is that s2 is ill-typed but the expression resulting from simp s2 is well-typed. The second issue is that simplification is not exhaustive. For example, simp s1 yields Mult Zero One which could be further simplified.

To cope with the first issue, we simply need to check if expressions are well-typed before simplification. To cope with the second issue, we repeatidly apply simplifications until no further simplifications are possible.

simpFix :: E -> E
simpFix e = let e2 = simp e
            in if e2 == e then e
               else simpFix e2

For example, simpFix s1 yields Zero.

Enforcing well-typed by construction via GADTs

In Haskell, we can enforce well-typing of expressions when building ASTs. This is possible with the help of generalized algebraic data types (GADTs).

Here’s a GADT that enforces well-typing of arithmetic AST expressions.

data Exp a where
   One_Exp :: Exp Int
   Zero_Exp :: Exp Int
   ETrue_Exp :: Exp Bool
   EFalse_Exp :: Exp Bool
   Plus_Exp :: Exp Int -> Exp Int -> Exp Int
   Mult_Exp :: Exp Int -> Exp Int -> Exp Int
   Or_Exp :: Exp Bool -> Exp Bool -> Exp Bool

For example, via constructor Plus_Exp we build Integer expressions composed of addition. The arguments of the constructor demand that left and right operands must be well-typed. The result is a well-typed Integer expressions. A similar condition is enforced via constructor Or_Exp for Boolean expressions.

As we can see, GADTs are a form of a parametric data type where the parameter instance may vary for each constructor. Pattern matching proceeds as before.

Based on GADTs, evaluation of well-typed (by construction) expressions is defined as follows.

evalExp :: Exp a -> a
evalExp One_Exp = 1
evalExp Zero_Exp = 1
evalExp ETrue_Exp = True
evalExp EFalse_Exp = False
evalExp (Plus_Exp e1 e2) = evalExp e1 + evalExp e2
evalExp (Mult_Exp e1 e2) = evalExp e1 * evalExp e2
evalExp (Or_Exp e1 e2) = evalExp e1 || evalExp e2

The type

evalExp :: Exp a -> a

states that any well-typed expression of type a yields a value of type a. Our GADT covers Integer and Boolean expressons. Hence, this statement applies to Int and Bool.