Martin Sulzmann
Types to separate pure from impure (side effecting) computations
(Imperative) Programming with side effects in Haskell
do
notation
Mutable state, File I/O, …
Monads for
representing computations/side effects
composition (sequencing) of computations
capturing side effects
Roll your own monad: The parser monad
What does the following type tell us?
Not much. All of the below implementations match the above function type
int foo(int x) {
// Besides returning the incremented value,
// other things that could happen during execution:
// (a) read file
// (b) fire missile
// ...
static int i = 0;
i++;
if (i >= 5) {
exit();
}
return x+1;
}
Pure = Functions behavior solely defined by its inputs
A pure function will always compute the same result for the same set of inputs (no side effects)
In Haskell we capture side-effects (mutable state, i/o,…) via monads
What is a monad?
Short and complicated answer.
A monad is a mathematical strucure known from category theory. See Kleisli triple, …
Short and simple answer. Monads allow for the systematic control of side effects where
Computations and their results are described via types, and
side-effecting computations are build via functional composition of primitive monad functions.
Thus, we can hide details such as (hidden) state etc.
example4 :: String -> IO String
example4 x = do let y = x ++ "Hallo"
z <- getLine
print (y++z)
return y
or written in the alternative notation
example4b :: String -> IO String
example4b x = do { let y = x ++ "Hallo"
; z <- getLine
; print (y++z)
; return y }
The state (side-effect) is implicit but captured via the type
IO
is a monad
The result type IO String
tells us that the result
is a string but comes with a side-effect
The type IO String
represents a monadic
computation
The access the result of a monadic computation we need to use the
“left-arrow” <-
Above is syntactic sugar for the following.
example4c :: String -> IO String
example4c x =
let y = x ++ "Hallo"
in getLine
>>=
(\z -> print (y++z)
>>=
(\_ -> return z))
The bind operator >>=
composes two state
transforming functions.
Operations >>=
and return
can be
overloaded to deal with different kind of side effects.
Applied to the above example.
example4c :: String -> IO String
example4c x =
let y = x ++ "Hallo"
in getLine -- getLine :: IO String
>>= -- >>= :: IO String -> (String -> IO String) -> IO String
(\z -> print (y++z) -- z :: String and print (y++z) :: IO ()
>>= -- >>= :: IO () -> (() -> IO String) -> IO String
(\_ -> return z)) -- return z :: IO String
Consider the pure function
The following will not type check.
addSomeTxt
expects a String
example4
yields IO String
The following works.
test :: String -> IO String
test x = do z <- example4 x
let y = addSomeTxt z
return y -- embeds the "pure" result inside an impure computation
Types to distinguish pure and impure computations
Impure computations are written in command-style notation
Impure computations are captured via a monadic type
For example, the type IO String
captures some
IO
computation that yields a `String.
We refer to IO
as a monad
Command-style programs are expressed via functional composition of monadic computations
We can embed pure into impure computations (but not the other way around)
C example.
Haskell supports imperative style programming. Here’s a simple example.
inc :: IORef Int -> IO Int
inc x = do
v <- readIORef x
writeIORef x (v+1)
return (v+1)
run :: IO ()
run = do r <- newIORef 1
x <- inc r
print x
y <- readIORef r
print y
We find that
*Main> run
2
2
Module Data.IORef
supports the following (monadic)
interface.
Monad = “the thing that Haskell uses to represent/capture side effects”
IORef a
is mutable reference to a value of type
a
newIORef
creates a new references with some initial
value.
readIORef
dereferences.
writeIORef
overwrites the value stored in a
reference.
All three operations have a side effect.
Hence, the result type of all three operation is of the form
IO t
.
Imperative programming in Haskell feels like Assembler!
Looks much nicer in C (because references are directly supported by the language).
By overloading operators and some other methods, Haskell can provide some “fancy” interface to make look imperative programming nicer.
We consider processing of a file of comma separated values (CSV). For marking purposes, some lecturer maintains for each exam a CSV file of the following form.
Each row is of the form
StudentID , Mark1, ..., MarkN
where StudentID, Mark1, …, MarkN are natural numbers.
Here’s a sample file.
145454, 12 , 14, 0 , 5
5234, 4, 4, 8, 3
We wish to process each file to compute the sum of marks for each student. For the above input, we obtain the following output.
145454 , 31
5234 , 19
Here’s a possible solution to this problem.
sumCSVFile :: FilePath -> IO ()
sumCSVFile fileName = do
content <- readFile fileName -- (1)
let rows = lines content
let nrows = [ process row | row <- rows ]
where
process row = let (studentID:ws) = splitOn "," row
s = sum (map (\w -> (read w) :: Int) ws)
in studentID ++ " , " ++ (show s)
let outFile = (takeBaseName fileName) ++ "-sum.csv"
writeFile outFile (unlines nrows) -- (2)
What’s interesting is the use of file primitives (with side effects) and some string processing functions (no side effects). The entire computations run in “IO” because of file access. See locations marked (1) and (2). The remaining functionality performs some (pure) string computations.
Monadic interface of file primitives (with side effects):
Pure string processing functions (no side effects):
-- Prelude functions
lines :: String -> [String]
unlines :: [String] -> String
read :: Read a => String -> a
show :: Show a => a -> String
-- Some library functions
splitOn :: Eq a => [a] -> [a] -> [[a]]
takeBaseName :: FilePath -> String
import Data.IORef
import System.FilePath.Posix(takeBaseName)
import Data.List.Split (splitOn)
-- Mutable state
inc :: IORef Int -> IO Int
inc x = do
v <- readIORef x
writeIORef x (v+1)
return (v+1)
run :: IO ()
run = do r <- newIORef 1
x <- inc r
print x
y <- readIORef r
print y
-- Input is a file of comma separated values (CSV).
-- Each row is of the following form:
-- "StudentID , Mark1, ..., MarkN"
-- where StudentID, Mark1, ..., MarkN are natural numbers
-- Output is a csv file that contains for each student the sum of her marks.
sumCSVFile :: FilePath -> IO ()
sumCSVFile fileName = do
content <- readFile fileName -- (1)
let rows = lines content
let nrows = [ process row | row <- rows ]
where
process row = let (studentID:ws) = splitOn "," row
s = sum (map (\w -> (read w) :: Int) ws)
in studentID ++ " , " ++ (show s)
let outFile = (takeBaseName fileName) ++ "-sum.csv"
writeFile outFile (unlines nrows) -- (2)
We introduce some “fancy” methods to make look imperative programming nicer (and closer to the style found in C).
inc :: IORef Int -> IO Int
inc x = do
v <- (!) x
x .=. v + 1
return (v+1)
inc2 :: IORef Int -> IO Int
inc2 x = do
x .=. x .+. (1 :: Int)
v <- (!) x
return v
Haskell is a great language to embed other languages
“Fancy” IO relies on type classes with multiple parameters and functional dependencies.
-- Some wrappers to access newIORef etc.
new :: a -> IO (IORef a)
new n = newIORef n
(!) :: IORef a -> IO a
(!) ref = readIORef ref
-- Assignment is mapped to writeIORef.
-- We use type class overloading to support
-- pure and impure right-hand sides.
class Assign a b where
(.=.) :: a -> b -> IO ()
instance Assign (IORef a) a where
(.=.) ref v = writeIORef ref v
instance Assign (IORef a) (IO a) where
(.=.) ref x = do v <- x
writeIORef ref v
-- Addition and multiplication.
-- We use type class overloading to support
-- pure and impure operands.
class Plus a b c | a b -> c where
(.+.) :: a -> b -> IO c
instance Plus (IORef Int) Int Int where
(.+.) r x = do v <- readIORef r
return (v + x)
instance Plus Int Int Int where
(.+.) x y = return (x + y)
class Mult a b c | a b -> c where
(.*.) :: a -> b -> IO c
instance Mult (IORef Int) Int Int where
(.*.) r x = do v <- readIORef r
return (v * x)
instance Mult Int Int Int where
(.*.) x y = return (x * y)
-- We yet need to cover references as right operands etc.
We can also emulate while loops.
while :: IO Bool -> IO () -> IO ()
while cond body = do
c <- cond
if c
then do body
while cond body
else return ()
The following computes the factorial of positive numbers.
fac :: Int -> IO Int
fac n = do
acc <- new n
p <- new n
while (do i <- (!) p
return (i > 1))
(do i <- (!) p
p .=. i - 1
acc .=. acc .*. (i - 1))
res <- (!) acc
return res
{-# LANGUAGE MultiParamTypeClasses, FunctionalDependencies, FlexibleInstances, FlexibleContexts, UndecidableInstances #-}
import Data.IORef
-- Faking imperative programming in Haskell
-- Give this operator a lower precedence than the arithmetic operators
-- so we can avoid some parentheses.
infix 0 .=.
infix 1 .+.
infix 2 .*.
-- Some wrappers to access newIORef etc.
new :: a -> IO (IORef a)
new n = newIORef n
(!) :: IORef a -> IO a
(!) ref = readIORef ref
-- Assignment is mapped to writeIORef.
-- We use type class overloading to support
-- pure and impure right-hand sides.
class Assign a b where
(.=.) :: a -> b -> IO ()
instance Assign (IORef a) a where
(.=.) ref v = writeIORef ref v
instance Assign (IORef a) (IO a) where
(.=.) ref x = do v <- x
writeIORef ref v
-- Addition and multiplication.
-- We use type class overloading to support
-- pure and impure operands.
class Plus a b c | a b -> c where
(.+.) :: a -> b -> IO c
instance Plus (IORef Int) Int Int where
(.+.) r x = do v <- readIORef r
return (v + x)
instance Plus Int Int Int where
(.+.) x y = return (x + y)
class Mult a b c | a b -> c where
(.*.) :: a -> b -> IO c
instance Mult (IORef Int) Int Int where
(.*.) r x = do v <- readIORef r
return (v * x)
instance Mult Int Int Int where
(.*.) x y = return (x * y)
-- We yet need to cover references as right operands etc.
-- Emulating while loops.
-- We rely here on lazy evaluation!
-- Expressions cond and body will on be evaluated when needed.
while :: IO Bool -> IO () -> IO ()
while cond body = do
c <- cond
if c
then do body
while cond body
else return ()
-- Mutable state
inc :: IORef Int -> IO Int
inc x = do
v <- (!) x
x .=. v + 1
return (v+1)
inc2 :: IORef Int -> IO Int
inc2 x = do
x .=. x .+. (1 :: Int) -- Annotation is necessary.
v <- (!) x -- Otherwise, type is too general.
return v
run :: IO ()
run = do r <- newIORef 1
x <- inc r
print x
y <- readIORef r
print y
facPure :: Int -> Int
facPure 1 = 1
facPure n = n * facPure (n-1)
fac :: Int -> IO Int
fac n = do
acc <- new n
p <- new n
while (do i <- (!) p
return (i > 1))
(do i <- (!) p
p .=. i - 1
acc .=. acc .*. (i - 1))
res <- (!) acc
return res
We consider the Maybe monad
-- (x/y)/z
example x y z = case (divSafe x y) of
Just res -> case (divSafe res z) of
Just res2 -> "Just " ++ show res2
Nothing -> "Nothing"
This looks ugly!
Checking the result “manually” is rather clumsy!
exampleM x y z = do res <- divSafe x y
res2 <- divSafe res z
return res2
-- "do" Syntax desugared in a sequence of "bind" (>>=) operations
-- In our case,
-- (>>=) :: Maybe a -> (a -> Maybe b) -> Maybe b
exampleM2 x y z = (divSafe x y)
>>=
(\res -> divSafe res z
>>=
(\res -> return res))
import Control.Applicative
import Control.Monad
-- Function with side-effect.
-- We raise an exception if we attempt to divide by zero.
divInt :: Int -> Int -> Int
divInt _ 0 = error "can't divide by zero"
divInt n m = n `div` m
divSafe :: Int -> Int -> Maybe Int
divSafe _ 0 = Nothing
divSafe n m = Just $ n `div` m
-- Maybe is a monad, can use fail/return instead of Nothing/Just.
divSafe2 :: Int -> Int -> Maybe Int
divSafe2 _ 0 = fail ""
divSafe2 n m = return $ n `div` m
example x y z = case (divSafe x y) of
Just res -> case (divSafe res z) of
Just res2 -> "Just " ++ show res2
Nothing -> "Nothing"
exampleM x y z = do res <- divSafe x y
res2 <- divSafe res z
return res2
-- "do" Syntax desugared in a sequence of "bind" (>>=) operations
-- In our case,
-- (>>=) :: Maybe a -> (a -> Maybe b) -> Maybe b
exampleM2 x y z = (divSafe x y)
>>=
(\res -> divSafe res z
>>=
(\res -> return res))
runMaybe comp = case comp of
Just x -> "The result is: " ++ show x
Nothing -> "Failure"
-- Defining our own simple 'maybe' monad
data Res a = Something a | Fail String deriving Show
instance Monad Res where
return x = Something x
p >>= f = case p of
(Fail e) -> Fail e
(Something x) -> f x
instance Applicative Res where
pure = return
x <*> y = do f <- x
a <- y
return (f a)
instance Functor Res where
fmap f x = do a <- x
return (f a)
The concept of monads found in Haskell is connected to Kleisli categories.
A parser is a function which takes a string and yields a result (of
some type a
) and the remaining string. Our parser even
yields a list of possible parse results, although, we will generally
always favor the first successful parse result.
The below gives a simplified account of parsec: Monadic parser combinators.
Running the parser is simple. Apply the parse function on some input string.
A practical use of this parser appears rather clumsy. As it seems we need to explicitly thread through the parser function and its parser state. Fortunately, we can capture the essence of our parser in a monad.
instance Monad Parser where
return a = Parser (\cs -> [(a,cs)])
p >>= f = Parser (\cs -> concat [parse (f a) cs' | (a,cs') <- parse p cs])
parse (Parser p) = p
The return
function returns its arguments and leaves
the input unchanged.
The >>=
takes as its first argument a parse
result Parser a
and as its second argument a function
b -> Parser b
which consumes the result and produces a
new parser.
Helper concat
combines the individual results and
parse
extracts the parsing function.
Thanks to the do
syntax the specification of a parser
becomes straightforward as we will see shortly.
Since GHC 7.10
, each monad is also an applicative
functor (but not every function and applicative function is a monad).
Hence, we need to provide the following additional instances.
instance Functor Parser where
fmap = liftM -- default
{-
fmap f x = do a <- x
return (f a)
-}
instance Applicative Parser where
pure = return
(<*>) = ap -- default
{-
(<*>) x y = do f <- x
a <- y
return (f a)
-}
Our parser shall supports further features. These features can be
defined as instances of Alternative
and
MonadPlus
.
instance MonadPlus Parser where
mzero = Parser (\cs -> [])
mplus (Parser f1) (Parser f2) = Parser (\cs -> f1 cs ++ f2 cs)
instance Alternative Parser where
empty = mzero
(<|>) = mplus
many p = many1 p +++ return []
Now, the fun starts. Let’s define some parser abstractions.
-- One or more repetitions
many1 :: Parser a -> Parser [a]
many1 p = do {a <- p; as <- many p; return (a:as)}
-- Deterministic (first) choice
(+++) :: Parser a -> Parser a -> Parser a
(+++) p q = Parser ( \cs -> case parse (p `mplus` q) cs of
[] -> []
(x:xs) -> [x] )
-- Option
option :: Parser a -> Parser [a]
option p = (do x <- p
return [x])
+++ return []
-- Unconditionally parse a single character
item :: Parser Char
item = Parser (\cs -> case cs of
"" -> []
(c:cs) -> [(c,cs)])
(+++)
implements a deterministic first choice among
several parse results (e.g. could be used in an LL-style parser with
arbitrary look-ahead)
option
attempts to parse or simply skips the input
otherwise
item
unconditionally parse a single
character
many
and many1
attempt to repeatedly a
parser zero/at least once or more times.
sat :: (Char -> Bool) -> Parser Char
sat p = do { c <- item; if p c then return c else nil }
char :: Char -> Parser Char
char c = sat (c ==)
string :: String -> Parser String
string "" = return ""
string (c:cs) = do {char c; string cs; return (c:cs)}
sat
only parses a character if a condition is
satisfied
char
is conveniently defined in terms of
sat
string
recurses over the given string
space :: Parser String
space = many (sat (\c -> case c of
' ' -> True
'\t' -> True
_ -> False))
token :: Parser a -> Parser a
token p = do {space; a <- p; space; return a}
In parsing, we are generally only interested in character sequences which are not separated by white-space (commonly referred to as a token).
token
parses a token and throws away any
leading/trailing spaces
First, we define a parser function for digits where the digits are immediately turned into their integer representation.
digit c
| '0' <= c && c <= '9' = True
| otherwise = False
parseDigits :: Parser Int
parseDigits = do
i <- token $ many $ sat digit
return (read i :: Int)
-- read never fails here
In the above, we make use of operator $
to specify
function applications by omitting parentheses. The expression
token $ many $ sat digit
is equivalent to
token (many (sat digit))
In regular expression notation, parseDigits
expresses
['0'-'9]+
Additionally, in function parseDigits
we ignore
white-space and convert the sequence of digits into an integer
value.
Next, we consider parsing of a single line consisting of a student ID and a sequence of marks.
studentID = parseDigits
mark = parseDigits
delimiter = token $ sat (==',')
-- Parses a single line, optionally terminated by '\n'
-- We immediately sum up the marks
parseAndSumLine :: Parser (Int, Int)
parseAndSumLine = do
id <- studentID
marks <- many1 $ do delimiter
m <- mark
return m
token $ option $ sat (=='\n')
return (id,sum marks)
In regular expression notation, the above is (roughly) equivalent to
['0'-'9']+ (, ['0'-'9']+)+ '\n'
Additionally, function parseAndSumLine
extracts the
student ID and immediately sums up the list of marks.
Parsing the complete file is straightforward.
Here’s the a variant of the earlier function sumCVSFile
which uses our simple CSV parser.
import Control.Applicative
import Control.Monad (Monad, MonadPlus, liftM, ap, mplus, mzero)
import System.FilePath.Posix(takeBaseName)
------------------------------------------------------
-- The Parser Monad
newtype Parser a = Parser (String -> [(a,String)])
run :: Parser t -> String -> [(t, String)]
run (Parser p) s = p s
parse (Parser p) = p
instance Functor Parser where
fmap = liftM -- default
{-
fmap f x = do a <- x
return (f a)
-}
instance Applicative Parser where
pure = return
(<*>) = ap -- default
{-
(<*>) x y = do f <- x
a <- y
return (f a)
-}
instance Monad Parser where
return a = Parser (\cs -> [(a,cs)])
p >>= f = Parser (\cs -> concat [parse (f a) cs' | (a,cs') <- parse p cs])
instance MonadPlus Parser where
mzero = Parser (\cs -> [])
mplus (Parser f1) (Parser f2) = Parser (\cs -> f1 cs ++ f2 cs)
instance Alternative Parser where
empty = mzero
(<|>) = mplus
many p = many1 p +++ return []
------------------------
-- Some combinators
-- One or more repetitions
many1 :: Parser a -> Parser [a]
many1 p = do {a <- p; as <- many p; return (a:as)}
-- Deterministic (first) choice
(+++) :: Parser a -> Parser a -> Parser a
(+++) p q = Parser ( \cs -> case parse (p `mplus` q) cs of
[] -> []
(x:xs) -> [x] )
-- Unconditionally parse a single character
item :: Parser Char
item = Parser (\cs -> case cs of
"" -> []
(c:cs) -> [(c,cs)])
sat :: (Char -> Bool) -> Parser Char
sat p = do { c <- item; if p c then return c else mzero }
char :: Char -> Parser Char
char c = sat (c ==)
-- Parse a specific string
string :: String -> Parser String
string "" = return ""
string (c:cs) = do {char c; string cs; return (c:cs)}
-- Option
option :: Parser a -> Parser [a]
option p = (do x <- p
return [x])
+++ return []
-- Skip spaces
space :: Parser String
space = many (sat (\c -> case c of
' ' -> True
'\t' -> True
_ -> False))
-- Skip spaces before and after
token :: Parser a -> Parser a
token p = do {space; a <- p; space; return a}
-------------------------------------
-- Application: parse csv file
digit c
| '0' <= c && c <= '9' = True
| otherwise = False
parseDigits :: Parser Int
parseDigits = do
i <- token $ many $ sat digit
return (read i :: Int)
-- read never fails here
studentID = parseDigits
mark = parseDigits
delimiter = token $ sat (==',')
-- Parses a single line, optionally terminated by '\n'
-- We immediately sum up the marks
parseAndSumLine :: Parser (Int, Int)
parseAndSumLine = do
id <- studentID
marks <- many1 $ do delimiter
m <- mark
return m
token $ option $ sat (=='\n')
return (id,sum marks)
parseCSV = many1 parseAndSumLine
sumCSVFile fileName = do
let outFile = (takeBaseName fileName) ++ "-sum.csv"
content <- readFile fileName
case (run parseCSV content) of
((xs,_):_) -> do
let output = unlines $
map (\(id,m) -> show id ++ " , " ++ show m) xs
writeFile outFile output
[] -> error "parse failure"
Hidden state.
Types do not say much about the actual computations involved.
Must deal with side effects (such as state changes) explicitly
inc :: [(String, Int)] -> [(String, Int)]
inc s =
let val = (case lookup "x" s of -- read value of x in s
Just v -> v
Nothing -> 0)
+
1
-- The above corresponds to the right-hand side "x+1"
in -- output is the resulting state where
-- we overwrite the value stored for x by y
[ (y,v) | (y,v) <- s, y /= "x"] ++ [("x",val)]
Systematic approach to deal with side effects.
Haskell’s type class overloading helps us to hide “all” details.
Monads to capture arbitrary side effects/computations
“Fancy” IO is an example of a shallow EDSL (embedded domain-specific language).
Using monads is fairly easy
Roll your own monad, see parser monad example
Monads versus Exceptions
Not all computational effects are monads, e.g. see arrows, …
Monads also apply to other languages and are helpful tool to structure programs (identify precisely what type of computations take place)
import Data.IORef
-- Mutable state
-- IORef Int, declaraes a mutable value of type Int
-- we can overwrite the value
-- IO Int, describes a computation that produces some Int value,
-- as part of the computation, some IO side effects took place.
inc :: IORef Int -> IO Int
inc x = do
let z :: Int
z = 1 -- introduces an immutable variable
-- we can't overwrite the variable
-- readIORef x, dereferences x,
-- meaning we access the value, x points to in memory
-- In C, we simpy write (*x)
-- let v = readIORef x -- why can't we write that
-- not allowed in Haskell, cause readIORef
-- has a side effect, we access the memory
v <- readIORef x -- some clever syntax to state the following:
-- carry out some side effecting computation,
-- here, dereference x and obtain the value
-- x points to.
-- But we would like to bind the dereferenced value
-- to some variable.
-- Haskell uses
-- v <- readIORef x
-- to bind the dereferenced value to z.
-- Variable z is immutable.
-- The reason for using "<-":
-- We can't use "=" cause it's a keyword.
-- In Haskell, we can "overload" many/most symbols.
-- So, "<-" is such a symbol that has been giving a special meaning.
-- Summarize.
-- let v = readIORef x
-- may be easier to read, but would require to change the Haskell front-end.
writeIORef x (v+z) -- In C, we would simply write
-- *x = v + z
return (v+z) -- we return some Int value
inc2 :: IORef Int -> Int
inc2 x = 1
run :: IO ()
run = do r <- newIORef 1 -- introduces a mutable Variable r of type IORef Int
-- with initial value (in memory) 1
x <- inc r
print x
y <- readIORef r
print y
{-
In C, side effects are NOT captured via types.
int inc(int* x) {
const int z = 1;
x = (int*)(5000); // possible in C
*x = *x + z;
return *x;
}
int inc2(int* x) {
const int z = 1;
return z;
}
run() {
int* r;
r = new(1);
int x = inc(r);
...
delete r; // Haskell has automatic GC, no delete needed
}
-}