Martin Sulzmann
Consider the following code snippet.
transfer (from Account, to Account, amount int) {
if (from.balance() >= amount && amount < Max) {
from.withdraw(amount)
to.deposit(amount)
}
}
There is a potential data race. Two concurrent transfers may lead to inconsistent data. We need to guarantee that withdraw
and deposit
are executed atomically.
transfer (from Account, to Account, amount int) {
lock()
if (from.balance() >= amount && amount < Max) {
from.withdraw(amount)
to.deposit(amount)
}
unlock()
}
All good? The issue is that any concurrent transfer operations will be executed sequentially, even if the accounts involved differ!
Instead of a global lock for all accounts, we introduce a lock for each account.
transfer (from Account, to Account, amount int) {
if (from.balance() >= amount && amount < Max) {
from.lock()
to.lock()
from.withdraw(amount)
to.deposit(amount)
to.unlock()
from.unlock()
}
}
So, two concurrent transfers operating on different accounts can possibly be executed in parallel. However, there's a subtle (deadlock) problem. Can you see the problem?
Consider the following situation.
go transfer(from,to,50) // T1
transfer(to,from,50) // T2
There's a cycle!
T1 tries to acquire from.lock and then to.lock
T2 tries to acquire to.lock and then from.lock
So it is possible that T1 gets stuck because T2 holds to.lock and T2 gets stuck because T1 holds from.lock
We encounter a deadlock!
We impose a strict order in which fine-grained locks are acquired.
transfer (from Account, to Account, amount int) {
if (from.balance() >= amount && amount < Max) {
if from.lock.order() > to.lock.order() {
from.lock()
to.lock()
from.withdraw(amount)
to.deposit(amount)
to.unlock()
from.unlock()
} else {
to.lock()
from.lock()
from.withdraw(amount)
to.deposit(amount)
from.unlock()
to.unlock()
}
}
}
Phew! Problem solved. But it gets really tricky. Too much to handle for mere mortals.
There's yet another issue.
Suppose we wish to carry out two transfer operations atomically, let's say among four accounts.
We cannot (re)use the above transfer operation which manages two accounts.
We have to build a specialized transfer operation for the four account case.
Idea:
Transfer data-base transactions to the world of programming languages.
Keep track of mutuable state (shared variables) and ensure that changes to that state are protected and visible to all threads.
transfer (from Account, to Account, amount int) {
atomically {
if (from.balance() >= amount && amount < Max) {
from.withdraw(amount)
to.deposit(amount)
}
}
}
We introduce a new keyword atomically
to mark code sections whose mutuable state operations shall be protected.
atomically
supports the ACI properties known from database transactions (but not D). Recall A=Atomicity, C=Consistency, I=Isolation, D=Durability.
Durability is dropped. RAM is not persistent
In case there is a conflict among several transactions, at least one of the transactions can be completed whereas the other will be rollbacked.
No intermediate state is observable from the outside.
The cool thing about STM is that it's damn easy to use. Like garbage collection (GC) is damn easy to use. It just works!
So, are all typicall concurrency problems solved by using STM?
Deadlock and livelock cannot happen because the STM run-time guarantees that at least one transaction succeeds in case of conflict. [NOTE: Such issues may reappear once we introduce further STM features.]
Starvation may still happen because there's no fairness guarantee. That is, it's entirely possible that the same transaction will be forced to rollback over and over again.
"Stop the world" by simply using a global lock! See our bank transfer example.
In general this is bad because all concurrent transactions will be forced to execute sequentially.
Use a transaction log to record reading and writing of shared variables.
Validate that reads are still the same.
Commit any writes from the transaction log to the memory store.
The transaction log guarantees that two concurrent transactions can operate in isolation. If there are no interferences both transactions can be executed in parallel.
For consistency, it's critical that the validation and commit step do not lead to interferences with other concurrent validates and commits. There is quite a bit of design space how this in detail is dealt with.
A possible (fairly optimistic) approach is
to allow for concurrent validation steps,
but there can be at most one commmit step where
any conncurrent validation step is forced to rollback.
Possible improvements are:
Allow for concurrent commits as long as they don't interfere. For example, we could use a fine-grained locking approach and to lock the to be commited memory locations based on the order of their location.
Rollbacks are only detected during the validation step instead of immediatly forcing a rollback if a committer invalidates some other currently active transaction log.
There are two strategies to perform the 'commit' step.
Lazy versioning (pretty much as described above)
Record updates in transactional log
On commit, update actual memory locations
Cons: Slow commit
Pro: Fast abort (only need to cancel the 'local' transactional log)
Eager versioning
Update memory locations directly
Maintain undo information in transactional log
Pro: Fast commit
Cons: Slow abort
STM exists for many languages. Haskell has very good support for STM as will see shortly.
"Lock-free" approach to concurrent programming. As easy to use as GC. Clever STM implementations take care of everthing.
"Optimistic" implementation approach works well for frequent reads and less frequent writes.
// Example 1
atomically {
...
fireMissile();
}
What happens in case of a rollback? All modifications to (transactional) variables are rolled back. This is possible because updates only took place in the transactional log. What about fireMissile()
? Such side-effecting operations will be carried out immediately! In case of several rollbacks, we will call fireMissile()
multiple times!
//thread 1
atomically {
...
x = x + 1
...
}
// thread 2
y = x
// thread 3
x++
The questions here are as follows. Shall transactions be atomic with respect to other transactions and non-transactional code? Or shall transactions only be atomic with respect to other transactions? The first choice is referred to as strong atomicity whereas the second choice is referred to as weak atomicity.
Next, we will take a look at STM in Haskell. The good news is that by construction there cannot be any "conflicts" between transactional and non-transactional code.
C versus Haskell.
int x = 1;
x = x + 1;
x <- newIORef (1::Int)
v <- readIORef x
writeIORef x (v+1)
Coding in-place updates in Haskell feels like Assembler (three-address code).
True. It would be nice to have some better syntactic sugar (sometimes possible via EDSLs).
However, the advantage of the Haskell versus the C version is that we can make precise statements about the side effects that take place.
Here is the in-place update interface in Haskell.
newIORef :: a -> IO (IORef a)
readIORef :: IORef a -> IO a
writeIORef :: IORef a -> a -> IO ()
Notice that readIORef
yields IO a
and not a
.
The IO
bit indicates that some side effect (in-place update) took place.
This can be of advantage if several kind of side effects take place! Hint: Thinking of mixing any kind of IO side effect with STM.
module Control.Concurrent.STM
data STM a
data TVar a
newTVarIO :: a -> IO (TVar a)
newTVar :: a -> STM (TVar s)
readTVar :: TVar a -> STM a
writeTVar :: TVar a -> a -> STM ()
atomically :: STM a -> IO a
retry :: STM a
orElse :: STM a-> STM a-> STM a
We will shortly make use of the above STM operations. One important aspect of STM in Haskell is that IO and STM cannot be mixed.
TVar a
identifies a transactional variable which holds a value of type a
IORef a
!STM
identifies a transactional computation
IO
!Hence, the following code is not type correct, hence, rejected.
fireMissile :: IO ()
someCode :: IO ()
atomically $
do ...
fireMissile
...
atomically
expects STM
and not IO
STM
computations may only affect transactional variablesIO
actions like fireMissile
atomically
on IO
actionstransfer :: TVar Int -> TVar Int -> Int -> IO ()
transfer fromAcc toAcc amount =
atomically $ do
f <- readTVar fromAcc
if f <= amount
then return ()
else do
writeTVar fromAcc (f - amount)
t <- readTVar toAcc
writeTVar toAcc (t + amount)
Consider the following.
transferSTM :: TVar Int -> TVar Int -> Int -> STM ()
transferSTM fromAcc toAcc amount = do
f <- readTVar fromAcc
if f <= amount
then return ()
else do
writeTVar fromAcc (f - amount)
t <- readTVar toAcc
writeTVar toAcc (t + amount)
transferAtomicFourAccounts :: TVar Int -> TVar Int -> Int
-> TVar Int -> TVar Int -> Int -> IO ()
transferAtomicFourAccounts from1 to1 a1 from2 to2 a2 = do
atomically $ do
transferSTM from1 to1 a1
transferSTM from2 to2 a2
Capture the essential functionality in terms of STM
Glue together STM
operations.
Here, we build atomic transfer involving four accounts as the compostion of a basic transfer operation.
Recall (fine-grained) locks are not composable.
The 'trick' is that the STM run-time maintains a 'global' view and can thus guarantee that always a strict 'locking' order is used.
transfer :: TVar Int -> TVar Int -> Int -> IO ()
transfer fromAcc toAcc amount =
atomically $ do
f <- readTVar fromAcc
if f <= amount
then retry
else do
writeTVar fromAcc (f - amount)
t <- readTVar toAcc
writeTVar toAcc (t + amount)
retry
retry
transfer2 :: TVar Int -> TVar Int -> Int -> STM ()
transfer2 fromAcc toAcc amount =
(do f <- readTVar fromAcc
if f <= amount
then retry
else do
writeTVar fromAcc (f - amount)
t <- readTVar toAcc
writeTVar toAcc (t + amount) )
`orElse`
transfer2 fromAcc toAcc (amount-50)
orElse
if a transaction is retriedtype Sem = TVar Int
newSem :: Int -> IO Sem
newSem n = newTVarIO n
p :: Sem -> STM ()
p sem = do n <- readTVar sem
if n > 0
then writeTVar sem (n-1)
else retry
v :: Sem -> STM ()
v sem = do n <- readTVar sem
writeTVar sem (n+1)