Martin Sulzmann
Building reliable software relies on many factors. One important condition is that the software implemented by some contractor matches the requirements as stated by the customer. We take a look at formal specifications described in terms of operational semantics rules to ensure that the meaning of the ‘system’ is described unambiguously. We then consider how to turn the operational semantics rules into executable code.
We all learned about roman numbers in school. We wish to describe the syntax and semantics of roman numbers in terms of Haskell.
A roman number is a sequence of letters I
,
V
, X
, L
, C
,
D
and M
. In Haskell, we can represent roman
numbers as follows.
data RomanLetter =
I -- 1
| V -- 5
| X -- 10
| L -- 50
| C -- 100
| D -- 500
| M -- 1000
deriving (Eq, Ord, Show)
type RomanNumber = [RomanLetter]
For example, the roman number “VXI” is represented in Haskell as follows.
We wish to provide the user with a more convenient interface where a
sequence of symbols (string) is transformed into the Haskell type
RomanNumber
.
This transformation process is commonly referred to as parsing and consists of two parts:
Check the sequence of symbols represents a valid roman number
If yes, the output shall be a value of type
RomanNumber
.
Here’s a simple parser.
primMap = [('I',I),('V',V),('X',X),('L',L),('C',C),('D',D),('M',M)]
parse :: String -> Maybe RomanNumber
parse [] = Nothing
parse [x] = case lookup x primMap of
Nothing -> Nothing
Just r -> Just [r]
parse (x:xs) =
case lookup x primMap of
Nothing -> Nothing
Just r -> case parse xs of
Nothing -> Nothing
Just rs -> Just (r:rs)
The syntax rule are very simple here. In essence, a mapping from
characters to RomanLetter
, see primMap
.
Nothing
).Monads support the systematic control of side effects. In our case,
distinguishing between success or failure of parsing. The
Maybe
data type is a monad. Hence, the above parser can be
written in the following equivalent (monadic) form.
Let’s consider the semantic meaning of roman numbers.
Each roman letter is associated with an integer value. In Haskell, this can be described via the following function.
value :: RomanLetter -> Int
value I = 1
value V = 5
value X = 10
value L = 50
value C = 100
value D = 500
value M = 1000
what’s the meaning of [V,M,X]
. Do we simply sum up the
individual values? No, it is more difficult.
Here are some rules formulated as English sentences that (try to) explain how to obtain the value connected to a roman number
When a symbol appears after a larger (or equal) symbol it is added
But if the symbol appears before a larger symbol it is subtracted
What do we mean by “larger”? Well, that is simple. We assume
I < V < X < L < C < D < M
.
Unfortunately, the above rules are insufficient and ambiguous.
By insufficient, we mean that when trying to derive the semantic
meaning of a roman number we get stuck. Consider the roman number “VI”.
Based on the first rule, we can argue that “I” implies +1
.
But how to interpreter “V”. There is no rule that covers this case.
By ambiguous, we mean that the same roman number can be interpreted
differently and the differences lead to different results. Consider
“VIX”. Based on the first rule, “I” implies +1
but based on
the second rule, we could also argue for -1
.
Disclaimer. The following semantic rules are meant to provide for a possible interpretation of roman numbers. We have no intention to describe the meaning of roman numbers as used thousands of years ago.
We define the semantics of roman numbers via operational semantics
rules of the form r => i
where r
is a value
of type RomanNumber
and i
is its integer
value.
We require three rules where we assume that the sequence of roman letters is processed from left to right.
(R1) [x] => value(x)
(R2) (x:y:ys) => value(x) + n if x >= y and (y:ys) => n
(R3) (x:y:ys) => n - value(x) if x < y and (y:ys) => n
The first rule covers the (base) case that the roman number consists of a single letter only.
The second and third rule compare the two leading roman letters. For both rules, we are using Haskell pattern matching. If the first leading letter is strictly smaller than the second letter, we subtract the value of the leading letter. See rule (R3). Otherwise, we add the value of the leading letter. See rule (R2). In both cases, we assume that the meaning of the remaining roman letters is independent of the leading roman letter. This then leads to clear and concise semantic specification of the meaning of roman numbers.
By construction, the above operational semantic rules are unambiguous.
However, two distinct roman numbers may yield the same value. For example, take “VVV” and “XV”.
There exists a wide range of different forms of operational semantics rules. The above form is also known under the name interpreter-style evaluation rules Such rules can easily be implemented in Haskell.
eval :: RomanNumber -> Int
eval [x] = value x
eval (x:y:ys)
| x >= y = value x + eval (y:ys)
| otherwise = eval (y:ys) - value x
Instead of specifying the semantic rules in terms of “if” clauses, it is common use natural deduction style proof rules.
x >= y (y:ys) => n
(N2) ----------------------------
(x:y:ys) => value(x) + n
x < y (y:ys) => n
(N3) ----------------------------
(x:y:ys) => n - value(x)
Rules (N2) and (N3) correspond to rules (R2) and (R3).
If you are interested. Check out Operational semantics, Types and Programming Languages.
Formal specifications are necessary to avoid semantic ambiguities
Operational semantics rules are one possibility
Haskell is fun for semantic engineering. Here’s some onlinegdb runnable code that implements roman numbers
Here are some further examples where operational semantics rules are used: