From formal specifications to executable code (“roman numbers”)

Martin Sulzmann

Overview

Building reliable software relies on many factors. One important condition is that the software implemented by some contractor matches the requirements as stated by the customer. We take a look at formal specifications described in terms of operational semantics rules to ensure that the meaning of the ‘system’ is described unambiguously. We then consider how to turn the operational semantics rules into executable code.

Example: Roman numbers

We all learned about roman numbers in school. We wish to describe the syntax and semantics of roman numbers in terms of Haskell.

Syntax of roman numbers

A roman number is a sequence of letters I, V, X, L, C, D and M. In Haskell, we can represent roman numbers as follows.

data RomanLetter  =
               I   -- 1
             | V   -- 5
             | X   -- 10
             | L   -- 50
             | C   -- 100
             | D   -- 500
             | M   -- 1000
             deriving (Eq, Ord, Show)

type RomanNumber = [RomanLetter]

For example, the roman number “VXI” is represented in Haskell as follows.

r1 :: RomanNumber
r1 = [V,X,I]

We wish to provide the user with a more convenient interface where a sequence of symbols (string) is transformed into the Haskell type RomanNumber.

This transformation process is commonly referred to as parsing and consists of two parts:

Check the sequence of symbols represents a valid roman number
If yes, the output shall be a value of type RomanNumber.

Here’s a simple parser.

primMap = [('I',I),('V',V),('X',X),('L',L),('C',C),('D',D),('M',M)]

parse :: String -> Maybe RomanNumber
parse [] = Nothing
parse [x] = case lookup x primMap of
              Nothing -> Nothing
              Just r -> Just [r]
parse (x:xs) =
     case lookup x primMap of
       Nothing -> Nothing
       Just r -> case parse xs of
                   Nothing -> Nothing
                   Just rs -> Just (r:rs)

The syntax rule are very simple here. In essence, a mapping from characters to RomanLetter, see primMap.

We repeatedly check each input symbol.
If the symbol is valid we continue.
Otherwise, we report failure (Nothing).

Monadic parser

Monads support the systematic control of side effects. In our case, distinguishing between success or failure of parsing. The Maybe data type is a monad. Hence, the above parser can be written in the following equivalent (monadic) form.

parseM :: String -> Maybe RomanNumber
parseM [] = Nothing
parseM [x] = do r <- lookup x primMap
                return [r]
parseM (x:xs) = do
     r <- lookup x primMap
     rs <- parseM xs
     return (r:rs)

Semantics of roman numbers

Let’s consider the semantic meaning of roman numbers.

The easy part

Each roman letter is associated with an integer value. In Haskell, this can be described via the following function.

value :: RomanLetter -> Int
value I = 1
value V = 5
value X = 10
value L = 50
value C = 100
value D = 500
value M = 1000

The tricky part

what’s the meaning of [V,M,X]. Do we simply sum up the individual values? No, it is more difficult.

Here are some rules formulated as English sentences that (try to) explain how to obtain the value connected to a roman number

When a symbol appears after a larger (or equal) symbol it is added
But if the symbol appears before a larger symbol it is subtracted

What do we mean by “larger”? Well, that is simple. We assume I < V < X < L < C < D < M.

Unfortunately, the above rules are insufficient and ambiguous.

By insufficient, we mean that when trying to derive the semantic meaning of a roman number we get stuck. Consider the roman number “VI”. Based on the first rule, we can argue that “I” implies +1. But how to interpreter “V”. There is no rule that covers this case.

By ambiguous, we mean that the same roman number can be interpreted differently and the differences lead to different results. Consider “VIX”. Based on the first rule, “I” implies +1 but based on the second rule, we could also argue for -1.

Operational semantics to the rescue

Disclaimer. The following semantic rules are meant to provide for a possible interpretation of roman numbers. We have no intention to describe the meaning of roman numbers as used thousands of years ago.

We define the semantics of roman numbers via operational semantics rules of the form r => i where r is a value of type RomanNumber and i is its integer value.

We require three rules where we assume that the sequence of roman letters is processed from left to right.

(R1)   [x] => value(x)

(R2)   (x:y:ys) => value(x) + n  if x >= y and (y:ys) => n

(R3)   (x:y:ys) => n - value(x)  if x < y and (y:ys) => n

The first rule covers the (base) case that the roman number consists of a single letter only.

The second and third rule compare the two leading roman letters. For both rules, we are using Haskell pattern matching. If the first leading letter is strictly smaller than the second letter, we subtract the value of the leading letter. See rule (R3). Otherwise, we add the value of the leading letter. See rule (R2). In both cases, we assume that the meaning of the remaining roman letters is independent of the leading roman letter. This then leads to clear and concise semantic specification of the meaning of roman numbers.

By construction, the above operational semantic rules are unambiguous.
However, two distinct roman numbers may yield the same value. For example, take “VVV” and “XV”.

Interpreter-style evaluation rules

There exists a wide range of different forms of operational semantics rules. The above form is also known under the name interpreter-style evaluation rules Such rules can easily be implemented in Haskell.

eval :: RomanNumber -> Int
eval [x] = value x
eval (x:y:ys)
   | x >= y    = value x + eval (y:ys)
   | otherwise = eval (y:ys) - value x

Natural deduction proof rules

Instead of specifying the semantic rules in terms of “if” clauses, it is common use natural deduction style proof rules.

         x >= y    (y:ys) => n
(N2)   ----------------------------
        (x:y:ys) => value(x) + n


         x < y     (y:ys) => n
(N3)   ----------------------------
        (x:y:ys) => n - value(x)

Rules (N2) and (N3) correspond to rules (R2) and (R3).

Summary

Formal specifications are necessary to avoid semantic ambiguities
Operational semantics rules are one possibility
Haskell is fun for semantic engineering. Here’s some onlinegdb runnable code that implements roman numbers
Here are some further examples where operational semantics rules are used: