Martin Sulzmann
Building reliable software relies on many factors. One important condition is that the software implemented by some contractor matches the requirements as stated by the customer. We take a look at formal specifications described in terms of operational semantics rules to ensure that the meaning of the ‘system’ is described unambiguously. We then consider how to turn the operational semantics rules into executable Go code.
Go syntax is close to C but more modern. For example, strongly typed, package system (no more header files), automatic garbage collection etc.
We all learned about roman numbers in school. We wish to describe the syntax and semantics of roman numbers in terms of Go.
A roman number is a sequence of letters I
,
V
, X
, L
, C
,
D
and M
. In Go, we can represent roman numbers
as follows.
type RomanLetter int
const (
I RomanLetter = 1
V RomanLetter = 5
X RomanLetter = 10
L RomanLetter = 50
C RomanLetter = 100
D RomanLetter = 500
M RomanLetter = 1000
)
The above effectively corresponds to an enum known from C and C++.
For example, the roman number “VXI” is represented in Go as follows.
Go has a keyword (var
) for introducing variables. The
type []RomanLetter
denotes a slice. A slice is a
dynamically-sized collection of elements where elements are of the same
type.
Like other languages such as C++ and Java, Go supports a simple form
of type inference via the :=
keyword.
We wish to provide the user with a more convenient interface where a
sequence of symbols (string) is transformed into the Go type
[]RomanLetter
.
This transformation process is commonly referred to as parsing and consists of two parts:
Check the sequence of symbols represents a valid roman number
If yes, the output shall be a value of type
[]RomanLetter
.
Here’s a simple parser.
var romans = map[byte]RomanLetter{
'I': I,
'V': V,
'X': X,
'L': L,
'C': C,
'D': D,
'M': M,
}
func parse(s string) ([]RomanLetter, bool) {
var ls []RomanLetter
for i := 0; i < len(s); i++ {
r, ok := romans[s[i]]
if !ok {
return []RomanLetter{}, false
}
ls = append(ls, r)
}
return ls, true
}
Instead of char
we find byte
in Go.
We use a map to assign each roman character its internal representation.
Functions in Go are declared via func
. The return type
(if any) comes at the end. Variable/parameter declarations first
introduce the name of the variable/parameter and then its type.
Go supports multiple return values. This is commonly used to signal failure. Go does not support Java/C++ style exception handling.
Processing of the input is very simple.
roman
map.append
the roman
letter to the sequence of roman letters.false
).The other direction is equally simple to implement.
Let’s consider the semantic meaning of roman numbers.
Each roman letter is associated with an integer value. In Go, this can be described via the following function.
what’s the meaning of []RomanLetter{V,M,X}
. Do we simply
sum up the individual values? No, it is more difficult.
Here are some rules formulated as English sentences that (try to) explain how to obtain the value connected to a roman number
When a symbol appears after a larger (or equal) symbol it is added
But if the symbol appears before a larger symbol it is subtracted
What do we mean by “larger”? Well, that is simple. We assume
I < V < X < L < C < D < M
.
Unfortunately, the above rules are insufficient and ambiguous.
By insufficient, we mean that when trying to derive the semantic
meaning of a roman number we get stuck. Consider the roman number “VI”.
Based on the first rule, we can argue that “I” implies +1
.
But how to interpreter “V”. There is no rule that covers this case.
By ambiguous, we mean that the same roman number can be interpreted
differently and the differences lead to different results. Consider
“VIX”. Based on the first rule, “I” implies +1
but based on
the second rule, we could also argue for -1
.
Disclaimer. The following semantic rules are meant to provide for a possible interpretation of roman numbers. We have no intention to describe the meaning of roman numbers as used thousands of years ago.
We define the semantics of roman numbers via operational semantics
rules of the form r => i
where r
is a value
of type []RomanLetter
and i
is its integer
value.
We require three rules where we assume that the sequence of roman letters is processed from left to right.
We write [x1,...,xn]
to represent a sequence of letters
x1
,…,xn
.
(R1) [x] => value(x)
(R2) [x1,x2,...xn] => value(x1) + n if x1 >= x2 and [x2,...,xn] => n
(R3) [x1,x2,...xn] => n - value(x1) if x1 < x2 and [x2,...,xn] => n
The first rule covers the (base) case that the roman number consists of a single letter only.
The second and third rule compare the two leading roman letters. If the first leading letter is strictly smaller than the second letter, we subtract the value of the leading letter. See rule (R3). Otherwise, we add the value of the leading letter. See rule (R2). In both cases, we assume that the meaning of the remaining roman letters is independent of the leading roman letter. This then leads to clear and concise semantic specification of the meaning of roman numbers.
By construction, the above operational semantic rules are unambiguous.
However, two distinct roman numbers may yield the same value. For example, take “VVV” and “XV”.
What is the result of evaluating the following roman numbers.
[V,I] => ?
[X,I] => ?
[LMC] => ?
What about the following case.
[I,I,I,I] => ?
Four I
s in a row are unnecessary and can be represented
by [I,V]
.
Is there any roman number that evaluates to a negative number?
There exists a wide range of different forms of operational semantics rules. The above form is also known under the name interpreter-style evaluation rules Such rules can easily be implemented in Go.
func eval(ls []RomanLetter) int {
switch {
case len(ls) == 1:
return val(ls[0])
case len(ls) > 1 && ls[0] >= ls[1]:
return val(ls[0]) + eval(ls[1:])
case len(ls) > 1 /*&& ls[0] < ls[1]*/ :
return eval(ls[1:]) - val(ls[0])
default: // should never happen
return 0
}
}
The following functions ties together all the bits we have seen so far.
func run() {
var input string
fmt.Printf("\nPlease provide a roman number\n")
fmt.Scanln(&input)
ls, b := parse(input)
if !b {
fmt.Printf("\nSorry, incorrect format, please try again")
run()
}
fmt.Printf("\nYou have keyed in the following roman number: ")
fmt.Println(show(ls))
fmt.Printf("\nThe roman number corresponds to the numeric value %d", eval(ls))
}
Here’s some onlinegdb runnable Go code that implements roman numbers as described above.
FYI. The Go playground does not support user input. Hence, we’re using onlinegdb.
Instead of specifying the semantic rules in terms of “if” clauses, it is common use natural deduction style proof rules.
x1 >= x2 [x2,...,xn] => n
(N2) ----------------------------
[x1,x2,...,xn] => value(x1) + n
x1 < x2 [x2,...,xn] => n
(N3) ----------------------------
[x1,x2,...,xn] => n - value(x1)
Rules (N2) and (N3) correspond to rules (R2) and (R3).
If you are interested. Check out Operational semantics, Types and Programming Languages.
As one of the tasks here is to learn more about Go, we introduce yet another Go implementation of “roman numbers”.
Go is considered an “OO” language but does not support Java/C++ style classes. Go supports instead
We use these features to describe the “behavior” of roman numbers.
In Go, an interface describes a collection of methods.
val
yields the underlying value for each roman
letterconv
converts the roman letter into byte
representationgtThan
checks if one roman letter is greater than the
other roman letter (based on their numerical representation)Each methods operates on a receiver (aka objects). The receiver is implicit in the above declaration
We describe the behavior of roman letters by providing (overloaded)
implementations of methods val
, conv
and
gtThan
.
type I struct{}
type V struct{}
type X struct{}
type L struct{}
type C struct{}
type D struct{}
type M struct{}
func (x I) val() int { return 1 }
func (x V) val() int { return 5 }
func (x X) val() int { return 10 }
func (x L) val() int { return 50 }
func (x C) val() int { return 100 }
func (x D) val() int { return 500 }
func (x M) val() int { return 1000 }
func (x I) conv() byte { return 'I' }
func (x V) conv() byte { return 'V' }
func (x X) conv() byte { return 'X' }
func (x L) conv() byte { return 'L' }
func (x C) conv() byte { return 'C' }
func (x D) conv() byte { return 'D' }
func (x M) conv() byte { return 'M' }
func (x I) gtThan(y RomanLetter) bool { return x.val() >= y.val() }
func (x V) gtThan(y RomanLetter) bool { return x.val() >= y.val() }
func (x X) gtThan(y RomanLetter) bool { return x.val() >= y.val() }
func (x L) gtThan(y RomanLetter) bool { return x.val() >= y.val() }
func (x C) gtThan(y RomanLetter) bool { return x.val() >= y.val() }
func (x D) gtThan(y RomanLetter) bool { return x.val() >= y.val() }
func (x M) gtThan(y RomanLetter) bool { return x.val() >= y.val() }
Lots of boilerplate code. We don’t claim that this implementation is
“better” compared to the earlier implementation. This implementation is
“different” and has certain advantages. For example, we can easily add a
“new” roman letter (say N
standing for ten thousand).
Compared to Java/C++ there are no class hierarchies. Hence, there are no (nominal) subclass relations.
In Go, a (struct) type T
is a subtype of interface
I
if all methods declared in I
are defined for
T
. This form of subtyping is referred to semantic
subtyping.
In our case, we conclude that I
is a subtype of
RomanLetter
and therefore the code below type checks.
var romans = map[byte]RomanLetter{
'I': I{},
'V': V{},
'X': X{},
'L': L{},
'C': C{},
'D': D{},
'M': M{},
}
Here are the remaining bits of our alternative implementation of “roman numbers”. The differences are minor. Instead of function we use method calls.
func show(ls []RomanLetter) string {
var bs []byte
for _, l := range ls {
bs = append(bs, l.conv())
}
return string(bs)
}
func parse(s string) ([]RomanLetter, bool) {
var ls []RomanLetter
for i := 0; i < len(s); i++ {
r, ok := romans[s[i]]
if !ok {
return []RomanLetter{}, false
}
ls = append(ls, r)
}
return ls, true
}
func eval(ls []RomanLetter) int {
switch {
case len(ls) == 1:
return ls[0].val()
case len(ls) > 1 && ls[0].gtThan(ls[1]):
return ls[0].val() + eval(ls[1:])
case len(ls) > 1:
return eval(ls[1:]) - ls[0].val()
default: // should never happen
return 0
}
}
func run() {
var input string
fmt.Printf("\nPlease provide a roman number\n")
fmt.Scanln(&input)
ls, b := parse(input)
if !b {
fmt.Printf("\nSorry, incorrect format, please try again")
run()
}
fmt.Printf("\nYou have keyed in the following roman number: ")
fmt.Println(show(ls))
fmt.Printf("\nThe roman number corresponds to the numeric value %d", eval(ls))
}
Here’s some onlinegdb runnable Go code that provides for an alternative implementation of roman numbers as described above.
Formal specifications are necessary to avoid semantic ambiguities
Operational semantics rules are one possibility
Go is an easy to learn language with some interesting features. By example, we have some features such as
fmt.Printf(...)
)Here are some further examples where operational semantics rules are used: