Martin Sulzmann
Building reliable software relies on many factors. One important condition is that the software implemented by some contractor matches the requirements as stated by the customer. We take a look at formal specifications described in terms of operational semantics rules to ensure that the meaning of the ‘system’ is described unambiguously. We then consider how to turn the operational semantics rules into executable Go code.
Go syntax is close to C but more modern. For example, strongly typed, package system (no more header files), automatic garbage collection etc.
We all learned about roman numbers in school. We wish to describe the syntax and semantics of roman numbers in terms of Go.
A roman number is a sequence of letters I,
V, X, L, C,
D and M. In Go, we can represent roman numbers
as follows.
type RomanLetter int
const (
I RomanLetter = 1
V RomanLetter = 5
X RomanLetter = 10
L RomanLetter = 50
C RomanLetter = 100
D RomanLetter = 500
M RomanLetter = 1000
)The above effectively corresponds to an enum known from C and C++.
For example, the roman number “VXI” is represented in Go as follows.
Go has a keyword (var) for introducing variables. The
type []RomanLetter denotes a slice. A slice is a
dynamically-sized collection of elements where elements are of the same
type.
Like other languages such as C++ and Java, Go supports a simple form
of type inference via the := keyword.
We wish to provide the user with a more convenient interface where a
sequence of symbols (string) is transformed into the Go type
[]RomanLetter.
This transformation process is commonly referred to as parsing and consists of two parts:
Check the sequence of symbols represents a valid roman number
If yes, the output shall be a value of type
[]RomanLetter.
Here’s a simple parser.
var romans = map[byte]RomanLetter{
'I': I,
'V': V,
'X': X,
'L': L,
'C': C,
'D': D,
'M': M,
}
func parse(s string) ([]RomanLetter, bool) {
var ls []RomanLetter
for i := 0; i < len(s); i++ {
r, ok := romans[s[i]]
if !ok {
return []RomanLetter{}, false
}
ls = append(ls, r)
}
return ls, true
}Instead of char we find byte in Go.
We use a map to assign each roman character its internal representation.
Functions in Go are declared via func. The return type
(if any) comes at the end. Variable/parameter declarations first
introduce the name of the variable/parameter and then its type.
Go supports multiple return values. This is commonly used to signal failure. Go does not support Java/C++ style exception handling.
Processing of the input is very simple.
roman
map.append the roman
letter to the sequence of roman letters.false).The other direction is equally simple to implement.
var romansFrom = map[RomanLetter]byte{
I: 'I',
V: 'V',
X: 'X',
L: 'L',
C: 'C',
D: 'D',
M: 'M',
}
func show(ls []RomanLetter) string {
var bs []byte
for _, l := range ls {
bs = append(bs, romansFrom[l])
}
return string(bs)
}
Let’s consider the semantic meaning of roman numbers.
Each roman letter is associated with an integer value. In Go, this can be described via the following function.
what’s the meaning of []RomanLetter{V,M,X}. Do we simply
sum up the individual values? No, it is more difficult.
Here are some rules formulated as English sentences that (try to) explain how to obtain the value connected to a roman number
When a symbol appears after a larger (or equal) symbol it is added
But if the symbol appears before a larger symbol it is subtracted
What do we mean by “larger”? Well, that is simple. We assume
I < V < X < L < C < D < M.
Unfortunately, the above rules are insufficient and ambiguous.
By insufficient, we mean that when trying to derive the semantic
meaning of a roman number we get stuck. Consider the roman number “VI”.
Based on the first rule, we can argue that “I” implies +1.
But how to interpreter “V”. There is no rule that covers this case.
By ambiguous, we mean that the same roman number can be interpreted
differently and the differences lead to different results. Consider
“VIX”. Based on the first rule, “I” implies +1 but based on
the second rule, we could also argue for -1.
Disclaimer. The following semantic rules are meant to provide for a possible interpretation of roman numbers. We have no intention to describe the meaning of roman numbers as used thousands of years ago.
We define the semantics of roman numbers via operational semantics
rules of the form r => i where r is a value
of type []RomanLetter and i is its integer
value.
We require three rules where we assume that the sequence of roman letters is processed from left to right.
We write [x1,...,xn] to represent a sequence of letters
x1,…,xn.
(R1) [x] => value(x)
(R2) [x1,x2,...xn] => value(x1) + n if x1 >= x2 and [x2,...,xn] => n
(R3) [x1,x2,...xn] => n - value(x1) if x1 < x2 and [x2,...,xn] => n
The first rule covers the (base) case that the roman number consists of a single letter only.
The second and third rule compare the two leading roman letters. If the first leading letter is strictly smaller than the second letter, we subtract the value of the leading letter. See rule (R3). Otherwise, we add the value of the leading letter. See rule (R2). In both cases, we assume that the meaning of the remaining roman letters is independent of the leading roman letter. This then leads to clear and concise semantic specification of the meaning of roman numbers.
By construction, the above operational semantic rules are unambiguous.
However, two distinct roman numbers may yield the same value. For example, take “VVV” and “XV”.
What is the result of evaluating the following roman numbers.
[V,I] => ?
[X,I] => ?
[L,M,C] => ?
What about the following case.
[I,I,I,I] => ?
Four Is in a row are unnecessary and can be represented
by [I,V].
Is there any roman number that evaluates to a negative number?
There exists a wide range of different forms of operational semantics rules. The above form is also known under the name interpreter-style evaluation rules Such rules can easily be implemented in Go.
func eval(ls []RomanLetter) int {
switch {
case len(ls) == 1:
return val(ls[0])
case len(ls) > 1 && ls[0] >= ls[1]:
return val(ls[0]) + eval(ls[1:])
case len(ls) > 1 /*&& ls[0] < ls[1]*/ :
return eval(ls[1:]) - val(ls[0])
default: // should never happen
return 0
}
}Points to note.
val(...) is necessary:
For example, ls[0] yields a value of type RomanLetter
This type is not compatible with the type int (unless we apply some type cast)
ls[0] >= ls[1] versus
val(ls[0]) >= val(ls[1]):
When comparing values of type RomanLetter there is no need to cast them to int
Of course, we cannot compare an int against a RomanLetter (unless we apply some cast)
The following functions ties together all the bits we have seen so far.
func run() {
var input string
fmt.Printf("\nPlease provide a roman number\n")
fmt.Scanln(&input)
ls, b := parse(input)
if !b {
fmt.Printf("\nSorry, incorrect format, please try again")
run()
}
fmt.Printf("\nYou have keyed in the following roman number: ")
fmt.Println(show(ls))
fmt.Printf("\nThe roman number corresponds to the numeric value %d", eval(ls))
}Here’s some onlinegdb runnable Go code that implements roman numbers as described above.
FYI. The Go playground does not support user input. Hence, we’re using onlinegdb.
Instead of specifying the semantic rules in terms of “if” clauses, we use a “slightly” different notation. Instead of
conclusion if premise
we write
premise
-------------------
conclusion
The “conclusion” represents the desired outcome and the “premise” represents the conditions that need to be satisfied.
Here comes the reformulation of our operational semantic rules.
(N1) [x] => value(x)
[x2,...,xn] => n x1 >= x2
(N2) ----------------------------
[x1,x2,...,xn] => value(x1) + n
[x2,...,xn] => n x1 < x2
(N3) ----------------------------
[x1,x2,...,xn] => n - value(x1)
Rules (N1), (N2) and (N3) correspond to rules (R1), (R2) and (R3). This style of writing rules is commonly referred to as natural deduction style.
Thus, evaluation of a roman number can be written as an evaluation tree where
the leaf node at the top applies rule (N1)
intermediate nodes follow from applications of rule (N2) and (N3)
from the root node at the bottom we can derive the decimal value of the roman number.
Here is an example. Slight change in notation. We write VILC instead of [V,I,L,C] to reduce the syntactic noise.
Here is the proof for the statement VILC => 54.
(N1) C => 100 L < C Leaf node
(N3)---------------------------
LC => 50 = 100 - 50 I < L Intermediate node
(N3)---------------------------
ILC => 49 = 50 - 1 V >= I Intermediate node
(N2)---------------------------
VILC => 54 = 5 + 49 Root node
Here are some further examples.
(N1) I => 1 V >= I
(N2)---------------------------
VI => 6 = 5 + 1
(N1) C => 100 M >= C
(N2)---------------------------
MC => 1100 = 1000 + 100 L < M
(N3)---------------------------
LMC => 1050 = 1100 - 50
(N1) C => 100 V < C
(N3)---------------------------
VC => 95 = 100 - 5 I < V
(N3)---------------------------
IVC => 94 = 95 - 1 L >= I
(N2)---------------------------
LIVC => 144 = 50 + 94 L >= L
(N2)---------------------------
LLIVC => 194 = 50 + 144 X < L
(N3)---------------------------
XLLIVC => 184 = 194 - 10 M >= X
(N2)---------------------------
MXLLIVC => 1184 = 1000 + 184
Consider the evaluation tree for VILC.
(N1) C => 100 L < C
(N3)---------------------------
LC => 50 = 100 - 50 I < L
(N3)---------------------------
ILC => 49 = 50 - 1 V >= I
(N2)---------------------------
VILC => 54 = 5 + 49
Ultimately, we are only interested in the final evalution step where
we derive VILC => 54.
But how do we know that VILC => 54 is the correct
result? For example, we might have done some miscalculations, …
What are our options to verify that the result is correct? We could
apply eval interpreter on VILC. But what if
our implementation of eval has a bug?
In the end, we need to verify that the individual evaluation steps to obtain “VILC => 54” are correct. This means that we require a proof of the individual derivation steps that lead to the evaluation tree!
Idea:
Record the final derivation step, and
all earlier derivation steps.
A derivation is a representative for each rule application. In terms Go code, we use the following structure for derivations.
ls and v represent the conclusion
(claimed statement)
premise represents some assumptions
r tells us how to reach the conclusion based on the
assumptions
Each derivation connected to rules (N2) and (N3) can be viewed as a node in the evaluation tree. The derivation connected to rule (N1) corresponds to the leaf nude.
The following program generates a proof in terms of derivations.
func evalDeriv(ls []RomanLetter) Derivation {
var d Derivation
switch {
case len(ls) == 1:
d = Derivation{premise: nil, ls: append([]RomanLetter(nil), ls...), v: val(ls[0]), r: N1}
case len(ls) > 1 && ls[0] >= ls[1]:
d2 := evalDeriv(ls[1:])
d = Derivation{premise: &d2, ls: append([]RomanLetter(nil), ls...), v: val(ls[0]) + d2.v, r: N2}
case len(ls) > 1 /*&& ls[0] < ls[1]*/ :
d2 := evalDeriv(ls[1:])
d = Derivation{premise: &d2, ls: append([]RomanLetter(nil), ls...), v: d2.v - val(ls[0]), r: N3}
}
return d
}Here is quick and basic sanity check to see if the derivation is sensible. We do NOT check if the result is correct. We just make sure that the structure of the derivation steps is sensible.
func sanityCheck(d Derivation) bool {
switch {
case d.r == N1:
if d.premise == nil && len(d.ls) != 1 {
return false
}
return true
case d.r == N2:
if d.premise == nil || len(d.ls) <= 1 || !(d.ls[0] >= d.ls[1]) {
return false
}
return sanityCheck(*d.premise)
case d.r == N3:
if d.premise == nil || len(d.ls) <= 1 || !(d.ls[0] < d.ls[1]) {
return false
}
return sanityCheck(*d.premise)
}
return false
}Could we generate a proof that is correct (and certified) by construction? Yes, but that’s story to be told in some seminar, project, and elective course.
Here is some program code to play with.
If you are interested. Check out Operational semantics, Types and Programming Languages.
As one of the tasks here is to learn more about Go, we introduce yet another Go implementation of “roman numbers”.
Go is considered an “OO” language but does not support Java/C++ style classes. Go supports instead
We use these features to describe the “behavior” of roman numbers.
In Go, an interface describes a collection of methods.
val yields the underlying value for each roman
letterconv converts the roman letter into byte
representationgtThan checks if one roman letter is greater than the
other roman letter (based on their numerical representation)Each methods operates on a receiver (aka objects). The receiver is implicit in the above declaration
We describe the behavior of roman letters by providing (overloaded)
implementations of methods val, conv and
gtThan.
type I struct{}
type V struct{}
type X struct{}
type L struct{}
type C struct{}
type D struct{}
type M struct{}
func (x I) val() int { return 1 }
func (x V) val() int { return 5 }
func (x X) val() int { return 10 }
func (x L) val() int { return 50 }
func (x C) val() int { return 100 }
func (x D) val() int { return 500 }
func (x M) val() int { return 1000 }
func (x I) conv() byte { return 'I' }
func (x V) conv() byte { return 'V' }
func (x X) conv() byte { return 'X' }
func (x L) conv() byte { return 'L' }
func (x C) conv() byte { return 'C' }
func (x D) conv() byte { return 'D' }
func (x M) conv() byte { return 'M' }
func (x I) gtThan(y RomanLetter) bool { return x.val() >= y.val() }
func (x V) gtThan(y RomanLetter) bool { return x.val() >= y.val() }
func (x X) gtThan(y RomanLetter) bool { return x.val() >= y.val() }
func (x L) gtThan(y RomanLetter) bool { return x.val() >= y.val() }
func (x C) gtThan(y RomanLetter) bool { return x.val() >= y.val() }
func (x D) gtThan(y RomanLetter) bool { return x.val() >= y.val() }
func (x M) gtThan(y RomanLetter) bool { return x.val() >= y.val() }Lots of boilerplate code. We don’t claim that this implementation is
“better” compared to the earlier implementation. This implementation is
“different” and has certain advantages. For example, we can easily add a
“new” roman letter (say N standing for ten thousand).
Compared to Java/C++ there are no class hierarchies. Hence, there are no (nominal) subclass relations.
In Go, a (struct) type T is a subtype of interface
I if all methods declared in I are defined for
T. This form of subtyping is referred to semantic
subtyping.
In our case, we conclude that I is a subtype of
RomanLetter and therefore the code below type checks.
var romans = map[byte]RomanLetter{
'I': I{},
'V': V{},
'X': X{},
'L': L{},
'C': C{},
'D': D{},
'M': M{},
}Here are the remaining bits of our alternative implementation of “roman numbers”. The differences are minor. Instead of function we use method calls.
func show(ls []RomanLetter) string {
var bs []byte
for _, l := range ls {
bs = append(bs, l.conv())
}
return string(bs)
}
func parse(s string) ([]RomanLetter, bool) {
var ls []RomanLetter
for i := 0; i < len(s); i++ {
r, ok := romans[s[i]]
if !ok {
return []RomanLetter{}, false
}
ls = append(ls, r)
}
return ls, true
}
func eval(ls []RomanLetter) int {
switch {
case len(ls) == 1:
return ls[0].val()
case len(ls) > 1 && ls[0].gtThan(ls[1]):
return ls[0].val() + eval(ls[1:])
case len(ls) > 1:
return eval(ls[1:]) - ls[0].val()
default: // should never happen
return 0
}
}
func run() {
var input string
fmt.Printf("\nPlease provide a roman number\n")
fmt.Scanln(&input)
ls, b := parse(input)
if !b {
fmt.Printf("\nSorry, incorrect format, please try again")
run()
}
fmt.Printf("\nYou have keyed in the following roman number: ")
fmt.Println(show(ls))
fmt.Printf("\nThe roman number corresponds to the numeric value %d", eval(ls))
}Here’s some onlinegdb runnable Go code that provides for an alternative implementation of roman numbers as described above.
Formal specifications are necessary to avoid semantic ambiguities
Operational semantics rules are one possibility
Go is an easy to learn language with some interesting features. By example, we have some features such as
fmt.Printf(...))Here are some further examples where operational semantics rules are used: