From formal specifications to executable Go code (“roman numbers”)

Martin Sulzmann

Overview

Building reliable software relies on many factors. One important condition is that the software implemented by some contractor matches the requirements as stated by the customer. We take a look at formal specifications described in terms of operational semantics rules to ensure that the meaning of the ‘system’ is described unambiguously. We then consider how to turn the operational semantics rules into executable Go code.

Go syntax is close to C but more modern. For example, strongly typed, package system (no more header files), automatic garbage collection etc.

Example: Roman numbers

We all learned about roman numbers in school. We wish to describe the syntax and semantics of roman numbers in terms of Go.

Syntax of roman numbers

A roman number is a sequence of letters I, V, X, L, C, D and M. In Go, we can represent roman numbers as follows.

type RomanLetter int

const (
    I RomanLetter = 1
    V RomanLetter = 5
    X RomanLetter = 10
    L RomanLetter = 50
    C RomanLetter = 100
    D RomanLetter = 500
    M RomanLetter = 1000
)

The above effectively corresponds to an enum known from C and C++.

For example, the roman number “VXI” is represented in Go as follows.

    var r1 []RomanLetter
    r1 = []RomanLetter{V, X, I}

Go has a keyword (var) for introducing variables. The type []RomanLetter denotes a slice. A slice is a dynamically-sized collection of elements where elements are of the same type.

Like other languages such as C++ and Java, Go supports a simple form of type inference via the := keyword.

    r2 := []RomanLetter{V, X, I}

User-friendly input

From string to roman letters

We wish to provide the user with a more convenient interface where a sequence of symbols (string) is transformed into the Go type []RomanLetter.

This transformation process is commonly referred to as parsing and consists of two parts:

Check the sequence of symbols represents a valid roman number
If yes, the output shall be a value of type []RomanLetter.

Here’s a simple parser.

var romans = map[byte]RomanLetter{
    'I': I,
    'V': V,
    'X': X,
    'L': L,
    'C': C,
    'D': D,
    'M': M,
}

func parse(s string) ([]RomanLetter, bool) {

    var ls []RomanLetter

    for i := 0; i < len(s); i++ {
        r, ok := romans[s[i]]
        if !ok {
            return []RomanLetter{}, false
        }
        ls = append(ls, r)
    }

    return ls, true

}

Instead of char we find byte in Go.

We use a map to assign each roman character its internal representation.

Functions in Go are declared via func. The return type (if any) comes at the end. Variable/parameter declarations first introduce the name of the variable/parameter and then its type.

Go supports multiple return values. This is commonly used to signal failure. Go does not support Java/C++ style exception handling.

Processing of the input is very simple.

We repeatedly check each input symbol.
We perform a lookup of the symbol in the roman map.
If the symbol is valid we continue and append the roman letter to the sequence of roman letters.
Otherwise, we report failure (false).

From roman letters to string

The other direction is equally simple to implement.

var romansFrom = map[RomanLetter]byte{
    I: 'I',
    V: 'V',
    X: 'X',
    L: 'L',
    C: 'C',
    D: 'D',
    M: 'M',
}

func show(ls []RomanLetter) string {
    var bs []byte
    for _, l := range ls {
        bs = append(bs, romansFrom[l])
    }
    return string(bs)
}

Semantics of roman numbers

Let’s consider the semantic meaning of roman numbers.

The easy part

Each roman letter is associated with an integer value. In Go, this can be described via the following function.

func val(l RomanLetter) int {
    return (int)(l)
}

The tricky part

what’s the meaning of []RomanLetter{V,M,X}. Do we simply sum up the individual values? No, it is more difficult.

Here are some rules formulated as English sentences that (try to) explain how to obtain the value connected to a roman number

When a symbol appears after a larger (or equal) symbol it is added
But if the symbol appears before a larger symbol it is subtracted

What do we mean by “larger”? Well, that is simple. We assume I < V < X < L < C < D < M.

Unfortunately, the above rules are insufficient and ambiguous.

By insufficient, we mean that when trying to derive the semantic meaning of a roman number we get stuck. Consider the roman number “VI”. Based on the first rule, we can argue that “I” implies +1. But how to interpreter “V”. There is no rule that covers this case.

By ambiguous, we mean that the same roman number can be interpreted differently and the differences lead to different results. Consider “VIX”. Based on the first rule, “I” implies +1 but based on the second rule, we could also argue for -1.

Operational semantics to the rescue

Disclaimer. The following semantic rules are meant to provide for a possible interpretation of roman numbers. We have no intention to describe the meaning of roman numbers as used thousands of years ago.

We define the semantics of roman numbers via operational semantics rules of the form r => i where r is a value of type []RomanLetter and i is its integer value.

We require three rules where we assume that the sequence of roman letters is processed from left to right.

We write [x1,...,xn] to represent a sequence of letters x1,…,xn.

(R1)   [x] => value(x)

(R2)   [x1,x2,...xn] => value(x1) + n  if x1 >= x2 and [x2,...,xn] => n

(R3)   [x1,x2,...xn] => n - value(x1)  if x1 < x2 and [x2,...,xn] => n

The first rule covers the (base) case that the roman number consists of a single letter only.

The second and third rule compare the two leading roman letters. If the first leading letter is strictly smaller than the second letter, we subtract the value of the leading letter. See rule (R3). Otherwise, we add the value of the leading letter. See rule (R2). In both cases, we assume that the meaning of the remaining roman letters is independent of the leading roman letter. This then leads to clear and concise semantic specification of the meaning of roman numbers.

By construction, the above operational semantic rules are unambiguous.
However, two distinct roman numbers may yield the same value. For example, take “VVV” and “XV”.

Examples

What is the result of evaluating the following roman numbers.

[V,I] => ?

[X,I] => ?

[LMC] => ?

What about the following case.

[I,I,I,I] => ?

Four Is in a row are unnecessary and can be represented by [I,V].

Is there any roman number that evaluates to a negative number?

Interpreter-style evaluation rules in Go

There exists a wide range of different forms of operational semantics rules. The above form is also known under the name interpreter-style evaluation rules Such rules can easily be implemented in Go.

func eval(ls []RomanLetter) int {
    switch {
    case len(ls) == 1:
        return val(ls[0])
    case len(ls) > 1 && ls[0] >= ls[1]:
        return val(ls[0]) + eval(ls[1:])
    case len(ls) > 1 /*&& ls[0] < ls[1]*/ :
        return eval(ls[1:]) - val(ls[0])
    default: // should never happen
        return 0
    }

}

Putting things together

The following functions ties together all the bits we have seen so far.

func run() {
    var input string
    fmt.Printf("\nPlease provide a roman number\n")
    fmt.Scanln(&input)
    ls, b := parse(input)
    if !b {
        fmt.Printf("\nSorry, incorrect format, please try again")
        run()

    }
    fmt.Printf("\nYou have keyed in the following roman number: ")
    fmt.Println(show(ls))
    fmt.Printf("\nThe roman number corresponds to the numeric value %d", eval(ls))

}

Here’s some onlinegdb runnable Go code that implements roman numbers as described above.

FYI. The Go playground does not support user input. Hence, we’re using onlinegdb.

Natural deduction proof rules

Instead of specifying the semantic rules in terms of “if” clauses, it is common use natural deduction style proof rules.

         x1 >= x2    [x2,...,xn] => n
(N2)   ----------------------------
        [x1,x2,...,xn] => value(x1) + n


         x1 < x2     [x2,...,xn] => n
(N3)   ----------------------------
        [x1,x2,...,xn] => n - value(x1)

Rules (N2) and (N3) correspond to rules (R2) and (R3).

Yet another Go implementation of “roman numbers”

As one of the tasks here is to learn more about Go, we introduce yet another Go implementation of “roman numbers”.

Go is considered an “OO” language but does not support Java/C++ style classes. Go supports instead

method overloading
interfaces
semantic subtyping

We use these features to describe the “behavior” of roman numbers.

Roman interface

In Go, an interface describes a collection of methods.

type RomanLetter interface {
    val() int
    conv() byte
    gtThan(RomanLetter) bool
}

val yields the underlying value for each roman letter
conv converts the roman letter into byte representation
gtThan checks if one roman letter is greater than the other roman letter (based on their numerical representation)

Each methods operates on a receiver (aka objects). The receiver is implicit in the above declaration

Method overloading

We describe the behavior of roman letters by providing (overloaded) implementations of methods val, conv and gtThan.

type I struct{}
type V struct{}
type X struct{}
type L struct{}
type C struct{}
type D struct{}
type M struct{}

func (x I) val() int { return 1 }
func (x V) val() int { return 5 }
func (x X) val() int { return 10 }
func (x L) val() int { return 50 }
func (x C) val() int { return 100 }
func (x D) val() int { return 500 }
func (x M) val() int { return 1000 }

func (x I) conv() byte { return 'I' }
func (x V) conv() byte { return 'V' }
func (x X) conv() byte { return 'X' }
func (x L) conv() byte { return 'L' }
func (x C) conv() byte { return 'C' }
func (x D) conv() byte { return 'D' }

func (x M) conv() byte { return 'M' }
func (x I) gtThan(y RomanLetter) bool { return x.val() >= y.val() }
func (x V) gtThan(y RomanLetter) bool { return x.val() >= y.val() }
func (x X) gtThan(y RomanLetter) bool { return x.val() >= y.val() }
func (x L) gtThan(y RomanLetter) bool { return x.val() >= y.val() }
func (x C) gtThan(y RomanLetter) bool { return x.val() >= y.val() }
func (x D) gtThan(y RomanLetter) bool { return x.val() >= y.val() }
func (x M) gtThan(y RomanLetter) bool { return x.val() >= y.val() }

Lots of boilerplate code. We don’t claim that this implementation is “better” compared to the earlier implementation. This implementation is “different” and has certain advantages. For example, we can easily add a “new” roman letter (say N standing for ten thousand).

Semantic subtyping

Compared to Java/C++ there are no class hierarchies. Hence, there are no (nominal) subclass relations.

In Go, a (struct) type T is a subtype of interface I if all methods declared in I are defined for T. This form of subtyping is referred to semantic subtyping.

In our case, we conclude that I is a subtype of RomanLetter and therefore the code below type checks.

var romans = map[byte]RomanLetter{
    'I': I{},
    'V': V{},
    'X': X{},
    'L': L{},
    'C': C{},
    'D': D{},
    'M': M{},
}

Remaining bits

Here are the remaining bits of our alternative implementation of “roman numbers”. The differences are minor. Instead of function we use method calls.

func show(ls []RomanLetter) string {
    var bs []byte
    for _, l := range ls {
        bs = append(bs, l.conv())
    }
    return string(bs)
}

func parse(s string) ([]RomanLetter, bool) {

    var ls []RomanLetter

    for i := 0; i < len(s); i++ {
        r, ok := romans[s[i]]
        if !ok {
            return []RomanLetter{}, false
        }
        ls = append(ls, r)
    }

    return ls, true

}

func eval(ls []RomanLetter) int {
    switch {
    case len(ls) == 1:
        return ls[0].val()
    case len(ls) > 1 && ls[0].gtThan(ls[1]):
        return ls[0].val() + eval(ls[1:])
    case len(ls) > 1:
        return eval(ls[1:]) - ls[0].val()
    default: // should never happen
        return 0
    }

}

func run() {
    var input string
    fmt.Printf("\nPlease provide a roman number\n")
    fmt.Scanln(&input)
    ls, b := parse(input)
    if !b {
        fmt.Printf("\nSorry, incorrect format, please try again")
        run()

    }
    fmt.Printf("\nYou have keyed in the following roman number: ")
    fmt.Println(show(ls))
    fmt.Printf("\nThe roman number corresponds to the numeric value %d", eval(ls))

}

Here’s some onlinegdb runnable Go code that provides for an alternative implementation of roman numbers as described above.

Summary

Formal specifications are necessary to avoid semantic ambiguities
Operational semantics rules are one possibility
Go is an easy to learn language with some interesting features. By example, we have some features such as
- Package system (fmt.Printf(...))
- Typed constants
- Slices
- Simple type inference
- Simple control structures (for loop and switch-case)
- Functions with multiple return values
- Method overloading
- Interfaces
- Semantic subtyping
Here are some further examples where operational semantics rules are used: