Expressive Types

Martin Sulzmann

Overview

Software has bugs
Build software that is correct by construction
Make use of expressive types to achieve this goal

By expressive types we mean:

Exploit the programming language’s type system to catch bugs at compile-time instead of run-time

We consider expressive types in Go, Haskell, C++ and Agda.

Background: Types versus terms

By “terms” we refer to expressions/constructs found in some programming language. Via types we wish to express that terms (programs) satisfy some properties. So far, we have seen the following.

Terms that depend on terms.

For example, consider the following Go function that takes another function as an argument.

func apply(i int, f func(int) int) int {
    return f(i)
    }

Types that depend on types.

For example, slices used in the function below. The type of a slice depends on the type of the underlying elements.

func mapInt(f func(int) int, xs []int) []int {
    ys := make([]int, len(xs))
    for i, e := range xs {
        ys[i] = f(e)
    }

    return ys
}

Further examples are collection types as found in Java. Take the Vec type (constructor/operator) which depends on the type of the underlying elements. For example, Vec<int> describes a vector of Integer values.

Terms that depend on types.

Consider

func f_int(x int) int {
   var y int
   y = x + 1
   return y
}

where + depends on the types assigned to the program variables. Hence, we find here Integer addition.

If we change the types, use Floats instead of Integer, the meaing of + changes.

func f_float(x float32) float32 {
   var y float32
   y = x + 1
   return y
}

In case terms depend on types, we usually find a form of polymorphism. In the above, we find overloading (a special form of ad-hoc polymorphism).

Let’s consider parametric polymorphism (“generics”). Here is a recast of the mapInt example.

func map[A any,B any[(f func(A) B, xs []A) []B {
    ys := make([]<B>, len(xs))
    for i, e := range xs {
        ys[i] = f(e)
    }

    return ys
}

This is yet another example for terms depending on types.

There’s a fourth dependency: Types that depend on terms.

Here’s a (contrived) example using Go notation.

func concat[N,M](xs []int[N], ys []int[M]) []int[N+M] {
    zs := make([]int, len(xs)+len(ys))
    var i int
    for i = 0; i < len(xs); i++ {
        zs[i] = xs[i]
    }
    for i = len(xs); i < len(xs)+len(ys); i++ {
        zs[i] = ys[i-len(xs)]
    }

    return zs
}

We find value parameters N and M. The type []int[N] describes a slice of Integer elements of length N. This is an example of a type depending on a term.

Function concat appends two slices. Hence, the length of the resulting slice is the sum of the length of the two input slices. This is expressed via the return type []int[N+M].

The above example shows that

via dependent types we can capture non-trivial program properties
type checking with dependent types becomes more involved because we need to carry out certain computations already at compile-time (when type checking the program).

Type checking for dependent types is undecidable in general. Hence, dependently typed languages such as Agda and Idris require the user to supply proofs for dependent type properties.

Note. The Halting problem is undecidable but for a specific function the user can verify (by providing a proof) that the function terminates.

Can we mimic dependent types in Go? Yes (too some extent)!

Expressive types in Go

We take a look at a series of Go examples. The complete program code can be found here.

Example: Physical units

Here is a standard school book exericse:

Given some rectangle where the length and height are record by variables l and h.
Compute the rectangle’s perimeter based on the formula 2 * (l + h).

Below is an implementation in Go.

func perimeterRec(l int, h int) int {
    return l + l + h + h
}

func ex_rec() {
    length := 10 // measured in feet

    height := 5 // measured in meters

    p := perimeterRec(length, height)

    fmt.Printf("\np = %d", p)
}

Any issues?

Length is measured in feet
Height is measured in meters
The physical units do not match

Hence, we might get some bogus result but the program runs just fine and the bug (physical unit mismatch) goes undetected.

Expressive types to the rescue

Idea:

Each variable that stores a distance carries additional information about its physical unit
Arithmetic operation ensure that physical units are consistent with each other

Here’s how we achieve this in Go.

Physical units aware integer variables

type dist[A any] struct {
    val int
}

func add[A any](x dist[A], y dist[A]) dist[A] {
    return dist[A]{x.val + y.val}
}

The actual distance is stored in an integer variable (val)
The type parameter A represents some physical unit
When building a concrete distance value, type parameter A will be instantiated
The type of the add function ensures that the physical units of the to be added value match

Here’s a recast of the above example where we assume the physical units feet and meters.

type meters struct{}
type feet struct{}


func perimeterRec_safe(l dist[meters], h dist[meters]) dist[meters] {
    return add(l, add(l, add(h, h)))
}

func ex_rec_safe() {
    length := dist[meters]{10}

    height := dist[meters]{5}

    p := perimeterRec_safe(length, height)

    fmt.Printf("\np = %d", p.val)
}

/*

func ex_rec_wont_compile() {
    length := dist[feet]{10}

        height := dist[meters]{5}

        p := perimeterRec_safe(length, height)

        fmt.Printf("\np = %d", p.val)
    }

*/

Extensions

As you have noticed, instead of 2 * (l + h) we use the (equivalent) expression l+l+h+h.

If we include multiplication, matters become more complicated.

We need to represent phyisical units such as meters / feet
We need to take care of algebraic unit laws such as (meters * meters) / meters equals meters

It’s possible to represent some more complex phyiscal units in Go but things become more complicated.

Example: Extracting the head element of a slice

Consider the following program text.

func head(xs []int) int {
    return xs[0]
}

func ex_hd() {
    xs := []int{1, 2, 3}
    // The following variant yields panic.
    /// xs := []int{}
    x := head(xs)

    fmt.Printf("\n%d", x)

}

Here’s a variant where we catch the bug (extracting the head of an empty slice) at run-time.

func head_rt(xs []int) (int, bool) {
    if len(xs) == 0 {
        return 0, false
    }

    return xs[0], true
}

func ex_hd_rt() {
    xs := []int{1, 2, 3}

    x, b := head_rt(xs)
    if !b {
        fmt.Println("some panic, do something")
    }
    fmt.Printf("\n%d", x)

}

Challenge:

We wish to catch “head applied on empty slice” at compile-time

Expressive types to the rescue

Idea:

Encode natural numbers at the level of types
Each slice maintains some type information that correspends to its length.

Encoding numbers at the level of types

type zero struct{}

type succ[T nat] struct {
    succ T
}

type nat interface {
    length() int
}

func (x zero) length() int {
    return 0
}

func (x succ[T]) length() int {
    return 1 + x.succ.length()
}

Length-indexed lists

type list[T any, L nat] struct {
    list []T
}

func head_list[T any, L nat](xs list[T, succ[L]]) T {
    return xs.list[0]
}

func push_back[T any, L nat](xs list[T, L], x T) list[T, succ[L]] {
    var ls []T
    ls = append(xs.list, x)
    return list[T, succ[L]]{ls}
}

func mk_list1[T any](x T) list[T, succ[zero]] {
    xs := []T{x}
    return list[T, succ[zero]]{xs}
}

func mk_list0[T any]() list[T, zero] {
    xs := []T{}
    return list[T, zero]{xs}
}

Here’s a recast of the above example where we make use of length-indexed lists.

func ex_hd_list() {
    xs := mk_list0[int]()
    ys := push_back(xs, 1)
    x := head_list(ys)

    fmt.Printf("\n%d", x)
}

Expressive types in Haskell

{-# LANGUAGE GADTs #-}
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE FunctionalDependencies #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE UndecidableInstances #-}
{-# LANGUAGE TypeFamilies #-}

-- Simulating dependent types in Haskell.

-- Natural numbers.
-- data Nat = Zero | Succ Nat
data Zero = Zero
data Succ n = Succ n

-- Length-indexed lists.
data List a n where
    Nil :: List a Zero
    Cons :: a -> List a m -> List a (Succ m)

-- Haskell's "printf" via overloading.
instance Show a => Show (List a n) where
   show Nil = "[]"
   show (Cons x Nil) = "[" ++ show x ++ "]"
   show (Cons x xs) = "[" ++ show x ++ "," ++ tail (show xs)

-- List examples.
listNonEmpty1 = Cons False Nil
listNonEmpty2 = Cons True (Cons False Nil)
listEmpty = Nil

-- Safe operations on list.
headSafe :: List a (Succ n) -> a
headSafe (Cons x _) = x

tailSafe :: List a (Succ n) -> List a n
tailSafe (Cons _ xs) = xs

-- Reversing a list won't change its length.
reverseProp :: List a n -> List a n
reverseProp Nil = Nil
reverseProp (Cons x xs) = snoc (reverseProp xs) x

-- Helper snoc takes a list [x1,...,xn] and an element x
-- and yields the list [x1,...,xn,x].
snoc :: List a n -> a -> List a (Succ n)
snoc Nil x = Cons x Nil
snoc (Cons x xs) y = Cons x (snoc xs y)


-- Appending two lists of length l and m, yields a list of length l+m.
-- 1. We need to find a representation for the sum of l and m.
-- 2. The user needs to implement append as well as providing a proof that the resulting length is l+m.
-- 3. The type checker will verify that the property "l+m" is satisfied.
--
-- The property "l+m" needs to be represented at the level of types.
-- For this purpose, we introduce the following data type.
data Sum x y z where
  Base :: Sum Zero x x
       -- Fact:
       --  0 + x     = x
  Step :: Sum x y z -> Sum (Succ x) y (Succ z)
       -- Rule:
       -- If x + y = z
       -- then (x+1) + y = (z+1)
       --
       -- The above rule is formulated inductively.


-- Insight: Proofs are programs.
-- Consider the property that 1 + 2 = 3
-- where the below program represents a proof for this property.
onePlusTwoEqualsThree :: Sum (Succ Zero) (Succ (Succ Zero)) (Succ (Succ (Succ Zero)))
onePlusTwoEqualsThree = Step Base


-- Let's consider the append function.
-- As an additional (first) argument we find a proof for the to be verified property ("l+m").
-- We make use of the proof during the construction of the program.
-- The type checker ensures that the program satisfies the property.
appendProp :: Sum l m n -> List a l -> List a m -> List a n
appendProp Base Nil xs = xs
appendProp (Step p) (Cons x xs) ys = Cons x (appendProp p xs ys)


example = appendProp onePlusTwoEqualsThree listNonEmpty1 listNonEmpty2


-- We make use of some Haskell extension to let the compiler
-- infer the proof of the "l+m" property for us.

type family Plus x y
type instance Plus Zero x = x
type instance Plus (Succ x) y = Succ (Plus x y)

-- The proof for the propery "l+m" is now implicit.
-- The compiler will carry out the proof steps for us
-- and insert the proof into the program text.
appendProp2 :: List a l -> List a m -> List a (Plus l m)
appendProp2 Nil xs = xs
appendProp2 (Cons x xs) ys = Cons x (appendProp2 xs ys)

Expressive types in C++

// Simulating dependent types in C++11 with templates.

#include <iostream>
#include <string>
using namespace std;

// Natural numbers.
class Zero {};

template<typename N>
class Succ{
  public:
  Succ(N x) {}
};

enum ListKind {
  NilKind,
  ConsKind };

// Length indexed lists expressed as a template class.
// Each list carries its kind (nil or non-empty).
template<typename A, typename N>
class List {
   public:
   ListKind k;
};

// We distinguish between nil (empty) and non-empty list.
// Non-empty list are built via Cons and consist of a head and tail.
// The list [x1,...,xn] is represented as Cons x1 (... (Cons xn Nil) ...).
template<typename A>
class Nil : public List<A,Zero> {
  public:
  Nil() { this->k = NilKind; }
};

template<typename A, typename N>
class Cons : public List<A,Succ<N>> {
  public:
  A x;
  List<A,N>* xs;
  Cons(A _x, List<A,N>* _xs) {
    this->k = ConsKind;
    x = _x;
    xs = _xs;
  }
};

// Show functions.
// Specific definitions for subclasses and length-indexed lists.
string show(string s) {
  return s;
}

string show(bool b) {
  if(b) {
    return "true";
  }
  return "false";
}

template<typename A>
string show(Nil<A> ys) {
  return "[]";
}

template<typename A>
string show(Cons<A,Zero> ys) {
    return ("[" + show(ys.x) + "]");
}

template<typename A, typename N>
string show(Cons<A,Succ<N>> ys) {

  string s = show(ys.xs);

  return ("[" + show(ys.x) + "," + s.substr(1,s.length()-1));
}

template<typename A>
string show(List<A,Zero>* ys) {
  Nil<A>* zs = (Nil<A>*)ys;

  return show(*zs);
}

template<typename A, typename N>
string show(List<A,Succ<N>>* ys) {
  Cons<A,N>* zs = (Cons<A,N>*)ys;

  if(zs->xs->k == NilKind) {
    return ("[" + show(zs->x) + "]");
  }

  string s = show(zs->xs);
  return ("[" + show(zs->x) + "," + s.substr(1,s.length()-1));
}

// Safe list functions.

// Extracting the head requires a non-empty list.
template<typename A, typename N>
A headSafe(List<A,Succ<N>>* ys) {
    Cons<A,N>* zs = (Cons<A,N>*)ys;

    return zs->x;
}

// Extracting the tail requires a non-empty list.
template<typename A, typename N>
List<A,N>* tailSafe(List<A,Succ<N>>* ys) {
    Cons<A,N>* zs = (Cons<A,N>*)ys;

    return zs->xs;
}

// Snoc [x1,...,xn] x = Cons (x1 (... (Cons x Nil) ...))
// "Snoc" expressed via overloaded functions.
template<typename A>
List<A,Succ<Zero>>* Snoc(List<A,Zero>* ys, A y) {
  return (new Cons<A,Zero>(y,ys));
}

template<typename A, typename N>
List<A,Succ<Succ<N>>>* Snoc(List<A,Succ<N>>* ys, A y) {
   Cons<A,N>* zs = (Cons<A,N>*)ys;

   return (new Cons<A,Succ<N>>(zs->x,Snoc(zs->xs,y)));
}


// The "reverse" function where
// we guarantee that the length of the list remains the same.
template<typename A>
List<A,Zero>* reverseProp(List<A,Zero>* ys) {
  return ys;
}

template<typename A, typename N>
List<A,Succ<N>>* reverseProp(List<A,Succ<N>>* ys) {
    Cons<A,N>* zs = (Cons<A,N>*)ys;

    return Snoc(reverseProp(zs->xs),zs->x);
}

// Next, we consider appending two lists of length l and m.
// We (statically) guarantee that the length of the resulting list is the sum of l and m.

// We express X+Y=Z via a template class
// and partial template specialization.
template<typename X, typename Y, typename Z>
class Sum{};

template<typename X>
class Sum<Zero,X,X>{};

template<typename X, typename Y, typename Z>
class Sum<Succ<X>,Y,Succ<Z>>{
   public:
      Sum(Sum<X,Y,Z> _x) { p = _x; }
      Sum<X,Y,Z> p;
};

// We represent append as static member and
// use again partial template specialization to cover the various cases.
template<typename A, typename X, typename Y, typename Z>
class Append {
  public:
  static List<A,Z>* append(Sum<X,Y,Z>, List<A,X>*, List<A,Y>*);
};

template<typename A, typename X>
class Append<A,Zero,X,X> {
  public:
  static List<A,X>* append(Sum<Zero,X,X> p, List<A,Zero>* xs, List<A,X>* ys) {
    return ys;
  }
};

template<typename A, typename X, typename Y, typename Z>
class Append<A,Succ<X>,Y,Succ<Z>> {
  public:
  static List<A,Succ<Z>>* append(Sum<Succ<X>,Y,Succ<Z>> s, List<A,Succ<X>>* ys, List<A,Y>* zs) {
     Cons<A,X>* vs = (Cons<A,X>*)ys;

     return (new Cons<A,Y>(vs->x, Append<A,X,Y,Z>::append(s.p,vs->xs,zs)));
  }
};


// Examples.
int main() {

  // Lists of Booleans.
  {
  List<bool,Succ<Zero>>* l1 = new Cons<bool,Zero>(true,new Nil<bool>());
  List<bool,Succ<Succ<Zero>>>* l2 = new Cons<bool,Succ<Zero>>(false,l1);


  cout << "\n" << show(l1);

  cout << "\n" << show(l2);

  cout << "\n" << show(reverseProp(l2));

  // 1 + 1 = 2
  Sum<Succ<Zero>,Succ<Zero>,Succ<Succ<Zero>>> p
    = Sum<Succ<Zero>,Succ<Zero>,Succ<Succ<Zero>>>(Sum<Zero,Succ<Zero>,Succ<Zero>>());

  cout << "\n" << show(Append<bool,Succ<Zero>,Succ<Zero>,Succ<Succ<Zero>>>::append(p,l1,l1));

  }

  // Lists of strings.
  {
  auto l1 = Nil<string>();

  auto l2 = Cons<string,Zero>("a",new Nil<string>());
  List<string,Succ<Zero>>* l3 = &l2;
  List<string,Succ<Succ<Zero>>>* l4 = new Cons<string,Succ<Zero>>("b",l3);

  cout << "\n" << show(l1);

  cout << "\n" << show(l3);

  cout << "\n" << show(l4);
  }

}

Agda

data N : Set where
  zero : N
  succ : N -> N

add : N -> N -> N
add zero x = x
add (succ x) y = succ (add x y)

data List (A : Set) : (n : N) -> Set where
  Nil : List A zero
  Cons : {n : N} ->  A -> List A n -> List A (succ n)

head : forall {A n} -> List A (succ n) -> A
head (Cons x xs) = x

tail : forall {A n} -> List A (succ n) -> List A n
tail (Cons x xs) = xs

snoc : forall {A m} -> List A m -> A -> List A (succ m)
snoc Nil x = Cons x Nil
snoc (Cons x xs) y = Cons x (snoc xs y)

reverse : forall {A m} -> List A m -> List A m
reverse Nil = Nil
reverse (Cons x xs) = snoc (reverse xs) x

append : forall {A m n} -> List A m -> List A n -> List A (add m n)
append Nil xs = xs
append (Cons x xs) ys = Cons x (append xs ys)

Summary and some references

Expressive types are an old idea (they also appear under many names, e.g., phantom types, …)

It’s possible to encode expressive types in today’s programming languages (Go, Java, …)

The future:

We no longer write programs
Instead we specify properties via types and use proof assistants to establish that the properties hold

Some references

Robert Harper, Type systems for programming languages

Benjamin Pierce’s book collection: