How I turned a binary search tree into a generic data structure with go2go

Some time ago I wrote about how to create a balanced binary search tree. The search keys and the data payload were both plain strings. Now it is time to get rid of this limitation. go2go lets us do that while waiting for the official generics release.

I started creating a new course on Go concurrency!
Join the wait list to stay in the loop and get a discount.

Warning: This article is super boring! It turned out that converting a container type into a generic container type is quite straightforward with go2go and shows no surprises.

Which is actually a good sign.

It is a good sign because adding generic data types and functions to a programming language is dead easy… to get wrong. Hence the Go team went to great lengths, and took all possible precautions, to design generics that don’t suck. And IMHO, the current proposal should appeal even to the ones who were skeptical about adding generics to Go at all.

With the current generics design, it would seem fairly easy to create new generic data structures and generic functions, but what about sifting through old code to make it generic? Will there be any footguns?

Let’s find out.

Generic Trees

The status quo of the search tree code

In this article, I created a binary tree, and in another article, I turned the tree into a balanced tree (with AVL balancing logic). Both the search key and the payload data are of type string.

type Node struct {
	Value  string
	Data   string
	Left   *Node
	Right  *Node
	height int
}

What to change

Obviously, we need to change the types of the fields Value and Data.

Then, all functions that take or return either of these two fields, or that take a Node and access the fields through the Node struct, need to be adjusted. This applies to functions like Insert() or min(), for example.

Let’s walk through the code and adjust it as required.

Step 1: create generic types

First, we need to let the compiler know which types supports comparison operators (<, >=, etc). For this, we create an interface type that lists all comparable types, using the keyword type to distinguish it from a behavioral interface. This type of interface is called a type constraint.

As always, the code starts with package and import statements, as the whole blog article is generated from a single, compilable Go source file.

package main

import (
	"fmt"
	"strings"
)

Looks like a classic interface but the type keyword inside reveals that this is a type constraint.

type Ordered interface {
	type int, int8, int16, int32, int64, uint, uint8, uint16, uint32, uint64, uintptr, float32, float64, string
}

A generic type has type parameters that are substituted by a concrete type when a new variable is declared based on the generic type.

Here, we use two generic types: Value and Data.

Value is the search value and hence has to be a comparable type.

Datais the payload and can be any type.

This is expressed by adding appropriate type constraints to the declaration.

type (

Type Value must be one of the types listed in the Ordered type constraint.

	Value[T Ordered] T

Type Data can be any type.

	Data[T any] T
)

Now we can write declarations like

var v Value[int]
var d Data[[]string]

to instantiate variables from a generic type (such as Value), using a real type (such as int). Here, we just created an int as the search value and a slice of strings as data.

Step 2: Change existing types

Now we take the Node struct shown above, and change the Value and Data fields from string to the new generic Value and Data types.

This turns the Node struct itself into a generic type that we must declare with appropriate type parameters. In general, any generic types declared inside a struct bubble up to the struct declaration.

Note that the *Node pointer types inside the struct also need to be properly parameterized.

type Node struct {
Value string
Data string
Right *Node
Left *Node

type Node[Value Ordered, Data any] struct {
	Value  Value
	Data   Data
	Left   *Node[Value, Data]
	Right  *Node[Value, Data]
	height int
}

(In the comment block, this is how the struct looked before.)

When instantiating a Node, concrete types for the Value and Data parameters must be supplied. Then the fields Value and Data get instantiated to the given concrete types.

Example: n := *Node{uint16, []byte}

Step 3: change functions and methods

Now let’s look through all the functions and methods and make them polymorphic.

Wherever a function receives a Node value, or a value string or data string, we need to change this to the respective generic type, for example, Node[Value, Data].

The same applies to method receivers.

Here, we can see why we need an Ordered constraint. Type T must support comparison operations, otherwise a > b would fail at runtime if T is instantiated with a non-comparable type.

func max[T Ordered](a, b T) T {
	if a > b {
		return a
	}
	return b
}

Besides the receiver type, nothing needs to be changed here. *Node becomes *Node[Value, Data].
Later, when instantiating a struct of type Node, concrete types need to be supplied for Value and Data.

func (n *Node[Value, Data]) Height() int {
	if n == nil {
		return 0
	}
	return n.height
}

func (n *Node[Value, Data]) Bal() int {
	return n.Right.Height() - n.Left.Height()
}

Here is the first occurrence of generic parameters and return types.
value, data string is now
value Value, data Data.
The function body remains untouched, as all operations on value, data, n.Value, or n.Data work the same, even though the concrete types for Value and Data are not known yet.
Especially, == and < work fine for the Value type because of the Ordered type constraint.

func (n *Node[Value, Data]) Insert(value Value, data Data) *Node[Value, Data] {
	if n == nil {
		return &Node[Value, Data]{
			Value:  value,
			Data:   data,
			height: 1,
		}
	}
	if n.Value == value {
		n.Data = data
		return n
	}

	if value < n.Value {
		n.Left = n.Left.Insert(value, data)
	} else {
		n.Right = n.Right.Insert(value, data)
	}

	n.height = max(n.Left.Height(), n.Right.Height()) + 1

	return n.rebalance()
}

From here onwards, the same pattern repeats. The function signatures receive generic parameters for the Node type, and the function bodies remain largely unmodified.
#boring

func (n *Node[Value, Data]) rotateLeft() *Node[Value, Data] {
	r := n.Right
	n.Right = r.Left
	r.Left = n
	n.height = max(n.Left.Height(), n.Right.Height()) + 1
	r.height = max(r.Left.Height(), r.Right.Height()) + 1
	return r
}

func (n *Node[Value, Data]) rotateRight() *Node[Value, Data] {
	l := n.Left
	n.Left = l.Right
	l.Right = n
	n.height = max(n.Left.Height(), n.Right.Height()) + 1
	l.height = max(l.Left.Height(), l.Right.Height()) + 1
	return l
}

func (n *Node[Value, Data]) rotateRightLeft() *Node[Value, Data] {
	n.Right = n.Right.rotateRight()
	n = n.rotateLeft()
	n.height = max(n.Left.Height(), n.Right.Height()) + 1
	return n
}

func (n *Node[Value, Data]) rotateLeftRight() *Node[Value, Data] {
	n.Left = n.Left.rotateLeft()
	n = n.rotateRight()
	n.height = max(n.Left.Height(), n.Right.Height()) + 1
	return n
}

func (n *Node[Value, Data]) rebalance() *Node[Value, Data] {
	switch {
	case n.Bal() < -1 && n.Left.Bal() == -1:
		return n.rotateRight()
	case n.Bal() > 1 && n.Right.Bal() == 1:
		return n.rotateLeft()
	case n.Bal() < -1 && n.Left.Bal() == 1:
		return n.rotateLeftRight()
	case n.Bal() > 1 && n.Right.Bal() == -1:
		return n.rotateRightLeft()
	}
	return n
}

func (n *Node[Value, Data]) Find(s Value) (Data, bool) {

	if n == nil {

Interesting detail: go2go has no dedicated expression for “zero value of type T” (yet). This is resolved here by instantiating a variable of type T and returning that variable.
An alternate way is shown below, and a third alternative is to use named return parameters and use a naked return statement.

		var zero Data
		return zero, false
	}

	switch {
	case s == n.Value:
		return n.Data, true
	case s < n.Value:
		return n.Left.Find(s)
	default:
		return n.Right.Find(s)
	}
}

func (n *Node[Value, Data]) Dump(i int, lr string) {
	if n == nil {
		return
	}
	indent := ""
	if i > 0 {
		indent = strings.Repeat(" ", (i-1)*4) + "+" + lr + "--"
	}
	fmt.Printf("%s%v[%d,%d]\n", indent, n.Value, n.Bal(), n.Height())
	n.Left.Dump(i+1, "L")
	n.Right.Dump(i+1, "R")
}

type Tree[Value Ordered, Data any] struct {
	Root *Node[Value, Data]
}

func (t *Tree[Value, Data]) Insert(value Value, data Data) {
	t.Root = t.Root.Insert(value, data)
	if t.Root.Bal() < -1 || t.Root.Bal() > 1 {
		t.rebalance()
	}
}

func (t *Tree[Value, Data]) rebalance() {
	if t == nil || t.Root == nil {
		return
	}
	t.Root = t.Root.rebalance()
}

func (t *Tree[Value, Data]) Find(s Value) (Data, bool) {
	if t == nil || t.Root == nil {

Same situation as in method Find above.
Here, we use new to create a zero value on the fly.
new returns a pointer, and hence we need to add the dereferencing operator.

		return *new(Data), false
	}
	return t.Root.Find(s)
}

func (t *Tree[Value, Data]) Traverse(n *Node[Value, Data], f func(*Node[Value, Data])) {
	if n == nil {
		return
	}
	t.Traverse(n.Left, f)
	f(n)
	t.Traverse(n.Right, f)
}

func (t *Tree[Value, Data]) PrettyPrint() {

	printNode := func(n *Node[Value, Data], depth int) {
		fmt.Printf("%s%v\n", strings.Repeat("  ", depth), n.Value)
	}
	var walk func(*Node[Value, Data], int)
	walk = func(n *Node[Value, Data], depth int) {
		if n == nil {
			return
		}
		walk(n.Right, depth+1)
		printNode(n, depth)
		walk(n.Left, depth+1)
	}

	walk(t.Root, 0)
}

func (t *Tree[Value, Data]) Dump() {
	t.Root.Dump(0, "")
}

How to use the new generic tree type

Now is the moment where we can instantiate the generic Tree[Value, Data] type into something tangible like Tree[int,string].

func main() {
	values := []string{"d", "b", "g", "g", "c", "e", "a", "h", "f", "i", "j", "l", "k"}
	data := []string{"delta", "bravo", "golang", "golf", "charlie", "echo", "alpha", "hotel", "foxtrot", "india", "juliett", "lima", "kilo"}

Here, Tree gets instantiated with the string type for both Value and Data. This is basically the same tree as in the original article about balanced trees.

	tree := &Tree[string, string]{}
	for i := 0; i < len(values); i++ {
		tree.Insert(values[i], data[i])
	}

	fmt.Print("\n*** Tree with string search values and string data ***\n\n")
	fmt.Print("Sorted values: | ")

As with *Tree above, *Node also needs to get instantiated with concrete types.

	tree.Traverse(tree.Root, func(n *Node[string, string]) { fmt.Print(n.Value, ": ", n.Data, " | ") })
	fmt.Println()

	fmt.Println("Pretty print (turned 90° anti-clockwise):")
	tree.PrettyPrint()

Let’s try the same with integers as search values.

	keys := []int{4, 2, 7, 7, 3, 5, 1, 8, 6, 9, 10, 12, 11}

No new data slice here. It remains the same slice of strings.


This time, Tree gets instantiated with int and string for Value and Data, respectively.

	intTree := &Tree[int, string]{}
	for i := 0; i < len(keys); i++ {
		intTree.Insert(keys[i], data[i])
	}

	fmt.Print("\n*** Tree with int search values and string data ***\n\n")
	fmt.Print("Sorted values: | ")
	intTree.Traverse(intTree.Root, func(n *Node[int, string]) { fmt.Print(n.Value, ": ", n.Data, " | ") })
	fmt.Println()

	fmt.Println("Pretty print")
	intTree.PrettyPrint()

How about creating a search tree of search trees?

Let’s feed a search tree with search trees as payload data.
Because why not?
And because doing this can answer an interesting question: Will the syntax of nested generic type instantiatons become unwieldy?

The search values shall be integers.

	keys = []int{3, 1, 2}

I am lazy here and use the existing “string, string” tree thrice.

	trees := []*Tree[string, string]{tree, tree, tree}

This is a nested instantiation of generic types. Nice detail: the syntax really remains readable.

	treeTree := &Tree[int, *Tree[string, string]]{}
	for i := 0; i < len(keys); i++ {
		treeTree.Insert(keys[i], trees[i])
	}

	fmt.Print("\n*** Tree with int search values and Tree[string, string] data ***\n\n")
	fmt.Print("Sorted values: | ")

As with *Tree above, *Node also needs to get instantiated with concrete types.

	treeTree.Traverse(treeTree.Root, func(n *Node[int, *Tree[string, string]]) { fmt.Print(n.Value, ": ", n.Data, " | ") })
	fmt.Println()

	fmt.Println("Pretty print:")
	treeTree.PrettyPrint()

	var val string
	subtree, found := treeTree.Find(2)
	if found {
		val, found = subtree.Find("b")
	}
	fmt.Printf("Find \"s\" in subtree 2: %v (found: %t)\n", val, found)

}

How to run the code

At the moment of this writing, a go2go installation is required to run this code locally. Hence I recommend using the go2go playground for a quick test.

Conclusion

Turning an existing container data type into a generic one has only few surprises. Hey, I told you it will be boring!

With a few checks in mind, you should be ready for generizing… generalizing… genericizing… genericking… uh, whatever… your existing container data types.

  • Review all the operations your code applies to the original types. If these operations apply to a certain kind of data type only, your generic type needs a type constraint.
  • Look through your fmt.Printf statements. Most likely, you will need to change a few type-specific placeholders to a general %v to avoid errors.
  • Look for return statements that return a zero value. Typically, these occur when returning a non-nil error.
    Example: return "", errors.New(...).
    Use one of the workaround shown above:
    • Workaround 1: declare a variable of type T, which defaults to the type’s zero value. Return that variable.
    • Workaround 2: use *new(T), which instantiates T, returns a pointer, and dereferences that pointer. The result is a zero value of T. Return that result.

(See the tree code above for working examples.)

In summary, I am pleased about how easy the conversion process turned out to be, and also how readable the result is. Once generics are included in an official release, workarounds like the ones I described in another article are not required anymore.

That’s it. Happy generic coding! ʕ◔ϖ◔ʔ


Trees and background image courtesy of artists at Pixabay

comments powered by Disqus