How I turned a binary search tree into a generic data structure with go2go
Some time ago I wrote about how to create a balanced binary search tree. The search keys and the data payload were both plain strings. Now it is time to get rid of this limitation. go2go lets us do that while waiting for the official generics release.
Update: Go type parameters have changed since go2go
. The article has been updated to match the syntax and semantics of type parameters in Go 1.18 and use the cmp
package of Go 1.21 instead of constraints
.
Warning: This article is super boring! It turned out that converting a container type into a generic container type is quite straightforward with go2go
and shows no surprises.
Which is actually a good sign.
It is a good sign because adding generic data types and functions to a programming language is dead easy… to get wrong. Hence the Go team went to great lengths, and took all possible precautions, to design generics that don't suck. And IMHO, the current proposal should appeal even to the ones who were skeptical about adding generics to Go at all.
With the current generics design, it would seem fairly easy to create new generic data structures and generic functions, but what about sifting through old code to make it generic? Will there be any footguns?
Let's find out.
The status quo of the search tree code
In
this article, I created a binary tree, and in
another article, I turned the tree into a balanced tree (with AVL balancing logic). Both the search key and the payload data are of type string
.
type Node struct {
Value string
Data string
Left *Node
Right *Node
height int
}
What to change
Obviously, I need to change the types of the fields Value
and Data
.
Then, all functions that take or return either of these two fields, or that take a Node
and access the fields through the Node
struct, need to be adjusted. This applies to functions like Insert()
or min()
, for example.
Let's walk through the code and adjust it as required.
package main
import (
"cmp"
"fmt"
"strings"
)
Step 1: Change existing types
First, I take the Node
struct shown above, and change the Value
and Data
fields
from string
to the new generic Value
and Data
types. While the Value type must be ordered, the Data type can be anything.
This turns the Node struct itself into a generic type that I now must declare with appropriate type parameters. In general, any generic types declared inside a struct bubble up to the struct type declaration.
Note that the *Node
pointer types inside the struct also need to be properly parameterized.
Value string
Data string
Right *Node
Left *Node
type Node[Value cmp.Ordered, Data any] struct {
Value Value
Data Data
Left *Node[Value, Data]
Right *Node[Value, Data]
height int
}
(In the comment block, this is how the struct looked before.)
When instantiating a Node
, concrete types for the Value and Data parameters must be supplied.
Then the fields Value
and Data
get instantiated to the given concrete types.
Example: n := *Node{uint16, []byte}
Step 2: change functions and methods
Now let's look through all the functions and methods and make them polymorphic.
Wherever a function receives a Node
value, or a value string or data string,
I need to change this to the respective generic type, for example, Node[Value, Data]
.
The same applies to method receivers.
Ordered
constraint.
Type T
must support comparison operations, otherwise a > b
would fail
at runtime if T is instantiated with a non-comparable type.func max[T cmp.Ordered](a, b T) T {
if a > b {
return a
}
return b
}
*Node
becomes *Node[Value, Data]
.Later, when instantiating a struct of type
Node
, concrete types
need to be supplied for Value
and Data
.func (n *Node[Value, Data]) Height() int {
if n == nil {
return 0
}
return n.height
}
func (n *Node[Value, Data]) Bal() int {
return n.Right.Height() - n.Left.Height()
}
value, data string
is nowvalue Value, data Data
.The function body remains untouched, as all operations on
value
, data
, n.Value
, or n.Data
work the same, even though the concrete types for Value
and Data
are not known yet.Especially,
==
and <
work fine for the Value
type because of the Ordered
type constraint.func (n *Node[Value, Data]) Insert(value Value, data Data) *Node[Value, Data] {
if n == nil {
return &Node[Value, Data]{
Value: value,
Data: data,
height: 1,
}
}
if n.Value == value {
n.Data = data
return n
}
if value < n.Value {
n.Left = n.Left.Insert(value, data)
} else {
n.Right = n.Right.Insert(value, data)
}
n.height = max(n.Left.Height(), n.Right.Height()) + 1
return n.rebalance()
}
#boring
func (n *Node[Value, Data]) rotateLeft() *Node[Value, Data] {
r := n.Right
n.Right = r.Left
r.Left = n
n.height = max(n.Left.Height(), n.Right.Height()) + 1
r.height = max(r.Left.Height(), r.Right.Height()) + 1
return r
}
func (n *Node[Value, Data]) rotateRight() *Node[Value, Data] {
l := n.Left
n.Left = l.Right
l.Right = n
n.height = max(n.Left.Height(), n.Right.Height()) + 1
l.height = max(l.Left.Height(), l.Right.Height()) + 1
return l
}
func (n *Node[Value, Data]) rotateRightLeft() *Node[Value, Data] {
n.Right = n.Right.rotateRight()
n = n.rotateLeft()
n.height = max(n.Left.Height(), n.Right.Height()) + 1
return n
}
func (n *Node[Value, Data]) rotateLeftRight() *Node[Value, Data] {
n.Left = n.Left.rotateLeft()
n = n.rotateRight()
n.height = max(n.Left.Height(), n.Right.Height()) + 1
return n
}
func (n *Node[Value, Data]) rebalance() *Node[Value, Data] {
switch {
case n.Bal() < -1 && n.Left.Bal() == -1:
return n.rotateRight()
case n.Bal() > 1 && n.Right.Bal() == 1:
return n.rotateLeft()
case n.Bal() < -1 && n.Left.Bal() == 1:
return n.rotateLeftRight()
case n.Bal() > 1 && n.Right.Bal() == -1:
return n.rotateRightLeft()
}
return n
}
func (n *Node[Value, Data]) Find(s Value) (Data, bool) {
if n == nil {
go2go
has no dedicated expression for “zero value of type T” (yet).
This is resolved here by instantiating a variable of type T and returning that variable.An alternate way is shown below, and a third alternative is to use named return parameters and use a naked
return
statement. var zero Data
return zero, false
}
switch {
case s == n.Value:
return n.Data, true
case s < n.Value:
return n.Left.Find(s)
default:
return n.Right.Find(s)
}
}
func (n *Node[Value, Data]) Dump(i int, lr string) {
if n == nil {
return
}
indent := ""
if i > 0 {
indent = strings.Repeat(" ", (i-1)*4) + "+" + lr + "--"
}
fmt.Printf("%s%v[%d,%d]\n", indent, n.Value, n.Bal(), n.Height())
n.Left.Dump(i+1, "L")
n.Right.Dump(i+1, "R")
}
type Tree[Value cmp.Ordered, Data any] struct {
Root *Node[Value, Data]
}
func (t *Tree[Value, Data]) Insert(value Value, data Data) {
t.Root = t.Root.Insert(value, data)
if t.Root.Bal() < -1 || t.Root.Bal() > 1 {
t.rebalance()
}
}
func (t *Tree[Value, Data]) rebalance() {
if t == nil || t.Root == nil {
return
}
t.Root = t.Root.rebalance()
}
func (t *Tree[Value, Data]) Find(s Value) (Data, bool) {
if t == nil || t.Root == nil {
Find
above.Here, we use
new
to create a zero value on the fly.new
returns a pointer, and hence we need to add the dereferencing operator. return *new(Data), false
}
return t.Root.Find(s)
}
func (t *Tree[Value, Data]) Traverse(n *Node[Value, Data], f func(*Node[Value, Data])) {
if n == nil {
return
}
t.Traverse(n.Left, f)
f(n)
t.Traverse(n.Right, f)
}
func (t *Tree[Value, Data]) PrettyPrint() {
printNode := func(n *Node[Value, Data], depth int) {
fmt.Printf("%s%v\n", strings.Repeat(" ", depth), n.Value)
}
var walk func(*Node[Value, Data], int)
walk = func(n *Node[Value, Data], depth int) {
if n == nil {
return
}
walk(n.Right, depth+1)
printNode(n, depth)
walk(n.Left, depth+1)
}
walk(t.Root, 0)
}
func (t *Tree[Value, Data]) Dump() {
t.Root.Dump(0, "")
}
How to use the new generic tree type
Now is the moment where I can instantiate the generic Tree[Value, Data]
type into something tangible like Tree[int,string]
.
func main() {
values := []string{"d", "b", "g", "g", "c", "e", "a", "h", "f", "i", "j", "l", "k"}
data := []string{"delta", "bravo", "golang", "golf", "charlie", "echo", "alpha", "hotel", "foxtrot", "india", "juliett", "lima", "kilo"}
string
type for both Value and Data.
This is basically the same tree as in the original article about balanced trees. tree := &Tree[string, string]{}
for i := 0; i < len(values); i++ {
tree.Insert(values[i], data[i])
}
fmt.Print("\n*** Tree with string search values and string data ***\n\n")
fmt.Print("Sorted values: | ")
*Tree
above, *Node
also needs to get instantiated with concrete types. tree.Traverse(tree.Root, func(n *Node[string, string]) { fmt.Print(n.Value, ": ", n.Data, " | ") })
fmt.Println()
fmt.Println("Pretty print (turned 90° anti-clockwise):")
tree.PrettyPrint()
keys := []int{4, 2, 7, 7, 3, 5, 1, 8, 6, 9, 10, 12, 11}
data
slice here. It remains the same slice of strings.
int
and string
for Value and Data, respectively. intTree := &Tree[int, string]{}
for i := 0; i < len(keys); i++ {
intTree.Insert(keys[i], data[i])
}
fmt.Print("\n*** Tree with int search values and string data ***\n\n")
fmt.Print("Sorted values: | ")
intTree.Traverse(intTree.Root, func(n *Node[int, string]) { fmt.Print(n.Value, ": ", n.Data, " | ") })
fmt.Println()
fmt.Println("Pretty print")
intTree.PrettyPrint()
### How about creating a search tree of search trees?
Let's feed a search tree with search trees as payload data. \
Because why not?\
And because doing this can answer an interesting question: Will the syntax of nested generic type instantiatons become unwieldy?
keys = []int{3, 1, 2}
trees := []*Tree[string, string]{tree, tree, tree}
treeTree := &Tree[int, *Tree[string, string]]{}
for i := 0; i < len(keys); i++ {
treeTree.Insert(keys[i], trees[i])
}
fmt.Print("\n*** Tree with int search values and Tree[string, string] data ***\n\n")
fmt.Print("Sorted values: | ")
*Tree
above, *Node
also needs to get instantiated with concrete types. treeTree.Traverse(treeTree.Root, func(n *Node[int, *Tree[string, string]]) { fmt.Print(n.Value, ": ", n.Data, " | ") })
fmt.Println()
fmt.Println("Pretty print:")
treeTree.PrettyPrint()
var val string
subtree, found := treeTree.Find(2)
if found {
val, found = subtree.Find("b")
}
fmt.Printf("Find \"s\" in subtree 2: %v (found: %t)\n", val, found)
}
How to run the code
This code runs with Go 1.21 or later. It also runs fine in the Go Playground.
Conclusion
Turning an existing container data type into a generic one has only few surprises. Hey, I told you it will be boring!
With a few checks in mind, you should be ready for generizing… generalizing… genericizing… genericking… uh, whatever… your existing container data types.
- Review all the operations your code applies to the original types. If these operations apply to a certain kind of data type only, your generic type needs a type constraint.
- Look through your
fmt.Printf
statements. Most likely, you will need to change a few type-specific placeholders to a general%v
to avoid errors. - Look for return statements that return a zero value. Typically, these occur when returning a non-nil error.
Example:return "", errors.New(...)
.
Use one of the workaround shown above:- Workaround 1: declare a variable of type T, which defaults to the type's zero value. Return that variable.
- Workaround 2: use
*new(T)
, which instantiates T, returns a pointer, and dereferences that pointer. The result is a zero value of T. Return that result.
(See the tree code above for working examples.)
In summary, I am pleased about how easy the conversion process turned out to be, and also how readable the result is. Once generics are included in an official release, workarounds like the ones I described in another article are not required anymore.
That's it. Happy generic coding! ʕ◔ϖ◔ʔ
Trees and background image courtesy of artists at Pixabay
Changelog
2023-08-22
- Updated the code to work with Go 1.21. New: The
cmp
package. Obsolete: theconstraints
package. Link to the playground updated accordingly. - Added missing link to the github repo of this article.
2022-01-04
- Updated the code from go2go version of May 2021 to the current dev branch (which is a pre-release version of Go 1.18). The code is now compatible with Go 1.18. The playground link now opens the current dev branch rather than the (obsolete) go2go Playground.
- I also took the chance to change “we” to “I” to match the title of the article.
Note the import of the ‘cmp’ package (added in Go 1.21). This package provides types and functions for comparing ordered values, including the
Ordered
constraint that I need for being able to compare and sort the nodes.