Balancing a binary search tree

Only a well-balanced search tree can provide optimal search performance. This article adds automatic balancing to the binary search tree from the previous article.

Now let’s insert a new node below the left child of n.

Two scenarios can happen:

1. The new node was inserted as the left child of n’s left child.

Since n has no right children, its balance factor is now -2. (Remember, the balance is defined as “height of right tree minus height of left tree”.) This is an easy case. All we have to do is to “rotate” the tree:

  1. Make the left child node the root node.
  2. Make the former root node the new root node’s right child.

Here is a visualization of these steps (click “Rotate”):

The balance is restored, and the tree’s sort order is still intact.

Easy enough, isn’t it? Well, only until we look into the other scenario…

2. The new node was inserted as the right child of n’s left child.

This looks quite similar to the previous case, so let’s try the same rotation here. Click “Single Rotation” in the diagram below and see what happens:

The tree is again unbalanced; the root node’s balance factor changed from -2 to +2. Obviously, a simple rotation as in case 1 does not work here.

Now try the second button, “Double Rotation”. Here, the unbalanced node’s left subtree is rotated first, and now the situation is similar to case 1. Rotating the tree to the right finally rebalances the tree and retains the sort order.

Two more cases and a summary

The two cases above assumed that the unbalanced node’s balance factor is -2. If the balance factor is +2, the same cases apply in an analogous way, except that everything is mirror-reversed.

To summarize, here is a scenario where all of the above is included - double rotation as well as reassigning a child node/tree to a rotated node.

The Code

Now, after all this theory, let’s see how to add the balancing into the code from the previous article.

First, we set up two helper functions, min and max, that we will need later.

Imports, helper functions, and globals

package main

import (
	"fmt"
	"strings"
)

min is like math.Min but for int.

func min(a, b int) int {
	if a < b {
		return a
	}
	return b
}

max is math.Max for int.

func max(a, b int) int {
	if a > b {
		return a
	}
	return b
}

Node gets a new field, height, to store the height of the subtree at this node.

type Node struct {
	Value  string
	Data   string
	Left   *Node
	Right  *Node
	height int

TODO bal int // height(n.Right) - height(n.Left)

}

Height returns the height value. Wait, what’s the point? Well, the zero value of *Node is nil. If a child node is nil, there is no heightfield available; however, it is possible to call a method of a nil struct value! As a Go proverb says, “Make the zero value useful”.

func (n *Node) Height() int {
	if n == nil {
		return 0
	}
	return max(n.Right.Height(), n.Left.Height())
}

Bal returns the balance of a node’s subtrees: 0 for a balanced node, +n if the right subtree is n nodes taller than the left, -n if the left subtree is n nodes taller than the right.

func (n *Node) Bal() int {
	return n.Right.Height() - n.Left.Height()
}

The modified Insert function

Insert takes a search value and some data and inserts a new node (unless a node with the given search value already exists, in which case Insert only replaces the data).

It returns:

  • true if the height of the tree has increased.
  • false otherwise.
func (n *Node) Insert(value, data string) bool {

The following actions depend on whether the new search value is equal, less, or greater than the current node’s search value.

	switch {
	case value == n.Value:
		n.Data = data
		return false // Node already exists, nothing changes
	case value < n.Value:

If there is no left child, create a new one.

		if n.Left == nil {

Create a new node.

			n.Left = &Node{Value: value, Data: data, height: 1}
		} else {

The left child is not nil. Continue in the left subtree.

			if n.Left.Insert(value, data) {

If the subtree’s balance factor has become either -2 or 2, the subtree must be rebalanced.

				if n.Left.Bal() < -1 || n.Left.Bal() > 1 {
					n.rebalance(n.Left)
				}
			}
		}

This case is analogous to value < n.Value, except that everything is mirrored.

	case value > n.Value:
		if n.Right == nil {
			n.Right = &Node{Value: value, Data: data, height: 1}
		} else {
			if n.Right.Insert(value, data) {
				if n.Right.Bal() < -1 || n.Right.Bal() > 1 {
					n.rebalance(n.Right)
				}
			}
		}
	}
	if n.Bal() != 0 {
		return true
	}

No more adjustments to the ancestor nodes required.

	return false
}

The new rebalance() method and its helpers rotateLeft(), rotateRight(), rotateLeftRight(), and rotateRightLeft.

Important note: Many of the assumptions about balances, left and right children, etc, as well as much of the logic usde in the functions below, apply to the Insert operation only. For Delete operations, different rules and operations apply. As noted earlier, this article focuses on Insert only, to keep the code short and clear.

rotateLeft takes a child node and rotates the child node’s subtree to the left.

func (n *Node) rotateLeft(c *Node) {
	fmt.Println("rotateLeft " + c.Value)

Save c’s right child.

	r := c.Right

r’s left subtree gets reassigned to c. This also reduces the height of c’s right subtree by 1, hence also c’s own height by 1.

	c.Right = r.Left
	c.height -= 1

c becomes the left child of r.

	r.Left = c

Make the parent node (that is, the current one) point to the new root node. The balanced subtree’s height is reduced by one, and n’s balance needs to be adjusted accordingly.

	if c == n.Left {
		n.Left = r
	} else {
		n.Right = r
	}
}

rotateRight is the mirrored version of rotateLeft.

func (n *Node) rotateRight(c *Node) {
	fmt.Println("rotateRight " + c.Value)
	l := c.Left
	c.Left = l.Right
	l.Right = c
	if c == n.Left {
		n.Left = l
	} else {
		n.Right = l
	}
	c.bal = 0
	l.bal = 0
}

rotateRightLeft first rotates the right child of c to the right, then c to the left.

func (n *Node) rotateRightLeft(c *Node) {

rotateRight assumes that the left child has a left child, but as part of the rotate-right-left process, the left child of c.Right is a leaf. We therefore have to tweak the balance factors before and after calling rotateRight. If we did not do that, we would not be able to reuse rotateRight and rotateLeft.

	c.Right.Left.bal = 1
	c.rotateRight(c.Right)
	c.Right.bal = 1
	n.rotateLeft(c)
}

rotateLeftRight first rotates the left child of c to the left, then c to the right.

func (n *Node) rotateLeftRight(c *Node) {
	c.Left.Right.bal = -1 // The considerations from rotateRightLeft also apply here.
	c.rotateLeft(c.Left)
	c.Left.bal = -1
	n.rotateRight(c)
}

rebalance brings the (sub-)tree with root node c back into a balanced state.

func (n *Node) rebalance(c *Node) {
	fmt.Println("rebalance " + c.Value)
	c.Dump(0, "")
	switch {

Left subtree is too high, and left child has a left child.

	case c.bal == -2 && c.Left.bal == -1:
		n.rotateRight(c)

Right subtree is too high, and right child has a right child.

	case c.bal == 2 && c.Right.bal == 1:
		n.rotateLeft(c)

Left subtree is too high, and left child has a right child.

	case c.bal == -2 && c.Left.bal == 1:
		n.rotateLeftRight(c)

Right subtree is too high, and right child has a left child.

	case c.bal == 2 && c.Right.bal == -1:
		n.rotateRightLeft(c)
	}
}

Find stays the same as in the previous article.

func (n *Node) Find(s string) (string, bool) {

	if n == nil {
		return "", false
	}

	switch {
	case s == n.Value:
		return n.Data, true
	case s < n.Value:
		return n.Left.Find(s)
	default:
		return n.Right.Find(s)
	}
}

Dump dumps the structure of the subtree starting at node n, including node search values and balance factors. Parameter i sets the line indent. lr is a prefix denoting the left or the right child, respectively.

func (n *Node) Dump(i int, lr string) {
	if n == nil {
		return
	}
	indent := ""
	if i > 0 {

indent = strings.Repeat(” “, (i-1)*4) + “+” + strings.Repeat(“-”, 3)

		indent = strings.Repeat(" ", (i-1)*4) + "+" + lr + "--"
	}
	fmt.Printf("%s%s[%d]\n", indent, n.Value, n.bal)
	n.Left.Dump(i+1, "L")
	n.Right.Dump(i+1, "R")
}

Tree

Changes to the Tree type:

  • Insert now takes care of rebalancing the root node if necessary.
  • A new method, Dump, exist for invoking Node.Dump.
  • Delete is gone.

type Tree struct {
	Root *Node
}

func (t *Tree) Insert(value, data string) {
	if t.Root == nil {
		t.Root = &Node{Value: value, Data: data}
		return
	}
	t.Root.Insert(value, data)

If the root node gets out of balance,

	if t.Root.bal < -1 || t.Root.bal > 1 {
		t.rebalance()
	}
}

Node’s rebalance method is invoked from the parent node of the node that needs rebalancing. However, the root node of a tree has no parent node. Therefore, Tree’s rebalance method creates a fake parent node for rebalancing the root node.

func (t *Tree) rebalance() {
	if t == nil || t.Root == nil {

Nothing to balance here.

		return
	}
	fakeParent := &Node{Left: t.Root, Value: "fakeParent"}
	fakeParent.rebalance(t.Root)

Fetch the new root node from the fake parent node

	t.Root = fakeParent.Left
}

func (t *Tree) Find(s string) (string, bool) {
	if t.Root == nil {
		return "", false
	}
	return t.Root.Find(s)
}

func (t *Tree) Traverse(n *Node, f func(*Node)) {
	if n == nil {
		return
	}
	t.Traverse(n.Left, f)
	f(n)
	t.Traverse(n.Right, f)
}

PrettyPrint prints the tree at a 90° angle, with the root to the left and the leaves to the right.

func (t *Tree) PrettyPrint() {
	walk := func(n *Node, print func(n *Node)) {
		if n == nil {
			return
		}
		walk(n.Right, print)
		print(n)
		walk(n.Left, print)
	}
	walk(t.Root, func(n *Node) {
		fmt.Printf("%s%s\n", strings.Repeat("  ", n.Height()-1))
	})
}

Dump dumps the tree structure.

func (t *Tree) Dump() {
	t.Root.Dump(0, "")
}

A demo

Using the Dump method plus some fmt.Print... statements at relevant places, we can watch the code how it inserts new values, rebalancing the subtrees where necessary.

The output of the final Dump call should look like this:

g[1]
+L--d[0]
    +L--b[0]
        +L--a[0]
        +R--c[0]
    +R--e[1]
        +R--f[0]
+R--i[1]
    +L--h[0]
    +R--k[0]
        +L--j[0]
        +R--l[0]

The small letters are the search values. “L” and “R” denote if the child node is a left or a right child. The number in brackets is the balance factor.

If everything works correctly, the Traverse method should finally print out the nodes in alphabetical sort order.

func main() {

The values are sorted in a way that causes two single rotations and a double rotation.

	values := []string{"d", "b", "g", "g", "c", "e", "a", "h", "f", "i", "j", "l", "k"}
	data := []string{"delta", "bravo", "golang", "golf", "charlie", "echo", "alpha", "hotel", "foxtrot", "india", "juliett", "lima", "kilo"}

	tree := &Tree{}
	for i := 0; i < len(values); i++ {
		fmt.Println("Insert " + values[i] + ": " + data[i])
		tree.Insert(values[i], data[i])
		tree.Dump()
		fmt.Println()
	}

	fmt.Print("Sorted values: | ")
	tree.Traverse(tree.Root, func(n *Node) { fmt.Print(n.Value, ": ", n.Data, " | ") })
	fmt.Println()

}

As always, the code is available on GitHub. Using the -d flag with go get to avoid that the binary gets auto-installed into $GOPATH/bin.

go get -d github.com/appliedgo/balancedtree
cd $GOPATH/src/github.com/appliedgo/balancedtree
go build
./balancedtree

The code is also available on the Go Playground. (Subject to availabilty of the Playground service.)

Conclusion

Keeping a binary search tree in balance is a bit more involved as it might seem at first. In this article, I have broken down the rebalancing to the bare minimum by removing the Delete operation entirely. If you want to dig deeper, here are a couple of useful readings:

Wikipedia on Tree Rotation: Richly illustrated, concise discussion of the rotation process.

German Wikipedia on AVL Trees: Sorry, this is German only, but when you scroll down to section 4, “Rebalancierung”, there are a couple of detailed diagrams on single and double rotation. Here you can see how the subtree heights change after each rotation.

GitHub search for Go AVL libs: For advanced study :)

That’s it. Happy tree planting!