Green tea makes your life easier (if you're a garbage collector)
Steaming hot news: A new garbage collector design for Go is under development, and you can try it out now using gotip
.
What's the problem?
Go's garbage collector is perhaps one of the finest GCs around. Years of ongoing improvement have minimized stop-the-world phases to a minimum and into the sub-millisecond range. Why did the Go team set out to redesign this tested-and-proven piece of technology?
Modern CPU cores achieve high speeds that memory cannot keep up with. Multi-core systems compensate for the speed gap through a hierarchy of fast, core-local memory caches. As a result, the location of a data entity in memory matters quite much for efficiency. Imagine a slice that's loaded into the L1 cache of CPU core 0 and some code executing on core 7 wants to access this slice. The result is a cache invalidation and a relocation of the current state of the slice to core 7's L1 cache.
Traditional garbage collection algorithms, like the tricolor-marking algorithm that Go uses, are unaware of such memory details; they consider memory as an abstract, flat address space that they can freely read from and write to, causing cache misses. Moreover, the larger the number of cores is, the more system threads fight for limited memory bandwidth.
In short, traditional GC algorithms waste time waiting for memory because they don't optimize for non-uniform memory topologies.
Desperately needed: a topology-aware GC algorithm
A garbage collector that meets the requirements of modern multicore architectures with their cache hierarchies and memory bandwidth challenges must have an idea about the architecture it operates on.
The Green Tea algorithm that the Go team has now made public in gotip
takes a new approach. Instead of scanning objects randomly, it groups them into memory blocks (“spans”) for processing.
This approach reduces three performance-hampering factors at once:
- Cache misses by processing “nearby” data together
- Contention by having cores work on separate blocks
- Memory stalls by keeping activity inside the core-local caches, reducing access to main memory
In selected GC-heavy benchmarks, the Go team observes a 10–50% reduction in CPU cost for garbage collection compared to the existing GC. Other benchmarks, however, either showed no effect or even regressed. Microbenchmarks are always tricky and have to be interpreted with a good dose of skepticism.
With the release of the prototype as a GOEXPERIMENT in gotip
, the Go team seeks feedback from real-world benchmarks that target complete programs. So if you know of a GC-heavy program or a program that is sensitive to GC overhead, consider testing the new Green Tea algorithm on it and sharing your insights with the Go team. ❤️