The Unexamined Code: Four Steps to Get to Know Your Codebase

At his trial for impedity and corrupting youth in 399 BC, the Greek philosopher Sokrates supposedly said,

The unexamined life is not worth living.

In his pursuit of wisdom, Socrates considered a life unworthy if it lacks introspection, self-reflection, and critical thinking. Or, as Yafeng Shan has framed it, “Philosophers are not only concerned with metaphysical, epistemological, conceptual, ethical, and aesthetic issues of things around us; they also pay serious attention to the nature, value, methods, and development of philosophy itself.”

What could be more true for software developers! We aren't only concerned with the goals our software design & development has to achieve but also with the nature of our code, the functioning of libraries, and the various mental models cast into programming language paradigms.

The brain gets occupied with questions

So, when turning our attention towards our code, questions may pop up in our heads.

Is my code clear enough to read? Clear enough to maintain? Is it still idiomatic Go, or did some Java paradigms silently invade the codebase?

How's the architecture of my project doing? Is it still as clean as it was when the project started, or has it begun to take on the shape of a slime mold, spreading in all directions, impossible to keep at bay?

Do I even understand the code and the project still, years after it started? Do I understand my coworkers code, or the code of that third-party library I'm planning to review for inclusion in our security-sensitive project?

Now you could go the hard way and work through the code line by line, trying to build a mental model of what the code is doing… or you could summon some genies for help.

Step 1: Keep your code readable

Go already has a clear syntax, devoid of noisy semicolons and other distractions. But that's not all: In a genius move, the Go team created gofmt, a code formatting tool that gives the users no choice: The style it produces is mandatory, thus putting an end to unproductive code style discussions.

So the first step, and an imperative one, is to ensure gofmt takes care of your code. Most Go-aware IDEs take already care of calling gofmt when you save a file (either via Go's language server gopls or otherwise), or at least you can install an extension or plugin to that end.

This being said, you might want to take a look at gofumpt, which has even stricter rules than gofmt. These rules are compatible with gofmt rules, so when you run gofmt on code already formatted with gofumpt, nothing will change.

Step 2: Reduce complexity

The next thing to check is semantics. Code that uses complicated or un-idiomatic constructs lacks the patterns that have become familiar to the seasoned gopher and can thus become illegible despite looking quite clean at first sight. To fight brittle and dangerous constructs, run tools like go vet and staticcheck as a habit (or let the CI pipeline do it for you). I wrote about these tools in the previous Spotlight, so I won't go into details here.

Instead, let me zoom out a bit. Looking at functions as a whole, there is another level of complexity to keep an eye on. Large functions, probably with many if/else branches and loops inside, are difficult to comprehend and a breeding grounds for bugs. But when is a function “too large” or “too complex”?

To answer this, Thomas J. McCabe, Sr. developed a metric called cyclomatic complexity that calculates the number of linearly independent paths through a given source code. (The original measure targeted whole programs, because in 1976, programs were rather small compared to today, but nowadays it is more common to look at individual functions.)

Try it out: gocyclo examines the functions in your code and returns a complexity score for each one.

I ran this on my newsletter helper tool (where Claude has contributed quite some code in the recent past, and I suspected that the complexity got a bit out of hand) and got this top-10 result list:

> gocyclo -top 10 .
61 blogs parse news/sources/blogs/blogs.go:98:1
34 content parseMeta news/sources/content/meta.go:46:1
27 news publish news/publish.go:163:1
23 news publishWithStore news/publish.go:587:1
22 web (*Server).reviewGet news/web/review.go:25:1
20 content fetchGitHubReadmeUnlocked news/sources/content/github.go:66:1
20 content (*ContentFetcher).FetchPending news/sources/content/fetcher.go:55:1
20 service AddManualCapture news/service/add.go:199:1
20 rss TestReadOpmlFile news/feeds/rss/opml_test.go:8:1
19 sqlite (*SQLiteStorage).savePost news/storage/sqlite/sqlite.go:121:1

The vast majority of functions remains below 20 (average is 4.42), but a score of 61 doesn't seem quite healthy anymore. Surely, it stands to reason about the level above which the complexity becomes a risk to the stability, maintainability, and security of the code, but McCabe suggested that “high risk” starts at a score of 20. So obviously, there's work ahead for me…

Step 3: Examine the architecture

At this point, I must clearly say: If you can get hold of architectural documents, like a functional specification or a C4 model, read them.

Starting from plain source code, you can only reconstruct the lower levels of an architectural design. High-level design almost never reflects at the code level, just like a meal served in a restaurant barely reveals the recipe used to create it.

Here is what you can do right away with basic tools:

1. Review the repository layout

The folder structure can already tell a lot about a project. Get it (excluding uninteresting subtrees) with the tree command:

tree -d -I 'vendor|node_modules|.git'

If you're only interested in the package structure of a repo, a simple

go list ./...

does the trick.

2. Reveal dependencies

The dependency graph uncovers the code that's invisible when looking at the file structure. A simple go mod graph prints out all direct and indirect dependencies recursively, but the list gets large quickly even for moderately large projects.

There is no flag such as --depth for go mod graph, but this one-liner shows just the direct dependencies of a project:

# Get the module name
MOD=$(go list -m)

go list -f '{{range .Imports}}{{printf "\"%s\" -> \"%s\"\n" $.ImportPath .}}{{end}}' ./... | grep "$MOD"

However, not all packages are of equal importance. With a bit of filtering and sorting, you can list the 20 packages imported most often, sorted by number of imports:

go list -f '{{range .Imports}}{{println .}}{{end}}' ./... | grep "$MOD" | sort | uniq -c | sort -rn | head -20

Check interfaces

Interfaces sit at the seams of an architecture and define the contracts between layers. Grep the interface definitions for a start:

grep -rn '^type.*interface' --include='*.go' .

Find concurrency boundaries

See what parts of the code run independent of each other. Two quick greps return goroutines, synchronization points and communication channels:

grep -rn '^[[:space:]]*go ' --include='*.go' .
grep -rn 'sync\.\|chan ' --include='*.go' .

The above queries are some handy tools to get a first impression of an app's inner structure. They provide meaningful entry points for exploring code, at least if the codebase isn't too big.

Step 4: Book a guided tour

When the above steps aren't sufficient to introspect code—because it's old or completely foreign to you, and you need a quick way to get a sophisticated first impression—then let a bot explain the code to you.

My favorite technique for this is a linear walkthrough. The idea is straightforward: Ask an LLM to give you a guided tour through the code, explaining the details and showing how the parts work together. This doesn't require a thoughtfully engineered mega-prompt; in fact, the prompt is as simple as the idea itself:

Read the source and then plan a linear walkthrough of the code that explains how it all works in detail

Pass this prompt to a frontier LLM, and ensure it has access to the code, and it will play a Socratic interlocutor who walks you gently through the code, answering any question you ask him.

For larger projects, it's probably best to let the LLM walk you through the individual sections of the project separately and then synthesize a meta-walkthrough from the individual walkthroughs. This strategy would respect the limits of the LLM's context window and reduce the costs associated with maxing out the context window.

And with the above techniques, you know how to split an app into its parts already.

Familiar code is a power tool

The most maintainable, extensible, and optimizable code is the one you're familiar with. If you face unfamiliar code, your code exploration toolkit and your growing experience help to quickly become familiar with it.