On testing: "Does it work?" vs. "Does it fulfill the contract?"

How do you write unit tests for a function? It's tempting to look at the internals of the function and write the tests to confirm that the algorithm works “by the letter.” But often, the more important question is: Does the function deliver what the callers expect? These two approaches to testing are so distinct that it's worth taking a closer look at the differences, why they matter, and why the latter kind is preferable (but not in every case).

Testing “if it works”

If you test “if the code works,” you test implementation details. I'll refer to this testing as “implementation test” in this article. Implementation tests examine the exact algorithmic behavior of the code. The tests live in the same package as the code to be tested; they have complete access to all package internals and make use of this circumstance.

Testing for contract fulfillment

A different approach is what I will call “contract test” here. A contract test verifies the behavior of a function that is visible to a caller. Here, it is important to test if the function behaves as a caller would expect it to behave:

- Whether expected input results in correct output - Whether wrong or unexpected input triggers a well-defined reaction. an error status, some default output, or even an intentionally undefined behavior.

(Side note: reacting to input with an undefined behavior is legit if the situation neither allows to specify deterministic output nor warrants entering an error condition. For example, if a sorting algorithm receives a key/value list with identical keys, and the values have no meaningful sort order, the algorithm may return the list in any order because every possible order fulfills the requirement “sort this list by key.” The caller cannot expect to read the values of the returned list in any particular sequence.)

A contract in this context is a specification of the allowed input values and the expected results. Obviously, you need to specify a contract to write a test against. I'll come to that later.

Why would you prefer testing contracts over testing algorithms?

Now, what's the point of contract testing? What do you win by treating the implementation as a black box and looking only on the input and output?

Consider these two points:

Algorithms can change while contracts remain stable. Imagine you have a function for sorting data, and you replace the current quick sort algorithm by merge sort. Implementation tests that depend on the particular behavior of quick sort could break when run against merge sort, although the result is the same in both cases. The contract test only verifies if the function returns the data sorted by the predefined criteria and thus works for all kinds of sorting algorithms.
Refactoring can break implementation tests. Even basic refactoring can invalidate an implementation test that is too tightly coupled to the code.

What you need for testing contract fulfillment?

Well, that's the tiny caveat here. You need a contract. In the context of function and package APIs, a contract basically defines two things:

- The allowed kind of input to a function - If input is valid, the guaranteed output - If input is invalid, how the function reacts to this input

The idea of API contracts in software goes back to Bertrand Meyer, who designed the language Eiffel around contracts

The better you design the contract, the better tests you can write against the contract. A certain knack for finding edge cases certainly helps.

How Go helps

Go already supplies fundamental unit testing capabilities that help writing tests for contracts.

Black-box testing: In Go, you can place tests either in the same package as the tested code or in a separate package. Placing it in the same package gives you access to all package internals. If your tests rely on accessing package internals, you are doing white-box testing. Package or function contracts don't know, and don't care, about internals. To reliably write contract tests, you therefore need to place your tests in a separate package, where the code to be tested appears as a black box with only the exported types and function signatures being visible. Hence the name black-box tests.
Fuzzing: I suppose you have carefully searched for edge cases and designed tests to catch them. Still, you can't be sure if your tests cover all edge cases. At this point, you can employ fuzz tests to expand tests to randomly generated input. (Long before the first iPhone came out, a “personal digital assistant” named Palm Pilot was quite popular. It had a touch screen and a pen and offered a calendar a note app, and some more apps. Why do I mention the Palm Pilot? Because app developers could use a special testing mode called “Gremlins” that simulated random pen taps on the display, until the app ran into an error or the test ran out of time. I like to think “Gremlins” when working with or talking about fuzz tests.
Property testing: Contracts may describe functionality in a way that's not easy to formulate as classic “if input A then output B” tests. Rather, they may describe properties like, “Sorting an already sorted list by the same sort criteria must not change the sort order.” To test such a property, a test can generate input/output pairs for which the property must be true and then validate the property against each of the generated pairs. (Find more details and an example in the Spotlight linked at the beginning of this bullet point, including how to use testing/quick for property testing.)

Pitfalls

Writing tests against a contract isn't much different from writing unit tests in general. The part to take particular care of are the contracts themselves. Designing contracts can go wrong in two directions:

Over-specifying the contract: A contract should never specify implementation details. Like, (silly example) prescribing a specific error message when an error type would be sufficient. Implementation details should never leak into the contract.
Under-specifying the contract: For example, if a function has a pointer argument (or if a method has a pointer receiver), the contract should specify the behavior if the pointer is nil . “Undefined behavior” is the least desirable result of unexpected input.

Also, while testing for contract has obvious advantages, don't neglect implementation testing totally. For example, when testing security-relevant or performance-critical code, implementation tests can reveal erroneous code that accidentally delivers a “correct” result.

Effective and well-balanced testing

I hope this Spotlight helps raising awareness of the difference between implementation testing (a.k.a. white-box testing) and contract testing (a.k.a. black-box testing). The latter is more robust against refactoring or deeper changes to the implementation, such as swapping out an algorithm for another that fulfills the same contract.

Still, you need to weigh the pros and cons of contract testing. For example, security-critical or performance-critical code might need additional implementation tests.

Finally, if you get into the habit of designing API contracts, you can write tests for them before writing any actual code. The tests then build natural guardrails for implementing the contracts. If this approach sounds familiar to you, that's because it's the basis of Test-Driven Development (TDD).