Continuous refresh, or: how to keep your API client authorized
An access token should be initialized and refreshed from a central place, yet be available to umpteenth of client sessions. Dynamic futures to the rescue.
If web apps could sweat, they would.
On one end, hoards of client sessions request continuous flow of data. On the other end, third-party APIs set rigorous access rules that require using short-lived access tokens.
Let an API access token expire and chaos starts. All open client sessions would run into errors! Bad user experience. So better keep the token fresh.
However, do not let each client session request a new access token for the same API. Only the last request wins, and all others have received a token that is already invalidated by subsequent refresh requests.
So if many client sessions access the same API, you need to manage the token refresh centrally and distribute the current token to client sessions.
I am sure that there are many ways to do this, but here is one that is quick to implement and easy to understand. I call this approach “dynamic futures”, in lack of a standard (or at least, a better) term.
Dynamic futures
If the word “futures” lets you think of our planet's potential destinies rather than programming paradigms, the article about futures in Go gets you back on track.
TL;DR: A future is a variable whose value does not exist initially but will be available after it gets computed. In Go, a future can be trivially implemented as a goroutine with a result channel.
result := make(chan int)
go func() {
result <- compute()
}()
// other activity...
future := <-result
In this article, I want to leverage this mechanism to create a future that updates itself in the background. I pose several requirements to the implementation:
- All concurrent work shall be hidden from the client. No channels or other concurrency constructs shall leak to the client side. The client shall be able to always get a current and valid access token by a simple method call.
- Use standard Go concurrency ingredients only.
- Use the standard library only. (This rules out using
singleflight
or similar packages.)
The challenge: orchestrating token refreshes and client requests
Scenario: A web app serves many clients. It also needs to call a third-party web API to fetch various data. This third-party API requires an access token that expires after a certain time. The web app must refresh the token on time, or else it would not be able to fetch more data from the API, and the client sessions would be blocked.
Furthermore, the web app must abide by the following rules:
- The solution must not provide a client with an expired token
- No client session should be blocked for an unacceptable amount of time
Let's see if we can implement all of these requirements and rules with minimal effort.
The starting point: a future that can be read more than once
With a slight modification, we can make the “future” code from above return the computed result as often as we want.
c := make(chan int)
n := 256
go func(input int, result chan<- int) {
value := calculate(input)
for {
result <- value
}
}(n, c)
The code can now read the channel as often as it wants.
value1 := <-c
value2 := <-c // Read again, get the same value again
How to serve a value and compute a new one in the background
The above future implementation computes the result only once. To update the result in the background, we need a way of reading and writing the result in a concurrency-safe way.
Did anyone shout, “mutex!”?
Mutexes do work, but there is a better way:
a select
statement.
As a quick recap, a select
statement is like a switch
statement, except that the case
conditions are channel read or write operations. That is, every case
condition attempts to either read from or write to a channel. If a channel is not ready, the affected case
blocks until the channel is ready. If more than one channel become ready for reading or writing, the select
statement randomly selects one of the unblocked case
s for execution. The other case
s remain blocked until the running case completes.
This mechanism provides an elegant way of serializing access to resources and avoid data races, without having to use mutexes.
Here is how the select
statement can help implement our “dynamic future”:
go func(ctx context.Context, token chan<- string) {
tok, lifeSpan := authorize()
expired := time.After(lifeSpan - safetyMargin)
for {
select {
case token <- tok:
// NOP
case <-expired:
tok, lifeSpan = authorize()
expired = time.After(lifeSpan - safetyMargin)
case <-ctx.Done():
return
}
}
}
What happens here?
- When the goroutine starts, it fetches an initial access token by calling function
authorize()
(that I will not define here, let's assume some black-box token refreshing logic). The function returns a new token and the token's lifespan (as a duration). - Then, the goroutine starts a timer that expires shortly before the token expires.
- The goroutine enters an infinite loop that consists of a
select
statement with two cases. - The first case unblocks when a client attempts to read the
token
channel. The case sends the current token to the channel, and the loop starts over. Everything happens in the case condition, hence the case body is empty. - The second case unblocks when the timer fires. It fetches a new token and sets a new timer.
- The
context
parameterctx
is a standard mechanism in Go for managing multiple goroutines inside a common context. Passing a cancelablecontext
value to the goroutine allows stopping the goroutine automatically when the context is canceled. That's what the thirdcase
is for.
That's it! That's all the magic of our self-updating future.
Why does this work?
The code makes use of the fundamental property of the select
statement that I mentioned above.
The select
statement randomly selects one of the unblocked case
s for execution. It is not possible that two case
s are unblocked at the same time. No data race can happen from simultaneous access to tok
from both cases.
Hence, if a client reads the token
channel, it can do so undisturbed.
And if the timer fires and the code calls the authorization API to refresh the access token, no client is able to read the current token anymore that can become invalidated at any time during the refresh.
But theory is cheap, so let's go through a working demo code.
The code
package main
time
) are actually required for the token refreshing code. They are used by the code that simulates the token refresh API, the test code, and for printing out what's going on.import (
"context"
"crypto/rand"
"fmt"
"log"
rnd "math/rand"
"sync"
"sync/atomic"
"testing"
"time"
)
const (
lifeSpanSafetyMargin
duration shall provide enough time for this.
For the simulation, it is set to an unrealistically small value, to make the test run fast. lifeSpanSafetyMargin = 10 * time.Millisecond
retryDelay = 11 * time.Millisecond
)
tokenResponse
and pass the result on to the client.type tokenResponse struct {
Token string
Err error
}
Token
represents an access token. It refreshes itself in the background by calling the API's authorization endpoint before the current token expires.type Token struct {
accessToken
channel is used to send the new access token to the client. accessToken chan tokenResponse
authorize
field allows setting a custom authorization function that implements the call to the actual authorization endpoint. authorize func() (string, time.Duration, error)
}
refreshToken
fetches a new access token from the authorization API if there is none yet or if the current one expires. It sends the results (a token or an error) to the accessToken
channel.func (a *Token) refreshToken(ctx context.Context) {
var token string
var expiration time.Duration
var err error
authorize()
is defined below. Its purpose is to fetch a new token from an authorization endpoint, including the token's lifespan. token, expiration, err = a.authorize()
expired := time.After(expiration - lifeSpanSafetyMargin)
for {
select {
case
condition writes one to the accessToken
channel. It does nothing else, hence the body of the case is empty. case a.accessToken <- tokenResponse{Token: token, Err: err}:
expired
. case <-expired:
log.Println("Token expired")
token, expiration, err = a.authorize()
if err != nil {
log.Println("Error refreshing token:", err)
expiration = retryDelay
} else {
log.Println("Token refreshed")
}
expired = time.After(expiration - lifeSpanSafetyMargin)
case <-ctx.Done():
return
}
}
}
func NewToken(ctx context.Context, auth func() (string, time.Duration, error)) *Token {
a := &Token{
accessToken: make(chan tokenResponse),
authorize: auth,
}
go a.refreshToken(ctx) // This call sets a.token and a.apiErr.
return a
}
Get()
returns the current token or an error.func (a *Token) Get() (string, error) {
t := <-a.accessToken
return t.Token, t.Err
}
Simulating an authorization endpoint
Next, let me implement a flaky authorization function that we can pass to the Token
constructor.
The function authFunc()
simulates fetching a new access token that expires after lifespan
milliseconds.
But the simulated authorization endpoint is not very stable. With a probability of apiFailureRate
, the call to the endpoint fails, and the failure then persists for apiErrorDuration
.
When this happens, the function simulates a half-heartedly backoff strategy (“sleep, then try just once again”) that fails half of the time. (Exponential backoff with jitter, anyone? Take it as a homework assignment.)
Feel free to skip reading through that part of the code, it is not relevant for the implementation. For any production purposes, you would insert a real API call here.
const (
tokenLifeSpan = 100 * time.Millisecond
averageCallDuration = 8 * time.Millisecond
apiFailureRate = 0.2
apiErrorDuration = tokenLifeSpan * 150 / 100
)
tempError
is set to true
during a simulated transient API/network outage. To avoid races, it is an atomic value.var tempError atomic.Bool
authFunc()
simulates fetching a new access token that expires after lifespan
milliseconds.func authFunc() (token string, lifespan time.Duration, err error) {
b := make([]byte, 8)
_, err = rand.Read(b)
if err != nil {
return "", tokenLifeSpan, err
}
time.Sleep(averageCallDuration)
apiFailureRate
.
The error lasts for apiErrorDuration
, then disappears.
The code pretends to do a backoff strategy that fails half of the time. if !tempError.Load() && rnd.Float64() < apiFailureRate {
log.Println("API error")
log.Println("Backing off...")
time.Sleep(tokenLifeSpan)
if rnd.Float64() < 0.5 {
log.Println("API is still not back, giving up")
tempError.Store(true)
apiErrorDuration
. go func() {
<-time.After(apiErrorDuration)
log.Println("API error disappeared")
tempError.Store(false)
}()
} else {
log.Println("API error disappeared during backoff")
}
}
if tempError.Load() {
return "", tokenLifeSpan, fmt.Errorf("temporary API error")
}
return fmt.Sprintf("%x", b), tokenLifeSpan, nil
}
Testing
Instead of building a complete web app with an HTTP server, handlers, and more, let's just write a simple test that simulates a few clients requesting new tokens in regular intervals.
func TestTokenGet(t *testing.T) {
log.SetFlags(0) // no extra log info
log.Println("Starting test")
ctx, cancel := context.WithCancel(context.Background())
a := NewToken(ctx, authFunc)
defer cancel()
client := func(n int, token *Token, done <-chan struct{}, wg *sync.WaitGroup) {
for {
select {
case <-done:
wg.Done()
log.Printf("Client %d done\n", n)
return
default:
t, err := token.Get()
log.Printf("Client %d token: %s, err: %v\n", n, t, err)
time.Sleep(tokenLifeSpan / 5)
}
}
}
var wg sync.WaitGroup
done := make(chan struct{})
log.Println("Starting clients")
for i := 0; i < 5; i++ {
wg.Add(1)
go client(i, a, done, &wg)
}
<-time.After(5 * time.Second)
log.Println("Stopping clients")
close(done)
log.Println("Waiting for clients to clean up")
wg.Wait()
}
Exploring an alternative using mutexes
How did this article arrive at the presented solution using channels and select
?
I started from modeling futures in Go, especially, futures that deliver the computed result continuously. The beauty of that approach is that you do not need to share memory that you'd need to protect with mutexes at every access path.
I developed this “continuous future” further into a future that can update itself in the background.
I added a timer to signal when it's time to update the token. I added a select
statement that listens to the timer and to consumers. This select
statement takes the role of a mutex
construct for guarding access to the token data.
How would the same code look like when using mutexes?
Don't communicate by sharing memory…
…except if the scenario is simple enough. So let's try a version that uses shared memory protected by mutexes.
type MToken struct {
token string
tokenErr error
mu sync.RWMutex
authorize func() (string, time.Duration, error)
ctx context.Context
}
func NewMToken(ctx context.Context, auth func() (string, time.Duration, error)) *MToken {
m := &MToken{
authorize: authFunc,
ctx: ctx,
}
go m.refreshToken(ctx) // This call sets a.token and a.apiErr.
return m
}
m.token
. Locking the mutex for reading (through RLock()
) allows mutliple readers to read the protected value at the same time. If a writer requests a write lock through Lock()
, then RLock()
does not let any new readers aquire a read lock until the write lock is released.func (m *MToken) Get() (string, error) {
m.mu.RLock()
defer m.mu.RUnlock()
return m.token, m.tokenErr
}
refreshToken
, the tokenResponse
channel is replaced by a direct, mutex-protected write to m.token
.func (m *MToken) refreshToken(ctx context.Context) {
var expiration time.Duration
m.mu.Lock()
m.token, expiration, m.tokenErr = m.authorize()
m.mu.Unlock()
expired := time.After(expiration - lifeSpanSafetyMargin)
for {
select {
expired
. case <-expired:
log.Println("Token expired")
m.mu.Lock()
m.token, expiration, m.tokenErr = m.authorize()
err := m.tokenErr
m.mu.Unlock()
if err != nil {
log.Println("Error refreshing token:", err)
expiration = retryDelay
} else {
log.Println("Token refreshed")
}
expired = time.After(expiration - lifeSpanSafetyMargin)
case <-ctx.Done():
return
}
}
}
Test the token version
func TestMTokenGet(t *testing.T) {
log.SetFlags(0) // no extra log info
log.Println("Starting mutex test")
ctx, cancel := context.WithCancel(context.Background())
m := NewMToken(ctx, authFunc)
defer cancel()
client := func(n int, token *MToken, done <-chan struct{}, wg *sync.WaitGroup) {
for {
select {
case <-done:
wg.Done()
log.Printf("Mutex client %d done\n", n)
return
default:
t, err := token.Get()
log.Printf("Mutex client %d token: %s, err: %v\n", n, t, err)
time.Sleep(tokenLifeSpan / 5)
}
}
}
var wg sync.WaitGroup
done := make(chan struct{})
log.Println("Starting mutex clients")
for i := 0; i < 5; i++ {
wg.Add(1)
go client(i, m, done, &wg)
}
<-time.After(5 * time.Second)
log.Println("Stopping mutex clients")
close(done)
log.Println("Waiting for mutex clients to clean up")
wg.Wait()
}
Run the code
The code runs fine in the Go playground and therefore is just one click away.
Alternatively, clone the
code repository and run go test
.
git clone https://github.com/appliedgo/refresh
cd refresh
go test -v .
Thoughts on using the code in real life
The code shown here is meant for demo purposes. To use it in production, consider a couple of improvements.
Add a good backoff strategy
The above code's backoff “strategy” is to sleep once and try again.
A tested-and-proven backup strategy for production code is “exponential backoff with jitter”. Exponential means that the time between retries becomes exponentially longer (for example, the first delay is 1 second, the second is 2 seconds the third is 4, the fourth is 8, and so on). Jitter means adding a random amount of time to the delay, to avoid that clients that happen to go into backoff at the same time all retry the call at the same times.
Which approach is faster, channels or mutexes?
You probably don't need to care. The performance difference between sharing data via guarded access versus sharing data via channels is certainly negligible, given that the shared data—an access token—is used for calling an API endpoint (or even multiple endpoints), which is orders of magnitudes slower than reading a string shared via mutex-locking or a channel.
But for the sake of it, I have run benchmarks on Token.Get() and MToken.Get(). (The benchmark code is available in the repository only.) The results speak for themselves.
> go test -run NONE -bench=.
goos: darwin
goarch: arm64
pkg: github.com/appliedgo/refresh
BenchmarkToken_Get-12 4263207 296.5 ns/op
BenchmarkMToken_Get-12 4758291 317.2 ns/op
PASS
ok github.com/appliedgo/refresh 4.552s
> go test -run NONE -bench=.
goos: darwin
goarch: arm64
pkg: github.com/appliedgo/refresh
BenchmarkToken_Get-12 3404556 336.8 ns/op
BenchmarkMToken_Get-12 3348738 315.0 ns/op
PASS
ok github.com/appliedgo/refresh 3.054s
> go test -run NONE -bench=.
goos: darwin
goarch: arm64
pkg: github.com/appliedgo/refresh
BenchmarkToken_Get-12 4190270 302.7 ns/op
BenchmarkMToken_Get-12 4691931 295.7 ns/op
PASS
ok github.com/appliedgo/refresh 4.108s
None of the two variants is considerably faster than the other. And by “considerably”, I mean by at least an order of magnitude faster.
Why do you use a timer-based token refresher?
A typical API client would call into the API until it gets a token expiration error. Then the client would refresh the token and continue where it left off.
In contrast to this, our web app receives session-less requests from an unknown number of site visitors. It could be one user or a thousand visitors requesting a transfer search or booking at the same time. The web app does not create a separate API connection for each of them. It maintains a single one.
In this scenario, there is no point in letting thousands of request run into the token timeout. They all make use of the same API connection, so the better approach is to refresh the token once for all visitors, before the timeout.
But keep in mind that this approach has also a downside. refreshToken()
is a goroutine that enters an infinite loop. It may therefore outlive the context that it was created in, unless it gets stopped from the outside.
This is a potential goroutine leak. If an app needs a lot of separate API connections, it would create many Token
variables, and each of them would spawn a neverending goroutine that continues to consume resources.
Remember:
Always think about your goroutines’ lifetime.
As a rule of thumb, keep goroutines short-lived wherever possible. Do not let goroutines enter infinite loops without ensuring they can—and will—be stopped when their job is done.
The current solution uses a cancelable context, but the app must ensure to eventually cancel the context. That's an additional burden for the Token
consumer, unless the consumer already uses a cancelable context that the token refresher can be hooked into.
Conclusion
We went from a simple channel-based future to a self-refreshing value. We compared a channel-based approach and an approach using mutexes, and we had a critical look at the token refresher that uses a timer instead of letting API calls run into the token timeout. I hope you had some fun time while reading this article and trying out the code. thank you
Links
- Futures in Go • Applied Go
- The Singleflight package
- Select statements (Go language specification)
- This code on GitHub
- Go playground
Happy coding!
Changelog:
2023-10-25: Add context and an atomic value to address a potential goroutine leak and a race condition.
2023-10-26: Add a mutex version and benchmarks.
2023-10-28: Replace footnotes with Commonmark-compliant link references.
Imports and globals
The package name must be
main
to make the code run in the Go playground. In real life, the package would be a library package named, for example,auth
.