You Could Have Invented the Continuation Monad

With apologies to sigfpe

I remember the first time I heard about continuations.

Machinist’s Apprentice, Emma Stebbins c. 1859

I was learning to write programs in Scheme at the time and kept seeing references to something called “continuation passing style”. I didn’t understand most of what I read then but continuations were especially confusing; somehow they represent “the rest of the program” in a way that can be used to implement interesting control structures. After a halfhearted attempt to understand how they work I set them aside, since there was and is no shortage of other interesting things to learn about.

I wasn’t ready yet.

Recently, though, while thinking about something else I ran face first into a problem that continuations solve in a nifty way. In this post I’ll try to show what this problem is and how applying some straightforward FP design techniques makes the continuation monad pop out naturally. I’m using Haskell because it’s my favorite, but in principle the ideas apply in any language with first-class functions.

Resources

Very often our programs need to interact with scarce resources: things like file handles, network connections, or graphics contexts. Typically such resources have to be acquired from an underlying resource manager, perhaps supplied by the operating system. Once acquired we can interact with the resource, and when we’re done using it we should release it back to the resource manager so other programs can use it.

When working with resources of this kind, two rules are vitally important:

  • Only interact with the resource after it’s been acquired and before it’s been released. Attempting to interact with a resource before acquisition or after release will at best simply not work, and at worst do horrible things.
  • When we’re done interacting with a resource, be sure to release it. This applies even if our program errors out while using the resource. Neglecting to release results in a resource leak wherein a long-running program may eventually consume all the available resources on the system.

So our program (at least, the part that uses resources) will need to look something like this:

acquire :: IO Resource
use :: Resource -> IO ()
release :: Resource -> IO ()

program :: IO ()
program = do
  resource <- acquire
  use resource
  release resource

Simple enough.

Unfortunately this program structure, if we’re forced to remember it as-is, has some serious flaws. First, it cannot prevent us from trying to use resource again after releasing it. Even worse, it cannot guarantee that release will be invoked at all — for instance if use resource throws an exception. And we can easily forget to include the release line.

One standard way to fix these flaws is to write that program structure correctly once, and bundle it into a function that takes the details of acquire, use, and release as arguments. Enter…

The Bracket Pattern

Haskell’s standard library includes just such a function. Nestled deep in Control.Exception is a little utility called bracket which has the following signature:

bracket
  :: IO a        -- thing to do first ("acquire")
  -> (a -> IO b) -- thing to do last NO MATTER WHAT ("release")
  -> (a -> IO c) -- thing to do in between ("use")
  -> IO c

(Specialized instances of this pattern are scattered throughout the standard library, usually behind functions called with-something.) Compare the argument types for bracket to our acquire, release, and use. The actual implementation of bracket is close to program, although the types are slightly more general and there’s a bit more complexity to make sure the release computation runs even if the use computation throws an error. With bracket, we could rewrite program like so:

program :: IO ()
program =
  bracket acquire release use

Which is… kind of anticlimactic! Of course the purpose of bracket is precisely to represent the structure of program, so that’s to be expected. All we’ve done so far is syntactically “rearrange” the steps of the acquire-use-release sequence. To see what this indirection buys us we need something more complicated.

First of all, with use in the last argument position we can replace it with an anonymous function.

program2 :: IO ()
program2 =
  bracket acquire release $ \resource -> do
    putStrLn "Look at me, using a resource"
    use resource

We can also use more than one resource, confident that they will be acquired and released in nested order.

program3 :: IO ()
program3 =
  bracket acquire1 release1 $ \resource1 -> do
    bracket acquire2 release2 $ \resource2 -> do
      putStrLn "Ooh, now I'm using TWO resources"
      useBoth resource1 resource2

Because the “inner” computation of bracket is the last argument, we can chain it as many levels deep as we want. Each resource will be acquired in order and released in reverse order, even in the presence of exceptions.

(I should note here that this pattern is restricted to acquiring and releasing in a LIFO fashion. There are other patterns for interleaved resource management but until you know you need them they probably aren’t worth the extra complexity.)

If One Is Good, Many Is Better

So we have a succinct way to safely interact with a scarce resource, and we can nest it.

Now suppose we need to work with a lot of resources.

This isn’t an academic question, by the way; my motivating problem here has to do with using the Vulkan API for GPU programming, which notoriously requires roughly 18 different memory resources (which all have to be freed) just to draw a single colored triangle on the screen.

No problem; bracket can handle this…

program4 :: IO ()
program4 =
  bracket acquire1 release1 $ \resource1 -> do
    bracket acquire2 release2 $ \resource2 -> do
      bracket acquire3 release3 $ \resource3 -> do
        bracket acquire4 release4 $ \resource4 -> do
          bracket acquire5 release5 $ \resource5 -> do
            bracket acquire6 release6 $ \resource6 -> do
              bracket acquire7 release7 $ \resource7 -> do
                ...
not like that

Yikes! bracket is great, but deeply nesting bracket is not great. The more resources we need to use, the more deeply nested the chain of function applications. We could fix that by breaking out the different use callbacks as top level functions or let-bindings, but it would be nice if we didn’t have to do that.

Fortunately, we have another functional design tool in our toolbox that is sometimes useful against code patterns that march past the right margin like this: monads. Recall how monads tamed nested case expressions on Maybe and Either and nested functions on State and Writer. It’s not the point of monads, but one way to think of them is as tools for turning a nested syntactic expression into a linear one by interpreting the nesting mechanism (in this case function application) as a monadic bind. Let’s think about how to do this with bracket.

What is this the meaning of?

We’ll start off really concrete from the bracket example. First we’ll abstract a single application of bracket into two parts: the acquire/release part and the use part.

-- (bracket acquire release) use

use                     = g :: a -> IO c
bracket acquire release = f :: (a -> IO c) -> IO c

We want to turn the expression f g into an expression like f' >>= g' in some as-yet-unknown monad. I say “like”, because to get the types to work out we’ll need to do some wrapping and unwrapping of IO values. So suppose we have a monad M and functions h, k, and m (these are the wrappers) such that

expr1 :: IO c   (1)   expr2 :: IO c    (2)   expr2 :: IO c
expr1 =         ===   expr2 = m $ do   ===   expr2 =
  f g                   a <- h f               m (h f >>= k g)
                        k g a

Equality (2) comes from unpacking do notation; equality (1) is the one we really care about. We can now do a little type inference by hand:

h :: ((a -> IO c) -> IO c) -> M a    -- hint hint :)
k :: (a -> IO c) -> a -> M c
m :: M c -> IO c

A useful strategy for working with types like this is to just see what we can build with them, ignoring what the function bodies might be. For instance, note that k takes two arguments: a function that turns as into IO cs, and an a. We can combine these to get just an IO c, and now k needs to produce an M c. We have a standard class for monads into which IO values can be lifted, with the function liftIO :: IO a -> m a. We could do worse than to assume this is all k is doing – something like k g a = liftIO $ g a. Sit on that for now.

Next consider h. This looks an awful lot like a constructor for M — it takes a single input, and produces an M a. What if it is just the constructor for M? Something like

newtype M' a = M' { h :: (a -> IO c) -> IO c }

Ah- but there’s a problem here. We have two free variables (a and c) on the right side of this definition, but only one on the left. There are two ways to fix this: we could parameterize M by c on the left hand side, or we could quantify over c on the right hand side. It turns out that both of these work and lead to subtly different monads; I’m going to go with the second strategy (hiding c via quantification).

newtype M a = M { unM :: forall c. (a -> IO c) -> IO c }

And just like that, the weird definition of continuations appears. Actually this is usually called codensity, and the other choice for getting rid of the return type variable (parameterization) yields the usual continuation monad.

A (Not So) Brief Detour

A Road in Louveciennes, Auguste Renoir c. 1870

Before moving on, we should see if it’s possible to find a monad instance for M. To do that, we need to (1) find definitions for return and >>= and then (2) show they satisfy the monad laws. That means writing some proofs! First we’ll try to find a definition for return. We’ll be using a technique called type driven development: this is where you write down the type of the function you want with a placeholder for the body and then “unpack” the placeholder using what you know about its type.

return :: a -> M a
return a
  = _             -- _ :: M a
  = M _           -- _ :: (a -> IO c) -> IO c
  = M (\g -> _)   -- _ :: IO c, g :: a -> IO c
  = M (\g -> g a)

Note how at each unpacking step we did the most “obvious” thing. For instance, in the first line we need an M a, and values of this type have only one constructor — there’s only one way to unpack it. This is a surprisingly effective way to write Haskell code if we follow two rules of thumb. First, try to make your types as general as possible. The more general the type, the less we can do with it, so the fewer choices we have to make. Second, when defining a function body, try to use all the inputs at least once.

So anyway we have a return, and I suspect it’s the only possible implementation. Now for >>=:

(>>=) :: M a -> (a -> M b) -> M b
(M x) >>= f       -- x :: ∀d. (a -> IO d) -> IO d, f :: a -> M b
  = _             -- _ :: M b
  = M _           -- _ :: (b -> IO c) -> IO c
  = M (\g -> _)   -- _ :: IO c, g :: b -> IO c
  = M (\g -> x _) -- _ :: a -> IO c, g :: b -> IO c
  = M (\g -> x (\a -> _))
                  -- _ :: IO c, g :: b -> IO c, a :: a
                  -- unM (f a) :: ∀d. (b -> IO d) -> IO d
  = M (\g -> x (\a -> unM (f a) _))
                  -- _ :: b -> IO c, g :: b -> IO c
  = M (\g -> x (\a -> unM (f a) g))

It’s less clear whether this is the only possible implementation for >>=, but that doesn’t matter too much as long as the monad laws hold. We can verify these with some equational reasoning. This is by far my favorite feature in a programming language, by the way; referential transparency means we can manipulate program fragments as algebraic expressions. There’s the left identity law:

return a >>= f
  === (expand return)
(M (\g -> g a)) >>= f
  === (expand >>=, with some alpha conversions)
M (\h -> (\g -> g a) (\b -> unM (f b) h))
  === (apply function)
M (\h -> (\b -> unM (f b) h) a)
  === (apply function)
M (\h -> unM (f a) h)
  === (eta reduction)
M (unM (f a))
  === (M . unM == id)
f a

And the right identity law:

(M x) >>= return
  === (expand >>=)
M (\g -> x (\a -> unM (return a) g))
  === (expand return)
M (\g -> x (\a -> unM (M (\h -> h a)) g))
  === (unM . M == id)
M (\g -> x (\a -> (\h -> h a) g))
  === (apply function)
M (\g -> x (\a -> g a))
  === (eta reduction)
M (\g -> x g)
  === (eta reduction)
M x

And the associativity law:

((M x) >>= f) >>= g
  === (expand >>=, with some alpha conversion)
(M (\h -> x (\a -> unM (f a) h))) >>= g
  === (expand >>=)
M (\k -> (\h -> x (\a -> unM (f a) h)) (\b -> unM (g b) k))
  === (apply function)
M (\k -> x (\a -> unM (f a) (\b -> unM (g b) k)))
  === (eta expansion)
M (\k -> x (\a -> (\h -> unM (f a) (\b -> unM (g b) h)) k))
  === (unM . M == id)
M (\k -> x (\a -> unM (M (\h -> unM (f a) (\b -> unM (g b) h))) k))
  === (condense >>=)
M (\k -> x (\a -> unM ((M (unM (f a))) >>= g) k))
  === (M . unM == id)
M (\k -> x (\a -> unM (f a >>= g) k))
  === (eta expansion)
M (\k -> x (\a -> unM ((\c -> f c >>= g) a) k))
  === (condense >>=)
(M x) >>= (\c -> f c >>= g)

(That one looks tricky — the secret is to write the proof from the outside in, rather than from the top down.)

So M really is a monad. Ok! Detour over!

Finishing the Translation

Writing those proofs was so much fun I almost forgot what we were trying to do! We’re translating the bracket pattern in IO to our newly defined monad M, and all that remains is to figure out a definition for m :: M a -> IO a. Trying our type driven tricks again, we get this:

m :: M a -> IO a
m (M x)      -- x :: ∀c. (a -> IO c) -> IO c
  = _        -- _ :: IO a
  = x _      -- _ :: a -> IO a
  = x return

Nice! A value of type M a is a function expecting another function — which we call a continuation — as input. This continuation is just an ordinary arrow over IO, which the M a value applies to an input it provides. We can reduce the M a to a plain IO a by providing the trivial continuation, played by return here. This is such a natural thing to do we’ll give it a name: execM.

Putting it together, the bracket pattern recast in terms of M looks something like this:

expr2 :: IO c
expr2 = execM $ do
  a <- M $ bracket acquire release
  liftIO $ use a

We get both a nice linear syntax with do notation and disciplined resource management. Multiple brackets are no problem:

expr3 :: IO c
expr3 = execM $ do
  a1 <- M $ bracket acquire1 release1
  a2 <- M $ bracket acquire2 release2
  a3 <- M $ bracket acquire3 release3
  liftIO $ useAllThree a1 a2 a3

Next Steps

Mishima Pass in Kai Province, from the series Thirty-six Views of Mount Fuji, Katsushika Hokusai c. 1830–32

The bracket pattern is a powerful tool for making our programs good stewards of system resources. We saw that it’s possible to turn deeply nested bracket expressions into a nicely linear monadic bind sequence. Moreover, the translation is mechanical, falling out of some widely applicable and well understood design tools: equational reasoning and type driven development. What falls out is a well-known variant on continuations known as the codensity monad and whose motivation is otherwise not necessarily obvious.

This doesn’t really use the full power of M, however. We’ve got a monad that uses continuations behind the scenes to let us write monadic code “out of order”. However we don’t have a way to access or manipulate these continuations ourselves. Looking back on my early Scheme efforts, I recall that the truly killer feature of continuations was a special operator — call/cc — that did exactly this.

Exploring the continuation operators in more depth will have to wait for another post. But for now I think I’ve made good progress toward understanding the continuation monad and its motivation.

Leave a comment