Resources, Laziness, and Continuation-Passing Style
This is a question posed by Aaron Levin on Reddit:
In WAI,
Application
has the typeRequest -> (Response -> IO ResponseReceived) -> IO ResponseReceived
. There is a note about how historically this wasRequest -> ResourceT IO Response
. There is the following note on Hackage about this change:Note that, since WAI 3.0, this type is structured in continuation passing style to allow for proper safe resource handling. This was handled in the past via other means (e.g., ResourceT).
A naive implementation might have
type Application = Request -> IO Response
. I would assume one could still handle appropriate resource cleanup (both on the client side and on the Wai/warp side) in this naive implementation, but I'm assuming I'm wrong in thinking that. So:
- Why is Application defined in CPS style?
- Is there an example of how resources cannot be cleaned up appropriately using a naive implementation above?
- Are there other reasons for the CPS style? (E.g. allows for more efficiency as network resources can be freed while your handler is running)
Like my last post, this is a piece of folklore that's been described in papers, mailing lists, &c, but I don't know of a single resource to point people to, so I'll explain it here.
I'm gonna explain some preliminary details about laziness, the trace
function, and the bracket
function; if you're already familiar with these, go ahead and skip to the section A Naïve Application Type. Alternately, if you want to puzzle it out on your own, you can skip this blog post entirely try running the code in this gist and figure out why you get the output you do—-all the code examples in this post are modified versions of the code that appears in that gist.
Some Preliminary Explanations
A big part of this post involves the fact that Haskell is lazy: when we write something in Haskell, it won't be evaluated until it's absolutely necessary that it be evaluated—-which might be never! For example, an expression like
>>> let f x y = x in f 5 undefined
5
will happily ignore our use of undefined
, as we never actually use it. If we want to make sure that a given expression is evaluated, we need to use it somehow. One way of using a value is to pattern-match on it: in order to pattern-match against a constructor, we need to evaluate it at least enough to know what constructor was used.1 We can change the example above to pattern-match on the currently unused value:
>>> let f x y = case y of { () -> x } in f 5 undefined
*** Exception: Prelude.undefined
Now, we're pattern-matching on y
, which forces it—-that is, evaluating the pattern-match causes y
to be evaluated before it can determine whether or not it matches the pattern, which means we evaluate undefined
and get a runtime error.
In general—-with the exception of code that loops forever, or inherently unsafe values like undefined
—-laziness should be a detail we barely notice! In pure code that doesn't loop forever, evaluation order doesn't matter at all, and imperative code in Haskell uses monads like IO
. Because each step in a monad has to be evaluated before the next step can proceed, those computations appear to be strict, and computations that aren't in a monad generally don't have observable side-effects that we can use to peek at the exact process Haskell is using to evaluate them.
However, Haskell has a few features we can use to peek into what's going on under the surface. One of them is the trace
function, found in Debug.Trace
, which has the signature
trace :: String -> a -> a
This takes a string and any other value, and will print the string as soon as the value is forced—-not before, not after. That makes it of sometimes questionable utility for debugging, as it's possible that our calls to trace
won't run in the order we expect, or even won't run at all:
>>> let f x y = x in f 5 (trace "Hello!" undefined)
5
But because trace
will print exactly when its result is needed, we can use it to peek into how Haskell chooses to evaluate our code.
Bracketing for Resources
The module Control.Exception.Base
exports a function called bracket
:
bracket :: IO a -> (a -> IO b) -> (a -> IO c) -> IO c
This function is used for managing resources that we need to clean up, like file handles or large pieces of data we don't want to keep in memory. The first argument is an IO
computation that produces a resource, the second one is the cleanup function for that resource, and the third one is a function that uses that resource to produce a value prior to cleanup:
main :: IO ()
main = bracket
(openFile "tmp.txt" ReadMode) -- opening a resource: here, a file
hClose -- closing the resource
(\ handle -> do -- what to do with the resource
contents <- hGetContents handle
putStrLn contents)
The common function withFile
is implemented straightforwardly in terms of bracket
almost exactly like I used it above. The major utility of bracket
is that it knows how to correctly handle exceptions that occur while running computations: if an exception gets raised while the resource is being used, it will nevertheless ensure that the resource gets closed appropriately.
A Naïve Application Type
Let's try writing a toy version of WAI that doesn't actually do anything. We're going to pretend that we're writing some kind of server where some responses might depend on scarce resources, so we want to clean up those resource as soon as we possibly can.
Let's start with importing the functions above and defining some dummy types:
import Control.Exception.Base (bracket)
import Debug.Trace (trace)
-- Our dummy requests and responses
data Request = Request
data Response = Response
-- Our dummy resource
data SomeResource = SomeResource
-- Our naive @App@ type
type App = Request -> IO Response
Let's write our App
handler first. Because we're trying to mimic a real application, we pattern-match on the Response
value to force Haskell to evaluate it before returning:
runApp :: App -> IO ()
runApp app = do
response <- app Request
case response of
Response -> putStrLn "Sending response"
Now, let's create an App
that uses some “resource”. In this case, I'm going to fake it as well. Additionally, I'm going to include a call to trace
so that we can be sure of when our Response
value actually gets evaluated:
myApp :: App
myApp request = bracket createResource destroyResource respond
where createResource = do
putStrLn "Creating resource"
return SomeResource
destroyResource _ = do
putStrLn "Destroying resource"
respond SomeResource = do
return (trace "EVALUATING RESPONSE" Response)
And now let's wrap this all up:
main :: IO ()
main = runApp myApp
When I run this program on my machine here's what I get:
$ runhaskell myapp.hs Creating resource Destroying resource EVALUATING RESPONSE Sending response
What does this mean? Well, this means that we end up destroying the resource we are trying to use even before the response gets constructed. That's not good! In this case, the resource is just a dummy value, but if it were instead, say, a file handle, then this program might close the file handle before we ever tried to read from it!2
But we're using bracket
and everything! That should mean that the action we're passing to bracket
—-the respond
function we defined up above—-should run before the resource gets destroyed! Shouldn't it?
As a matter of fact, it is. The IO
action is getting run, and the IO
steps are getting evaluated eagerly. If we rewrite the myApp
function slightly to insert a print statement there
myApp' :: App
myApp' request = bracket createResource destroyResource respond
where createResource = do
putStrLn "Creating resource"
return SomeResource
destroyResource _ = do
putStrLn "Destroying resource"
respond SomeResource = do
putStrLn "Responding to request" -- A new print statement
return (trace "EVALUATING RESPONSE" Response)
and then run it again, we'll see that the respond
function is getting properly bracketed between resource creation and destruction:
$ runhaskell myapp.hs Creating resource Responding to request Destroying resource EVALUATING RESPONSE Sending response
The specific problem is that while IO
steps are evaluated eagerly, the values that get return
ed are subject to the normal rules of laziness: they aren't forced until we use them. In this case, we don't actually force the Response
value until the case
-expression in runApp
, which doesn't happen until after the entire myApp
function—-bracket
and all—-has finished running.
So the bigger problem is that, while bracket
can ensure that IO
actions are properly bookended with resource-related code, it can't make the same guarantees with respect to the values inside it. We need some way of forcing the value before the cleanup code gets run. We can rewrite myApp
to do this:
myStricterApp :: App
myStricterApp request = bracket createResource destroyResource respond
where createResource = do
putStrLn "Creating resource"
return SomeResource
destroyResource _ = do
putStrLn "Destroying resource"
respond SomeResource = do
putStrLn "Responding to request" -- A new print statement
let response = trace "EVALUATING RESPONSE" Response
case response of -- pattern-match to force evaluation
Response -> return response
This produces the output we want:
$ runhaskell myapp.hs Creating resource Responding to request EVALUATING RESPONSE Destroying resource Sending response
But this has a lot of drawbacks: it's ugly, it inserts apparently extraneous case statements, and I've papered over the fact that, if we had a more complicated data structure, we'd need to force evaluation of the entire thing, not just the outermost structure. But even worse, it's not a general solution: WAI is a library that exports the equivalent of runApp
and expects users to write the equivalent of myApp
, which means we'd be forcing every single user to remember to force their Response
values. If a user forgets to insert the ugly and opaque code, then their program might start crashing or—-even worse—-running but with nonsensical results. What we want is some way of ensuring that any App
must force its Response
before cleaning up any resources.
CPS To The Rescue
The core idea of continuation-passing is that a continuation is a function that represents “what to do next”: a continuation is a callback function that expects the result of a previous computation as argument. Let's rewrite our program but with an App
type that looks like the CPS-ey type WAI actually uses:
-- Our modified @App@ type
type App = Request -> (Response -> IO ()) -> IO ()
Now, our App
takes two arguments—-the Request
, as before, as well as a function that takes a Response
and produces an IO
action. The logic of “sending” the response now gets moved into the callback function we pass to the App
:
runApp :: App -> IO ()
runApp app =
app Request
(\ response -> case response of
Response -> putStrLn "Sending response")
Our implementation of myApp
is almost the same, except we take an extra callback parameter, and instead of returning our Response
, we pass it to that callback:
myApp :: App
myApp request cb = bracket createResource destroyResource respond
where createResource = do
putStrLn "Creating resource"
return SomeResource
destroyResource _ = do
putStrLn "Destroying resource"
respond SomeResource = do
cb (trace "EVALUATING RESPONSE" Response)
If I run this version of the program, I get this output:
$ runhaskell myapp.hs Creating resource EVALUATING RESPONSE Sending response Destroying resource
This is much better! This ensures that the resource outlives the Response
, and because now the callback is responsible for forcing the Response
, it can guarantee that it gets done. This lets us ensure that propery of any App
! This is something we couldn't ensure with the previous one, because it would have required some way of reaching “inside” the App
to force the return value. By having a callback that the user has to call, we can force it ourselves without having to hope that the user does!
...well, except there's that remaining little issue: using the implementation above, the user doesn't have to call the callback, so we could write an App
that produces no Response
at all:
myBadApp :: App
myBadApp _ _ = return ()
We can fix this by creating a new type—-WAI calls it ResponseReceived
—-and hiding every way to construct it except using the callback. That way, the user has to use the callback at least once in order to get a ResponseReceived
value, which they can return from their App
. (We will have the problem that we can call the callback more than once: we could keep working on solutions, but we'd have to move to more complicated solutions and WAI ends up suggesting that you just not do that.)
So this is why the CPS style is indeed necessary for resource management here: it's not enough to use bracket
, you have to use bracket
and have the appropriate amount of strictness3, and the CPS style helps you achieve that.
- The more technical way of saying this is that we're evaluating it to Weak Head Normal Form.
This is related to the problems with lazy IO in general, and we can reproduce the problematic behavior like this:
import Control.Exception.Base (bracket) import System.IO (IOMode(ReadMode), openFile, hClose, hGetContents) main = do cs <- bracket (openFile "tmp.txt" ReadFile) hClose hGetContents putStrLn cs
If we run this program in a directory that contains “tmp.txt”, we'll get the error message
tmp.txt: hGetContents: illegal operation (delayed read on closed handle)
This is because
hGetContents
only returns a string lazily, which might be okay in some cases... but here, we aren't using the return value until after the call tobracket
, so again, the file will already be closed!- This is also the problem with lazy IO in general, and it's no coincidence that the iteratee/pipe/conduit approaches to resource management also look reasonably CPS-ey.