Infinite Negative Utility

Earlier today I pasted a snippet of Haskell code into our work chat that contained a multi-parameter type class instance that looked more or less like this:

instance a ~ b => C a b where {- ... -}

The constraint a ~ b in this declaration is a type equality constraint: it means that a and b must be the same type. One of my coworkers saw this and remarked, more or less, “Why did you write it in that obtuse way? You could have just declared an instance for C a a, which means the same thing.”1

Those two declarations actually don't mean the same thing—-but I can't blame said coworker for thinking they did, because the difference between the two is subtle and probably won't come up unless you've done a fair amount of fiddling with type classes. To that end, I want to show an example where those two differ, and then explain why.

The Example

Let's say we have this (more than a little contrived) multi-parameter type class (which will require the MultiParamTypeclasses extension):

class PairOf a b where
  thePair :: (a, b)

I then define an instance that looks like this (which will require the FlexibleInstances extension):

instance Monoid a => PairOf a a where
  thePair = (mempty, mempty)

So far, so good! Now let's try a particular use of thePair in a (probably even more contrived) definition:

obtuseMempty :: Monoid t => t
obtuseMempty = fst thePair

What do I expect to happen in this code snippet? I would hope that obtuseMempty ends up being the same as mempty. In particular, I want thePair to resolve to the instance I defined above, and consequently evaluate to (mempty, mempty), and then, fst (mempty, mempty) == mempty. What actually happens when I compile this?

• Could not deduce (PairOf a b0) arising from a use of ‘thePair’ from the context: Monoid a bound by the type signature for: obtuseMempty :: Monoid a => a at sample.hs:11:1-29 The type variable ‘b0’ is ambiguous Relevant bindings include obtuseMempty :: a (bound at sample.hs:12:1) These potential instance exist: instance Monoid a => PairOf a a — Defined at sample.hs:8:10 • In the first argument of ‘fst’, namely ‘thePair’ In the expression: fst thePair In an equation for ‘obtuseMempty’: obtuseMempty = fst thePair

Well, that's not what I had wanted, but I guess it makes sense: thePair has to be a pair (because of the use of fst), but since we never use the second element in thePair, there's no reason for GHC to believe that it's the same type as the first. However, let's try tweaking the definition a bit: instead of PairOf a a, let's define it as PairOf a b and then include a constraint that keeps a and b equal (which will require either GADTs or TypeFamilies enabled):

instance (Monoid a, a ~ b) => PairOf a b where
  thePair = (mempty, mempty)

If we try compiling obtuseMempty with this definition, it works without any errors! But why does one work while the other doesn't? What's the difference between the two? The answer has to do with a subtlety of how type class resolution works, a subtlety that can be hard to observe right until it stares you in the face.

How Type Class Resolution Works

When GHC searches for an appropriate instance of a type class, it has to check against all the instances you've defined and find one that “matches” your use of the type class. This is pretty easy when you define instances for a basic type like Int or Bool. For example, using the following (pretty useless) type class:

class MyClass t where
  myValue :: t

instance MyClass Int where myValue = 0
instance MyClass Bool where myValue = True

If I use myValue in a context where it's being interpreted as Int, then Haskell will search through the available instances and find our declaration of MyClass Int. That's simple enough! Things get a bit more complicated when we add types that include type parameters. For example, this is a definition of MyClass for pairs of values:

instance (MyClass a, MyClass b) => MyClass (a, b) where
  myValue = (myValue, myValue)

As a point of vocabulary, everything before the => in this definition is called the context, and everything after it is called the head. So now, if I use myValue in a context where it's being interpreted as (Int, Bool), then GHC will find the matching instance MyClass (a, b), and then in turn tries to search for two other instances: one for MyClass Int and another for MyClass Bool.

Notice the precise way I phrased that, because this is where the basic intuition around type class instances can diverge from the actual implementation: it looks like the instance definition is saying, “If we have an instance for MyClass a and an instance for MyClass b, then we can also have an instance for MyClass (a, b).” This is a reasonable intuition, but it's precisely backwards from what Haskell is actually doing. What happens instead is that, when it sees myValue being used as a pair, GHC will first select the instance MyClass (a, b) without having even examined the constraints in the context. Only after it's already committed to using the instance MyClass (a, b) will it then examine the context and try to satisfy the MyClass a and MyClass b constraints. Or, to say it in a more concise but jargon-heavy way, it first matches the head, and once it has done so, attempts to satisfy the context.

Consider what happens if write the following code without having defined an instance of the form MyClass ():

blah :: (Int, ())
blah = myValue

The precise error GHC gives us is: “No instance for (MyClass ()) arising from a use of myVal.” GHC has already settled on the MyClass (a, b) instance, has found a satisfying instance for MyClass Int, but then cannot find an instance for MyClass ().

In this case, making that distinction seems a little bit pedantic. In code like this, it amounts to simply an implementation detail! It shouldn't really matter what order GHC is using underneath the surface: either way, the problem is that we don't have an instance for MyClass (), and as long as GHC reports that error to us in some way, it doesn't matter whether it first tries to satify the context before committing to to MyClass (a, b), or if it commits first and then tries to satisfy the context: we'll run into the same problem either way! But knowing the precise way GHC tries to instantiate type classes can be very useful in other contexts.

Repeated Types in Instance Heads

Let's look at our first problematic PairOf instance:

instance Monoid a => PairOf a a where
  thePair = (mempty, mempty)

Remember: GHC has to match the instance head first, and after that it solves the remaining constraints from the context! So, in the definition of obtuseMempty, what is the inferred type of the use of thePair?

obtuseMempty :: Monoid t => t
obtuseMempty = fst thePair

It should be obvious, but for good measure, let's replace thePair with a hole and find out what type GHC is assigning based on the surrounding context:

• Found hole: _ :: (a, b0) Where: ‘a’ is a rigid type variable bound by the type signature for: obtuseMempty :: forall a. Monoid a => a at sample.hs:11:17 ‘b0’ is an ambiguous type variable • In the first argument of ‘fst’, namely ‘_’ In the expression: fst _ In an equation for ‘obtuseMempty’: obtuseMempty = fst _

So GHC needs to find an instance of the form MyPair t b0. Let's look at our instances: do we have one that has a matching head? ...well, no, we only have MyPair a a.

“But wait!” you say. “We don't do anything with the second part of the pair, so why can't GHC choose MyPair a a as a valid instance?” But GHC can't read your mind, and there's nothing in your program to indicate that the type variables a and b0 are supposed to be the same. As soon as you somehow indicate that, GHC will comply and find your MyPair a a instance:

evenMoreObtuseMempty :: Monoid t => t
evenMoreObtuseMempty = let p = thePair in (fst p `mappend` snd p)

In this definition, it's clear that thePair is supposed to have two fields of the same type, because we're clearly using mappend to combine them. But is there a way of making it work even if we don't give GHC that extra nudge at the use site?

Type Equality to the Rescue

Let's move on to the second MyPair instance I described above:

instance (Monoid a, a ~ b) => PairOf a b where
  thePair = (mempty, mempty)

Using typical informal reasoning, it seems like this should be identical to the other instance definition: “It's an instance where both parameters are the same type, and that type needs to be a Monoid”. But remember how GHC chooses instances: first by matching the head, then by elaborating the constraints. And this definition is different in an important way: the fact that the two type parameters are the same type is no longer a feature of the head, but rather a feature of the context! So when we revisit our definition of obtuseMempty with this new definition, GHC behaves differently:

obtuseMempty :: Monoid t => t
obtuseMempty = fst thePair

The value thePair in this context still has the inferred type PairOf t b0 => (t, b0), but now we have an instance for PairOf a b, which means GHC can unify those and select the instance we want! And now that it has chosen that instance, GHC can take the constraints associated with that instance and try to solve them: the expression thePair now has the inferred type (Monoid t, t ~ b0) => (t, b0), which GHC can solve as: (Monoid t) => (t, t). After that, our function type-checks successfully!

So What's The Lesson?

A rough intuition is that you can think of type equality in code like this as helping you propagate the fact that types are equal in some contexts. In the examples above, we as programmers intended the two fields of thePair should have the same type, but the first definition—-the MyPair a a one—-meant we could only use myPair when GHC already knew those types were equal. By moving the type equality to the context, we can use myPair in places where the two types aren't already known to be distinct and introduce that fact. This isn't the only use of type equality—-type equality is very important in the context of type families, for example—-but it's nevertheless an important use.

GHC type class resolution is really very straightforward, even in the face of extensions like the ones we enabled in this post. In fact, part of the problem is that it's sometimes tempting to imagine GHC is doing something more clever than it actually is! Methodically stepping through the way GHC processes your code can really help understand situations like these, where intial intuitions sometimes lead you down the wrong path.

  1. Well, okay, they ACTUALLY assumed I wrote an excessively obtuse type signature as an elaborate way of trolling them. And to be fair, the code snippet I was pasting was a deliberately silly and ill-advised piece of code—-but the use of type equality wasn't part of the joke.

    ...well, if you MUST know, it was an instance definition very similar to the following:

    instance (a ~ b) => Num ((Integer -> a) -> b) where
      fromIntegral x f = f x
      {- the rest of this is not important here -}

    this allows you to use numeric literals as functions that take another function and apply that function to themselves. This gives you some cute syntactic sugar:

    newtype Pixels = Pixels { fromPixels :: Integer }
    width, height :: Pixels
    width = 320 Pixels
    height = 240 Pixels

    This is a disgusting and terrible, and please never do it.

Markov chains are reasonably common on the internet these days: they're often used as way of generating random data that looks similar—-at least, in a sense—-to some existing data set. A lot of the great examples of Markov chains come from combining two very different data sets and building a single Markov model from them:

  • King James Programming creates mashed-up versions of the King James Bible and the classic programming text The Structure And Interpretation of Computer Programs.
  • At The Modules of Madness creates mashed-up versions of the documentation for the Puppet tool and the eldritch horror novels of H.P. Lovecraft.
  • Erowid Recruiter creates mashed-up versions of recruiter emails from programming companies and people's accounts of taking various illicit substances.

I also have my own Markov bot, which right now is just trained on my own tweets1, and I might eventually decide to add my prose writing as a secondary source of data. It produces a few wonderful tweets and a whole lot of nonsensical crap.

I've just added a little feature which helps to visualize what's going on under the surface, so I wanted to explain how Markov chains work at a high level first, and then show how @guwan_aisamanra now visualizes the underlying mechanisms at work.

What is a Markov Chain?

There are various ways of thinking about Markov chains, but underlying them is the kind of abstract notion of steps in a sequence. You can think of these steps as being actions over time, or as words in a sentence, or as abstract symbols in any kind of sequence of symbols. It doesn't matter what: just a bunch of things one after another, in time or in space.

In many sequences like these, the next step might be random, but the odds of choosing a particular next step are often related in some way to the steps that came before. Let's take a simple contrived example: the capricious chef at your local cafeteria. Every day, the chef chooses one of three meal options: tacos, pasta, or soup. The chef chooses the next meal randomly, but is not allowed to repeat the same meal two days in a row. The chef is also inordinately fond of tacos2 and is twice as likely to pick tacos as any other option, but still won't make tacos two days in a row. A possible sequence of meal choices might go: “tacos, pasta, tacos, soup, pasta, tacos, pasta, tacos...”

This example has a very particular property where the choice of each step depends only on the previous step, but not any of the steps before the last one. If last lunch was tacos, then we have roughly equal chances of choosing pasta or soup as the next lunch because the chef has no preference between the two. On the other hand, if the last lunch was pasta or soup, then we have a two-thirds chance of choosing tacos as the next lunch, and one-third chance of choosing soup or pasta respectively, because the chef is allowed to make tacos and is more likely to make them than a different lunch. We can write a table that conveys the odds of a given meal choice given yesterday's meal choice:

yesterday's lunch today's lunch probability

tacos pasta ½ tacos soup ½ pasta tacos ⅔ pasta soup ⅓ soup tacos ⅔ soup pasta ⅓

The mathematical way of saying that the probability of a given lunch choice doesn't depend on any history of lunches except for the most recent one is that our sequence of lunches has the Markov property.

If some sequence of things—-words, mathematical objects, lunches, whatever—-has the Markov property, then we can build up a mathematical model of the underlying process. One convenient way of visualizing this model is as a graph with directed edges: each step is a node, and the directed edges between nodes are tagged with the probability that the steps of picking that step next. Consequently, the graph of the lunche pattern I described would look like this:

We can come up with a probably sequence of lunches by moving randomly starting in one state, and then following the edges from one state to another, choosing from our next edge according to the provided probabilities.

Implementing Markov Models

In software, we can implement a directed graph as a table that maps steps to a set of possible next steps, where the possible next steps are randomly chosen according to some set of probabilities. For simplicity in this implementation, I'm going to treat the set of next steps as a multiset from which we can choose any element with equal probability: if we choose at random from the multiset {tacos, tacos, pasta}, then we'll get tacos two-thirds of the time and pasta one-third of the time. This is a terrible representation in practice—-for certain situations, it can use a ridiculously wasteful amount of space—-but it makes the code in this blog post short and easy!

Let's implement a basic, inefficient-but-working Python version of a Markov model! We can represent our model in Python using a plain dictionary, with strings as our step names, and lists as our possible next steps:

model = {
  'tacos': ['pasta', 'soup'],
  # both of the following have double the chance
  # of choosing tacos
  'pasta': ['tacos', 'tacos', 'soup'],
  'soup':  ['tacos', 'tacos', 'pasta']

If we want to generate a sequence of lunches with a particular length, we can do so by starting with a random starting lunch, and then choosing our next lunch randomly according to the set of possible “next lunches” provided by the model:

import random

def generate_sequence(model, length=10):
    # choose our initial state randomly
    state = random.choice(model.keys())
    # our output sequence starts empty
    output = []

    # we loop equal to the desired length
    for _ in range(length):
        # each time, we add the current state
        # to the sequence
        # and choose a new state randomly from
        # the set of possible next states
        state = random.choice(model[state])

    # once we've gotten as many as we want, we
    # can return our sequence
    return output

If we run this model several times, we'll get various random sequences that conform to the patterns we'd expect:

>>> generate_sequence(model, length=5)
['pasta', 'tacos', 'soup', 'tacos', 'soup']
>>> generate_sequence(model, length=5)
['soup', 'tacos', 'soup', 'tacos', 'soup']
>>> generate_sequence(model, length=5)
['tacos', 'pasta', 'soup', 'tacos', 'pasta']
>>> generate_sequence(model, length=5)
['tacos', 'soup', 'tacos', 'soup', 'pasta']

In the example we've been using, our model was based on perfect mathematical knowledge of the underlying problem domain: we know exactly the logic the chef uses to determine the next lunch. However, in most cases, we don't have that simple knowledge, so instead we want to calculate a model based on some set of existing observed data. Using the same representation of Markov model, we can take a sequence of any kind of things and build a model out of it by looking at each pair of elements in the sequence, and building a model accordingly3:

def build_model(sequence):
    # our empty model knows nothing about anything
    model = {}

    # walking along the sequence, taking each step
    # as well as its subsequent step...
    for step, next_step in zip(sequence, sequence[1:]):

        # make sure that step has an entry in our model
        # data structure:
        if step not in model:
            model[step] = []

        # add the next_step to the list of possible next
        # steps

    return model

Now we can build a Markov model from a sequence of data even if we don't know what probabilistic relationships hold in that data set. This lets us take data we've found “in the wild” and build a model that resembles the raw data! Well, in a limited sense, as we'll see.

Markov Models of Non-Markov Sequences

It turns out that English text—-both on the level of individual letters, and on the level of sentences—-doesn't have the Markov property, because the choice of “next word” depends on a lot more than just the previous word: grammatical structures, semantic context, and so forth. Building and using a Markov mover from raw English text—-in the examples below, I'm using the Project Gutenberg text of Alice in Wonderland, stripped of everything but spaces and alphabetic characters turned lowercase—-produces some nonsense that's English-like from a distance but still mostly gibberish:

>>> alice = open('alice-in-wonderland.txt').read().lower()
>>> model = build_model([ch for ch in alice if ch.isalpha() or ch == ' '])
>>> ''.join(generate_sequence(model))
'xpr yoube '
>>> ''.join(generate_sequence(model))
'be che sof'
>>> ''.join(generate_sequence(model))
'ver athean'
>>> ''.join(generate_sequence(model))
'k wanigroo'
>>> ''.join(generate_sequence(model))
'jeado sshe'
>>> ''.join(generate_sequence(model))
'f tond mpo'

Similarly, a Markov chain run over the sequence of words in the text, rather than the characters, produces strings of English words which lack grammar or real sense:

>>> model = build_model(alice.split())
>>> ' '.join(generate_sequence(model))
'bat, and said alice, she wandered about in an opportunity'
>>> ' '.join(generate_sequence(model))
'coaxing tone, going to leave it had lost something; and'
>>> ' '.join(generate_sequence(model))
'oneself speak--and they are removed. of it; and then the'

Why does it produce such terrible nonsense? Well, because Markov chains have no memory! Each pair of words in sequence is sensible, but higher-level relationships like sentence structure don't get captured in the model, because they rely on larger amounts of context. Let's take a closer look at a four-word sequence generated by the Alice in Wonderland chain:

escape; so much evidence

We can look at as being composed of three overlapping sequential pairs of words, where each ordered pair of words must have appeared at least once in the source text, or else the model would not have provided a path between them. The first pair is escape; so, and it turns out that this word pair appears only once in the source text, in Chapter IV:

This seemed to Alice a good opportunity for making her escape; so she set off at once...

The pair so much appears at least nine times—-it is a very likely pair of words in English more generally, so this makes sense! I'll choose my favorite sentence in which that pairs appears, just as an arbitrary example:

'Curiouser and curiouser!' cried Alice (she was so much surprised, that for the moment she quite forgot how to speak good English)...

And the word pair much evidence appears only once, in Chapter XI:

Alice watched the White Rabbit as he fumbled over the list, feeling very curious to see what the next witness would be like, '—for they haven't got much evidence YET,' she said to herself.

In each case, training the Markov model produced the knowledge that these words appear in sequence somewhere in the source text, so it was okay for them to appear together in the output:

making her ESCAPE; SO she set off at once, and ran till s
ied alice (she was SO MUCH surprised, that for the moment
-for they haven't got MUCH EVIDENCE yet,' she said to her

But it's not really a full sentence, even if it seems more realistic than some of the others produced. Because we're choosing each word only based on the previous word, the model doesn't know about things like clauses or verbs or sentences, just word-pair frequency. I our training model contains a sentence like

behold the dishwasher, the greatest invention of mankind

then it might learn that it's okay to follow “the” with “dishwasher,” and also to follow “dishwasher,” with “the”, and produce a nonsensical sequence like this:

the dishwasher, the dishwasher, the dishwasher, the dishwasher,

(There's an even sillier example that arose from my own Markov bot: I once tweeted something that included the phrase “stabby stabby”, and consequently my Markov model learned that it was perfectly fine to follow the word “stabby” with “stabby”.)

Visualizing Markov Output Directly

It's sometimes very interesting and illustrative to compare the output of a Markov chain with the input provided to that Markov chain, because you start to see the places where the model “learned” particular patterns. To draw an example from my own bot: earlier today it tweeted [a short phrase]():

bad painter? it's happened in it

My bot is programmed to only start with words which appear at the start of a tweet elsewhere, so it chose to start with the word bad because it came at the start of this old tweet:

Bad idea: double-shot of espresso added to drip coffee (a “redeye”). OTOH now I can see infinity. So I got that going for me. Which is nice.

The word bad also appears in this more recent tweet, where it borrowed several pairs of words in sequence:

Did you know that in addition to being a bad programmer and a bad-opinion-haver, I am also a bad painter? It's true!

The word it's also appears in this tweet where it is followed by the word happened:

It usually happens with weird dream-like books. It's happened before with Dunsany and MacDonald. Last night, it was an R. A. Lafferty novel.

And the word happened also appears in this tweet, followed by a few more words:

Last time I had a fever, I watched all of the Tron cartoon. A+++ would feverishly watch again. idea what happened in it.

In this case, this isn't just a reconstruction of what might have influenced the model, like in the Alice example above. To facilitate exactly this kind of introspection of Markov chain output, I've modified my own Markov bot to consistently record not just which words it's selecting, but also keep track of where it learned of a relationship between two words. That data is presented in a human-readable form on a new web page generated for each tweet and every new tweet generated will be included on a chronological index page.

A Markov model is an almost embarassingly simple example of machine learning, but it's a good, approachable example to start with—-and is also a great object-lesson in the way that machine learning can only picks up on trends that appear in its training data, and the way that the model might not be nuanced enough to pick up on every kind of pattern.

  1. They are very bad tweets.
  2. And who isn't?
  3. Again, this is a terrible representation for the list of possible next steps: a better representation might be a set of pairs, each representing the next step with the number of times it occurred in that context.

This is a question posed by Aaron Levin on Reddit:

In WAI, Application has the type Request -> (Response -> IO ResponseReceived) -> IO ResponseReceived. There is a note about how historically this was Request -> ResourceT IO Response. There is the following note on Hackage about this change:

Note that, since WAI 3.0, this type is structured in continuation passing style to allow for proper safe resource handling. This was handled in the past via other means (e.g., ResourceT).

A naive implementation might have type Application = Request -> IO Response. I would assume one could still handle appropriate resource cleanup (both on the client side and on the Wai/warp side) in this naive implementation, but I'm assuming I'm wrong in thinking that. So:

  1. Why is Application defined in CPS style?
  2. Is there an example of how resources cannot be cleaned up appropriately using a naive implementation above?
  3. Are there other reasons for the CPS style? (E.g. allows for more efficiency as network resources can be freed while your handler is running)

Like my last post, this is a piece of folklore that's been described in papers, mailing lists, &c, but I don't know of a single resource to point people to, so I'll explain it here.

I'm gonna explain some preliminary details about laziness, the trace function, and the bracket function; if you're already familiar with these, go ahead and skip to the section A Naïve Application Type. Alternately, if you want to puzzle it out on your own, you can skip this blog post entirely try running the code in this gist and figure out why you get the output you do—-all the code examples in this post are modified versions of the code that appears in that gist.

Some Preliminary Explanations

A big part of this post involves the fact that Haskell is lazy: when we write something in Haskell, it won't be evaluated until it's absolutely necessary that it be evaluated—-which might be never! For example, an expression like

>>> let f x y = x in f 5 undefined

will happily ignore our use of undefined, as we never actually use it. If we want to make sure that a given expression is evaluated, we need to use it somehow. One way of using a value is to pattern-match on it: in order to pattern-match against a constructor, we need to evaluate it at least enough to know what constructor was used.1 We can change the example above to pattern-match on the currently unused value:

>>> let f x y = case y of { () -> x } in f 5 undefined
*** Exception: Prelude.undefined

Now, we're pattern-matching on y, which forces it—-that is, evaluating the pattern-match causes y to be evaluated before it can determine whether or not it matches the pattern, which means we evaluate undefined and get a runtime error.

In general—-with the exception of code that loops forever, or inherently unsafe values like undefined—-laziness should be a detail we barely notice! In pure code that doesn't loop forever, evaluation order doesn't matter at all, and imperative code in Haskell uses monads like IO. Because each step in a monad has to be evaluated before the next step can proceed, those computations appear to be strict, and computations that aren't in a monad generally don't have observable side-effects that we can use to peek at the exact process Haskell is using to evaluate them.

However, Haskell has a few features we can use to peek into what's going on under the surface. One of them is the trace function, found in Debug.Trace, which has the signature

trace :: String -> a -> a

This takes a string and any other value, and will print the string as soon as the value is forced—-not before, not after. That makes it of sometimes questionable utility for debugging, as it's possible that our calls to trace won't run in the order we expect, or even won't run at all:

>>> let f x y = x in f 5 (trace "Hello!" undefined)

But because trace will print exactly when its result is needed, we can use it to peek into how Haskell chooses to evaluate our code.

Bracketing for Resources

The module Control.Exception.Base exports a function called bracket:

bracket :: IO a -> (a -> IO b) -> (a -> IO c) -> IO c

This function is used for managing resources that we need to clean up, like file handles or large pieces of data we don't want to keep in memory. The first argument is an IO computation that produces a resource, the second one is the cleanup function for that resource, and the third one is a function that uses that resource to produce a value prior to cleanup:

main :: IO ()
main = bracket
  (openFile "tmp.txt" ReadMode)  -- opening a resource: here, a file
  hClose                         -- closing the resource
  (\ handle -> do                -- what to do with the resource
       contents <- hGetContents handle
       putStrLn contents)

The common function withFile is implemented straightforwardly in terms of bracket almost exactly like I used it above. The major utility of bracket is that it knows how to correctly handle exceptions that occur while running computations: if an exception gets raised while the resource is being used, it will nevertheless ensure that the resource gets closed appropriately.

A Naïve Application Type

Let's try writing a toy version of WAI that doesn't actually do anything. We're going to pretend that we're writing some kind of server where some responses might depend on scarce resources, so we want to clean up those resource as soon as we possibly can.

Let's start with importing the functions above and defining some dummy types:

import Control.Exception.Base (bracket)
import Debug.Trace (trace)

-- Our dummy requests and responses
data Request = Request
data Response = Response

-- Our dummy resource
data SomeResource = SomeResource

-- Our naive @App@ type
type App = Request -> IO Response

Let's write our App handler first. Because we're trying to mimic a real application, we pattern-match on the Response value to force Haskell to evaluate it before returning:

runApp :: App -> IO ()
runApp app = do
  response <- app Request
  case response of
    Response -> putStrLn "Sending response"

Now, let's create an App that uses some “resource”. In this case, I'm going to fake it as well. Additionally, I'm going to include a call to trace so that we can be sure of when our Response value actually gets evaluated:

myApp :: App
myApp request = bracket createResource destroyResource respond
  where createResource = do
          putStrLn "Creating resource"
          return SomeResource
        destroyResource _ = do
          putStrLn "Destroying resource"
        respond SomeResource = do
          return (trace "EVALUATING RESPONSE" Response)

And now let's wrap this all up:

main :: IO ()
main = runApp myApp

When I run this program on my machine here's what I get:

$ runhaskell myapp.hs Creating resource Destroying resource EVALUATING RESPONSE Sending response

What does this mean? Well, this means that we end up destroying the resource we are trying to use even before the response gets constructed. That's not good! In this case, the resource is just a dummy value, but if it were instead, say, a file handle, then this program might close the file handle before we ever tried to read from it!2

But we're using bracket and everything! That should mean that the action we're passing to bracket—-the respond function we defined up above—-should run before the resource gets destroyed! Shouldn't it?

As a matter of fact, it is. The IO action is getting run, and the IO steps are getting evaluated eagerly. If we rewrite the myApp function slightly to insert a print statement there

myApp' :: App
myApp' request = bracket createResource destroyResource respond
  where createResource = do
          putStrLn "Creating resource"
          return SomeResource
        destroyResource _ = do
          putStrLn "Destroying resource"
        respond SomeResource = do
          putStrLn "Responding to request" -- A new print statement
          return (trace "EVALUATING RESPONSE" Response)

and then run it again, we'll see that the respond function is getting properly bracketed between resource creation and destruction:

$ runhaskell myapp.hs Creating resource Responding to request Destroying resource EVALUATING RESPONSE Sending response

The specific problem is that while IO steps are evaluated eagerly, the values that get returned are subject to the normal rules of laziness: they aren't forced until we use them. In this case, we don't actually force the Response value until the case-expression in runApp, which doesn't happen until after the entire myApp function—-bracket and all—-has finished running.

So the bigger problem is that, while bracket can ensure that IO actions are properly bookended with resource-related code, it can't make the same guarantees with respect to the values inside it. We need some way of forcing the value before the cleanup code gets run. We can rewrite myApp to do this:

myStricterApp :: App
myStricterApp request = bracket createResource destroyResource respond
  where createResource = do
          putStrLn "Creating resource"
          return SomeResource
        destroyResource _ = do
          putStrLn "Destroying resource"
        respond SomeResource = do
          putStrLn "Responding to request" -- A new print statement
          let response = trace "EVALUATING RESPONSE" Response
          case response of  -- pattern-match to force evaluation
            Response -> return response

This produces the output we want:

$ runhaskell myapp.hs Creating resource Responding to request EVALUATING RESPONSE Destroying resource Sending response

But this has a lot of drawbacks: it's ugly, it inserts apparently extraneous case statements, and I've papered over the fact that, if we had a more complicated data structure, we'd need to force evaluation of the entire thing, not just the outermost structure. But even worse, it's not a general solution: WAI is a library that exports the equivalent of runApp and expects users to write the equivalent of myApp, which means we'd be forcing every single user to remember to force their Response values. If a user forgets to insert the ugly and opaque code, then their program might start crashing or—-even worse—-running but with nonsensical results. What we want is some way of ensuring that any App must force its Response before cleaning up any resources.

CPS To The Rescue

The core idea of continuation-passing is that a continuation is a function that represents “what to do next”: a continuation is a callback function that expects the result of a previous computation as argument. Let's rewrite our program but with an App type that looks like the CPS-ey type WAI actually uses:

-- Our modified @App@ type
type App = Request -> (Response -> IO ()) -> IO ()

Now, our App takes two arguments—-the Request, as before, as well as a function that takes a Response and produces an IO action. The logic of “sending” the response now gets moved into the callback function we pass to the App:

runApp :: App -> IO ()
runApp app =
  app Request
      (\ response -> case response of
           Response -> putStrLn "Sending response")

Our implementation of myApp is almost the same, except we take an extra callback parameter, and instead of returning our Response, we pass it to that callback:

myApp :: App
myApp request cb = bracket createResource destroyResource respond
  where createResource = do
          putStrLn "Creating resource"
          return SomeResource
        destroyResource _ = do
          putStrLn "Destroying resource"
        respond SomeResource = do
          cb (trace "EVALUATING RESPONSE" Response)

If I run this version of the program, I get this output:

$ runhaskell myapp.hs Creating resource EVALUATING RESPONSE Sending response Destroying resource

This is much better! This ensures that the resource outlives the Response, and because now the callback is responsible for forcing the Response, it can guarantee that it gets done. This lets us ensure that propery of any App! This is something we couldn't ensure with the previous one, because it would have required some way of reaching “inside” the App to force the return value. By having a callback that the user has to call, we can force it ourselves without having to hope that the user does!

...well, except there's that remaining little issue: using the implementation above, the user doesn't have to call the callback, so we could write an App that produces no Response at all:

myBadApp :: App
myBadApp _ _ = return ()

We can fix this by creating a new type—-WAI calls it ResponseReceived—-and hiding every way to construct it except using the callback. That way, the user has to use the callback at least once in order to get a ResponseReceived value, which they can return from their App. (We will have the problem that we can call the callback more than once: we could keep working on solutions, but we'd have to move to more complicated solutions and WAI ends up suggesting that you just not do that.)

So this is why the CPS style is indeed necessary for resource management here: it's not enough to use bracket, you have to use bracket and have the appropriate amount of strictness3, and the CPS style helps you achieve that.

  1. The more technical way of saying this is that we're evaluating it to Weak Head Normal Form.
  2. This is related to the problems with lazy IO in general, and we can reproduce the problematic behavior like this:

    import Control.Exception.Base (bracket)
    import System.IO (IOMode(ReadMode), openFile, hClose, hGetContents)
    main = do
      cs <- bracket (openFile "tmp.txt" ReadFile)
      putStrLn cs

    If we run this program in a directory that contains “tmp.txt”, we'll get the error message

    tmp.txt: hGetContents: illegal operation (delayed read on closed handle)

    This is because hGetContents only returns a string lazily, which might be okay in some cases... but here, we aren't using the return value until after the call to bracket, so again, the file will already be closed!

  3. This is also the problem with lazy IO in general, and it's no coincidence that the iteratee/pipe/conduit approaches to resource management also look reasonably CPS-ey.

The 'WriterT space leak' is something that gets talked about a lot in Haskell folklore. Here's a short explanation of what that means.

Defining WriterT

The Writer monad represents computations with “extra output”: produced data, logging messages, and so forth. That “extra output” has some constraints: we need to know what it means to have an “empty” piece of that data (so we can construct values with no extra output) and we need to know how to combine or append the “extra output” (so we can run two distinct computations and combine them appropriately.) We can get those properties in Haskell by ensuring that the “extra output” parameter has a Monoid constraint.

The WriterT transformer is defined like this:

newtype WriterT w m a =
  WriterT { runWriterT :: m (a, w) }

which is to say: it wraps an underlying monad m, which represents a computation that produces a pair of a (the result of our computation) and w (the “extra output” parameter). The Monad instance (I'm eliding the Functor and Applicative instances) looks like this:

instance (Monoid w, Monad m) => Monad (WriterT w m) where
  -- return the provided value, filling in the "extra output"
  -- with an "mempty" value.
  return x = WriterT $ return (x, mempty)

  -- to chain two WriterT computations together...
  m >>= f = WriterT $ do

    -- A: run the first computation, fetching its result
    -- and extra output
    (x, w)  <- runWriterT m

    -- B: produce the second computation by applying f to the
    -- result from the first computation, and then run it,
    -- fetching its result and extra output
    (y, w') <- runWriterT (f x)

    -- C: return the final result with the pieces of
    -- extra output from each computation combined together.
    return (y, w `mappend` w')

The last thing we need is the tell operation, which is the function we use to create our “extra output”:

tell :: (Monoid w, Monad m) => w -> WriterT w m ()
tell o = WriterT $ return ((), o)

A simple contrivted WriterT computation looks like this:

sample :: WriterT (Sum Int) Identity ()
sample = do
  tell (Sum 2)
  tell (Sum 3)
  tell (Sum 4)
  return ()

running this produces a return value of () as well as “extra output” equivalent to Sum 2 <> Sum 3 <> Sum 4 === Sum (2 + 3 + 4) === Sum 9. Sure enough:

>>> runIdentity $ runWriterT sample
((), Sum 9)

In Space Leaks, No-One Can Hear You Scream

Well, what's the problem? Let's look more closely at the definition of >>=. In particular, notice that in order to start evaluating the call to mappend in step C, we need to have the output from both steps A and B. ...except step B represents evaluating the entire rest of the computation, including an arbitrarily large number of other calls to >>=.

If we started evaluating the first call to >>= in runWriterT sample, then step A represents evaluating tell (Sum 2), and step B represents evaluating tell (Sum 3) >>= (\_ -> tell (Sum 4) >>= (\_ -> return ())). We get our output value w from the first one, and then we have to hold on to it while running the entire second computation in order to get the second extra output value w' before we can mappend them in step C. This problem appears with every call to >>=: we have to finish the entire subsequent chain of binds before we get the extra output to combine.

This is especially silly for sample, where I've chosen to use Sum Int as the extra output. It's just an accumulator! But in order to actually start adding those numbers together, we have to go through the entire chain of binds and produce each number first, each of which has to be kept around until we get to the final return, and only then can we go back and start adding the numbers we've generated together. In this case, it's not gonna kill our heap, but consider a computation like

largerSample :: WriterT (Sum Int) Identity ()
largerSample = sequence [ tell (Sum 1) | _ <- enumTo 1000000 ]

This computation will contain 1000000 calls to >>=, which means we will generate 1000000 instances of the number 1 first, and only then can it start adding them together. This can't be changed with strictness annotations or modifications to evaluation stategy: we'll need to rearrange the way that data depends on other data.

Alleviating the Space Leak

The problem solution is described in short by Gabriel Gonzalez here and here on the mailing lists, but I'll explain the same thing slightly more verbosely:

What we really want is for the call to mappend to happen earlier, which means moving around the way our intermediate values depend on each other.1 We've written our WriterT so that the “extra output” is always returned from a computation, and never passed to a computation: that means a given computation doesn't “know” about the intermediate value of that value, and we can only run our mappend operation after we've finished it. Let's try writing an alternate WriterishT monad that differs by also taking the “extra output” from previous steps:

newtype WriterishT w m a =
  WriterishT { runWriterishT :: w -> m (a, w) }

The w on the left side of the arrow represents the intermediate value of the “extra output” that comes from previous steps. We can move the mappend step from the implementation of >>= into the tell function, which now combines the new value being “told” with the previous values:

tellish :: (Monoid w, Monad m) => w -> WriterishT w m ()
tellish w = WriterishT $ \w' -> return ((), w <> w')

The implementation of return used to return mempty, but now it's taking an intermediate value it might need to combine with the 'new state', so it would look like this:

return x = WriterishT $ \w -> (x, w <> mempty)

Luckily, the Monoid laws tell us that w <> mempty === w, so we can simplify that definition down a fair amount. This ends up removing all mention of the Monoidal nature of our data in the Monad instance, so we can omit the Monoid w constraint:

instance (Monad m) => Monad (WriterishT w m) where
  -- the motivation for this implementation is explained
  -- up above...
  return x = WriterishT $ \w -> (x, w)

  -- The bind instance looks pretty similar, but again, we
  -- need to take the current state of our accumulator:
  m >>= f = WriterishT $ \w -> do
    -- A: we pass that on to the first computation...
    (x, w')  <- runWriterishT m w
    -- B: and pass that on to the second computation...
    (y, w'') <- runWriterishT (f x) w'
    -- C: and return just the second state, because the
    -- call to 'mappend' was taken care of inside one
    -- of these.
    return (y, w'')

Now, the call to mappend have all the data they need as soon as tellish happens. Step C only relies on the “extra output” from B, which itself only relies on the “extra output” from A. That means that we've already run mappend before step B, and can throw away the older intermediate value.

...this is a little bit dishonest, though: the type that we've made is significantly more powerful than just a WriterT. One feature of WriterT is that there's no way for a computation to peek at or rely on the “extra output” from previous steps. However, if we aren't careful about the interface to our WriterishT monad—-in particular, if we don't hide the constructor—-a user could write code like this

getCurrentOutput :: WriterishT w m w
getCurrentOutput = WriterishT $ \w -> return (w, w)

poorlyBehaved :: WriterishT (Sum Int) Identity String
poorlyBehaved = do
  Sum n <- getCurrentOutput
  if n == 0
    then return "zero"
    else return "nonzero"

which would be impossible in WriterT! (If you can't see why, it's worth trying to write it and seeing what walls you come up against.) This is because, in point of fact, what I've been calling WriterishT is literally just StateT by a different name! That makes it much more powerful than WriterT, but accordingly it removes some of the guarantees you get by using a WriterT, as well.

So in summary, the WriterishT or StateT can alleviate the space-leak problems with WriterT, but at the cost of some of the desirable properties of WriterT. In most cases, the space leaks from WriterT won't turn out to be a huge problem, but it's good to know what's happening and why it comes about!

  1. This is a good way of thinking about a lot of performance problems in a lazy language: not in terms of 'steps', like in an imperative language, but in terms of dependencies among your subexpressions.

One current pain point for certain Haskell programs is Cabal's data-files feature. This feature is a little bit awkward and weird: it involves depending on a generated file but also filling in a default implementation for that file, and there are some obscure and unclear edge cases in its use. One difficult edge case is that, while most uses of data files are in applications, it's nonetheless possible for libraries to depend on data files, in which case it's very difficult to induce Cabal to produce final relocatable bundles that contain both executables and the data files on which they depend.

I was musing on a hypothetical future way of solving this problem: once the lightweight module system Backpack is fully implemented in Haskell, we could build a data-files-like mechanism in a more convenient way by modifying the current design to use module-level mixins. The rest of this post will sketch out what that design might look like.

I should stress that I'm describing a possible solution, and not the solution. I'm by no means indicating that this is the best way of solving present problems with data files, and I have absolutely no indication that anyone else would want to solve the problem like this. That said, I think this is an interesting and motivated design, and I'd be happy to discuss its strengths and weaknesses, as well as other possible design choices that address the same issues.

Right now, I'm using Edward Yang's blog post A Taste Of Cabalized Backpack as my primary guide to Backpack-in-practice. I don't have a Backpack-enabled GHC and Cabal on hand, and so I haven't actually run any of this: this should right now be treated effectively as pseudocode. I also assume familiarity with Cabal's data files support; if you're in need of an introduction or a refresher, you should read the post Adding Data Files Using Cabal.

An Abstract Signature for Data Files

In our hypothetical Backpack-enabled-data-files-support future, we start by creating a signature that corresponds to the generated Paths_whatever module. To this end, we can create an .hsig file with a declaration like this:

module Dist.DataFiles (getDataFileName) where
  getDataFileName :: FilePath -> IO FilePath

This defines an abstract module called Dist.DataFiles that exposes a single function, getDataFileName, with no actual implementation. We can expose this signature by creating a package, data-files-sig, that exposes only this signature:

    name:               data-files-sig
    version:            1.0
    indefinite:         True
    build-depends:      base
    exposed-signatures: Dist.DataFiles

This would be a standard package—maybe even part of base—that can be consistently and universally relied on by libraries that require some kind of data file support.

Creating A Library With Data Files

Now, let's create a library that needs a data file. In this case, the library will do nothing but read and return the contents of that data file:

module Sample.Library (getSampleFile) where

import Dist.DataFiles (getDataFileName)

getSampleFile :: IO String
getSampleFile = getDataFileName "sample-file" >>= readFile

Now we need to create a corresponding .cabal file for this library. Because we're using Dist.DataFiles, we need to import that signature from the data-files-sig module. Importantly, we still don't have an implementation for getDataFileName. Because of that, our package is still abstract, or in Backpack-speak, indefinite:

    name:            sample-library
    indefinite:      True
    build-depends:   base, data-files-sig
    exposed-modules: Sample.Library

Depending On A Library With Data Files

In order to write an application that uses sample-library, we need to give it a module that's a concrete implementation of the Dist.DataFiles signature. In this case, let's create an implementation manually as part of our application.

First, let's write a small application that uses sample-library:

module Main where

import Sample.Library (getSampleFile)

main :: IO ()
main = getSampleFile >>= putStrLn

We still don't have that concrete implementation for getDataFileName, though, so let's write a simple module that exports the same name with the same type:

module MyDataFilesImpl (getDataFileName) where

import System.FilePath ((</>))

getDataFileName :: FilePath -> IO FilePath
getDataFileName path = pure
  ("/opt/sample-application" </> path)

Now, when we write our .cabal file for this application, we also need to specify we want to use MyDataFilesImpl as the concrete implementation of Dist.DataFiles for sample-library. That means our .cabal file will look like this:

    name: sample-application
      sample-library (MyDataFilesImpl as Dist.DataFiles)

Now, all our abstract signatures are filled in, so this application is no longer indefinite, and we as developers have a convenient way of telling sample-library where we want it to look for its data files. In fact, one advantage of this system for data files is that we could import two libraries that both depend on the Dist.DataFiles signature but tell them to look in two different places for their data files, like this:

    name: other-application
      lib-one (OneDataFilesImpl as Dist.DataFiles),
      lib-two (AnotherDataFilesImpl as Dist.DataFiles)

If there are reasonable default implementations for Dist.DataFiles, we could also put those on Hackage and reuse them in much the same way.

A Final Sprinkling Of Magic

In this case, I'm still missing a major part of Cabal's data-files support: namely, we want to shunt the responsibility from the developer to Cabal, so that we have support for things like relocatable builds. So in a final bit of handwaving, let's stipulate that our tooling in this hypothetical future has a special case to deal with applications that expose an indefinite Dist.DataFiles signature: Cabal could notice this situation, and fill those signagutres in with sensible implementations based on the commands and configurations we're using.

For example, if my .cabal file for sample-application above didn't supply a concrete implementation for Dist.DataFiles, then a default one could be chosen for development that's equivalent to:

-- as automatically generated by cabal
module Dist.DataFiles (getDataFileName) where

getDataFileName :: FilePath -> IO FilePath
getDataFileName = pure

That is, the application will just look for the file in the current directory.

If the developer started preparing the package for release, and changed the configuration appropriately, then the automatically generated getDataFileName could be modified to reflect that, replacing the automatically generated code with something more like

-- as automatically generated by cabal
module Dist.DataFiles (getDataFileName) where

import System.FilePath ((</>))

getDataFileName :: FilePath -> IO FilePath
getDataFileName path =
  pure ("/usr/share/sample-application" </> path)

This would be admittedly a little bit “magical”, but it would be a small and easy-to-explain bit of magic, and it would have the advantage of affording a kind of flexibility that the current approach to data files lacks.

Is This How It's Actually Gonna Work?

Probably not! Backpack is still a ways out, and this would require opt-in from many parts of the Haskell ecosystem, and the problem it solves could probably also be solved in numerous other ways I haven't considered. But this post describes a point in the design space that I think is at least worth weighing!

I buy a fair number of comic books. Often I buy them as standalone graphic novels or as collected trade paperbacks, but if I really like an ongoing series1 I'll buy individual issues as they come out.

This poses a problem for me: I'm not collecting them, so I'm not really interested in keeping them in plastic sleeves in special boxes, but I do want to keep them in reasonable, readable condition and have them on-hand and organized. Some people will bind their comics into volumes, which is really cool, but not exactly what I want.

I leaned towards something like a slipcase for some number of issues. Some of these exist, but when I looked they were expensive (around \$20 per case) and kind of boring-looking, so I opted to make my own. I made some prototypes by hand, but I wasn't quite happy with the quality of my worksmanship on them, so: I finally got around to creating one via laser-cutting instead.

The Design

The basic design is obvious: a box, tall and wide enough for a comic issue and deep enough for some number of them. For the prototype, I chose five deep2, and did some measurements of a stack of five comics: I had to accomodate a volume of roughly 6.75 in × 10.25 in × 0.375 in. The only bits of the design I wasn't sure about were the edges and the front and back panels. The edges I wanted to have a cut-away tab so one could easily pull the comics out, but I figured I could use a shape that was more interesting and unusual than just a half-circle. The front and back would be bland blank: I considered engraving an image into them, or leaving them blank for later painting, but ended up deciding to cut a window into them so you could see glimpses of the front and back covers of the comics inside. Despite not knowing the exact shapes of either of those features, I went ahead with designing.

The Process

I created the blueprint as a Haskell program using the diagrams package. The code was reasonably straightforward: as I wanted to experiment with the pull-tab shape and the window shape, I represented those as arguments to a function comicBox which drew the rest of the slipcase, allowing me to easily create several variations on the details as I went.

I already decided to use Ponoko to cut the box, so I also included some utility functions that simply colored lines the appropriate color for cutting or scoring according to Ponoko's guidelines—-0x0000ff lines for cutting, and 0x00ff00 for scoring.

The code is actually not all that long:

import Diagrams.Prelude
import Diagrams.Backend.SVG.CmdLine

-- some helpers
toLine :: [(Double, Double)] -> Diagram B
toLine ls = fromOffsets (map r2 ls) # strokeLine

cut :: Diagram B -> Diagram B
cut x = x # lc (sRGB 0.0 0.0 1.0)

crease :: Diagram B -> Diagram B
crease x = x # lc (sRGB 0.0 1.0 0.0)

-- width and height of space for comics
cW, cH :: Double
cW = 6.75
cH = 10.25

-- a slipcase parameterized by a depth, and a window and edge shape
comicBox :: Double -> Diagram B -> Diagram B -> Diagram B
comicBox cD window edge = reflectX fullS ||| spine ||| fullS
  where spine = rect cD cH # explodeTrail
                           # zipWith ($) [crease,cut,crease,cut]
                           # mconcat
        side = (translate (r2(3.375,5.125)) $ (sShape # crease)
                 <> (edge # cut))
                 <> (window # cut)
        sShape = toLine [(-cW,0), (0,-cH), (cW,0)]
        top = toLine [(0,cD), (cW,0), (0,-cD)] # cut # centerXY
        fullS = beside unitY side top === reflectY top

-- The window shape is a bunch of squares, and the tab shape is
-- a kind of anglular \__/ thing.
main :: IO ()
main = mainWith (comicBox 0.375 windowShape edgeShape)
  where windowShape = let row = hsep 0.5 (replicate 3 (square 1))
                      in  centerXY (vsep 0.5 (replicate 4 row))
        edgeShape =
          let offs = [ (0,    -0.35)
                     , (-0.1, -0.1)
                     , (0,    -0.1)
                     , (0.1,  -0.1)
                     , (0,    -0.35)
          in toLine offs # scaleX cW # scaleY cH

I decided on a simple array of grid squares for the window shape, and an angular kind of tab shape, which you can see below.

This produced a nice clean image, but not one that fit all the specifications for Ponoko plans—-it needed to be a specific size, and lines for cutting needed to be a specific thickness. Instead of finding the right functions to get my program to emit the correct line thicknesses and image size, I decided to just clean it up in Inkscape and copy it onto the templates for cutting that Ponoko supplied. I might go back and rewrite the code to spit out a valid uploadable image at some point, but it wasn't a priority.

I opted for cardboard as a material: it's not as thick as the chipboard I'd prefer, but it means I could fold (and not have to assemble) the box, and it's relatively cheap. I considered bamboo or some kind of thin wood, but for the first prototype, at least, cardboard seemed like a simpler choice: I can always try other materials later.

I used some of the excess space on the board for an unrelated experiment that involved a lot of engraving, so it was more expensive than it would have been had I simply opted for the cutting and scoring. When I get more made, I won't use that excess space for engraving-heavy things, so the price should be much cheaper.

After cleaning it up the design in Inkscape, I was ready to upload it and wait about a week for it to arrive!

The Result

I'm pretty happy with the final result, which is shown here containing five issues of a comic. I didn't think to take a photo of the raw cardboard sheet before punching it out and gluing it together, but you can easily extrapolate what it must have looked like.

Reflections and Next Steps ==========================

The biggest problem is that the slipcase is too snug—-something I anticipated, but I wanted to see exactly how snug it would be before I scaled up. The next iteration will have some more breathing room, which will make it easier to put comics in and take them out.

The next iteration will also have the comic name engraved on the spine of the case: this will make it take slightly longer to engrave, but should be negligible overall.

I also am going to modify the windows on the front and back and the tab on the side in a thematically appropriate way: I am already experimenting with various motifs I can use for particular comics. For example, I could create a slip-case for storing issues of Bitch Planet that has a window on the front and back in the shape of the Non-Compliant emblem.

My only real annoyance during the project is that the diagrams library has some weird corners and odd API design choices3. It is also a truly massive library. At one point, I deleted my sandbox, started rebuilding, walked to a nearby café4, had a coffee and sat for a bit, walked back, and found it still compiling.

Still, for the most part, it all worked without a hitch. And it was a lot of fun: creating a useful physical object5 from a screenful of code is a pretty cool feeling!

  1. Some I've bought recently or semi-recently: Lumberjanes, The Midas Flesh, Prophet, The Wicked + The Divine, Bitch Planet, Casanova, Sex Criminals, Trees, and Supreme: Blue Rose. I don't usually buy Marvel or DC comics as issues, because they tend to be chock-full of ads, but otherwise I would buy issues of Hawkguy, Squirrel Girl, and Ms. Marvel.
  2. I arbitrarily decided that the first comic I would store like this would be Sex Criminals, which has five issues per trade paperback—-therefore, five issues for the first slipcase.
  3. For example: each diagram has its own local coordinate space, so each diagram has its own “origin”. Depending on how the shape was created, the “origin” can be in a very different place: paths created from lists of points start at their origin, but primitive shapes are centered at their origin. You can of course translate images relative to their local origin.

    On top of that, the basic way of putting diagrams next to each other is using either the === operator (for juxtaposing them vertically) or the ||| operator (for juxtaposing them horizontally.) These arbitrarily produce diagrams whose origin is the same as their left argument, unless the left argument is mempty, in which case the origin is the same as their right argument. All these choices together strike me as weird defaults: I'd have been less surprised by the library had all operations been biased towards working with diagrams whose local origin was in a consistent place, but === and ||| seem to assume that the local origin is in the top left, while square and other primitive shapes produce local origins in the center.

  4. I live in Portland, which means the café was only two blocks away. My point still stands.
  5. We have a 3D printer at my office, so I could create objects with that, but 3D printers—-especially low-end ones like the Makerbot—-are of remarkably limited utility as far as the objects you can create. 3D printing seems like magic at first, but after having created dozens of crapjects, you quickly realize that very few useful objects can be made out of small-ish chunks of rough plastic.

As part of recent type-refactoring efforts in Haskell, a discussion about adding Semigroup as a parent class of Monoid has been bouncing around the mailing list. From a theoretical point of view, this is a great idea: it is more flexible than the current approach that would allow for greater expressibility.

From a practical point of view, however, I am inclined to oppose it. Not because it is in itself a bad change—it's a very reasonable change that has advantages for new code—but because I have, in the past, had to update large systems written in Haskell after GHC updates, and therefore I know that this kind of change has a cost. The Applicative-Monad changes seem to have made way for the Foldable-Traversable Prelude, but those have in turn inspired a number of suggestions for modifications to the Haskell standard library, each one of which, taken on its own, is reasonable, but taken as a mass, mean much more work for maintainers. This is especially true if we continue on this path of making minor refactorings at each release: each year a project will need changes, or it will quickly bit-rot beyond utility.

Default Superclass Instances

There is, however, an alternative I would like to discuss—one which has also been brought up on the mailing list, but which I'd like to provide more concrete motivation for. Some of these changes—especially the Semigroup/Monoid split—seem to involve taking the functionality of a class and splitting it into multiple smaller classes. For example, we can think of the Semigroup/Monoid change as converting

class Monoid m where
  mempty  :: m
  mappend :: m -> m -> m


class Semigroup m where
  mappend :: m -> m -> m

class Semigroup m => Monoid m where
  mempty :: m

Something that has been proposed before (in a few different forms) and which I suggest be more actively considered if changes like these are to become common is to allow superclass instances to be declared within a subclass declaration. This would allow you to write a single instance declaration for a class, and in that body also include implementations for methods belong to a superclass of that class. As an example, perhaps we could be able to write2:

newtype MyId a = MyId a

instance Monad MyId where
  -- Functor method
  fmap f (MyId x) = MyId (f x)

  -- Applicative methods
  pure = MyId
  MyId f <*> MyId x = MyId (f x)

  -- Monad methods
  return = MyId
  MyId x >>= f = f x

For the Monoid/Semigroup proposal, this would mean that any Monoid instances that exist would continue to work unchanged, but new instances could (optionally) split apart their declarations. Under this proposal, either of these would be acceptable:

class Semigroup m where mappend :: m -> m -> m
class Semigroup m => Monoid m where mempty :: m

-- combined `instance` declarations:
instance Monoid [a] where
  mempty = []
  mappend = (++)

or, equivalently,

class Semigroup m where mappend :: m -> m -> m
class Semigroup m => Monoid m where mempty :: m

-- split apart `instance` declarations
instance Semigroup [a] where
  mappend = (++)

instance Monoid [a] where
  mempty = []

And because the Monoid declaration for [] is already written like the former, we can make the Semigroup/Monoid split without having to rewrite the instance declarations!

Exciting Additional Opportunities

Because this lowers the cost of updating for new versions, various other useful changes might be considered that would otherwise involve far too much breakage. For example, we could consider splitting Num apart into small constituent parts. With Default Superclass Instances, we could refactor Num into the following without changing any instance declarations:3

class Add a where (+) :: a -> a -> a
class Sub a where (-) :: a -> a -> a
class Add a => Zero a where zero :: a

class Mul a where (*) :: a -> a -> a
class Mul a => One a where one :: a

class (Zero a, One a) => FromInteger a where
  fromInteger :: Integer -> a

  instance Zero a where zero = fromInteger 0
  instance One a where one = fromInteger 1

class Signed a where
  negate :: a -> a
  abs    :: a -> a
  signum :: a -> a

class ( Eq a, Show a, Add a, Sub a
      , Mul a, FromInteger a, Signed a) => Num a where

which would allow certain numeric types to only implement a subset of the relevant operations:

data Nat = Zero | Succ Nat

instance Add Nat where
  Z   + y = y
  S x + y = S (x + y)

{- et cetera --- but no implementations for e.g. Signed,
 - which is not meaningful for `Nat`! -}

and also allow current Num functions to have a looser set of constraints than they do at present:

sum :: (Zero a, Add a) => [a] -> a
sum (x:xs) = x + sum xs
sum []     = zero

prod :: (One a, Mul a) => [a] -> a
prod (x:xs) = x * prod xs
prod []     = one

We could also consider splitting Arrow4 into distinct components, again without having to change any instance declarations:

class Category a => Pairwise a where
  first  :: a b c -> a (b, d) (c, d)
  second :: a b c -> a (d, b) (d, c)
  (***) :: a b c -> a b' c' -> a (b, b') (c, c')
  (&&&) :: a b c -> a b c' -> a b (c, c')

class Pairwise a => Arrow a where
  arr :: (b -> c) -> a b c

or even (dare I dream) splitting Bits into something that is not a 22-method monstrosity!

Potential Drawbacks

On the other hand, this proposal does have some down-sides:

Grepping for Instance Declarations

Right now, I can often find an instance declaration for a type T by grepping for instance C T (modulo some line noise) whereas with this change, it's possible that there is no declaration for instance C T, because all of C's methods are declared by a subclass C' instead. The compiler ought to be able to deal with this without problem, which means that tools like Haddock documentation should somewhat alleviate this problem, but a user might be confused.

Introduces New Possible Errors

The declarations below are of course nonsensical, and would be rejected by the compiler—-but the fact that this change would introduce new failure conditions at all is a drawback of the proposal.

instance Semigroup Int where
  mappend = (+)

instance Monoid Int where
  -- this conflicts with an existing declaration
  mappend = (*)
  mempty  = 1

A Pragma-Less Extension

In order to be really useful, we'd want to use this without a LANGUAGE pragma. After all, we're arguing for it on the basis of preserving backwards-compatibility, but that argument is much less compelling if we still have to change the source files to make use of it! On the other hand, that means we'd have included a GHC extension that takes effect despite not being enabled, which is also a worrying precedent!

It still might be a useful extension even if it had to be enabled by a LANGUAGE pragma, as it is easier to add said pragma to a source file than to manually break apart dozens of instance declarations, but it makes this feature less compelling in general.

In Conclusion

As I said before, my primary objection to typeclass changes like those above—-even if they are good changes for newly written code—-is that they break existing code. I don't want to have to modify a handful of miscellaneous instance declarations on a yearly basis as people discover new levels of abstraction or become dissatisfied with current choices, especially as those changes will grow more extensive as I build more projects in Haskell! But with an extension like this, we could grow the typeclass ecosystem gradually and fix what we see as past warts while maintaining backwards-compatibility, which would be a very powerful tool to have.

  1. This is perhaps more simplistic than we want: we can also use the existing Semigroup class from the semigroup package and then, in the Monoid class declaration, explain how to derive the methods of the superclass if no superclass instance is present. This, according to the linked proposal, would look like:

    class Semigroup m => Monoid m where
      mempty  :: m
      mappend :: m -> m -> m
      instance Semigroup m where
        (.++.) = mappend

    The example above is slightly simpler, which is why I relegated this version to a footnote.

  2. This isn't a concrete proposal, so maybe the actual syntax or semantics of these things should be changed! I want to focus on the feature and not the instantiation.
  3. For this example, I added Zero and One classes so that a given type might implement an additive and multiplicative unit while not necessarily having a sensible FromInteger implementation. For example, it might not make sense to implement fromInteger for a complex number, but complex numbers clearly have both an additive and multiplicative unit:

    data Complex = Complex Float Float deriving (Eq, Show)
    {- ... -}
    instance Zero Complex where zero = Complex 0.0 0.0
    instance One Complex  where one  = Complex 1.0 0.0

    This means that the Sum and Product monoids could be rewritten like:

    newtype Product a = Product { getProduct :: a }
      deriving (Eq, Show)
    instance (One a, Mul a) => Monoid (Product a) where
      mempty        = Product one
      x `mappend` y = Product (getProduct x * getProduct y)

    Notice that I added Zero and One in such a way that an existing Num instance declarations needn't be changed!

  4. I have on numerous occasions had reason to use the Arrow abstraction, but haven't had a sensible implementation of arr. To use a slightly contrived example, I could define a GADT that can describe the structure of boolean circuits in a way that resembles an Arrow, but has no way of expressing arr:

    data Circ a b where
      BoolFunc :: (Bool -> Bool) -> Circ Bool Bool
      Ident    :: Circ a a
      Compose  :: Circ a b -> Circ b c -> Circ a c
      First    :: Circ a b -> Circ (a, d) (b, d)
      Second   :: Circ a b -> Circ (d, a) (d, b)
      Parallel :: Circ a b -> Circ a' b' -> Circ (a, a') (b, b')
      Fanout   :: Circ a b -> Circ a  b' -> Circ a (b, b')
    instance Category Circ where
      id  = Ident
      (.) = flip Compose
    instance Arrow Circ where
      first  = First
      second = Second
      (***)  = Parallel
      (&&&)  = Fanout
      arr    = error "Nothing sensible to put here!"
    -- for example, using this definition, we can write xor:
    xor :: BoolCircuit (Bool, Bool) Bool
    xor = ((first Not >>> And) &&& (second Not >>> And)) >>> Or
    -- ...and using xor, write a half-adder:
    halfAdder :: BoolCircuit (Bool, Bool) (Bool, Bool)
    halfAdder = xor &&& And

    This is not an unreasonable definition—-it would be nice to abstract over such a definition using existing tools—-but the structure of the Arrow typeclass makes it difficult!

I basically always use some program in the daemontools family on my computers. My home laptop and desktop are booted with an init system (runit) based on daemontools, while many of the systems I set up elsewhere boot a vanilla distribution but immediately set up a daemontools service directory as a secondary service management tool. Quite frankly, it's one of the best examples of good Unix design and at this point I wouldn't want to go without it.

This is a high-level introduction to the idea of daemontools rather than a full tutorial: to learn how to set it up in practice, djb's own site as well as a handful1 of others are better references.

What is Daemontools?

The core of daemontools is just two programs: svscan and supervise. They're very straightforward: svscan takes a single optional argument, and supervise takes a single mandatory one.

svscan watches a directory (if none is specified, then it will watch the current working directory) and checks to see if new directories have been added. Any time a new directory is added, it starts an instance of supervise pointing at that new directory2.

And that's all that svscan does.

supervise switches to the supplied directory and runs a script there called ./run. If ./run stops running for any reason, it will be started again (after a short pause, to avoid hammering the system.) It will also not start the ./run script if a file called ./down exists in the same directory. Extra data about the running process gets stored in a subdirectory called ./supervise, and a few other tools can be used to prod and modify that data—-for example, to send certain signals to kill the running program, to temporarily stop it, or to see how long it has been running.

And that's almost all that supervise does.

One extra minor wrinkle is that if supervise is pointed at a directory that also contains a subdirectory called ./log, and ./log/run also exists, then it will monitor that executable as well and point the stdout of ./run to the stdin of ./log/run. This allows you to build a custom logging solution for your services if you'd like. The ./log directory is optional.

So, how does this run a system? Well, you point svscan at a directory that contains a subdirectory for each service you want to run. Those services are generally small shell scripts that call the appropriate daemon in such a way that it will stay in the foreground. For example, a script to run sshd might look like:


# redirecting stderr to stdout
exec 2>&1

# the -D option keeps sshd in the foreground
# and the -e option writes log information to stderr
exec /usr/sbin/sshd -D -e

And your directory structure might look like

    - service/
      |- ngetty/
      |  |- run
      |  |- log/
      |     |- run
      |- sshd/
      |  |- run
      |  |- log/
      |     |- run
      |- crond/
      |  |- run
      |  |- log/
      |     |- run

Once you point svscan at this, you end up having a process tree where svscan is managing multiple service instances which in turn manage their respective services and logging services:

            |         `-log-service
            |         `-log-service
            |         `-log-service

This design has some pretty amazing practical advantages, many of which are attributable to the fact that daemontools is written in terms of Unix idioms. The “Unix way” gets a fair amount of derision—-some well-deserved, some not—-but daemontools is a good example of how embracing the idioms of your system can produce better, more flexible software. Consider the following problems and their daemontools solutions:

Testing a Service Before You Start It

The ./run script is a plain executable. If it runs and stays in the foreground, doing what it should do, it's correct. If it doesn't, then there's a problem. That's also the only code path, which is a sharp contrast to the infamously difficult-to-write sysvinit scripts, where start and stop and status and so forth must all be tested in various system states3.

Starting and Stoping a Service

All you do is create or delete a service directory. The most common way of doing this is to create the service directory elsewhere, and then create a symlink into the service directory to start it. This lets you delete a symlink without deleting the main directory, and furthermore ensures that the 'creation' of the directory is atomic.

Another tool, svc, lets you send signals to the running processes (e.g. svc -p sends a STOP signal, and svc -d sends a TERM signal as well as telling supervise to hold off on restarting the service otherwise.)

Express Service Dependencies

The daemontools design allows for various helper tools. One of them is svok, which finds out whether a given service is running. This is just another Unix program that will exit with either 0 if the process is running, or 100 if it is not. That means we can write

svok postgres || (echo "waiting for postgres..." && exit 1)
exec 2>&1
exec python2

and the script will die (prompting svscan to wait a moment and then restart it) unless postgres is already running.

Express Resource Limits

daemontools has several other applications that can enforce various resource limits or permissions. These are not part of the service mechanism—-instead, they simply modify the current process and then exec some other command. That means that you can easily incorporate them into a service script

exec 2>&1
# change to the user 'sample', and then limit the stack segment
# to 2048 bytes, the number of open file descriptors to 3, and
# the number of processes to 1:
exec setuidgid sample \
     softlimit -n 2048 -o 3 -p 1 \
     some-small-daemon -n

These aren't actually special, and don't have anything to do with the daemontools service mechanism. Any shell script can incorporate setuidgid or softlimit, even if those scripts have nothing to do with service management!

Allow User-Level Services

If I want a given user to have their own services that are run as that user, all I need to do is have another svscan running as that user and pointing at another directory, which I can run as another top-level service:

exec 2>&1
exec setuidgid user \
     /usr/sbin/svscan /home/user/service


What I described above was vanilla daemontools. Other systems are designed for booting entire systems with this kind of service management. Variations on this basic design add various features:

  • The runit package extends supervise with the ability to execute a ./finish script if the ./run script fails, to do various kinds of cleanup. (runit renames svscan and supervise to runsvdir and runsv, respectively.)
  • The s6 package adds even more options to both core programs (which are here named s6-svscan and s6-supervise) to e.g. limit the maximum number of services or modify how often scanning is done. It additionally allows control of an s6-supervise instance through a directory of FIFOs called ./event.
  • The daemontools-encore package adds even more optional scripts: a ./start script which is run before the main ./run script and a ./stop script after the service is disabled, a ./notify script which is invoked when the service changes, and a few others.
  • The nosh package is designed as a drop-in replacement for systemd on platforms where systemd cannot run (i.e. any Unix that is not a modern Linux) and so has a lot of utilities that superficially emulate systemd as well as tools which can convert systemd units into nosh service directories. nosh is the most radically divergent of the bunch, but is clearly a daemontools descendant (and incorporates most of the changes from daemontools-encore, as well.)

Additionally, all these (except for daemontools-encore) have other capabilities used to set up a Unix system before starting the service-management portion. They also generally include other tools for running services (e.g. runit includes the swiss-army-knife chpst for modifying a process's state; s6 includes a plethora of other service helpers and tools for doing things like file-system locking or socket activation) while keeping the same guiding principles of daemontools intact.

The Takeaway

The whole daemontools family has two properties which I really appreciate:

  1. A strong commitment to never parsing anything.
  2. A strong commitment to using Unix as a raw material.

Why avoid parsing?

Parsing is a surprisingly difficult thing to get right. Techniques for writing parsers vary wildly in terms of how difficult they are, and parsing bugs are a common source of weird machines in computer security. Various techniques can make parsing easier and less bug-prone, but it's a dangerous thing to rely on.

One way to get around this is to just skip parsing altogether. This is difficult in Unix, where most tools consume and emit plain text (or plain binary.) In other systems, such as in individual programming environments or systems like Windows PowerShell, the everything-is-plain-text requirement is relaxed, allowing tools to exchange structured data without reserializing and reparsing.

The way to avoid parsing in Unix is to use various kinds of structure to your advantage. Take the file system: it can, used correctly, emulate a tree-like structure or a key-value store. For example, one supplementary daemontools utility is envdir, which reads in environment variables not by parsing a string of name=value pairs, but by looking at a directory and turning the filename-to-file-contents mapping into a variable-name-to-variable-content mapping.

You might argue that this is silly—-after all, parsing an environment variable declaration is as easy as name=value! Could a system really introduce a security bug in parsing something as simple as that? As it happens, the answer is yes.

So daemontools avoids parsing by using directories as an organizing principle, rather than using configuration files.4 This makes an entire class of bugs and vulnerabilities impossible, which is always a good design choice.

What is “Unix as a raw material”?

The building blocks of daemontools are the parts of Unix which are common to every modern Unix variant: directories and executables and Unix processes and (in some of its descendants) FIFOs. This means you have a universe of actions you can perform outside of the daemontools universe:

  • Your scripts can be written in anything you'd like, not just a shell language. You could even drop a compiled executable in, at the cost of later maintainability.
  • Similarly, daemontools services are trivially testable, because they're just plain ol' executables.
  • Lots of details get moved out of service management because they can be expressed in terms of other building blocks of the system. There's no need for a 'which user do I run as' configuration flag, because that can get moved into a script. (Although that script can also consult an external configuration for that, if you'd like!)
  • Your directories can be arranged in various ways, being split up or put back together however you'd like.5

In contrast, service management with upstart or systemd requires special configuration files and uses various other RPC mechanisms, which means that interacting with them requires using the existing tools and... isn't really otherwise possible. Testing a service with upstart or systemd requires some kind of special testing tool in order to parse the service description and set up the environment it requests. Dependency-management must be built in, and couldn't have been added in afterwards. The same goes for resource limits or process isolation. ...and so forth.

“Unix design” has sometimes been used to justify some very poor design choices. On the other hand, it's possible to embrace Unix design and still build elegant systems. A well-built Unix system has some aspects in common with a well-built functional program: small components with well-defined semantics and scope and a clear delineation of the side-effects of any given part, all of which can easily be used together or apart. The daemontools family of programs is a perfect example of Unix design done well.

  1. This one is about runit, not daemontools, but they are similar enough in principle.
  2. It does this not using inotify or some other mechanism, but rather just by waking up every five seconds and doing a quick traversal of everything in the directory. This is less efficient, but also makes fewer assumptions about the platform it's running on, which means daemontools can run just about anywhere.
  3. Of course, your daemon might still rely on state—-but that's the fault of your daemon, and no longer inherent in the service mechanism. Contrast this to sysvinit-style scripts, where the only possible API is a stateful one in which the script does different things depending on the process state.
  4. One might argue that this is a little bit disingenuous: after all, you're still invoking shell scripts! If one part of your system avoids parsing, but then you call out to a piece of software as infamously complicated and buggy as bash, all that other security is for naught. But there's no reason that you have to write your scripts in bash, and in fact, the creator of s6 has built a new shell replacement for that purpose: namely, execline, which is designed around both security and performance concerns. If you wanted, you could replace all those shell scripts with something else, perhaps something more like the shill language. Luckily, the daemontools way is agnostic as to what it's executing, so it is easy to adopt these tools as well!
  5. I personally tend to have a system-level /etc/sv for some services and a user-level /home/gdritter/sv for other services, regardless of whether those services are run in my user-level service tree in /home/gdritter/service or the root-level tree in /service.

Apparently, I've got a theme going of Weird Syntax. Let's run with it.

Konrad Zuse was an early pioneer in computer science, although his name is perhaps somewhat less well-known than others. Zuse holds the honor of having built the first programmable computer—-the Z3—-back in the 40's, as well as several other computing firsts1. Of particular interest to this blog post is his early unimplemented programming language, Plankalkül.

Plankalkül was, like the Z3, in many respects ahead of its time. Zuse's explicit goal was to be able to describe programs at a high level, which meant he included control structures and datatype definitions2 and other high-level constructs that were often missing in languages of the early years of computing. Zuse was working on Plankalkül at a time when his machines were not useable, which meant that his language work was more theoretical than it was technical, and consequently he allowed features that he wasn't entirely sure how to program. Despite his notes on it having been written in the mid-40's, they were not published until the 70's, and it was not implemented until the year 2000.

One thing that struck me, as I read programs in this notation that had been set down on a typewriter3, is that certain kinds of grouping were handled by explicit indication of scope: not via matched delimiters as in ALGOL-style languages, or via indentation in languages such as Python and Haskell, but by formatting the code so that a line bordered on the left of the scoped parts of the code:

This is meant to capture the way grouping works in the hand-written or typeset notation, with brackets spanning multiple lines:

I think this is notationally interesting: it's like Python's significant whitespace, but not, uh, whitespace. It would be incredibly tedious to type out, but still entirely compatible with current programming notation:

class Tet
 | @staticmethod
 | def new_tet()
 |  | n = randint(0, len(Tet.Tets) - 1)
 |  | for p in Tet.Tets[n]
 |  |  | if p in Board.permanent
 |  |  |  | Game.lose()
 |  | Game.current = Tet(Tet.Tets[n], Tet.TetColors[n])
 | def __init__(self, points, color)
 |  | self.points = points
 |  | self.color = color

and would be entirely amenable to beautifying via judicious application of Unicode:

class Tet
 ┃ @staticmethod
 ┃ def new_tet()
 ┃  ┃ n = randint(0, len(Tet.Tets) - 1)
 ┃  ┃ for p in Tet.Tets[n]
 ┃  ┃  ┃ if p in Board.permanent
 ┃  ┃  ┗  ┗ Game.lose()
 ┃  ┗ Game.current = Tet(Tet.Tets[n], Tet.TetColors[n])
 ┃ def __init__(self, points, color)
 ┃  ┃ self.points = points
 ┃  ┗ self.color = color

Looking at this notation, however, an interesting possibility struck me: a programmer could explicit annotate information about the kind of scope involved in a given line. In this Python-like example, I could, for example, distinguish class scope using double lines, function scope with thick lines, and control structure scope with thin lines:

class Tet
 ║ @staticmethod
 ║ def new_tet()
 ║  ┃ n = randint(0, len(Tet.Tets) - 1)
 ║  ┃ for p in Tet.Tets[n]
 ║  ┃  │ if p in Board.permanent
 ║  ┃  └  └ Game.lose()
 ║  ┗ Game.current = Tet(Tet.Tets[n], Tet.TetColors[n])
 ║ def __init__(self, points, color)
 ║  ┃ self.points = points
 ║  ┗ self.color = color

One advantage of this scheme is that a handful of lines, viewed in isolation, still give you a clear view of what surrounds them. For example, I can view these two lines in isolation and still tell that they are within a control structure used within a function declared within a class:

 ║  ┃  │ if p in Board.permanent
 ║  ┃  └  └ Game.lose()

You could also imagine a hypothetical language in which choice of scope delimiter is important. In Python, for and if do not form a new lexical scope. What if instead we could stipulate the kind of scope they form by this notational convention?

def okay()
 ┃ if True
 ┃  └ n = 5   # n is declared in function scope
 ┗ return n   # n leaks out of the if-scope

def not_okay()
 ┃ if True
 ┃  ┗ n = 5   # n is declared in the if's scope
 ┗ return n   # error: no n in scope here

That being said, there are a number of reasons that this notation is in inferior to existing notations:

  • It makes refactoring code much more difficult.
  • It requires that the programmer pay attention to the sequence of enclosing scopes on a line-by-line basis, which is generally too pedantic and not particularly useful for a programmer.
  • The ability to select “which kind of scope” is by no means only expressible by this notation, as other syntactic features such as keywords and delimiters could express the same thing.
  • There are only so many line-like characters which can serve as a scope marker, so this scheme is not very extensible.
  • It complicates parsing (especially by introducing an entirely new class of parse errors in which adjacent lines feature incompatible sequences of delimiting lines), and so it also...
  • Complicates parse error messages, which are an important part of a language's UI and should be considered seriously.

So, as in my previous post on grammatical case in programming languages, I urge readers not to use this notation as the concrete syntax for a programming language. This is merely an entertaining peek through the looking glass at a curious notational convention which was never adopted.

That said: this makes a very nice notation for viewing code, where the programmer does not have to explicitly draw ASCII art around their code; indeed, it bears more than a passing similarity to the graphical interface used in Scratch, and Sean McDirmid's Experiments in Code Typography features this very convention as an interactive ornament on code in a Python-like language.

  1. He even wrote A New Kind Of Science half a century before Stephen Wolfram published it. In spirit, anyway—-if you strip away Wolfram's self-aggrandizement and the pretty pictures, much that remains of the book resembles Zuse's 1969 book Rechnender Raum.
  2. Datatype definitions wouldn't be found in other programming languages until the late 50's. Zuse's treatment of them was quite sophisticated: Plankalkül had no notion of what we would now call sums, but did have products in the form of tuples. The primitive type was S0, which represented a single bit and had the values + and -, so a two-bit value might be represented as (S0,S0). Arrays were included of the form m×t, which stood for m repetitions of t, and variable-length arrays were encoded as □×t. Integers were included as a primitive, and floats were defined as (3×S0,7×S0,22×S0), which stood for three sign bits (which could also indicate whether the number was real or imaginary or zero), a seven-bit exponent, and a twenty-two-bit significand.
  3. The image given is from Knuth and Pardo's The Early History of Programming Languages, which surveys early programming languages—-implemented and not implemented—-and gives the same basic program implemented in each one.

Comparing a computer language to a human language is like comparing an operating system kernel to a popcorn kernel.


In general, human languages and formal languages—-of which programming languages are one variety—-don't have a whole lot in common. Many natural-language features don't have formal-language analogues1, and many formal-language properties don't carry over well to natural languages2.

On the other hand, it is sometimes fun to derive inspiration for one from the other. One idle idea I had involved applying grammatical case to programming languages and seeing what resulted.

Grammatical Case

In linguistics, case is a grammatical category applied to nouns based on their function in a sentence. Consider these English sentences:

I see the cat.

The cat sees me.

My house is large.

Each sentence contains a word which refers to the speaker (I, me, my) but the choice of which word to use depends on the purpose of that word in the sentence. Grammarians would say that I—which serves as the subject—is the nominative form of the first-person pronoun, that me—the object of verbs and prepositions—is the oblique form, and that my is the possessive form. In English, only pronouns are inflected this way, but in some other languages, all nouns behave this way. For example, the Russian translations of the first two sentences above:

ja viž-u  košk-u
I  see-1s cat-ACC

košk-a  vid-et menja
cat-NOM see-3s me

feature the words koška and košku, which are two different forms of the word for 'cat'—the former serving as the subject of the sentence, and the latter the object. Russian has far more than three cases, as well:

košk-a   nominative (subject)
košk-y   genitive ('of the cat')
košk-je  dative ('to the cat')
košk-u   accusative (object)
košk-oj  instrumental ('with the cat')
košk-je  prepositional

It's important to note that the case of a noun is determined by its grammatical role in the sentence, not by its semantic role (or what linguists would call a thematic role.) Consider these two sentences:

The cat ate the fish. The fish was eaten by the cat.

In both sentences, the cat is the agent, the one who performs the action, and the fish is the patient, the one to whom the action is being done. But in the first sentence the cat is the subject, while in the second the fish is is the subject, and would be inflected accordingly:

košk-a  jel-a   ryb-u
cat-NOM ate-FEM fish-ACC

ryb-a    byl-a   sjedena    košk-oj
fish-NOM was-FEM eat.PP.FEM cat-INSTR

Not all languages have grammatical case: Chinese, for example, doesn't even inflect pronouns based on their role in a sentence

wǒ kàn māo
I  see cat

māo kàn  wǒ
cat sees me

Some languages with case have relatively few cases—-Modern Greek, for example, has four: a nominative, an accusative, a genitive, and a vocative—-whereas others may have quite a few—-Finnish, as an extreme case, has fifteen.

Grammatical case, among other things, allows for freer word order. In English, for example, we would normally say

The cat ate the fish.

and we could probably rearrange that sentence a little bit while still making sense

The fish, the cat ate.

but there are obvious limits to what is allowed. We would not say

The fish ate the cat.

and yet mean that the cat ate the fish. Word order is still conventionally fixed in languages with case—-especially in conversation and formal, non-poetic writing—-but we can be freer with word order and still produce a comprehensible sentence. A Russian-speaker might find it odd, but the sentence

ryb-u    jel-a   košk-a
fish-ACC ate-FEM cat

is clearly communicating that it was the cat that ate the fish, despite the words being in the opposite order.

Applying Case To Programming Languages

As it turns out, this has already been done—-in a Perl extension called Lingua::Romana::Perligata which was exploring this very question. In Perligata (as I will call it for short), three cases are distinguished: an accusative, a dative, and a genitive. The accusative is used as the arguments to functions or the left-hand side of an assignment; the dative is used as the left-hand side of an assignment or as the 'mutable' argument of a function like push, and the genitive is used for indexing. So, the expression

pippo = pluto[paperino]


pippo plutorum paperini da

But I'd argue that Perligata, while interesting, is too tightly tied to Perl semantics to be useful to people outside of Perl. So, I'm going to start from scratch and not tie this to any particular language, but rather to some manner of imperative pseudocode and add various features as I go.

Basic Form

Assume we have nouns and we have verbs. Verbs correspond to functions or subroutines, and nouns correspond to names given to values, or to literal values themselves. The only kind of classification given to nouns here is case; we're going to omit other linguistic classifications like grammatical gender or number. Individual statements will be semicolon-terminated for clarity.

In constrast to spoken language or to Perligata, our fake language is going to use prefixed punctuation marks to indicate the case of a variable. It wouldn't be hard to mimic natural language more closely by tokenizing our values based on some ending

    subject := /[a-z_]+us/
    object  := /[a-z_]+um/
    index   := /[a-z_]+i/

but I won't do that here.


Let's start with functions that take a single argument and don't return anything interesting; say, some kind of print function. We can give it an argument in the accusative, which we'll mark with an exclamation point:

    print !x; // understood as print(x)

Because we're using indicating the grammatical relationships of our tokens with case, we wouldn't create ambiguity if we were to to modify the order of the tokens here:

    !x print; // still understood as print(x)

We could do a few different things for functions of greater arity. One of the simplest possibilities—-especially for functions like sum in which the ordering of the arguments don't matter—-is to simply list all the arguments in the accusative.

    !a !b !c sum; // sum(a, b, c)
    sum !a !b !c; // sum(a, b, c)
    !a !b sum !c; // sum(a, b, c)


We'll introduce a dative sigil now, using the at-sign, to stand for taking the result of a function. Let's assume we have an incr function which adds one to its argument:

    // all of these are x = incr(y)
    @x incr !y;
    incr !y @x;
    !y incr @x;

If we want to assign one variable to another, we can introduce another nominative form which has no sigil.

    @x y; // x = y
    y @x; // x = y


So far we've assumed that we have functions in a generic namespace, but there are advantages to scoping functions within a namespace or object of some kind. Let's introduce another case to indicate a namespace in which a verb is being looked up, indicated by an ampersand:

    // object.method()
    &object method;
    method &object;

    // x = point.add(otherPoint)
    add &point !otherPoint @x;
    @x !otherPoint &point add;


In natural languages, we have a generally fixed set of prepositions—-a linguist would say that the class of prepositions is closed. I'm going to play fast-and-loose and instead include in this language an open set of prepositions, which will correspond to keyword arguments in languages like Python.

Previously, when supplying arguments to a function, we listed them in accusative form. That's fine for commutative functions like addition, but for non-commutative functions, or even moreso for functions of heterogeneous arguments, we want something that lets us move arguments around. Additionally, some functions may have arbitrarily large numbers of arguments. This is where our “prepositions” might come in handy.

Our “prepositions” will be of the form preposition:name, with optional whitespace, so we can arguably think of the colon as the prepositional sigil here.3

    // p = myPoint.move(x=myX, y=myY)
    @p &myPoint move x:myX y:myY;
    @p &myPoint move y:myY x:myX;
    y:myY x:myX move @p &myPoint;

There's no reason why functions couldn't have prepositions as well as an accusative argument. For example, a map function might use a func preposition to indicate its function argument:

    // newSeq = map(mySeq, func=toString)
    @newSeq map func:toString !mySeq;
    !mySeq map @newSeq func:toString;


Lots of extant programming languages have the ability to avoid repeating certain common elements. For example, in JavaScript, one can use the with keyword to avoid naming the same element over and over:

    with (foo) { a(5); b(6); }
    // is equivalent to
    foo.a(5); foo.b(6);

We could mimic this pretty easily in our hypothetical language:

    &foo {
        a !5;
        b !6;

But a structure like this could allow us to factor out repetitions of any given case; for example, if we assign multiple times to the same value:

    @x {
        add !x !1;
        mul !x !2;
        sub !x by:3;
        mul !x by:4;
    // corresponds to:
    // x = add(x, 1);
    // x = mul(x, 1);
    // x = sub(x, 1);
    // x = div(x, 1);

And even that had a repeated element, so let's factor that out, too:

    @x !x {
        add !1;
        mul !2;
        sub by:3;
        div by:4;

We can use this to factor out repeated object accesses:

    &console {
        writeline !"What is your name?";
        readline @name;
        writeline "Hello, {name}!";

or even repeated prepositional arguments:

    x:5 width:10 height:10 init {
        @box1 y:10;
        @box2 y:20;
        @box3 y:30;
    // box1 = init(x=5, y=10, width=10, height=10);
    // box2 = init(x=5, y=20, width=10, height=10);
    // box3 = init(x=5, y=30, width=10, height=10);

Would this be useful?

Probably not. I suspect a general-purpose language with a syntax like this would be rather tedious. Possible! You could even use this syntax with some kind of static type sytem:

writeline: { &Handle, !String } –> () div: { !Float, by:Float } –> Float init: { x: Int, y: Int, width: Int, height: Int} –> Rectangle

but it'd also be quite verbose, and the flexibility afforded by this scheme is probably the flexibility that anybody needs.

One place I can imagine this being useful, however, is in a shell-like system. Commands tend to have 'prepositions' already in the form of optional arguments, input-output redirection, &c, so it's not a far cry from what already exists:

    >logfile {
      echo !"Setting up system";
      port:22 user:alex {
        scp {
          host:alpha !some_file;
          host:beta !other_file;
        host:gamma command:"./" ssh;
      echo !"Script completed";

In general, though, I suspect that—with the exception of keyword arguments, which have already been proven to be very useful—the ideas here are little more than a quaint curiosity.

As a final aside: I'd love a strongly-typed functional language which included keyword arguments as a design choice from the beginning. OCaml and Haskell both have optional keyword arguments, and in both languages, they don't quite behave like you'd expect, especially with respect to currying. And frankly—they don't have to be optional to be useful! A language that included the ability to name and reorder all the obligatory arguments to a function would still be a huge win for usability.

  1. There is nothing that says a formal language needs to have grammatical categories, or a well-defined phonology, or any of a number of other features of natural language.
  2. A good example of this is the common belief that natural language which does not mirror propositional logic is somehow “illogical”, e.g. that a double negative expressed in natural language follows the conventions of classical logic and cancels itself out. Natural language of course follows rules, but they are not the same rules as formal languages.
  3. This keyword system is very similar to SmallTalk and related languages, which use keywords for all function calls. The difference is that SmallTalk et al. consider the keywords to be an ordered sequence, and not a set, so if I define a constructor that is called like this:

    p := Person name: 'John' age: 22

    then the constructor corresponds to the method name name:age: and cannot be called with those keywords in any other order.