Infinite Negative Utility

This is apparently quite a contentious opinion, but after having worked with them for a while, I in general like Rust's modules. However, it's not contentious at all to say that they are complicated and hard to understand for someone looking at them for the first time. So, let's talk about Rust modules, building up some principles for understanding them as we go.

Important note: this blog post represents the state of Rust modules as of the time of writing. There are changes in store, and the module system as used in the Rust of 2019 is gearing up to be pretty different!1

Part One: Modules

Before I get to actual code snippets, let's talk about the first two basic principles:

  • Every named item in Rust—-functions, types, values, modules—-can be made public by prefixing it with the pub keyword.
  • Every compilation unit—-a library or an executable, what Rust calls a crate—-contains a 'root' module. For a library, this root module is defined in lib.rs, and for an executable, it's defined at main.rs.

Let's say I make a new fresh Cargo project that's a library. I get a file at src/lib.rs. Every item defined in that file lives in the root module, so let's write a simple function here:

// in src/lib.rs
pub fn hello() {
    println!("Hello!");
}

Inside of this module, we can refer to this function using the unqualified name hello, but we also can use a qualified name, which represents a path through our module hierarchy to get to the name. In this case, hello is exposed through the root module, so the qualified name for hello is ::hello, which leads us to another principle:

  • The root of the current compilation unit's module hierarchy is indicated in qualified paths with an initial ::.

So let's make a new module inside of the root module. Submodules are always indicated with the mod keyword, but (as we'll see) this keyword can be deployed in different ways depending on where we want to implement the module. For now, we'll define the contents of the module inside the same source file, in a curly-brace-delimited block headed by pub mod my_module_name:

// in src/lib.rs
pub mod english {
    // #1
    pub fn hello() {
        println!("Hello!");
    }
}

// #2
pub fn hello() {
    ::english::hello();
}

At this point, the root module contains two public items: ::hello, a function, and ::english, a module. In turn, the module ::english contains a public item: a function ::english::hello. In this case, I'm using a fully qualified names for disambiguation: this is useful here because I have two different defined functions both named hello. Without explicit qualification, the name hello will refer to different things depending on the module scope: when referenced inside the english module, hello refers to the hello function we defined the that I've indicated with the comment // #1, and outside of the english module, hello refers to the hello function I've indicated with the comment // #2.

While there is never 'ambiguity' from the point of view of the Rust compiler, having the same name in different scopes can be confusing for us as readers and writers of code. We can always be explicit and use fully qualified names to alleviate any confusion. For example, we can (although I don't know why you would) write the following code:

// in src/lib.rs
pub mod english {
    pub fn hello() {
        println!("Hello!");
    }

    pub fn outer_hello() {
        ::hello();
    }
}

pub fn hello() {
    ::english::hello();
}

In this example, we've added the new function ::english::outer_hello, which invokes ::hello using a fully qualified path, which in turn invokes ::english::hello using a fully qualified path.

So we've discovered a new principle:

  • Name lookup in expressions is relative to the module in which the expression appears unless the name is fully qualified.

There are also ways of traversing the module hierarchy with relative names. The above example can be rewritten using relative names like so:

// in src/lib.rs
pub mod english {
    pub fn hello() {
        println!("Hello!");
    }

    pub fn outer_hello() {
        super::hello();
    }
}

pub fn hello() {
    english::hello();
}

The expression english::hello() inside of the function ::hello is in the root module, and therefore the name is looked up relative to the root module. The declaration of ::english::outer_hello, however, is relative to the module ::english, so therefore if we want a relative reference to ::hello, we need to first move up a module level with the super keyword in order to access the item we want in the parent module. There's also a self keyword, which allows us to look up names relative to the current module. That keyword is redundant here—-we're already looking up names relative to the current module by default!—-but we'll find a use for it later on.

  • You can use super and self in qualified names to traverse the module hierarchy in a relative way.

(Note also that super and self only work at the beginning of a path: that means we can't write a perverse relative path like ::english::super::english::hello—-not that you'd want to write that anyway!)

There are also two other ways of creating a module, but all of them use the mod keyword in some way. Right now, we've defined a module using a curly-brace-delimited block in a single file, but we can also create a module by placing the definitions of items in that module into a separate file named the same name as the module. In these examples, because our module is called english, we can put declarations into the file src/english.rs, so our two source files will look like

// in src/lib.rs
pub mod english;

pub fn hello() {
    self::english::hello();
}

and

// in src/english.rs
pub fn hello() {
    println!("Hello!");
}

pub fn call_outer_hello() {
    super::hello();
}

Notice that we've kept the pub mod english declaration in the root module, but we've removed the body of that declaration. At the same time, the contents of src/english.rs are (modulo indentation and the comment) identical to the contents of the english module before, and similarly none of the rest of the code had to change at all. This reorganization is purely for our benefit: the code does not care whether we use an inline module or an external module.

There's a third way we could have organized our code, as well: instead of creating a file named after our module, we could have created a directory named after our module, and included a file called mod.rs inside that directory. Again, we could have equivalently written these modules:

// in src/lib.rs
pub mod english;

pub fn hello() {
    self::english::hello();
}

and

// in src/english/mod.rs
pub fn hello() {
    println!("Hello!");
}

pub fn call_outer_hello() {
    super::hello();
}

Notice that even less had to change this time: the contents of the files are identical to before, and only the on-disk organization has changed! Again, from Rust's point of view, the tree of modules and items we've created here is identical. This final on-disk organization method is more convenient if we want to create other nested submodules that are contained in their own files: if we wanted to have a module ::english::plurals, then we could define the module ::english in the file src/english/mod.rs, and then define the module ::english::plurals in src/english/plurals.rs2. If we used either of the other two on-disk organization methods, then we would either have to write nested module blocks like

pub mod english {
    pub mod plurals {
       // ...
    }
}

Or we would have to have a pub mod plurals { ... } block inside of the source file src/english.rs.

However, like I said, all of these choies are mostly important to us as programmers structuring a project on-disk and maintaining it as a repository, but from Rust's point of view, they define an identical namespace. In summary:

  • Modules can be defined in three ways—-using lexical pub mod my_module { ... } blocks, using a src/my_module.rs file, or using a src/my_module/mod.rs file, but the choice of approach is immaterial to the namespace structure being constructed.

This can be a powerful way of growing a project: a module might start life as one or two types and functions inside a source file, and as they grow, they can get moved first into another file, and then into a directory that itself contains several more files, and this can happen transparently, without any user of the code (inside or outside my crate) having to be aware of it.

Now we can move on to

Part Two: Extern Crates

The extern crate declaration does two things at once: it specifies a dependency on an external named crate, and it includes that crate's root module as a named module in the current module's namespace.

Let's build some randomness into our example code. We will use the rand crate to get some random numbers, and use them to choose from a list of greetings.

// in src/lib.rs
// importing an external crate in the root
extern crate rand;

// our statically-chosen list of greetings
static GREETINGS: &[&'static str] = &[
    "Hello!", "Heya!", "Greetings!",
];

pub fn hello() {
    // choose a random valid index
    let index = ::rand::random::<usize>() % GREETINGS.len();
    // and print the chosen greeting
    println!("{}", GREETINGS[index]);
}

Notice that the library functions in the rand crate are now available as though they were defined in a module called rand in the root of our crate. By default, the name of the imported module is the same as the name of the crate—-but we can change that! If we want a different module structure, we can use the as keyword to rename the imported module to a different name—-in this case, let's call it my_rand:

// in src/lib.rs
// importing an external crate in the root, with renaming
extern crate rand as my_rand;

static GREETINGS: &[&'static str] = &[
    "Hello!", "Heya!", "Greetings!",
];

pub fn hello() {
    let index = ::my_rand::random::<usize>() % GREETINGS.len();
    println!("{}", GREETINGS[index]);
}

Now, we've separated out the specification of the external name of the crate we want (the rand crate) from the name we're giving its root module in our internal module hierarchy (::my_rand).

Unlike many programming languages, we can declare extern crate dependencies anywhere in the module hierarchy we want! So, for example, let's import rand underneath a new module. Notice that we need to include a pub qualifier to make sure the module it creates is accessible outside of the deps module. (Remember that all names brought into scope must be made explicitly public with the pub keyword!)

// in src/lib.rs
// importing an external crate in a submodule
pub mod deps {
    pub extern crate rand as my_rand;
}

static GREETINGS: &[&'static str] = &[
    "Hello!", "Heya!", "Greetings!",
];

pub fn hello() {
    let index = ::deps::my_rand::random::<usize>() % ::GREETINGS.len();
    println!("{}", ::GREETINGS[index]);
}

This is a pretty stark departure from how most languages think of external dependencies: usually, they just get placed in some kind of root namespace, and you might be able to import them under a different name (like with Python's import module as name syntax). In Rust, you have full control over your own module hierarchy, and so you can insert the root module of an external crate whever you'd like.

  • The extern crate declaration specifies a dependency on a named external crate and imports that crate's root module as a named module in the current module scope.
  • The name of a module created by extern crate defaults to the name of the crate, but can be explicitly specified with the as keyword.

As an aside: why would you want to do this? Well, one reason is that you might have a lot of external dependencies, and it would be nice to manage them. Imagine that you're writing a video game, and therefore depend on a lot of libraries for things like graphics and physics and so forth. You could therefore imagine wanting to both organize and rename your dependencies in order to manage them, something along the lines of:

pub mod graphics {
    pub extern crate opengl as gl;
    pub extern crate sdl;
}
pub mod img {
    pub extern crate png;
    pub extern crate libjpeg as jpeg;
}

Now you can organize your dependencies in a way that's convenient and structured!

Part Three: Uses

The last thing we haven't talked about is the use declaration, which is given a qualified path and creates a new local name for the item named by that path.

  • The use keyword pulls in names from another scope into the current one.

We often use this with functions or types defined by external crates, so we can pull them into scope with more convenient local names, but it works with any named item, including modules.[\^2]

It's not uncommon to see declarations like use std::fs;, which allows the module ::std::fs to be accessed locally as fs without the std:: prefix. This brings a module, not a function or type, into the local scope.

// in src/lib.rs
extern crate rand as my_rand;

// ...but pulling the random function into scope here
use my_rand::random;

static GREETINGS: &[&'static str] = &[
    "Hello!", "Heya!", "Greetings!",
];

pub fn hello() {
    // we can now use random unqualified
    let index = random::<usize>() % GREETINGS.len();
    println!("{}", GREETINGS[index]);
}

Confusingly, use declarations work differently from expressions, because names in use declarations are always relative to the root. Consider the example below: because the extern crate rand as my_rand was declared in the root module, the fully qualified name of the random function is ::my_rand::random, but when I use that name inside the english module, I give it a relative path as though I'm looking up the symbol from the root.

// in src/lib.rs
extern crate rand as my_rand;

pub mod english {
    use my_rand::random;

    static GREETINGS: &[&'static str] = &[
        "Hello!", "Heya!", "Greetings!",
    ];

    pub fn hello() {
        let index = random::<usize>() % GREETINGS.len();
        println!("{}", GREETINGS[index]);
    }
}

pub fn hello() {
    english::hello();
}

If I do want to use a name that's relative to the current module, I can use the self keyword in the path, which starts me instead from the module containing the use declaration. In the below example, we've moved the extern crate line inside of the module, so now the my_rand module lives in ::english::my_rand, and we then use an explicit relative path to pull in the function we want.

// in src/lib.rs

pub mod english {
    extern crate rand as my_rand;

    use self::my_rand::random;

    static GREETINGS: &[&'static str] = &[
        "Hello!",
        "Heya!",
        "Greetings!",
    ];

    pub fn hello() {
        let index = random::<usize>() % GREETINGS.len();
        println!("{}", GREETINGS[index]);
    }
}

pub fn hello() {
    english::hello();
}

So our last principle is:

  • Name lookup in use declarations is relative to the root module unless the name is explicitly made relative with the self keyword.

I also can use the pub keyword on uses to pull names into a module while simultaneously exposing those names outside of that module. Sometimes I like to do this to carefully namespace a subset of external names I plan on using, allowing me to group several libraries together, or pick-and-choose items from separate but related compilation units into a single module for local use.

pub mod io {
    pub use std::fs::File;
    pub use std::io::*;
}

All The Principles In One Place

To summarize, I'm going to pull all the principles from the above text into a single list:

  • Every named item in Rust—-functions, types, values, modules—-can be made public by prefixing it with the pub keyword.
  • Every compilation unit—-a library or an executable, what Rust calls a crate—-contains a 'root' module. For a library, this root module is defined in lib.rs, and for an executable, it's defined at main.rs.
  • The root of the module hierarchy is indicated in qualified paths with an initial ::.
  • Name lookup in expressions is relative to the module in which the expression appears unless the name is fully qualified.
  • You can use super and self in qualified names to traverse the module hierarchy in a relative way.
  • Modules can be defined in three ways—-using lexical pub mod my_module { ... } blocks, using a src/my_module.rs file, or using a src/my_module/mod.rs file, but the choice of approach is immaterial to the namespace structure being constructed.
  • The extern crate declaration specifies a dependency on a named external crate and imports that crate's root module as a named module in the current module scope.
  • The name of a module created by extern crate defaults to the name of the crate, but can be explicitly specified with the as keyword.
  • The use keyword pulls in names from another scope into the current one.
  • Name lookup in use declarations is relative to the root module unless the name is explicitly made relative with the self keyword.

This is a lot to take in at once, and it can be confusingly different from other languages, but the biggest difference is that your module tree is always explicit. The drawback is that it's verbose and sometimes offers flexibility that can be unnecessary or confusing, but the advantage is that what's happening it's always explicit in the code, and also that you can express exactly the hierarchy of names and namespaces that you want.


  1. Is it a pun to say 'gearing up' when talking about Rust?
  2. Or, in src/english/plurals/mod.rs, of course!

Over the weekend, a friend of mine who is currently learning Rust asked me a question. He was using serde for serialization, and he wanted a single function that took a file path as argument, opened the file, deserialized it as a particular type, and returned that value, where the return type could be inferred from context. (For this example, I'm going to ignore error-handling and just use .unwrap() to bail out on failures.) I hadn't used serde much myself, so I briefly assumed the function in question would just look like this:

fn load_from_file<T>(path: String) -> T
where
    T: serde::Deserialize
{
    let mut file = File::open(path).unwrap();
    serde_json::from_reader(&mut file).unwrap()
}

But it wasn't quite so easy, and, like most of the trickier parts of Rust, the problem is that lifetimes can be tricky. The serde::Deserialize trait takes a single parameter: a lifetime which corresponds to the lifetime of the input source. In the above function, the input source is file from which we're reading, but it might also be a str (using the from_str function) or a slice of bytes (using the from_slice function). In any case, the deserializer needs to know how long its input source lives, so we can make sure that the input source doesn't get suddenly closed or freed while it's in the middle of parsing.

Okay, that means we need a lifetime parameter to give to serde::Deserialize. Let's try the obvious thing and add a lifetime parameter to the signature of the function:

fn load_from_file<'de, T>(path: String) -> T
where
    T: serde::Deserialize<'de>
{
    let mut file = File::open(path).unwrap();
    serde_json::from_reader(&mut file).unwrap()
}

This looks nice at a glance, but it doesn't compile! You can plug this code snippet into rustc and it will (with its characteristic helpfulness) suggest exactly the correct code to write. But I'm less interested here in what to write, and more in why we're writing it: why does this not work, and why does the suggested fix work?

Let's step back to very basic Rust: when you have a generic parameter to (say) a function, what you're saying is that you want that particular implementation detail to be supplied by the use of the function. Here's an incredibly trivial example: I can write a polymorphic identity function, a function that simply returns the value given it, by giving it a type parameter like this:

fn identity<T>(x: T) -> T { x }

When we use the identity function with a value of a particular type—-say, by calling identity(22u32)—-we're also implicitly providing a concrete choice for the type T, which it can infer from the type of the argument. Rust also allows us to pass this type parameter explicitly, with the slightly unwieldy syntax identity::<u32>(22). Either way, the use site of identity provides the information necessary to choose a concrete type T.

The same goes for lifetime parameters: when we write a function like

fn ref_identity<'a, T>(x: &'a T) -> &'a T { x }

what we're saying is that the reference we're taking as argument lives some amount of time, but that amount of time is going to be known at the call site, and it'll get that information inferred from individual uses of ref_identity. When we call ref_identity(&foo), we're saying that the lifetime parameter 'a is going to be however long foo lives in that context.

Okay, so with that, let's look at the our first pass at writing load_from_file:

fn load_from_file<'de, T>(path: String) -> T
where
    T: serde::Deserialize<'de>
{
    let mut file = File::open(path).unwrap();
    serde_json::from_reader(&mut file).unwrap()
}

What's the problem here? Well, our type doesn't actually reflect what the body of our function does. Remember that the lifetime parameter we give to Deserialize corresponds to the lifetime of the input source to the deserializer, and we already know the lifetime of our source: it's the lifetime of our file variable. That lifetime isn't something we need to get from the caller—-for that matter, it's not a piece of information that the caller would even have access to! Expressing that as a parameter in this instance is blatantly incorrect.

...but then we've got a problem, because we need something to give Deserialize as the lifetime parameter, and Rust doesn't have a way of expressing, “The lifetime of this variable here in the scope I am currently defining.” We're caught: we need some lifetime parameter there, but it can't be an argument, and we can't name the specific individual lifetime that we know it should be.

So instead, one way of writing this is by saying, “This works for any lifetime we might care to give it.” The difference here is subtle but important: we aren't expressing the notion of “any lifetime you, the caller, want to give me”, we are expressing the notion of “any lifetime at all.”

It turns out that Rust has a syntax for this, which uses the for keyword:

fn load_from_file<T>(path: String) -> T
where
    T: for<'de> serde::Deserialize<'de>
{
    let mut file = File::open(path).unwrap();
    serde_json::from_reader(&mut file).unwrap()
}

If we try this, this function now works. And notice that the type parameter list now includes only one thing: the T type that we're deserializing to, which is exactly what we want!

The key here is the for<'a> T syntax: the for acts as a binder for one or more fresh lifetime parameters, which are then used in the following type. Here, we're introducing a new 'de lifetime and then filling that in in Deserialize. Importantly, this lifetime is now quantified over all possible lifetimes, not merely a lifetime that the calling context might supply. And of course, 'all possible lifetimes' includes the lifetime of the file variable inside the function!

The for<'a> T syntax is a feature called Higher-Ranked Trait Bounds and this feature was specifically necessary to support unboxed closures, but it ends up being useful in other situations—-like this one!

Earlier today I pasted a snippet of Haskell code into our work chat that contained a multi-parameter type class instance that looked more or less like this:

instance a ~ b => C a b where {- ... -}

The constraint a ~ b in this declaration is a type equality constraint: it means that a and b must be the same type. One of my coworkers saw this and remarked, more or less, “Why did you write it in that obtuse way? You could have just declared an instance for C a a, which means the same thing.”1

Those two declarations actually don't mean the same thing—-but I can't blame said coworker for thinking they did, because the difference between the two is subtle and probably won't come up unless you've done a fair amount of fiddling with type classes. To that end, I want to show an example where those two differ, and then explain why.

The Example

Let's say we have this (more than a little contrived) multi-parameter type class (which will require the MultiParamTypeclasses extension):

class PairOf a b where
  thePair :: (a, b)

I then define an instance that looks like this (which will require the FlexibleInstances extension):

instance Monoid a => PairOf a a where
  thePair = (mempty, mempty)

So far, so good! Now let's try a particular use of thePair in a (probably even more contrived) definition:

obtuseMempty :: Monoid t => t
obtuseMempty = fst thePair

What do I expect to happen in this code snippet? I would hope that obtuseMempty ends up being the same as mempty. In particular, I want thePair to resolve to the instance I defined above, and consequently evaluate to (mempty, mempty), and then, fst (mempty, mempty) == mempty. What actually happens when I compile this?

• Could not deduce (PairOf a b0) arising from a use of ‘thePair’ from the context: Monoid a bound by the type signature for: obtuseMempty :: Monoid a => a at sample.hs:11:1-29 The type variable ‘b0’ is ambiguous Relevant bindings include obtuseMempty :: a (bound at sample.hs:12:1) These potential instance exist: instance Monoid a => PairOf a a — Defined at sample.hs:8:10 • In the first argument of ‘fst’, namely ‘thePair’ In the expression: fst thePair In an equation for ‘obtuseMempty’: obtuseMempty = fst thePair

Well, that's not what I had wanted, but I guess it makes sense: thePair has to be a pair (because of the use of fst), but since we never use the second element in thePair, there's no reason for GHC to believe that it's the same type as the first. However, let's try tweaking the definition a bit: instead of PairOf a a, let's define it as PairOf a b and then include a constraint that keeps a and b equal (which will require either GADTs or TypeFamilies enabled):

instance (Monoid a, a ~ b) => PairOf a b where
  thePair = (mempty, mempty)

If we try compiling obtuseMempty with this definition, it works without any errors! But why does one work while the other doesn't? What's the difference between the two? The answer has to do with a subtlety of how type class resolution works, a subtlety that can be hard to observe right until it stares you in the face.

How Type Class Resolution Works

When GHC searches for an appropriate instance of a type class, it has to check against all the instances you've defined and find one that “matches” your use of the type class. This is pretty easy when you define instances for a basic type like Int or Bool. For example, using the following (pretty useless) type class:

class MyClass t where
  myValue :: t

instance MyClass Int where myValue = 0
instance MyClass Bool where myValue = True

If I use myValue in a context where it's being interpreted as Int, then Haskell will search through the available instances and find our declaration of MyClass Int. That's simple enough! Things get a bit more complicated when we add types that include type parameters. For example, this is a definition of MyClass for pairs of values:

instance (MyClass a, MyClass b) => MyClass (a, b) where
  myValue = (myValue, myValue)

As a point of vocabulary, everything before the => in this definition is called the context, and everything after it is called the head. So now, if I use myValue in a context where it's being interpreted as (Int, Bool), then GHC will find the matching instance MyClass (a, b), and then in turn tries to search for two other instances: one for MyClass Int and another for MyClass Bool.

Notice the precise way I phrased that, because this is where the basic intuition around type class instances can diverge from the actual implementation: it looks like the instance definition is saying, “If we have an instance for MyClass a and an instance for MyClass b, then we can also have an instance for MyClass (a, b).” This is a reasonable intuition, but it's precisely backwards from what Haskell is actually doing. What happens instead is that, when it sees myValue being used as a pair, GHC will first select the instance MyClass (a, b) without having even examined the constraints in the context. Only after it's already committed to using the instance MyClass (a, b) will it then examine the context and try to satisfy the MyClass a and MyClass b constraints. Or, to say it in a more concise but jargon-heavy way, it first matches the head, and once it has done so, attempts to satisfy the context.

Consider what happens if write the following code without having defined an instance of the form MyClass ():

blah :: (Int, ())
blah = myValue

The precise error GHC gives us is: “No instance for (MyClass ()) arising from a use of myVal.” GHC has already settled on the MyClass (a, b) instance, has found a satisfying instance for MyClass Int, but then cannot find an instance for MyClass ().

In this case, making that distinction seems a little bit pedantic. In code like this, it amounts to simply an implementation detail! It shouldn't really matter what order GHC is using underneath the surface: either way, the problem is that we don't have an instance for MyClass (), and as long as GHC reports that error to us in some way, it doesn't matter whether it first tries to satify the context before committing to to MyClass (a, b), or if it commits first and then tries to satisfy the context: we'll run into the same problem either way! But knowing the precise way GHC tries to instantiate type classes can be very useful in other contexts.

Repeated Types in Instance Heads

Let's look at our first problematic PairOf instance:

instance Monoid a => PairOf a a where
  thePair = (mempty, mempty)

Remember: GHC has to match the instance head first, and after that it solves the remaining constraints from the context! So, in the definition of obtuseMempty, what is the inferred type of the use of thePair?

obtuseMempty :: Monoid t => t
obtuseMempty = fst thePair

It should be obvious, but for good measure, let's replace thePair with a hole and find out what type GHC is assigning based on the surrounding context:

• Found hole: _ :: (a, b0) Where: ‘a’ is a rigid type variable bound by the type signature for: obtuseMempty :: forall a. Monoid a => a at sample.hs:11:17 ‘b0’ is an ambiguous type variable • In the first argument of ‘fst’, namely ‘_’ In the expression: fst _ In an equation for ‘obtuseMempty’: obtuseMempty = fst _

So GHC needs to find an instance of the form MyPair t b0. Let's look at our instances: do we have one that has a matching head? ...well, no, we only have MyPair a a.

“But wait!” you say. “We don't do anything with the second part of the pair, so why can't GHC choose MyPair a a as a valid instance?” But GHC can't read your mind, and there's nothing in your program to indicate that the type variables a and b0 are supposed to be the same. As soon as you somehow indicate that, GHC will comply and find your MyPair a a instance:

evenMoreObtuseMempty :: Monoid t => t
evenMoreObtuseMempty = let p = thePair in (fst p `mappend` snd p)

In this definition, it's clear that thePair is supposed to have two fields of the same type, because we're clearly using mappend to combine them. But is there a way of making it work even if we don't give GHC that extra nudge at the use site?

Type Equality to the Rescue

Let's move on to the second MyPair instance I described above:

instance (Monoid a, a ~ b) => PairOf a b where
  thePair = (mempty, mempty)

Using typical informal reasoning, it seems like this should be identical to the other instance definition: “It's an instance where both parameters are the same type, and that type needs to be a Monoid”. But remember how GHC chooses instances: first by matching the head, then by elaborating the constraints. And this definition is different in an important way: the fact that the two type parameters are the same type is no longer a feature of the head, but rather a feature of the context! So when we revisit our definition of obtuseMempty with this new definition, GHC behaves differently:

obtuseMempty :: Monoid t => t
obtuseMempty = fst thePair

The value thePair in this context still has the inferred type PairOf t b0 => (t, b0), but now we have an instance for PairOf a b, which means GHC can unify those and select the instance we want! And now that it has chosen that instance, GHC can take the constraints associated with that instance and try to solve them: the expression thePair now has the inferred type (Monoid t, t ~ b0) => (t, b0), which GHC can solve as: (Monoid t) => (t, t). After that, our function type-checks successfully!

So What's The Lesson?

A rough intuition is that you can think of type equality in code like this as helping you propagate the fact that types are equal in some contexts. In the examples above, we as programmers intended the two fields of thePair should have the same type, but the first definition—-the MyPair a a one—-meant we could only use myPair when GHC already knew those types were equal. By moving the type equality to the context, we can use myPair in places where the two types aren't already known to be distinct and introduce that fact. This isn't the only use of type equality—-type equality is very important in the context of type families, for example—-but it's nevertheless an important use.

GHC type class resolution is really very straightforward, even in the face of extensions like the ones we enabled in this post. In fact, part of the problem is that it's sometimes tempting to imagine GHC is doing something more clever than it actually is! Methodically stepping through the way GHC processes your code can really help understand situations like these, where intial intuitions sometimes lead you down the wrong path.


  1. Well, okay, they ACTUALLY assumed I wrote an excessively obtuse type signature as an elaborate way of trolling them. And to be fair, the code snippet I was pasting was a deliberately silly and ill-advised piece of code—-but the use of type equality wasn't part of the joke.

    ...well, if you MUST know, it was an instance definition very similar to the following:

    instance (a ~ b) => Num ((Integer -> a) -> b) where
      fromIntegral x f = f x
      {- the rest of this is not important here -}
    

    this allows you to use numeric literals as functions that take another function and apply that function to themselves. This gives you some cute syntactic sugar:

    newtype Pixels = Pixels { fromPixels :: Integer }
    
    width, height :: Pixels
    width = 320 Pixels
    height = 240 Pixels
    

    This is a disgusting and terrible, and please never do it.

Markov chains are reasonably common on the internet these days: they're often used as way of generating random data that looks similar—-at least, in a sense—-to some existing data set. A lot of the great examples of Markov chains come from combining two very different data sets and building a single Markov model from them:

  • King James Programming creates mashed-up versions of the King James Bible and the classic programming text The Structure And Interpretation of Computer Programs.
  • At The Modules of Madness creates mashed-up versions of the documentation for the Puppet tool and the eldritch horror novels of H.P. Lovecraft.
  • Erowid Recruiter creates mashed-up versions of recruiter emails from programming companies and people's accounts of taking various illicit substances.

I also have my own Markov bot, which right now is just trained on my own tweets1, and I might eventually decide to add my prose writing as a secondary source of data. It produces a few wonderful tweets and a whole lot of nonsensical crap.

I've just added a little feature which helps to visualize what's going on under the surface, so I wanted to explain how Markov chains work at a high level first, and then show how @guwan_aisamanra now visualizes the underlying mechanisms at work.

What is a Markov Chain?

There are various ways of thinking about Markov chains, but underlying them is the kind of abstract notion of steps in a sequence. You can think of these steps as being actions over time, or as words in a sentence, or as abstract symbols in any kind of sequence of symbols. It doesn't matter what: just a bunch of things one after another, in time or in space.

In many sequences like these, the next step might be random, but the odds of choosing a particular next step are often related in some way to the steps that came before. Let's take a simple contrived example: the capricious chef at your local cafeteria. Every day, the chef chooses one of three meal options: tacos, pasta, or soup. The chef chooses the next meal randomly, but is not allowed to repeat the same meal two days in a row. The chef is also inordinately fond of tacos2 and is twice as likely to pick tacos as any other option, but still won't make tacos two days in a row. A possible sequence of meal choices might go: “tacos, pasta, tacos, soup, pasta, tacos, pasta, tacos...”

This example has a very particular property where the choice of each step depends only on the previous step, but not any of the steps before the last one. If last lunch was tacos, then we have roughly equal chances of choosing pasta or soup as the next lunch because the chef has no preference between the two. On the other hand, if the last lunch was pasta or soup, then we have a two-thirds chance of choosing tacos as the next lunch, and one-third chance of choosing soup or pasta respectively, because the chef is allowed to make tacos and is more likely to make them than a different lunch. We can write a table that conveys the odds of a given meal choice given yesterday's meal choice:

yesterday's lunch today's lunch probability


tacos pasta ½ tacos soup ½ pasta tacos ⅔ pasta soup ⅓ soup tacos ⅔ soup pasta ⅓

The mathematical way of saying that the probability of a given lunch choice doesn't depend on any history of lunches except for the most recent one is that our sequence of lunches has the Markov property.

If some sequence of things—-words, mathematical objects, lunches, whatever—-has the Markov property, then we can build up a mathematical model of the underlying process. One convenient way of visualizing this model is as a graph with directed edges: each step is a node, and the directed edges between nodes are tagged with the probability that the steps of picking that step next. Consequently, the graph of the lunche pattern I described would look like this:

We can come up with a probably sequence of lunches by moving randomly starting in one state, and then following the edges from one state to another, choosing from our next edge according to the provided probabilities.

Implementing Markov Models

In software, we can implement a directed graph as a table that maps steps to a set of possible next steps, where the possible next steps are randomly chosen according to some set of probabilities. For simplicity in this implementation, I'm going to treat the set of next steps as a multiset from which we can choose any element with equal probability: if we choose at random from the multiset {tacos, tacos, pasta}, then we'll get tacos two-thirds of the time and pasta one-third of the time. This is a terrible representation in practice—-for certain situations, it can use a ridiculously wasteful amount of space—-but it makes the code in this blog post short and easy!

Let's implement a basic, inefficient-but-working Python version of a Markov model! We can represent our model in Python using a plain dictionary, with strings as our step names, and lists as our possible next steps:

model = {
  'tacos': ['pasta', 'soup'],
  # both of the following have double the chance
  # of choosing tacos
  'pasta': ['tacos', 'tacos', 'soup'],
  'soup':  ['tacos', 'tacos', 'pasta']
}

If we want to generate a sequence of lunches with a particular length, we can do so by starting with a random starting lunch, and then choosing our next lunch randomly according to the set of possible “next lunches” provided by the model:

import random

def generate_sequence(model, length=10):
    # choose our initial state randomly
    state = random.choice(model.keys())
    # our output sequence starts empty
    output = []

    # we loop equal to the desired length
    for _ in range(length):
        # each time, we add the current state
        # to the sequence
        output.append(state)
        # and choose a new state randomly from
        # the set of possible next states
        state = random.choice(model[state])

    # once we've gotten as many as we want, we
    # can return our sequence
    return output

If we run this model several times, we'll get various random sequences that conform to the patterns we'd expect:

>>> generate_sequence(model, length=5)
['pasta', 'tacos', 'soup', 'tacos', 'soup']
>>> generate_sequence(model, length=5)
['soup', 'tacos', 'soup', 'tacos', 'soup']
>>> generate_sequence(model, length=5)
['tacos', 'pasta', 'soup', 'tacos', 'pasta']
>>> generate_sequence(model, length=5)
['tacos', 'soup', 'tacos', 'soup', 'pasta']

In the example we've been using, our model was based on perfect mathematical knowledge of the underlying problem domain: we know exactly the logic the chef uses to determine the next lunch. However, in most cases, we don't have that simple knowledge, so instead we want to calculate a model based on some set of existing observed data. Using the same representation of Markov model, we can take a sequence of any kind of things and build a model out of it by looking at each pair of elements in the sequence, and building a model accordingly3:

def build_model(sequence):
    # our empty model knows nothing about anything
    model = {}

    # walking along the sequence, taking each step
    # as well as its subsequent step...
    for step, next_step in zip(sequence, sequence[1:]):

        # make sure that step has an entry in our model
        # data structure:
        if step not in model:
            model[step] = []

        # add the next_step to the list of possible next
        # steps
        model[step].append(next_step)

    return model

Now we can build a Markov model from a sequence of data even if we don't know what probabilistic relationships hold in that data set. This lets us take data we've found “in the wild” and build a model that resembles the raw data! Well, in a limited sense, as we'll see.

Markov Models of Non-Markov Sequences

It turns out that English text—-both on the level of individual letters, and on the level of sentences—-doesn't have the Markov property, because the choice of “next word” depends on a lot more than just the previous word: grammatical structures, semantic context, and so forth. Building and using a Markov mover from raw English text—-in the examples below, I'm using the Project Gutenberg text of Alice in Wonderland, stripped of everything but spaces and alphabetic characters turned lowercase—-produces some nonsense that's English-like from a distance but still mostly gibberish:

>>> alice = open('alice-in-wonderland.txt').read().lower()
>>> model = build_model([ch for ch in alice if ch.isalpha() or ch == ' '])
>>> ''.join(generate_sequence(model))
'xpr yoube '
>>> ''.join(generate_sequence(model))
'be che sof'
>>> ''.join(generate_sequence(model))
'ver athean'
>>> ''.join(generate_sequence(model))
'k wanigroo'
>>> ''.join(generate_sequence(model))
'jeado sshe'
>>> ''.join(generate_sequence(model))
'f tond mpo'

Similarly, a Markov chain run over the sequence of words in the text, rather than the characters, produces strings of English words which lack grammar or real sense:

>>> model = build_model(alice.split())
>>> ' '.join(generate_sequence(model))
'bat, and said alice, she wandered about in an opportunity'
>>> ' '.join(generate_sequence(model))
'coaxing tone, going to leave it had lost something; and'
>>> ' '.join(generate_sequence(model))
'oneself speak--and they are removed. of it; and then the'

Why does it produce such terrible nonsense? Well, because Markov chains have no memory! Each pair of words in sequence is sensible, but higher-level relationships like sentence structure don't get captured in the model, because they rely on larger amounts of context. Let's take a closer look at a four-word sequence generated by the Alice in Wonderland chain:

escape; so much evidence

We can look at as being composed of three overlapping sequential pairs of words, where each ordered pair of words must have appeared at least once in the source text, or else the model would not have provided a path between them. The first pair is escape; so, and it turns out that this word pair appears only once in the source text, in Chapter IV:

This seemed to Alice a good opportunity for making her escape; so she set off at once...

The pair so much appears at least nine times—-it is a very likely pair of words in English more generally, so this makes sense! I'll choose my favorite sentence in which that pairs appears, just as an arbitrary example:

'Curiouser and curiouser!' cried Alice (she was so much surprised, that for the moment she quite forgot how to speak good English)...

And the word pair much evidence appears only once, in Chapter XI:

Alice watched the White Rabbit as he fumbled over the list, feeling very curious to see what the next witness would be like, '—for they haven't got much evidence YET,' she said to herself.

In each case, training the Markov model produced the knowledge that these words appear in sequence somewhere in the source text, so it was okay for them to appear together in the output:

making her ESCAPE; SO she set off at once, and ran till s
ied alice (she was SO MUCH surprised, that for the moment
-for they haven't got MUCH EVIDENCE yet,' she said to her
           ESCAPE; SO MUCH EVIDENCE

But it's not really a full sentence, even if it seems more realistic than some of the others produced. Because we're choosing each word only based on the previous word, the model doesn't know about things like clauses or verbs or sentences, just word-pair frequency. I our training model contains a sentence like

behold the dishwasher, the greatest invention of mankind

then it might learn that it's okay to follow “the” with “dishwasher,” and also to follow “dishwasher,” with “the”, and produce a nonsensical sequence like this:

the dishwasher, the dishwasher, the dishwasher, the dishwasher,

(There's an even sillier example that arose from my own Markov bot: I once tweeted something that included the phrase “stabby stabby”, and consequently my Markov model learned that it was perfectly fine to follow the word “stabby” with “stabby”.)

Visualizing Markov Output Directly

It's sometimes very interesting and illustrative to compare the output of a Markov chain with the input provided to that Markov chain, because you start to see the places where the model “learned” particular patterns. To draw an example from my own bot: earlier today it tweeted [a short phrase]():

bad painter? it's happened in it

My bot is programmed to only start with words which appear at the start of a tweet elsewhere, so it chose to start with the word bad because it came at the start of this old tweet:

Bad idea: double-shot of espresso added to drip coffee (a “redeye”). OTOH now I can see infinity. So I got that going for me. Which is nice.

The word bad also appears in this more recent tweet, where it borrowed several pairs of words in sequence:

Did you know that in addition to being a bad programmer and a bad-opinion-haver, I am also a bad painter? It's true!

The word it's also appears in this tweet where it is followed by the word happened:

It usually happens with weird dream-like books. It's happened before with Dunsany and MacDonald. Last night, it was an R. A. Lafferty novel.

And the word happened also appears in this tweet, followed by a few more words:

Last time I had a fever, I watched all of the Tron cartoon. A+++ would feverishly watch again. ...no idea what happened in it.

In this case, this isn't just a reconstruction of what might have influenced the model, like in the Alice example above. To facilitate exactly this kind of introspection of Markov chain output, I've modified my own Markov bot to consistently record not just which words it's selecting, but also keep track of where it learned of a relationship between two words. That data is presented in a human-readable form on a new web page generated for each tweet and every new tweet generated will be included on a chronological index page.

A Markov model is an almost embarassingly simple example of machine learning, but it's a good, approachable example to start with—-and is also a great object-lesson in the way that machine learning can only picks up on trends that appear in its training data, and the way that the model might not be nuanced enough to pick up on every kind of pattern.


  1. They are very bad tweets.
  2. And who isn't?
  3. Again, this is a terrible representation for the list of possible next steps: a better representation might be a set of pairs, each representing the next step with the number of times it occurred in that context.

This is a question posed by Aaron Levin on Reddit:

In WAI, Application has the type Request -> (Response -> IO ResponseReceived) -> IO ResponseReceived. There is a note about how historically this was Request -> ResourceT IO Response. There is the following note on Hackage about this change:

Note that, since WAI 3.0, this type is structured in continuation passing style to allow for proper safe resource handling. This was handled in the past via other means (e.g., ResourceT).

A naive implementation might have type Application = Request -> IO Response. I would assume one could still handle appropriate resource cleanup (both on the client side and on the Wai/warp side) in this naive implementation, but I'm assuming I'm wrong in thinking that. So:

  1. Why is Application defined in CPS style?
  2. Is there an example of how resources cannot be cleaned up appropriately using a naive implementation above?
  3. Are there other reasons for the CPS style? (E.g. allows for more efficiency as network resources can be freed while your handler is running)

Like my last post, this is a piece of folklore that's been described in papers, mailing lists, &c, but I don't know of a single resource to point people to, so I'll explain it here.

I'm gonna explain some preliminary details about laziness, the trace function, and the bracket function; if you're already familiar with these, go ahead and skip to the section A Naïve Application Type. Alternately, if you want to puzzle it out on your own, you can skip this blog post entirely try running the code in this gist and figure out why you get the output you do—-all the code examples in this post are modified versions of the code that appears in that gist.

Some Preliminary Explanations

A big part of this post involves the fact that Haskell is lazy: when we write something in Haskell, it won't be evaluated until it's absolutely necessary that it be evaluated—-which might be never! For example, an expression like

>>> let f x y = x in f 5 undefined
5

will happily ignore our use of undefined, as we never actually use it. If we want to make sure that a given expression is evaluated, we need to use it somehow. One way of using a value is to pattern-match on it: in order to pattern-match against a constructor, we need to evaluate it at least enough to know what constructor was used.1 We can change the example above to pattern-match on the currently unused value:

>>> let f x y = case y of { () -> x } in f 5 undefined
*** Exception: Prelude.undefined

Now, we're pattern-matching on y, which forces it—-that is, evaluating the pattern-match causes y to be evaluated before it can determine whether or not it matches the pattern, which means we evaluate undefined and get a runtime error.

In general—-with the exception of code that loops forever, or inherently unsafe values like undefined—-laziness should be a detail we barely notice! In pure code that doesn't loop forever, evaluation order doesn't matter at all, and imperative code in Haskell uses monads like IO. Because each step in a monad has to be evaluated before the next step can proceed, those computations appear to be strict, and computations that aren't in a monad generally don't have observable side-effects that we can use to peek at the exact process Haskell is using to evaluate them.

However, Haskell has a few features we can use to peek into what's going on under the surface. One of them is the trace function, found in Debug.Trace, which has the signature

trace :: String -> a -> a

This takes a string and any other value, and will print the string as soon as the value is forced—-not before, not after. That makes it of sometimes questionable utility for debugging, as it's possible that our calls to trace won't run in the order we expect, or even won't run at all:

>>> let f x y = x in f 5 (trace "Hello!" undefined)
5

But because trace will print exactly when its result is needed, we can use it to peek into how Haskell chooses to evaluate our code.

Bracketing for Resources

The module Control.Exception.Base exports a function called bracket:

bracket :: IO a -> (a -> IO b) -> (a -> IO c) -> IO c

This function is used for managing resources that we need to clean up, like file handles or large pieces of data we don't want to keep in memory. The first argument is an IO computation that produces a resource, the second one is the cleanup function for that resource, and the third one is a function that uses that resource to produce a value prior to cleanup:

main :: IO ()
main = bracket
  (openFile "tmp.txt" ReadMode)  -- opening a resource: here, a file
  hClose                         -- closing the resource
  (\ handle -> do                -- what to do with the resource
       contents <- hGetContents handle
       putStrLn contents)

The common function withFile is implemented straightforwardly in terms of bracket almost exactly like I used it above. The major utility of bracket is that it knows how to correctly handle exceptions that occur while running computations: if an exception gets raised while the resource is being used, it will nevertheless ensure that the resource gets closed appropriately.

A Naïve Application Type

Let's try writing a toy version of WAI that doesn't actually do anything. We're going to pretend that we're writing some kind of server where some responses might depend on scarce resources, so we want to clean up those resource as soon as we possibly can.

Let's start with importing the functions above and defining some dummy types:

import Control.Exception.Base (bracket)
import Debug.Trace (trace)

-- Our dummy requests and responses
data Request = Request
data Response = Response

-- Our dummy resource
data SomeResource = SomeResource

-- Our naive @App@ type
type App = Request -> IO Response

Let's write our App handler first. Because we're trying to mimic a real application, we pattern-match on the Response value to force Haskell to evaluate it before returning:

runApp :: App -> IO ()
runApp app = do
  response <- app Request
  case response of
    Response -> putStrLn "Sending response"

Now, let's create an App that uses some “resource”. In this case, I'm going to fake it as well. Additionally, I'm going to include a call to trace so that we can be sure of when our Response value actually gets evaluated:

myApp :: App
myApp request = bracket createResource destroyResource respond
  where createResource = do
          putStrLn "Creating resource"
          return SomeResource
        destroyResource _ = do
          putStrLn "Destroying resource"
        respond SomeResource = do
          return (trace "EVALUATING RESPONSE" Response)

And now let's wrap this all up:

main :: IO ()
main = runApp myApp

When I run this program on my machine here's what I get:

$ runhaskell myapp.hs Creating resource Destroying resource EVALUATING RESPONSE Sending response

What does this mean? Well, this means that we end up destroying the resource we are trying to use even before the response gets constructed. That's not good! In this case, the resource is just a dummy value, but if it were instead, say, a file handle, then this program might close the file handle before we ever tried to read from it!2

But we're using bracket and everything! That should mean that the action we're passing to bracket—-the respond function we defined up above—-should run before the resource gets destroyed! Shouldn't it?

As a matter of fact, it is. The IO action is getting run, and the IO steps are getting evaluated eagerly. If we rewrite the myApp function slightly to insert a print statement there

myApp' :: App
myApp' request = bracket createResource destroyResource respond
  where createResource = do
          putStrLn "Creating resource"
          return SomeResource
        destroyResource _ = do
          putStrLn "Destroying resource"
        respond SomeResource = do
          putStrLn "Responding to request" -- A new print statement
          return (trace "EVALUATING RESPONSE" Response)

and then run it again, we'll see that the respond function is getting properly bracketed between resource creation and destruction:

$ runhaskell myapp.hs Creating resource Responding to request Destroying resource EVALUATING RESPONSE Sending response

The specific problem is that while IO steps are evaluated eagerly, the values that get returned are subject to the normal rules of laziness: they aren't forced until we use them. In this case, we don't actually force the Response value until the case-expression in runApp, which doesn't happen until after the entire myApp function—-bracket and all—-has finished running.

So the bigger problem is that, while bracket can ensure that IO actions are properly bookended with resource-related code, it can't make the same guarantees with respect to the values inside it. We need some way of forcing the value before the cleanup code gets run. We can rewrite myApp to do this:

myStricterApp :: App
myStricterApp request = bracket createResource destroyResource respond
  where createResource = do
          putStrLn "Creating resource"
          return SomeResource
        destroyResource _ = do
          putStrLn "Destroying resource"
        respond SomeResource = do
          putStrLn "Responding to request" -- A new print statement
          let response = trace "EVALUATING RESPONSE" Response
          case response of  -- pattern-match to force evaluation
            Response -> return response

This produces the output we want:

$ runhaskell myapp.hs Creating resource Responding to request EVALUATING RESPONSE Destroying resource Sending response

But this has a lot of drawbacks: it's ugly, it inserts apparently extraneous case statements, and I've papered over the fact that, if we had a more complicated data structure, we'd need to force evaluation of the entire thing, not just the outermost structure. But even worse, it's not a general solution: WAI is a library that exports the equivalent of runApp and expects users to write the equivalent of myApp, which means we'd be forcing every single user to remember to force their Response values. If a user forgets to insert the ugly and opaque code, then their program might start crashing or—-even worse—-running but with nonsensical results. What we want is some way of ensuring that any App must force its Response before cleaning up any resources.

CPS To The Rescue

The core idea of continuation-passing is that a continuation is a function that represents “what to do next”: a continuation is a callback function that expects the result of a previous computation as argument. Let's rewrite our program but with an App type that looks like the CPS-ey type WAI actually uses:

-- Our modified @App@ type
type App = Request -> (Response -> IO ()) -> IO ()

Now, our App takes two arguments—-the Request, as before, as well as a function that takes a Response and produces an IO action. The logic of “sending” the response now gets moved into the callback function we pass to the App:

runApp :: App -> IO ()
runApp app =
  app Request
      (\ response -> case response of
           Response -> putStrLn "Sending response")

Our implementation of myApp is almost the same, except we take an extra callback parameter, and instead of returning our Response, we pass it to that callback:

myApp :: App
myApp request cb = bracket createResource destroyResource respond
  where createResource = do
          putStrLn "Creating resource"
          return SomeResource
        destroyResource _ = do
          putStrLn "Destroying resource"
        respond SomeResource = do
          cb (trace "EVALUATING RESPONSE" Response)

If I run this version of the program, I get this output:

$ runhaskell myapp.hs Creating resource EVALUATING RESPONSE Sending response Destroying resource

This is much better! This ensures that the resource outlives the Response, and because now the callback is responsible for forcing the Response, it can guarantee that it gets done. This lets us ensure that propery of any App! This is something we couldn't ensure with the previous one, because it would have required some way of reaching “inside” the App to force the return value. By having a callback that the user has to call, we can force it ourselves without having to hope that the user does!

...well, except there's that remaining little issue: using the implementation above, the user doesn't have to call the callback, so we could write an App that produces no Response at all:

myBadApp :: App
myBadApp _ _ = return ()

We can fix this by creating a new type—-WAI calls it ResponseReceived—-and hiding every way to construct it except using the callback. That way, the user has to use the callback at least once in order to get a ResponseReceived value, which they can return from their App. (We will have the problem that we can call the callback more than once: we could keep working on solutions, but we'd have to move to more complicated solutions and WAI ends up suggesting that you just not do that.)

So this is why the CPS style is indeed necessary for resource management here: it's not enough to use bracket, you have to use bracket and have the appropriate amount of strictness3, and the CPS style helps you achieve that.


  1. The more technical way of saying this is that we're evaluating it to Weak Head Normal Form.
  2. This is related to the problems with lazy IO in general, and we can reproduce the problematic behavior like this:

    import Control.Exception.Base (bracket)
    import System.IO (IOMode(ReadMode), openFile, hClose, hGetContents)
    
    main = do
      cs <- bracket (openFile "tmp.txt" ReadFile)
                    hClose
                    hGetContents
      putStrLn cs
    

    If we run this program in a directory that contains “tmp.txt”, we'll get the error message

    tmp.txt: hGetContents: illegal operation (delayed read on closed handle)

    This is because hGetContents only returns a string lazily, which might be okay in some cases... but here, we aren't using the return value until after the call to bracket, so again, the file will already be closed!

  3. This is also the problem with lazy IO in general, and it's no coincidence that the iteratee/pipe/conduit approaches to resource management also look reasonably CPS-ey.

The 'WriterT space leak' is something that gets talked about a lot in Haskell folklore. Here's a short explanation of what that means.

Defining WriterT

The Writer monad represents computations with “extra output”: produced data, logging messages, and so forth. That “extra output” has some constraints: we need to know what it means to have an “empty” piece of that data (so we can construct values with no extra output) and we need to know how to combine or append the “extra output” (so we can run two distinct computations and combine them appropriately.) We can get those properties in Haskell by ensuring that the “extra output” parameter has a Monoid constraint.

The WriterT transformer is defined like this:

newtype WriterT w m a =
  WriterT { runWriterT :: m (a, w) }

which is to say: it wraps an underlying monad m, which represents a computation that produces a pair of a (the result of our computation) and w (the “extra output” parameter). The Monad instance (I'm eliding the Functor and Applicative instances) looks like this:

instance (Monoid w, Monad m) => Monad (WriterT w m) where
  -- return the provided value, filling in the "extra output"
  -- with an "mempty" value.
  return x = WriterT $ return (x, mempty)

  -- to chain two WriterT computations together...
  m >>= f = WriterT $ do

    -- A: run the first computation, fetching its result
    -- and extra output
    (x, w)  <- runWriterT m

    -- B: produce the second computation by applying f to the
    -- result from the first computation, and then run it,
    -- fetching its result and extra output
    (y, w') <- runWriterT (f x)

    -- C: return the final result with the pieces of
    -- extra output from each computation combined together.
    return (y, w `mappend` w')

The last thing we need is the tell operation, which is the function we use to create our “extra output”:

tell :: (Monoid w, Monad m) => w -> WriterT w m ()
tell o = WriterT $ return ((), o)

A simple contrivted WriterT computation looks like this:

sample :: WriterT (Sum Int) Identity ()
sample = do
  tell (Sum 2)
  tell (Sum 3)
  tell (Sum 4)
  return ()

running this produces a return value of () as well as “extra output” equivalent to Sum 2 <> Sum 3 <> Sum 4 === Sum (2 + 3 + 4) === Sum 9. Sure enough:

>>> runIdentity $ runWriterT sample
((), Sum 9)

In Space Leaks, No-One Can Hear You Scream

Well, what's the problem? Let's look more closely at the definition of >>=. In particular, notice that in order to start evaluating the call to mappend in step C, we need to have the output from both steps A and B. ...except step B represents evaluating the entire rest of the computation, including an arbitrarily large number of other calls to >>=.

If we started evaluating the first call to >>= in runWriterT sample, then step A represents evaluating tell (Sum 2), and step B represents evaluating tell (Sum 3) >>= (\_ -> tell (Sum 4) >>= (\_ -> return ())). We get our output value w from the first one, and then we have to hold on to it while running the entire second computation in order to get the second extra output value w' before we can mappend them in step C. This problem appears with every call to >>=: we have to finish the entire subsequent chain of binds before we get the extra output to combine.

This is especially silly for sample, where I've chosen to use Sum Int as the extra output. It's just an accumulator! But in order to actually start adding those numbers together, we have to go through the entire chain of binds and produce each number first, each of which has to be kept around until we get to the final return, and only then can we go back and start adding the numbers we've generated together. In this case, it's not gonna kill our heap, but consider a computation like

largerSample :: WriterT (Sum Int) Identity ()
largerSample = sequence [ tell (Sum 1) | _ <- enumTo 1000000 ]

This computation will contain 1000000 calls to >>=, which means we will generate 1000000 instances of the number 1 first, and only then can it start adding them together. This can't be changed with strictness annotations or modifications to evaluation stategy: we'll need to rearrange the way that data depends on other data.

Alleviating the Space Leak

The problem solution is described in short by Gabriel Gonzalez here and here on the mailing lists, but I'll explain the same thing slightly more verbosely:

What we really want is for the call to mappend to happen earlier, which means moving around the way our intermediate values depend on each other.1 We've written our WriterT so that the “extra output” is always returned from a computation, and never passed to a computation: that means a given computation doesn't “know” about the intermediate value of that value, and we can only run our mappend operation after we've finished it. Let's try writing an alternate WriterishT monad that differs by also taking the “extra output” from previous steps:

newtype WriterishT w m a =
  WriterishT { runWriterishT :: w -> m (a, w) }

The w on the left side of the arrow represents the intermediate value of the “extra output” that comes from previous steps. We can move the mappend step from the implementation of >>= into the tell function, which now combines the new value being “told” with the previous values:

tellish :: (Monoid w, Monad m) => w -> WriterishT w m ()
tellish w = WriterishT $ \w' -> return ((), w <> w')

The implementation of return used to return mempty, but now it's taking an intermediate value it might need to combine with the 'new state', so it would look like this:

return x = WriterishT $ \w -> (x, w <> mempty)

Luckily, the Monoid laws tell us that w <> mempty === w, so we can simplify that definition down a fair amount. This ends up removing all mention of the Monoidal nature of our data in the Monad instance, so we can omit the Monoid w constraint:

instance (Monad m) => Monad (WriterishT w m) where
  -- the motivation for this implementation is explained
  -- up above...
  return x = WriterishT $ \w -> (x, w)

  -- The bind instance looks pretty similar, but again, we
  -- need to take the current state of our accumulator:
  m >>= f = WriterishT $ \w -> do
    -- A: we pass that on to the first computation...
    (x, w')  <- runWriterishT m w
    -- B: and pass that on to the second computation...
    (y, w'') <- runWriterishT (f x) w'
    -- C: and return just the second state, because the
    -- call to 'mappend' was taken care of inside one
    -- of these.
    return (y, w'')

Now, the call to mappend have all the data they need as soon as tellish happens. Step C only relies on the “extra output” from B, which itself only relies on the “extra output” from A. That means that we've already run mappend before step B, and can throw away the older intermediate value.

...this is a little bit dishonest, though: the type that we've made is significantly more powerful than just a WriterT. One feature of WriterT is that there's no way for a computation to peek at or rely on the “extra output” from previous steps. However, if we aren't careful about the interface to our WriterishT monad—-in particular, if we don't hide the constructor—-a user could write code like this

getCurrentOutput :: WriterishT w m w
getCurrentOutput = WriterishT $ \w -> return (w, w)

poorlyBehaved :: WriterishT (Sum Int) Identity String
poorlyBehaved = do
  Sum n <- getCurrentOutput
  if n == 0
    then return "zero"
    else return "nonzero"

which would be impossible in WriterT! (If you can't see why, it's worth trying to write it and seeing what walls you come up against.) This is because, in point of fact, what I've been calling WriterishT is literally just StateT by a different name! That makes it much more powerful than WriterT, but accordingly it removes some of the guarantees you get by using a WriterT, as well.

So in summary, the WriterishT or StateT can alleviate the space-leak problems with WriterT, but at the cost of some of the desirable properties of WriterT. In most cases, the space leaks from WriterT won't turn out to be a huge problem, but it's good to know what's happening and why it comes about!


  1. This is a good way of thinking about a lot of performance problems in a lazy language: not in terms of 'steps', like in an imperative language, but in terms of dependencies among your subexpressions.

One current pain point for certain Haskell programs is Cabal's data-files feature. This feature is a little bit awkward and weird: it involves depending on a generated file but also filling in a default implementation for that file, and there are some obscure and unclear edge cases in its use. One difficult edge case is that, while most uses of data files are in applications, it's nonetheless possible for libraries to depend on data files, in which case it's very difficult to induce Cabal to produce final relocatable bundles that contain both executables and the data files on which they depend.

I was musing on a hypothetical future way of solving this problem: once the lightweight module system Backpack is fully implemented in Haskell, we could build a data-files-like mechanism in a more convenient way by modifying the current design to use module-level mixins. The rest of this post will sketch out what that design might look like.

I should stress that I'm describing a possible solution, and not the solution. I'm by no means indicating that this is the best way of solving present problems with data files, and I have absolutely no indication that anyone else would want to solve the problem like this. That said, I think this is an interesting and motivated design, and I'd be happy to discuss its strengths and weaknesses, as well as other possible design choices that address the same issues.

Right now, I'm using Edward Yang's blog post A Taste Of Cabalized Backpack as my primary guide to Backpack-in-practice. I don't have a Backpack-enabled GHC and Cabal on hand, and so I haven't actually run any of this: this should right now be treated effectively as pseudocode. I also assume familiarity with Cabal's data files support; if you're in need of an introduction or a refresher, you should read the post Adding Data Files Using Cabal.

An Abstract Signature for Data Files

In our hypothetical Backpack-enabled-data-files-support future, we start by creating a signature that corresponds to the generated Paths_whatever module. To this end, we can create an .hsig file with a declaration like this:

module Dist.DataFiles (getDataFileName) where
  getDataFileName :: FilePath -> IO FilePath

This defines an abstract module called Dist.DataFiles that exposes a single function, getDataFileName, with no actual implementation. We can expose this signature by creating a package, data-files-sig, that exposes only this signature:

    name:               data-files-sig
    version:            1.0
    indefinite:         True
    build-depends:      base
    exposed-signatures: Dist.DataFiles

This would be a standard package—maybe even part of base—that can be consistently and universally relied on by libraries that require some kind of data file support.

Creating A Library With Data Files

Now, let's create a library that needs a data file. In this case, the library will do nothing but read and return the contents of that data file:

module Sample.Library (getSampleFile) where

import Dist.DataFiles (getDataFileName)

getSampleFile :: IO String
getSampleFile = getDataFileName "sample-file" >>= readFile

Now we need to create a corresponding .cabal file for this library. Because we're using Dist.DataFiles, we need to import that signature from the data-files-sig module. Importantly, we still don't have an implementation for getDataFileName. Because of that, our package is still abstract, or in Backpack-speak, indefinite:

    name:            sample-library
    indefinite:      True
    build-depends:   base, data-files-sig
    exposed-modules: Sample.Library

Depending On A Library With Data Files

In order to write an application that uses sample-library, we need to give it a module that's a concrete implementation of the Dist.DataFiles signature. In this case, let's create an implementation manually as part of our application.

First, let's write a small application that uses sample-library:

module Main where

import Sample.Library (getSampleFile)

main :: IO ()
main = getSampleFile >>= putStrLn

We still don't have that concrete implementation for getDataFileName, though, so let's write a simple module that exports the same name with the same type:

module MyDataFilesImpl (getDataFileName) where

import System.FilePath ((</>))

getDataFileName :: FilePath -> IO FilePath
getDataFileName path = pure
  ("/opt/sample-application" </> path)

Now, when we write our .cabal file for this application, we also need to specify we want to use MyDataFilesImpl as the concrete implementation of Dist.DataFiles for sample-library. That means our .cabal file will look like this:

    name: sample-application
    build-depends:
      base,
      filepath,
      sample-library (MyDataFilesImpl as Dist.DataFiles)

Now, all our abstract signatures are filled in, so this application is no longer indefinite, and we as developers have a convenient way of telling sample-library where we want it to look for its data files. In fact, one advantage of this system for data files is that we could import two libraries that both depend on the Dist.DataFiles signature but tell them to look in two different places for their data files, like this:

    name: other-application
    build-depends:
      base,
      lib-one (OneDataFilesImpl as Dist.DataFiles),
      lib-two (AnotherDataFilesImpl as Dist.DataFiles)

If there are reasonable default implementations for Dist.DataFiles, we could also put those on Hackage and reuse them in much the same way.

A Final Sprinkling Of Magic

In this case, I'm still missing a major part of Cabal's data-files support: namely, we want to shunt the responsibility from the developer to Cabal, so that we have support for things like relocatable builds. So in a final bit of handwaving, let's stipulate that our tooling in this hypothetical future has a special case to deal with applications that expose an indefinite Dist.DataFiles signature: Cabal could notice this situation, and fill those signagutres in with sensible implementations based on the commands and configurations we're using.

For example, if my .cabal file for sample-application above didn't supply a concrete implementation for Dist.DataFiles, then a default one could be chosen for development that's equivalent to:

-- as automatically generated by cabal
module Dist.DataFiles (getDataFileName) where

getDataFileName :: FilePath -> IO FilePath
getDataFileName = pure

That is, the application will just look for the file in the current directory.

If the developer started preparing the package for release, and changed the configuration appropriately, then the automatically generated getDataFileName could be modified to reflect that, replacing the automatically generated code with something more like

-- as automatically generated by cabal
module Dist.DataFiles (getDataFileName) where

import System.FilePath ((</>))

getDataFileName :: FilePath -> IO FilePath
getDataFileName path =
  pure ("/usr/share/sample-application" </> path)

This would be admittedly a little bit “magical”, but it would be a small and easy-to-explain bit of magic, and it would have the advantage of affording a kind of flexibility that the current approach to data files lacks.

Is This How It's Actually Gonna Work?

Probably not! Backpack is still a ways out, and this would require opt-in from many parts of the Haskell ecosystem, and the problem it solves could probably also be solved in numerous other ways I haven't considered. But this post describes a point in the design space that I think is at least worth weighing!

I buy a fair number of comic books. Often I buy them as standalone graphic novels or as collected trade paperbacks, but if I really like an ongoing series1 I'll buy individual issues as they come out.

This poses a problem for me: I'm not collecting them, so I'm not really interested in keeping them in plastic sleeves in special boxes, but I do want to keep them in reasonable, readable condition and have them on-hand and organized. Some people will bind their comics into volumes, which is really cool, but not exactly what I want.

I leaned towards something like a slipcase for some number of issues. Some of these exist, but when I looked they were expensive (around \$20 per case) and kind of boring-looking, so I opted to make my own. I made some prototypes by hand, but I wasn't quite happy with the quality of my worksmanship on them, so: I finally got around to creating one via laser-cutting instead.

The Design

The basic design is obvious: a box, tall and wide enough for a comic issue and deep enough for some number of them. For the prototype, I chose five deep2, and did some measurements of a stack of five comics: I had to accomodate a volume of roughly 6.75 in × 10.25 in × 0.375 in. The only bits of the design I wasn't sure about were the edges and the front and back panels. The edges I wanted to have a cut-away tab so one could easily pull the comics out, but I figured I could use a shape that was more interesting and unusual than just a half-circle. The front and back would be bland blank: I considered engraving an image into them, or leaving them blank for later painting, but ended up deciding to cut a window into them so you could see glimpses of the front and back covers of the comics inside. Despite not knowing the exact shapes of either of those features, I went ahead with designing.

The Process

I created the blueprint as a Haskell program using the diagrams package. The code was reasonably straightforward: as I wanted to experiment with the pull-tab shape and the window shape, I represented those as arguments to a function comicBox which drew the rest of the slipcase, allowing me to easily create several variations on the details as I went.

I already decided to use Ponoko to cut the box, so I also included some utility functions that simply colored lines the appropriate color for cutting or scoring according to Ponoko's guidelines—-0x0000ff lines for cutting, and 0x00ff00 for scoring.

The code is actually not all that long:

import Diagrams.Prelude
import Diagrams.Backend.SVG.CmdLine

-- some helpers
toLine :: [(Double, Double)] -> Diagram B
toLine ls = fromOffsets (map r2 ls) # strokeLine

cut :: Diagram B -> Diagram B
cut x = x # lc (sRGB 0.0 0.0 1.0)

crease :: Diagram B -> Diagram B
crease x = x # lc (sRGB 0.0 1.0 0.0)

-- width and height of space for comics
cW, cH :: Double
cW = 6.75
cH = 10.25

-- a slipcase parameterized by a depth, and a window and edge shape
comicBox :: Double -> Diagram B -> Diagram B -> Diagram B
comicBox cD window edge = reflectX fullS ||| spine ||| fullS
  where spine = rect cD cH # explodeTrail
                           # zipWith ($) [crease,cut,crease,cut]
                           # mconcat
        side = (translate (r2(3.375,5.125)) $ (sShape # crease)
                 <> (edge # cut))
                 <> (window # cut)
        sShape = toLine [(-cW,0), (0,-cH), (cW,0)]
        top = toLine [(0,cD), (cW,0), (0,-cD)] # cut # centerXY
        fullS = beside unitY side top === reflectY top

-- The window shape is a bunch of squares, and the tab shape is
-- a kind of anglular \__/ thing.
main :: IO ()
main = mainWith (comicBox 0.375 windowShape edgeShape)
  where windowShape = let row = hsep 0.5 (replicate 3 (square 1))
                      in  centerXY (vsep 0.5 (replicate 4 row))
        edgeShape =
          let offs = [ (0,    -0.35)
                     , (-0.1, -0.1)
                     , (0,    -0.1)
                     , (0.1,  -0.1)
                     , (0,    -0.35)
                     ]
          in toLine offs # scaleX cW # scaleY cH

I decided on a simple array of grid squares for the window shape, and an angular kind of tab shape, which you can see below.

This produced a nice clean image, but not one that fit all the specifications for Ponoko plans—-it needed to be a specific size, and lines for cutting needed to be a specific thickness. Instead of finding the right functions to get my program to emit the correct line thicknesses and image size, I decided to just clean it up in Inkscape and copy it onto the templates for cutting that Ponoko supplied. I might go back and rewrite the code to spit out a valid uploadable image at some point, but it wasn't a priority.

I opted for cardboard as a material: it's not as thick as the chipboard I'd prefer, but it means I could fold (and not have to assemble) the box, and it's relatively cheap. I considered bamboo or some kind of thin wood, but for the first prototype, at least, cardboard seemed like a simpler choice: I can always try other materials later.

I used some of the excess space on the board for an unrelated experiment that involved a lot of engraving, so it was more expensive than it would have been had I simply opted for the cutting and scoring. When I get more made, I won't use that excess space for engraving-heavy things, so the price should be much cheaper.

After cleaning it up the design in Inkscape, I was ready to upload it and wait about a week for it to arrive!

The Result

I'm pretty happy with the final result, which is shown here containing five issues of a comic. I didn't think to take a photo of the raw cardboard sheet before punching it out and gluing it together, but you can easily extrapolate what it must have looked like.

Reflections and Next Steps ==========================

The biggest problem is that the slipcase is too snug—-something I anticipated, but I wanted to see exactly how snug it would be before I scaled up. The next iteration will have some more breathing room, which will make it easier to put comics in and take them out.

The next iteration will also have the comic name engraved on the spine of the case: this will make it take slightly longer to engrave, but should be negligible overall.

I also am going to modify the windows on the front and back and the tab on the side in a thematically appropriate way: I am already experimenting with various motifs I can use for particular comics. For example, I could create a slip-case for storing issues of Bitch Planet that has a window on the front and back in the shape of the Non-Compliant emblem.

My only real annoyance during the project is that the diagrams library has some weird corners and odd API design choices3. It is also a truly massive library. At one point, I deleted my sandbox, started rebuilding, walked to a nearby café4, had a coffee and sat for a bit, walked back, and found it still compiling.

Still, for the most part, it all worked without a hitch. And it was a lot of fun: creating a useful physical object5 from a screenful of code is a pretty cool feeling!


  1. Some I've bought recently or semi-recently: Lumberjanes, The Midas Flesh, Prophet, The Wicked + The Divine, Bitch Planet, Casanova, Sex Criminals, Trees, and Supreme: Blue Rose. I don't usually buy Marvel or DC comics as issues, because they tend to be chock-full of ads, but otherwise I would buy issues of Hawkguy, Squirrel Girl, and Ms. Marvel.
  2. I arbitrarily decided that the first comic I would store like this would be Sex Criminals, which has five issues per trade paperback—-therefore, five issues for the first slipcase.
  3. For example: each diagram has its own local coordinate space, so each diagram has its own “origin”. Depending on how the shape was created, the “origin” can be in a very different place: paths created from lists of points start at their origin, but primitive shapes are centered at their origin. You can of course translate images relative to their local origin.

    On top of that, the basic way of putting diagrams next to each other is using either the === operator (for juxtaposing them vertically) or the ||| operator (for juxtaposing them horizontally.) These arbitrarily produce diagrams whose origin is the same as their left argument, unless the left argument is mempty, in which case the origin is the same as their right argument. All these choices together strike me as weird defaults: I'd have been less surprised by the library had all operations been biased towards working with diagrams whose local origin was in a consistent place, but === and ||| seem to assume that the local origin is in the top left, while square and other primitive shapes produce local origins in the center.

  4. I live in Portland, which means the café was only two blocks away. My point still stands.
  5. We have a 3D printer at my office, so I could create objects with that, but 3D printers—-especially low-end ones like the Makerbot—-are of remarkably limited utility as far as the objects you can create. 3D printing seems like magic at first, but after having created dozens of crapjects, you quickly realize that very few useful objects can be made out of small-ish chunks of rough plastic.

As part of recent type-refactoring efforts in Haskell, a discussion about adding Semigroup as a parent class of Monoid has been bouncing around the mailing list. From a theoretical point of view, this is a great idea: it is more flexible than the current approach that would allow for greater expressibility.

From a practical point of view, however, I am inclined to oppose it. Not because it is in itself a bad change—it's a very reasonable change that has advantages for new code—but because I have, in the past, had to update large systems written in Haskell after GHC updates, and therefore I know that this kind of change has a cost. The Applicative-Monad changes seem to have made way for the Foldable-Traversable Prelude, but those have in turn inspired a number of suggestions for modifications to the Haskell standard library, each one of which, taken on its own, is reasonable, but taken as a mass, mean much more work for maintainers. This is especially true if we continue on this path of making minor refactorings at each release: each year a project will need changes, or it will quickly bit-rot beyond utility.

Default Superclass Instances

There is, however, an alternative I would like to discuss—one which has also been brought up on the mailing list, but which I'd like to provide more concrete motivation for. Some of these changes—especially the Semigroup/Monoid split—seem to involve taking the functionality of a class and splitting it into multiple smaller classes. For example, we can think of the Semigroup/Monoid change as converting

class Monoid m where
  mempty  :: m
  mappend :: m -> m -> m

into1

class Semigroup m where
  mappend :: m -> m -> m

class Semigroup m => Monoid m where
  mempty :: m

Something that has been proposed before (in a few different forms) and which I suggest be more actively considered if changes like these are to become common is to allow superclass instances to be declared within a subclass declaration. This would allow you to write a single instance declaration for a class, and in that body also include implementations for methods belong to a superclass of that class. As an example, perhaps we could be able to write2:

newtype MyId a = MyId a

instance Monad MyId where
  -- Functor method
  fmap f (MyId x) = MyId (f x)

  -- Applicative methods
  pure = MyId
  MyId f <*> MyId x = MyId (f x)

  -- Monad methods
  return = MyId
  MyId x >>= f = f x

For the Monoid/Semigroup proposal, this would mean that any Monoid instances that exist would continue to work unchanged, but new instances could (optionally) split apart their declarations. Under this proposal, either of these would be acceptable:

class Semigroup m where mappend :: m -> m -> m
class Semigroup m => Monoid m where mempty :: m

-- combined `instance` declarations:
instance Monoid [a] where
  mempty = []
  mappend = (++)

or, equivalently,

class Semigroup m where mappend :: m -> m -> m
class Semigroup m => Monoid m where mempty :: m

-- split apart `instance` declarations
instance Semigroup [a] where
  mappend = (++)

instance Monoid [a] where
  mempty = []

And because the Monoid declaration for [] is already written like the former, we can make the Semigroup/Monoid split without having to rewrite the instance declarations!

Exciting Additional Opportunities

Because this lowers the cost of updating for new versions, various other useful changes might be considered that would otherwise involve far too much breakage. For example, we could consider splitting Num apart into small constituent parts. With Default Superclass Instances, we could refactor Num into the following without changing any instance declarations:3

class Add a where (+) :: a -> a -> a
class Sub a where (-) :: a -> a -> a
class Add a => Zero a where zero :: a

class Mul a where (*) :: a -> a -> a
class Mul a => One a where one :: a

class (Zero a, One a) => FromInteger a where
  fromInteger :: Integer -> a

  instance Zero a where zero = fromInteger 0
  instance One a where one = fromInteger 1

class Signed a where
  negate :: a -> a
  abs    :: a -> a
  signum :: a -> a

class ( Eq a, Show a, Add a, Sub a
      , Mul a, FromInteger a, Signed a) => Num a where

which would allow certain numeric types to only implement a subset of the relevant operations:

data Nat = Zero | Succ Nat

instance Add Nat where
  Z   + y = y
  S x + y = S (x + y)

{- et cetera --- but no implementations for e.g. Signed,
 - which is not meaningful for `Nat`! -}

and also allow current Num functions to have a looser set of constraints than they do at present:

sum :: (Zero a, Add a) => [a] -> a
sum (x:xs) = x + sum xs
sum []     = zero

prod :: (One a, Mul a) => [a] -> a
prod (x:xs) = x * prod xs
prod []     = one

We could also consider splitting Arrow4 into distinct components, again without having to change any instance declarations:

class Category a => Pairwise a where
  first  :: a b c -> a (b, d) (c, d)
  second :: a b c -> a (d, b) (d, c)
  (***) :: a b c -> a b' c' -> a (b, b') (c, c')
  (&&&) :: a b c -> a b c' -> a b (c, c')

class Pairwise a => Arrow a where
  arr :: (b -> c) -> a b c

or even (dare I dream) splitting Bits into something that is not a 22-method monstrosity!

Potential Drawbacks

On the other hand, this proposal does have some down-sides:

Grepping for Instance Declarations

Right now, I can often find an instance declaration for a type T by grepping for instance C T (modulo some line noise) whereas with this change, it's possible that there is no declaration for instance C T, because all of C's methods are declared by a subclass C' instead. The compiler ought to be able to deal with this without problem, which means that tools like Haddock documentation should somewhat alleviate this problem, but a user might be confused.

Introduces New Possible Errors

The declarations below are of course nonsensical, and would be rejected by the compiler—-but the fact that this change would introduce new failure conditions at all is a drawback of the proposal.

instance Semigroup Int where
  mappend = (+)

instance Monoid Int where
  -- this conflicts with an existing declaration
  mappend = (*)
  mempty  = 1

A Pragma-Less Extension

In order to be really useful, we'd want to use this without a LANGUAGE pragma. After all, we're arguing for it on the basis of preserving backwards-compatibility, but that argument is much less compelling if we still have to change the source files to make use of it! On the other hand, that means we'd have included a GHC extension that takes effect despite not being enabled, which is also a worrying precedent!

It still might be a useful extension even if it had to be enabled by a LANGUAGE pragma, as it is easier to add said pragma to a source file than to manually break apart dozens of instance declarations, but it makes this feature less compelling in general.

In Conclusion

As I said before, my primary objection to typeclass changes like those above—-even if they are good changes for newly written code—-is that they break existing code. I don't want to have to modify a handful of miscellaneous instance declarations on a yearly basis as people discover new levels of abstraction or become dissatisfied with current choices, especially as those changes will grow more extensive as I build more projects in Haskell! But with an extension like this, we could grow the typeclass ecosystem gradually and fix what we see as past warts while maintaining backwards-compatibility, which would be a very powerful tool to have.


  1. This is perhaps more simplistic than we want: we can also use the existing Semigroup class from the semigroup package and then, in the Monoid class declaration, explain how to derive the methods of the superclass if no superclass instance is present. This, according to the linked proposal, would look like:

    class Semigroup m => Monoid m where
      mempty  :: m
      mappend :: m -> m -> m
    
      instance Semigroup m where
        (.++.) = mappend
    

    The example above is slightly simpler, which is why I relegated this version to a footnote.

  2. This isn't a concrete proposal, so maybe the actual syntax or semantics of these things should be changed! I want to focus on the feature and not the instantiation.
  3. For this example, I added Zero and One classes so that a given type might implement an additive and multiplicative unit while not necessarily having a sensible FromInteger implementation. For example, it might not make sense to implement fromInteger for a complex number, but complex numbers clearly have both an additive and multiplicative unit:

    data Complex = Complex Float Float deriving (Eq, Show)
    {- ... -}
    instance Zero Complex where zero = Complex 0.0 0.0
    instance One Complex  where one  = Complex 1.0 0.0
    

    This means that the Sum and Product monoids could be rewritten like:

    newtype Product a = Product { getProduct :: a }
      deriving (Eq, Show)
    
    instance (One a, Mul a) => Monoid (Product a) where
      mempty        = Product one
      x `mappend` y = Product (getProduct x * getProduct y)
    

    Notice that I added Zero and One in such a way that an existing Num instance declarations needn't be changed!

  4. I have on numerous occasions had reason to use the Arrow abstraction, but haven't had a sensible implementation of arr. To use a slightly contrived example, I could define a GADT that can describe the structure of boolean circuits in a way that resembles an Arrow, but has no way of expressing arr:

    data Circ a b where
      BoolFunc :: (Bool -> Bool) -> Circ Bool Bool
      Ident    :: Circ a a
      Compose  :: Circ a b -> Circ b c -> Circ a c
      First    :: Circ a b -> Circ (a, d) (b, d)
      Second   :: Circ a b -> Circ (d, a) (d, b)
      Parallel :: Circ a b -> Circ a' b' -> Circ (a, a') (b, b')
      Fanout   :: Circ a b -> Circ a  b' -> Circ a (b, b')
    
    instance Category Circ where
      id  = Ident
      (.) = flip Compose
    
    instance Arrow Circ where
      first  = First
      second = Second
      (***)  = Parallel
      (&&&)  = Fanout
      arr    = error "Nothing sensible to put here!"
    
    -- for example, using this definition, we can write xor:
    xor :: BoolCircuit (Bool, Bool) Bool
    xor = ((first Not >>> And) &&& (second Not >>> And)) >>> Or
    
    -- ...and using xor, write a half-adder:
    halfAdder :: BoolCircuit (Bool, Bool) (Bool, Bool)
    halfAdder = xor &&& And
    

    This is not an unreasonable definition—-it would be nice to abstract over such a definition using existing tools—-but the structure of the Arrow typeclass makes it difficult!

I basically always use some program in the daemontools family on my computers. My home laptop and desktop are booted with an init system (runit) based on daemontools, while many of the systems I set up elsewhere boot a vanilla distribution but immediately set up a daemontools service directory as a secondary service management tool. Quite frankly, it's one of the best examples of good Unix design and at this point I wouldn't want to go without it.

This is a high-level introduction to the idea of daemontools rather than a full tutorial: to learn how to set it up in practice, djb's own site as well as a handful1 of others are better references.

What is Daemontools?

The core of daemontools is just two programs: svscan and supervise. They're very straightforward: svscan takes a single optional argument, and supervise takes a single mandatory one.

svscan watches a directory (if none is specified, then it will watch the current working directory) and checks to see if new directories have been added. Any time a new directory is added, it starts an instance of supervise pointing at that new directory2.

And that's all that svscan does.

supervise switches to the supplied directory and runs a script there called ./run. If ./run stops running for any reason, it will be started again (after a short pause, to avoid hammering the system.) It will also not start the ./run script if a file called ./down exists in the same directory. Extra data about the running process gets stored in a subdirectory called ./supervise, and a few other tools can be used to prod and modify that data—-for example, to send certain signals to kill the running program, to temporarily stop it, or to see how long it has been running.

And that's almost all that supervise does.

One extra minor wrinkle is that if supervise is pointed at a directory that also contains a subdirectory called ./log, and ./log/run also exists, then it will monitor that executable as well and point the stdout of ./run to the stdin of ./log/run. This allows you to build a custom logging solution for your services if you'd like. The ./log directory is optional.

So, how does this run a system? Well, you point svscan at a directory that contains a subdirectory for each service you want to run. Those services are generally small shell scripts that call the appropriate daemon in such a way that it will stay in the foreground. For example, a script to run sshd might look like:

#!/bin/sh

# redirecting stderr to stdout
exec 2>&1

# the -D option keeps sshd in the foreground
# and the -e option writes log information to stderr
exec /usr/sbin/sshd -D -e

And your directory structure might look like

    - service/
      |- ngetty/
      |  |- run
      |  |- log/
      |     |- run
      |- sshd/
      |  |- run
      |  |- log/
      |     |- run
      |- crond/
      |  |- run
      |  |- log/
      |     |- run

Once you point svscan at this, you end up having a process tree where svscan is managing multiple service instances which in turn manage their respective services and logging services:

    -svscan-+-service-+-ngetty
            |         `-log-service
            +-service-+-sshd
            |         `-log-service
            +-service-+-crond
            |         `-log-service

This design has some pretty amazing practical advantages, many of which are attributable to the fact that daemontools is written in terms of Unix idioms. The “Unix way” gets a fair amount of derision—-some well-deserved, some not—-but daemontools is a good example of how embracing the idioms of your system can produce better, more flexible software. Consider the following problems and their daemontools solutions:

Testing a Service Before You Start It

The ./run script is a plain executable. If it runs and stays in the foreground, doing what it should do, it's correct. If it doesn't, then there's a problem. That's also the only code path, which is a sharp contrast to the infamously difficult-to-write sysvinit scripts, where start and stop and status and so forth must all be tested in various system states3.

Starting and Stoping a Service

All you do is create or delete a service directory. The most common way of doing this is to create the service directory elsewhere, and then create a symlink into the service directory to start it. This lets you delete a symlink without deleting the main directory, and furthermore ensures that the 'creation' of the directory is atomic.

Another tool, svc, lets you send signals to the running processes (e.g. svc -p sends a STOP signal, and svc -d sends a TERM signal as well as telling supervise to hold off on restarting the service otherwise.)

Express Service Dependencies

The daemontools design allows for various helper tools. One of them is svok, which finds out whether a given service is running. This is just another Unix program that will exit with either 0 if the process is running, or 100 if it is not. That means we can write

#!/bin/sh
svok postgres || (echo "waiting for postgres..." && exit 1)
exec 2>&1
exec python2 some-web-app.py

and the script will die (prompting svscan to wait a moment and then restart it) unless postgres is already running.

Express Resource Limits

daemontools has several other applications that can enforce various resource limits or permissions. These are not part of the service mechanism—-instead, they simply modify the current process and then exec some other command. That means that you can easily incorporate them into a service script

#!/bin/sh
exec 2>&1
# change to the user 'sample', and then limit the stack segment
# to 2048 bytes, the number of open file descriptors to 3, and
# the number of processes to 1:
exec setuidgid sample \
     softlimit -n 2048 -o 3 -p 1 \
     some-small-daemon -n

These aren't actually special, and don't have anything to do with the daemontools service mechanism. Any shell script can incorporate setuidgid or softlimit, even if those scripts have nothing to do with service management!

Allow User-Level Services

If I want a given user to have their own services that are run as that user, all I need to do is have another svscan running as that user and pointing at another directory, which I can run as another top-level service:

#!/bin/sh
exec 2>&1
exec setuidgid user \
     /usr/sbin/svscan /home/user/service

Variations

What I described above was vanilla daemontools. Other systems are designed for booting entire systems with this kind of service management. Variations on this basic design add various features:

  • The runit package extends supervise with the ability to execute a ./finish script if the ./run script fails, to do various kinds of cleanup. (runit renames svscan and supervise to runsvdir and runsv, respectively.)
  • The s6 package adds even more options to both core programs (which are here named s6-svscan and s6-supervise) to e.g. limit the maximum number of services or modify how often scanning is done. It additionally allows control of an s6-supervise instance through a directory of FIFOs called ./event.
  • The daemontools-encore package adds even more optional scripts: a ./start script which is run before the main ./run script and a ./stop script after the service is disabled, a ./notify script which is invoked when the service changes, and a few others.
  • The nosh package is designed as a drop-in replacement for systemd on platforms where systemd cannot run (i.e. any Unix that is not a modern Linux) and so has a lot of utilities that superficially emulate systemd as well as tools which can convert systemd units into nosh service directories. nosh is the most radically divergent of the bunch, but is clearly a daemontools descendant (and incorporates most of the changes from daemontools-encore, as well.)

Additionally, all these (except for daemontools-encore) have other capabilities used to set up a Unix system before starting the service-management portion. They also generally include other tools for running services (e.g. runit includes the swiss-army-knife chpst for modifying a process's state; s6 includes a plethora of other service helpers and tools for doing things like file-system locking or socket activation) while keeping the same guiding principles of daemontools intact.

The Takeaway

The whole daemontools family has two properties which I really appreciate:

  1. A strong commitment to never parsing anything.
  2. A strong commitment to using Unix as a raw material.

Why avoid parsing?

Parsing is a surprisingly difficult thing to get right. Techniques for writing parsers vary wildly in terms of how difficult they are, and parsing bugs are a common source of weird machines in computer security. Various techniques can make parsing easier and less bug-prone, but it's a dangerous thing to rely on.

One way to get around this is to just skip parsing altogether. This is difficult in Unix, where most tools consume and emit plain text (or plain binary.) In other systems, such as in individual programming environments or systems like Windows PowerShell, the everything-is-plain-text requirement is relaxed, allowing tools to exchange structured data without reserializing and reparsing.

The way to avoid parsing in Unix is to use various kinds of structure to your advantage. Take the file system: it can, used correctly, emulate a tree-like structure or a key-value store. For example, one supplementary daemontools utility is envdir, which reads in environment variables not by parsing a string of name=value pairs, but by looking at a directory and turning the filename-to-file-contents mapping into a variable-name-to-variable-content mapping.

You might argue that this is silly—-after all, parsing an environment variable declaration is as easy as name=value! Could a system really introduce a security bug in parsing something as simple as that? As it happens, the answer is yes.

So daemontools avoids parsing by using directories as an organizing principle, rather than using configuration files.4 This makes an entire class of bugs and vulnerabilities impossible, which is always a good design choice.

What is “Unix as a raw material”?

The building blocks of daemontools are the parts of Unix which are common to every modern Unix variant: directories and executables and Unix processes and (in some of its descendants) FIFOs. This means you have a universe of actions you can perform outside of the daemontools universe:

  • Your scripts can be written in anything you'd like, not just a shell language. You could even drop a compiled executable in, at the cost of later maintainability.
  • Similarly, daemontools services are trivially testable, because they're just plain ol' executables.
  • Lots of details get moved out of service management because they can be expressed in terms of other building blocks of the system. There's no need for a 'which user do I run as' configuration flag, because that can get moved into a script. (Although that script can also consult an external configuration for that, if you'd like!)
  • Your directories can be arranged in various ways, being split up or put back together however you'd like.5

In contrast, service management with upstart or systemd requires special configuration files and uses various other RPC mechanisms, which means that interacting with them requires using the existing tools and... isn't really otherwise possible. Testing a service with upstart or systemd requires some kind of special testing tool in order to parse the service description and set up the environment it requests. Dependency-management must be built in, and couldn't have been added in afterwards. The same goes for resource limits or process isolation. ...and so forth.

“Unix design” has sometimes been used to justify some very poor design choices. On the other hand, it's possible to embrace Unix design and still build elegant systems. A well-built Unix system has some aspects in common with a well-built functional program: small components with well-defined semantics and scope and a clear delineation of the side-effects of any given part, all of which can easily be used together or apart. The daemontools family of programs is a perfect example of Unix design done well.


  1. This one is about runit, not daemontools, but they are similar enough in principle.
  2. It does this not using inotify or some other mechanism, but rather just by waking up every five seconds and doing a quick traversal of everything in the directory. This is less efficient, but also makes fewer assumptions about the platform it's running on, which means daemontools can run just about anywhere.
  3. Of course, your daemon might still rely on state—-but that's the fault of your daemon, and no longer inherent in the service mechanism. Contrast this to sysvinit-style scripts, where the only possible API is a stateful one in which the script does different things depending on the process state.
  4. One might argue that this is a little bit disingenuous: after all, you're still invoking shell scripts! If one part of your system avoids parsing, but then you call out to a piece of software as infamously complicated and buggy as bash, all that other security is for naught. But there's no reason that you have to write your scripts in bash, and in fact, the creator of s6 has built a new shell replacement for that purpose: namely, execline, which is designed around both security and performance concerns. If you wanted, you could replace all those shell scripts with something else, perhaps something more like the shill language. Luckily, the daemontools way is agnostic as to what it's executing, so it is easy to adopt these tools as well!
  5. I personally tend to have a system-level /etc/sv for some services and a user-level /home/gdritter/sv for other services, regardless of whether those services are run in my user-level service tree in /home/gdritter/service or the root-level tree in /service.