Lifetime Quantification and Higher-Ranked Trait Bounds

July 17, 2017

Over the weekend, a friend of mine who is currently learning Rust asked me a question. He was using serde for serialization, and he wanted a single function that took a file path as argument, opened the file, deserialized it as a particular type, and returned that value, where the return type could be inferred from context. (For this example, I'm going to ignore error-handling and just use .unwrap() to bail out on failures.) I hadn't used serde much myself, so I briefly assumed the function in question would just look like this:

fn load_from_file<T>(path: String) -> T
where
    T: serde::Deserialize
{
    let mut file = File::open(path).unwrap();
    serde_json::from_reader(&mut file).unwrap()
}

But it wasn't quite so easy, and, like most of the trickier parts of Rust, the problem is that lifetimes can be tricky. The serde::Deserialize trait takes a single parameter: a lifetime which corresponds to the lifetime of the input source. In the above function, the input source is file from which we're reading, but it might also be a str (using the from_str function) or a slice of bytes (using the from_slice function). In any case, the deserializer needs to know how long its input source lives, so we can make sure that the input source doesn't get suddenly closed or freed while it's in the middle of parsing.

Okay, that means we need a lifetime parameter to give to serde::Deserialize. Let's try the obvious thing and add a lifetime parameter to the signature of the function:

fn load_from_file<'de, T>(path: String) -> T
where
    T: serde::Deserialize<'de>
{
    let mut file = File::open(path).unwrap();
    serde_json::from_reader(&mut file).unwrap()
}

This looks nice at a glance, but it doesn't compile! You can plug this code snippet into rustc and it will (with its characteristic helpfulness) suggest exactly the correct code to write. But I'm less interested here in what to write, and more in why we're writing it: why does this not work, and why does the suggested fix work?

Let's step back to very basic Rust: when you have a generic parameter to (say) a function, what you're saying is that you want that particular implementation detail to be supplied by the use of the function. Here's an incredibly trivial example: I can write a polymorphic identity function, a function that simply returns the value given it, by giving it a type parameter like this:

fn identity<T>(x: T) -> T { x }

When we use the identity function with a value of a particular type—-say, by calling identity(22u32)—-we're also implicitly providing a concrete choice for the type T, which it can infer from the type of the argument. Rust also allows us to pass this type parameter explicitly, with the slightly unwieldy syntax identity::<u32>(22). Either way, the use site of identity provides the information necessary to choose a concrete type T.

The same goes for lifetime parameters: when we write a function like

fn ref_identity<'a, T>(x: &'a T) -> &'a T { x }

what we're saying is that the reference we're taking as argument lives some amount of time, but that amount of time is going to be known at the call site, and it'll get that information inferred from individual uses of ref_identity. When we call ref_identity(&foo), we're saying that the lifetime parameter 'a is going to be however long foo lives in that context.

Okay, so with that, let's look at the our first pass at writing load_from_file:

fn load_from_file<'de, T>(path: String) -> T
where
    T: serde::Deserialize<'de>
{
    let mut file = File::open(path).unwrap();
    serde_json::from_reader(&mut file).unwrap()
}

What's the problem here? Well, our type doesn't actually reflect what the body of our function does. Remember that the lifetime parameter we give to Deserialize corresponds to the lifetime of the input source to the deserializer, and we already know the lifetime of our source: it's the lifetime of our file variable. That lifetime isn't something we need to get from the caller—-for that matter, it's not a piece of information that the caller would even have access to! Expressing that as a parameter in this instance is blatantly incorrect.

...but then we've got a problem, because we need something to give Deserialize as the lifetime parameter, and Rust doesn't have a way of expressing, “The lifetime of this variable here in the scope I am currently defining.” We're caught: we need some lifetime parameter there, but it can't be an argument, and we can't name the specific individual lifetime that we know it should be.

So instead, one way of writing this is by saying, “This works for any lifetime we might care to give it.” The difference here is subtle but important: we aren't expressing the notion of “any lifetime you, the caller, want to give me”, we are expressing the notion of “any lifetime at all.”

It turns out that Rust has a syntax for this, which uses the for keyword:

fn load_from_file<T>(path: String) -> T
where
    T: for<'de> serde::Deserialize<'de>
{
    let mut file = File::open(path).unwrap();
    serde_json::from_reader(&mut file).unwrap()
}

If we try this, this function now works. And notice that the type parameter list now includes only one thing: the T type that we're deserializing to, which is exactly what we want!

The key here is the for<'a> T syntax: the for acts as a binder for one or more fresh lifetime parameters, which are then used in the following type. Here, we're introducing a new 'de lifetime and then filling that in in Deserialize. Importantly, this lifetime is now quantified over all possible lifetimes, not merely a lifetime that the calling context might supply. And of course, 'all possible lifetimes' includes the lifetime of the file variable inside the function!

The for<'a> T syntax is a feature called Higher-Ranked Trait Bounds and this feature was specifically necessary to support unboxed closures, but it ends up being useful in other situations—-like this one!