Backpacks Full of Data Files: A Hypothetical Future

One current pain point for certain Haskell programs is Cabal's data-files feature. This feature is a little bit awkward and weird: it involves depending on a generated file but also filling in a default implementation for that file, and there are some obscure and unclear edge cases in its use. One difficult edge case is that, while most uses of data files are in applications, it's nonetheless possible for libraries to depend on data files, in which case it's very difficult to induce Cabal to produce final relocatable bundles that contain both executables and the data files on which they depend.

I was musing on a hypothetical future way of solving this problem: once the lightweight module system Backpack is fully implemented in Haskell, we could build a data-files-like mechanism in a more convenient way by modifying the current design to use module-level mixins. The rest of this post will sketch out what that design might look like.

I should stress that I'm describing a possible solution, and not the solution. I'm by no means indicating that this is the best way of solving present problems with data files, and I have absolutely no indication that anyone else would want to solve the problem like this. That said, I think this is an interesting and motivated design, and I'd be happy to discuss its strengths and weaknesses, as well as other possible design choices that address the same issues.

Right now, I'm using Edward Yang's blog post A Taste Of Cabalized Backpack as my primary guide to Backpack-in-practice. I don't have a Backpack-enabled GHC and Cabal on hand, and so I haven't actually run any of this: this should right now be treated effectively as pseudocode. I also assume familiarity with Cabal's data files support; if you're in need of an introduction or a refresher, you should read the post Adding Data Files Using Cabal.

An Abstract Signature for Data Files

In our hypothetical Backpack-enabled-data-files-support future, we start by creating a signature that corresponds to the generated Paths_whatever module. To this end, we can create an .hsig file with a declaration like this:

module Dist.DataFiles (getDataFileName) where
  getDataFileName :: FilePath -> IO FilePath

This defines an abstract module called Dist.DataFiles that exposes a single function, getDataFileName, with no actual implementation. We can expose this signature by creating a package, data-files-sig, that exposes only this signature:

    name:               data-files-sig
    version:            1.0
    indefinite:         True
    build-depends:      base
    exposed-signatures: Dist.DataFiles

This would be a standard package—maybe even part of base—that can be consistently and universally relied on by libraries that require some kind of data file support.

Creating A Library With Data Files

Now, let's create a library that needs a data file. In this case, the library will do nothing but read and return the contents of that data file:

module Sample.Library (getSampleFile) where

import Dist.DataFiles (getDataFileName)

getSampleFile :: IO String
getSampleFile = getDataFileName "sample-file" >>= readFile

Now we need to create a corresponding .cabal file for this library. Because we're using Dist.DataFiles, we need to import that signature from the data-files-sig module. Importantly, we still don't have an implementation for getDataFileName. Because of that, our package is still abstract, or in Backpack-speak, indefinite:

    name:            sample-library
    indefinite:      True
    build-depends:   base, data-files-sig
    exposed-modules: Sample.Library

Depending On A Library With Data Files

In order to write an application that uses sample-library, we need to give it a module that's a concrete implementation of the Dist.DataFiles signature. In this case, let's create an implementation manually as part of our application.

First, let's write a small application that uses sample-library:

module Main where

import Sample.Library (getSampleFile)

main :: IO ()
main = getSampleFile >>= putStrLn

We still don't have that concrete implementation for getDataFileName, though, so let's write a simple module that exports the same name with the same type:

module MyDataFilesImpl (getDataFileName) where

import System.FilePath ((</>))

getDataFileName :: FilePath -> IO FilePath
getDataFileName path = pure
  ("/opt/sample-application" </> path)

Now, when we write our .cabal file for this application, we also need to specify we want to use MyDataFilesImpl as the concrete implementation of Dist.DataFiles for sample-library. That means our .cabal file will look like this:

    name: sample-application
    build-depends:
      base,
      filepath,
      sample-library (MyDataFilesImpl as Dist.DataFiles)

Now, all our abstract signatures are filled in, so this application is no longer indefinite, and we as developers have a convenient way of telling sample-library where we want it to look for its data files. In fact, one advantage of this system for data files is that we could import two libraries that both depend on the Dist.DataFiles signature but tell them to look in two different places for their data files, like this:

    name: other-application
    build-depends:
      base,
      lib-one (OneDataFilesImpl as Dist.DataFiles),
      lib-two (AnotherDataFilesImpl as Dist.DataFiles)

If there are reasonable default implementations for Dist.DataFiles, we could also put those on Hackage and reuse them in much the same way.

A Final Sprinkling Of Magic

In this case, I'm still missing a major part of Cabal's data-files support: namely, we want to shunt the responsibility from the developer to Cabal, so that we have support for things like relocatable builds. So in a final bit of handwaving, let's stipulate that our tooling in this hypothetical future has a special case to deal with applications that expose an indefinite Dist.DataFiles signature: Cabal could notice this situation, and fill those signagutres in with sensible implementations based on the commands and configurations we're using.

For example, if my .cabal file for sample-application above didn't supply a concrete implementation for Dist.DataFiles, then a default one could be chosen for development that's equivalent to:

-- as automatically generated by cabal
module Dist.DataFiles (getDataFileName) where

getDataFileName :: FilePath -> IO FilePath
getDataFileName = pure

That is, the application will just look for the file in the current directory.

If the developer started preparing the package for release, and changed the configuration appropriately, then the automatically generated getDataFileName could be modified to reflect that, replacing the automatically generated code with something more like

-- as automatically generated by cabal
module Dist.DataFiles (getDataFileName) where

import System.FilePath ((</>))

getDataFileName :: FilePath -> IO FilePath
getDataFileName path =
  pure ("/usr/share/sample-application" </> path)

This would be admittedly a little bit “magical”, but it would be a small and easy-to-explain bit of magic, and it would have the advantage of affording a kind of flexibility that the current approach to data files lacks.

Is This How It's Actually Gonna Work?

Probably not! Backpack is still a ways out, and this would require opt-in from many parts of the Haskell ecosystem, and the problem it solves could probably also be solved in numerous other ways I haven't considered. But this post describes a point in the design space that I think is at least worth weighing!