Backpacks Full of Data Files: A Hypothetical Future
One current pain point for certain Haskell programs is Cabal's
data-files feature. This feature is a little bit awkward and weird: it involves depending on a generated file but also filling in a default implementation for that file, and there are some obscure and unclear edge cases in its use. One difficult edge case is that, while most uses of data files are in applications, it's nonetheless possible for libraries to depend on data files, in which case it's very difficult to induce Cabal to produce final relocatable bundles that contain both executables and the data files on which they depend.
I was musing on a hypothetical future way of solving this problem: once the lightweight module system Backpack is fully implemented in Haskell, we could build a
data-files-like mechanism in a more convenient way by modifying the current design to use module-level mixins. The rest of this post will sketch out what that design might look like.
I should stress that I'm describing a possible solution, and not the solution. I'm by no means indicating that this is the best way of solving present problems with data files, and I have absolutely no indication that anyone else would want to solve the problem like this. That said, I think this is an interesting and motivated design, and I'd be happy to discuss its strengths and weaknesses, as well as other possible design choices that address the same issues.
Right now, I'm using Edward Yang's blog post A Taste Of Cabalized Backpack as my primary guide to Backpack-in-practice. I don't have a Backpack-enabled GHC and Cabal on hand, and so I haven't actually run any of this: this should right now be treated effectively as pseudocode. I also assume familiarity with Cabal's data files support; if you're in need of an introduction or a refresher, you should read the post Adding Data Files Using Cabal.
An Abstract Signature for Data Files
In our hypothetical Backpack-enabled-data-files-support future, we start by creating a signature that corresponds to the generated
Paths_whatever module. To this end, we can create an
.hsig file with a declaration like this:
module Dist.DataFiles (getDataFileName) where getDataFileName :: FilePath -> IO FilePath
This defines an abstract module called
Dist.DataFiles that exposes a single function,
getDataFileName, with no actual implementation. We can expose this signature by creating a package,
data-files-sig, that exposes only this signature:
name: data-files-sig version: 1.0 indefinite: True build-depends: base exposed-signatures: Dist.DataFiles
This would be a standard package—maybe even part of
base—that can be consistently and universally relied on by libraries that require some kind of data file support.
Creating A Library With Data Files
Now, let's create a library that needs a data file. In this case, the library will do nothing but read and return the contents of that data file:
module Sample.Library (getSampleFile) where import Dist.DataFiles (getDataFileName) getSampleFile :: IO String getSampleFile = getDataFileName "sample-file" >>= readFile
Now we need to create a corresponding
.cabal file for this library. Because we're using
Dist.DataFiles, we need to import that signature from the
data-files-sig module. Importantly, we still don't have an implementation for
getDataFileName. Because of that, our package is still abstract, or in Backpack-speak,
name: sample-library indefinite: True build-depends: base, data-files-sig exposed-modules: Sample.Library
Depending On A Library With Data Files
In order to write an application that uses
sample-library, we need to give it a module that's a concrete implementation of the
Dist.DataFiles signature. In this case, let's create an implementation manually as part of our application.
First, let's write a small application that uses
module Main where import Sample.Library (getSampleFile) main :: IO () main = getSampleFile >>= putStrLn
We still don't have that concrete implementation for
getDataFileName, though, so let's write a simple module that exports the same name with the same type:
module MyDataFilesImpl (getDataFileName) where import System.FilePath ((</>)) getDataFileName :: FilePath -> IO FilePath getDataFileName path = pure ("/opt/sample-application" </> path)
Now, when we write our
.cabal file for this application, we also need to specify we want to use
MyDataFilesImpl as the concrete implementation of
sample-library. That means our
.cabal file will look like this:
name: sample-application build-depends: base, filepath, sample-library (MyDataFilesImpl as Dist.DataFiles)
Now, all our abstract signatures are filled in, so this application is no longer
indefinite, and we as developers have a convenient way of telling
sample-library where we want it to look for its data files. In fact, one advantage of this system for data files is that we could import two libraries that both depend on the
Dist.DataFiles signature but tell them to look in two different places for their data files, like this:
name: other-application build-depends: base, lib-one (OneDataFilesImpl as Dist.DataFiles), lib-two (AnotherDataFilesImpl as Dist.DataFiles)
If there are reasonable default implementations for
Dist.DataFiles, we could also put those on Hackage and reuse them in much the same way.
A Final Sprinkling Of Magic
In this case, I'm still missing a major part of Cabal's
data-files support: namely, we want to shunt the responsibility from the developer to Cabal, so that we have support for things like relocatable builds. So in a final bit of handwaving, let's stipulate that our tooling in this hypothetical future has a special case to deal with applications that expose an indefinite
Dist.DataFiles signature: Cabal could notice this situation, and fill those signagutres in with sensible implementations based on the commands and configurations we're using.
For example, if my
.cabal file for
sample-application above didn't supply a concrete implementation for
Dist.DataFiles, then a default one could be chosen for development that's equivalent to:
-- as automatically generated by cabal module Dist.DataFiles (getDataFileName) where getDataFileName :: FilePath -> IO FilePath getDataFileName = pure
That is, the application will just look for the file in the current directory.
If the developer started preparing the package for release, and changed the configuration appropriately, then the automatically generated
getDataFileName could be modified to reflect that, replacing the automatically generated code with something more like
-- as automatically generated by cabal module Dist.DataFiles (getDataFileName) where import System.FilePath ((</>)) getDataFileName :: FilePath -> IO FilePath getDataFileName path = pure ("/usr/share/sample-application" </> path)
This would be admittedly a little bit “magical”, but it would be a small and easy-to-explain bit of magic, and it would have the advantage of affording a kind of flexibility that the current approach to data files lacks.
Is This How It's Actually Gonna Work?
Probably not! Backpack is still a ways out, and this would require opt-in from many parts of the Haskell ecosystem, and the problem it solves could probably also be solved in numerous other ways I haven't considered. But this post describes a point in the design space that I think is at least worth weighing!