Backpacks Full of Data Files: A Hypothetical Future
One current pain point for certain Haskell programs is Cabal's data-files
feature. This feature is a little bit awkward and weird: it involves depending on a generated file but also filling in a default implementation for that file, and there are some obscure and unclear edge cases in its use. One difficult edge case is that, while most uses of data files are in applications, it's nonetheless possible for libraries to depend on data files, in which case it's very difficult to induce Cabal to produce final relocatable bundles that contain both executables and the data files on which they depend.
I was musing on a hypothetical future way of solving this problem: once the lightweight module system Backpack is fully implemented in Haskell, we could build a data-files
-like mechanism in a more convenient way by modifying the current design to use module-level mixins. The rest of this post will sketch out what that design might look like.
I should stress that I'm describing a possible solution, and not the solution. I'm by no means indicating that this is the best way of solving present problems with data files, and I have absolutely no indication that anyone else would want to solve the problem like this. That said, I think this is an interesting and motivated design, and I'd be happy to discuss its strengths and weaknesses, as well as other possible design choices that address the same issues.
Right now, I'm using Edward Yang's blog post A Taste Of Cabalized Backpack as my primary guide to Backpack-in-practice. I don't have a Backpack-enabled GHC and Cabal on hand, and so I haven't actually run any of this: this should right now be treated effectively as pseudocode. I also assume familiarity with Cabal's data files support; if you're in need of an introduction or a refresher, you should read the post Adding Data Files Using Cabal.
An Abstract Signature for Data Files
In our hypothetical Backpack-enabled-data-files-support future, we start by creating a signature that corresponds to the generated Paths_whatever
module. To this end, we can create an .hsig
file with a declaration like this:
module Dist.DataFiles (getDataFileName) where
getDataFileName :: FilePath -> IO FilePath
This defines an abstract module called Dist.DataFiles
that exposes a single function, getDataFileName
, with no actual implementation. We can expose this signature by creating a package, data-files-sig
, that exposes only this signature:
name: data-files-sig
version: 1.0
indefinite: True
build-depends: base
exposed-signatures: Dist.DataFiles
This would be a standard package—maybe even part of base
—that can be consistently and universally relied on by libraries that require some kind of data file support.
Creating A Library With Data Files
Now, let's create a library that needs a data file. In this case, the library will do nothing but read and return the contents of that data file:
module Sample.Library (getSampleFile) where
import Dist.DataFiles (getDataFileName)
getSampleFile :: IO String
getSampleFile = getDataFileName "sample-file" >>= readFile
Now we need to create a corresponding .cabal
file for this library. Because we're using Dist.DataFiles
, we need to import that signature from the data-files-sig
module. Importantly, we still don't have an implementation for getDataFileName
. Because of that, our package is still abstract, or in Backpack-speak, indefinite
:
name: sample-library
indefinite: True
build-depends: base, data-files-sig
exposed-modules: Sample.Library
Depending On A Library With Data Files
In order to write an application that uses sample-library
, we need to give it a module that's a concrete implementation of the Dist.DataFiles
signature. In this case, let's create an implementation manually as part of our application.
First, let's write a small application that uses sample-library
:
module Main where
import Sample.Library (getSampleFile)
main :: IO ()
main = getSampleFile >>= putStrLn
We still don't have that concrete implementation for getDataFileName
, though, so let's write a simple module that exports the same name with the same type:
module MyDataFilesImpl (getDataFileName) where
import System.FilePath ((</>))
getDataFileName :: FilePath -> IO FilePath
getDataFileName path = pure
("/opt/sample-application" </> path)
Now, when we write our .cabal
file for this application, we also need to specify we want to use MyDataFilesImpl
as the concrete implementation of Dist.DataFiles
for sample-library
. That means our .cabal
file will look like this:
name: sample-application
build-depends:
base,
filepath,
sample-library (MyDataFilesImpl as Dist.DataFiles)
Now, all our abstract signatures are filled in, so this application is no longer indefinite
, and we as developers have a convenient way of telling sample-library
where we want it to look for its data files. In fact, one advantage of this system for data files is that we could import two libraries that both depend on the Dist.DataFiles
signature but tell them to look in two different places for their data files, like this:
name: other-application
build-depends:
base,
lib-one (OneDataFilesImpl as Dist.DataFiles),
lib-two (AnotherDataFilesImpl as Dist.DataFiles)
If there are reasonable default implementations for Dist.DataFiles
, we could also put those on Hackage and reuse them in much the same way.
A Final Sprinkling Of Magic
In this case, I'm still missing a major part of Cabal's data-files
support: namely, we want to shunt the responsibility from the developer to Cabal, so that we have support for things like relocatable builds. So in a final bit of handwaving, let's stipulate that our tooling in this hypothetical future has a special case to deal with applications that expose an indefinite Dist.DataFiles
signature: Cabal could notice this situation, and fill those signagutres in with sensible implementations based on the commands and configurations we're using.
For example, if my .cabal
file for sample-application
above didn't supply a concrete implementation for Dist.DataFiles
, then a default one could be chosen for development that's equivalent to:
-- as automatically generated by cabal
module Dist.DataFiles (getDataFileName) where
getDataFileName :: FilePath -> IO FilePath
getDataFileName = pure
That is, the application will just look for the file in the current directory.
If the developer started preparing the package for release, and changed the configuration appropriately, then the automatically generated getDataFileName
could be modified to reflect that, replacing the automatically generated code with something more like
-- as automatically generated by cabal
module Dist.DataFiles (getDataFileName) where
import System.FilePath ((</>))
getDataFileName :: FilePath -> IO FilePath
getDataFileName path =
pure ("/usr/share/sample-application" </> path)
This would be admittedly a little bit “magical”, but it would be a small and easy-to-explain bit of magic, and it would have the advantage of affording a kind of flexibility that the current approach to data files lacks.
Is This How It's Actually Gonna Work?
Probably not! Backpack is still a ways out, and this would require opt-in from many parts of the Haskell ecosystem, and the problem it solves could probably also be solved in numerous other ways I haven't considered. But this post describes a point in the design space that I think is at least worth weighing!