Typeclass Refactoring and Default Superclass Instances

As part of recent type-refactoring efforts in Haskell, a discussion about adding Semigroup as a parent class of Monoid has been bouncing around the mailing list. From a theoretical point of view, this is a great idea: it is more flexible than the current approach that would allow for greater expressibility.

From a practical point of view, however, I am inclined to oppose it. Not because it is in itself a bad change—it's a very reasonable change that has advantages for new code—but because I have, in the past, had to update large systems written in Haskell after GHC updates, and therefore I know that this kind of change has a cost. The Applicative-Monad changes seem to have made way for the Foldable-Traversable Prelude, but those have in turn inspired a number of suggestions for modifications to the Haskell standard library, each one of which, taken on its own, is reasonable, but taken as a mass, mean much more work for maintainers. This is especially true if we continue on this path of making minor refactorings at each release: each year a project will need changes, or it will quickly bit-rot beyond utility.

Default Superclass Instances

There is, however, an alternative I would like to discuss—one which has also been brought up on the mailing list, but which I'd like to provide more concrete motivation for. Some of these changes—especially the Semigroup/Monoid split—seem to involve taking the functionality of a class and splitting it into multiple smaller classes. For example, we can think of the Semigroup/Monoid change as converting

class Monoid m where
  mempty  :: m
  mappend :: m -> m -> m


class Semigroup m where
  mappend :: m -> m -> m

class Semigroup m => Monoid m where
  mempty :: m

Something that has been proposed before (in a few different forms) and which I suggest be more actively considered if changes like these are to become common is to allow superclass instances to be declared within a subclass declaration. This would allow you to write a single instance declaration for a class, and in that body also include implementations for methods belong to a superclass of that class. As an example, perhaps we could be able to write2:

newtype MyId a = MyId a

instance Monad MyId where
  -- Functor method
  fmap f (MyId x) = MyId (f x)

  -- Applicative methods
  pure = MyId
  MyId f <*> MyId x = MyId (f x)

  -- Monad methods
  return = MyId
  MyId x >>= f = f x

For the Monoid/Semigroup proposal, this would mean that any Monoid instances that exist would continue to work unchanged, but new instances could (optionally) split apart their declarations. Under this proposal, either of these would be acceptable:

class Semigroup m where mappend :: m -> m -> m
class Semigroup m => Monoid m where mempty :: m

-- combined `instance` declarations:
instance Monoid [a] where
  mempty = []
  mappend = (++)

or, equivalently,

class Semigroup m where mappend :: m -> m -> m
class Semigroup m => Monoid m where mempty :: m

-- split apart `instance` declarations
instance Semigroup [a] where
  mappend = (++)

instance Monoid [a] where
  mempty = []

And because the Monoid declaration for [] is already written like the former, we can make the Semigroup/Monoid split without having to rewrite the instance declarations!

Exciting Additional Opportunities

Because this lowers the cost of updating for new versions, various other useful changes might be considered that would otherwise involve far too much breakage. For example, we could consider splitting Num apart into small constituent parts. With Default Superclass Instances, we could refactor Num into the following without changing any instance declarations:3

class Add a where (+) :: a -> a -> a
class Sub a where (-) :: a -> a -> a
class Add a => Zero a where zero :: a

class Mul a where (*) :: a -> a -> a
class Mul a => One a where one :: a

class (Zero a, One a) => FromInteger a where
  fromInteger :: Integer -> a

  instance Zero a where zero = fromInteger 0
  instance One a where one = fromInteger 1

class Signed a where
  negate :: a -> a
  abs    :: a -> a
  signum :: a -> a

class ( Eq a, Show a, Add a, Sub a
      , Mul a, FromInteger a, Signed a) => Num a where

which would allow certain numeric types to only implement a subset of the relevant operations:

data Nat = Zero | Succ Nat

instance Add Nat where
  Z   + y = y
  S x + y = S (x + y)

{- et cetera --- but no implementations for e.g. Signed,
 - which is not meaningful for `Nat`! -}

and also allow current Num functions to have a looser set of constraints than they do at present:

sum :: (Zero a, Add a) => [a] -> a
sum (x:xs) = x + sum xs
sum []     = zero

prod :: (One a, Mul a) => [a] -> a
prod (x:xs) = x * prod xs
prod []     = one

We could also consider splitting Arrow4 into distinct components, again without having to change any instance declarations:

class Category a => Pairwise a where
  first  :: a b c -> a (b, d) (c, d)
  second :: a b c -> a (d, b) (d, c)
  (***) :: a b c -> a b' c' -> a (b, b') (c, c')
  (&&&) :: a b c -> a b c' -> a b (c, c')

class Pairwise a => Arrow a where
  arr :: (b -> c) -> a b c

or even (dare I dream) splitting Bits into something that is not a 22-method monstrosity!

Potential Drawbacks

On the other hand, this proposal does have some down-sides:

Grepping for Instance Declarations

Right now, I can often find an instance declaration for a type T by grepping for instance C T (modulo some line noise) whereas with this change, it's possible that there is no declaration for instance C T, because all of C's methods are declared by a subclass C' instead. The compiler ought to be able to deal with this without problem, which means that tools like Haddock documentation should somewhat alleviate this problem, but a user might be confused.

Introduces New Possible Errors

The declarations below are of course nonsensical, and would be rejected by the compiler—-but the fact that this change would introduce new failure conditions at all is a drawback of the proposal.

instance Semigroup Int where
  mappend = (+)

instance Monoid Int where
  -- this conflicts with an existing declaration
  mappend = (*)
  mempty  = 1

A Pragma-Less Extension

In order to be really useful, we'd want to use this without a LANGUAGE pragma. After all, we're arguing for it on the basis of preserving backwards-compatibility, but that argument is much less compelling if we still have to change the source files to make use of it! On the other hand, that means we'd have included a GHC extension that takes effect despite not being enabled, which is also a worrying precedent!

It still might be a useful extension even if it had to be enabled by a LANGUAGE pragma, as it is easier to add said pragma to a source file than to manually break apart dozens of instance declarations, but it makes this feature less compelling in general.

In Conclusion

As I said before, my primary objection to typeclass changes like those above—-even if they are good changes for newly written code—-is that they break existing code. I don't want to have to modify a handful of miscellaneous instance declarations on a yearly basis as people discover new levels of abstraction or become dissatisfied with current choices, especially as those changes will grow more extensive as I build more projects in Haskell! But with an extension like this, we could grow the typeclass ecosystem gradually and fix what we see as past warts while maintaining backwards-compatibility, which would be a very powerful tool to have.

  1. This is perhaps more simplistic than we want: we can also use the existing Semigroup class from the semigroup package and then, in the Monoid class declaration, explain how to derive the methods of the superclass if no superclass instance is present. This, according to the linked proposal, would look like:

    class Semigroup m => Monoid m where
      mempty  :: m
      mappend :: m -> m -> m
      instance Semigroup m where
        (.++.) = mappend

    The example above is slightly simpler, which is why I relegated this version to a footnote.

  2. This isn't a concrete proposal, so maybe the actual syntax or semantics of these things should be changed! I want to focus on the feature and not the instantiation.
  3. For this example, I added Zero and One classes so that a given type might implement an additive and multiplicative unit while not necessarily having a sensible FromInteger implementation. For example, it might not make sense to implement fromInteger for a complex number, but complex numbers clearly have both an additive and multiplicative unit:

    data Complex = Complex Float Float deriving (Eq, Show)
    {- ... -}
    instance Zero Complex where zero = Complex 0.0 0.0
    instance One Complex  where one  = Complex 1.0 0.0

    This means that the Sum and Product monoids could be rewritten like:

    newtype Product a = Product { getProduct :: a }
      deriving (Eq, Show)
    instance (One a, Mul a) => Monoid (Product a) where
      mempty        = Product one
      x `mappend` y = Product (getProduct x * getProduct y)

    Notice that I added Zero and One in such a way that an existing Num instance declarations needn't be changed!

  4. I have on numerous occasions had reason to use the Arrow abstraction, but haven't had a sensible implementation of arr. To use a slightly contrived example, I could define a GADT that can describe the structure of boolean circuits in a way that resembles an Arrow, but has no way of expressing arr:

    data Circ a b where
      BoolFunc :: (Bool -> Bool) -> Circ Bool Bool
      Ident    :: Circ a a
      Compose  :: Circ a b -> Circ b c -> Circ a c
      First    :: Circ a b -> Circ (a, d) (b, d)
      Second   :: Circ a b -> Circ (d, a) (d, b)
      Parallel :: Circ a b -> Circ a' b' -> Circ (a, a') (b, b')
      Fanout   :: Circ a b -> Circ a  b' -> Circ a (b, b')
    instance Category Circ where
      id  = Ident
      (.) = flip Compose
    instance Arrow Circ where
      first  = First
      second = Second
      (***)  = Parallel
      (&&&)  = Fanout
      arr    = error "Nothing sensible to put here!"
    -- for example, using this definition, we can write xor:
    xor :: BoolCircuit (Bool, Bool) Bool
    xor = ((first Not >>> And) &&& (second Not >>> And)) >>> Or
    -- ...and using xor, write a half-adder:
    halfAdder :: BoolCircuit (Bool, Bool) (Bool, Bool)
    halfAdder = xor &&& And

    This is not an unreasonable definition—-it would be nice to abstract over such a definition using existing tools—-but the structure of the Arrow typeclass makes it difficult!