Infinite Negative Utility

Apparently, I've got a theme going of Weird Syntax. Let's run with it.

Konrad Zuse was an early pioneer in computer science, although his name is perhaps somewhat less well-known than others. Zuse holds the honor of having built the first programmable computer—-the Z3—-back in the 40's, as well as several other computing firsts1. Of particular interest to this blog post is his early unimplemented programming language, Plankalkül.

Plankalkül was, like the Z3, in many respects ahead of its time. Zuse's explicit goal was to be able to describe programs at a high level, which meant he included control structures and datatype definitions2 and other high-level constructs that were often missing in languages of the early years of computing. Zuse was working on Plankalkül at a time when his machines were not useable, which meant that his language work was more theoretical than it was technical, and consequently he allowed features that he wasn't entirely sure how to program. Despite his notes on it having been written in the mid-40's, they were not published until the 70's, and it was not implemented until the year 2000.

One thing that struck me, as I read programs in this notation that had been set down on a typewriter3, is that certain kinds of grouping were handled by explicit indication of scope: not via matched delimiters as in ALGOL-style languages, or via indentation in languages such as Python and Haskell, but by formatting the code so that a line bordered on the left of the scoped parts of the code:

This is meant to capture the way grouping works in the hand-written or typeset notation, with brackets spanning multiple lines:

I think this is notationally interesting: it's like Python's significant whitespace, but not, uh, whitespace. It would be incredibly tedious to type out, but still entirely compatible with current programming notation:

class Tet
 | @staticmethod
 | def new_tet()
 |  | n = randint(0, len(Tet.Tets) - 1)
 |  | for p in Tet.Tets[n]
 |  |  | if p in Board.permanent
 |  |  |  | Game.lose()
 |  | Game.current = Tet(Tet.Tets[n], Tet.TetColors[n])
 |
 | def __init__(self, points, color)
 |  | self.points = points
 |  | self.color = color

and would be entirely amenable to beautifying via judicious application of Unicode:

class Tet
 ┃ @staticmethod
 ┃ def new_tet()
 ┃  ┃ n = randint(0, len(Tet.Tets) - 1)
 ┃  ┃ for p in Tet.Tets[n]
 ┃  ┃  ┃ if p in Board.permanent
 ┃  ┃  ┗  ┗ Game.lose()
 ┃  ┗ Game.current = Tet(Tet.Tets[n], Tet.TetColors[n])
 ┃
 ┃ def __init__(self, points, color)
 ┃  ┃ self.points = points
 ┃  ┗ self.color = color

Looking at this notation, however, an interesting possibility struck me: a programmer could explicit annotate information about the kind of scope involved in a given line. In this Python-like example, I could, for example, distinguish class scope using double lines, function scope with thick lines, and control structure scope with thin lines:

class Tet
 ║ @staticmethod
 ║ def new_tet()
 ║  ┃ n = randint(0, len(Tet.Tets) - 1)
 ║  ┃ for p in Tet.Tets[n]
 ║  ┃  │ if p in Board.permanent
 ║  ┃  └  └ Game.lose()
 ║  ┗ Game.current = Tet(Tet.Tets[n], Tet.TetColors[n])
 ║
 ║ def __init__(self, points, color)
 ║  ┃ self.points = points
 ║  ┗ self.color = color

One advantage of this scheme is that a handful of lines, viewed in isolation, still give you a clear view of what surrounds them. For example, I can view these two lines in isolation and still tell that they are within a control structure used within a function declared within a class:

 ║  ┃  │ if p in Board.permanent
 ║  ┃  └  └ Game.lose()

You could also imagine a hypothetical language in which choice of scope delimiter is important. In Python, for and if do not form a new lexical scope. What if instead we could stipulate the kind of scope they form by this notational convention?

def okay()
 ┃ if True
 ┃  └ n = 5   # n is declared in function scope
 ┗ return n   # n leaks out of the if-scope

def not_okay()
 ┃ if True
 ┃  ┗ n = 5   # n is declared in the if's scope
 ┗ return n   # error: no n in scope here

That being said, there are a number of reasons that this notation is in inferior to existing notations:

  • It makes refactoring code much more difficult.
  • It requires that the programmer pay attention to the sequence of enclosing scopes on a line-by-line basis, which is generally too pedantic and not particularly useful for a programmer.
  • The ability to select “which kind of scope” is by no means only expressible by this notation, as other syntactic features such as keywords and delimiters could express the same thing.
  • There are only so many line-like characters which can serve as a scope marker, so this scheme is not very extensible.
  • It complicates parsing (especially by introducing an entirely new class of parse errors in which adjacent lines feature incompatible sequences of delimiting lines), and so it also...
  • Complicates parse error messages, which are an important part of a language's UI and should be considered seriously.

So, as in my previous post on grammatical case in programming languages, I urge readers not to use this notation as the concrete syntax for a programming language. This is merely an entertaining peek through the looking glass at a curious notational convention which was never adopted.

That said: this makes a very nice notation for viewing code, where the programmer does not have to explicitly draw ASCII art around their code; indeed, it bears more than a passing similarity to the graphical interface used in Scratch, and Sean McDirmid's Experiments in Code Typography features this very convention as an interactive ornament on code in a Python-like language.


  1. He even wrote A New Kind Of Science half a century before Stephen Wolfram published it. In spirit, anyway—-if you strip away Wolfram's self-aggrandizement and the pretty pictures, much that remains of the book resembles Zuse's 1969 book Rechnender Raum.
  2. Datatype definitions wouldn't be found in other programming languages until the late 50's. Zuse's treatment of them was quite sophisticated: Plankalkül had no notion of what we would now call sums, but did have products in the form of tuples. The primitive type was S0, which represented a single bit and had the values + and -, so a two-bit value might be represented as (S0,S0). Arrays were included of the form m×t, which stood for m repetitions of t, and variable-length arrays were encoded as □×t. Integers were included as a primitive, and floats were defined as (3×S0,7×S0,22×S0), which stood for three sign bits (which could also indicate whether the number was real or imaginary or zero), a seven-bit exponent, and a twenty-two-bit significand.
  3. The image given is from Knuth and Pardo's The Early History of Programming Languages, which surveys early programming languages—-implemented and not implemented—-and gives the same basic program implemented in each one.

Comparing a computer language to a human language is like comparing an operating system kernel to a popcorn kernel.

—kryptkpr

In general, human languages and formal languages—-of which programming languages are one variety—-don't have a whole lot in common. Many natural-language features don't have formal-language analogues1, and many formal-language properties don't carry over well to natural languages2.

On the other hand, it is sometimes fun to derive inspiration for one from the other. One idle idea I had involved applying grammatical case to programming languages and seeing what resulted.

Grammatical Case

In linguistics, case is a grammatical category applied to nouns based on their function in a sentence. Consider these English sentences:

I see the cat.

The cat sees me.

My house is large.

Each sentence contains a word which refers to the speaker (I, me, my) but the choice of which word to use depends on the purpose of that word in the sentence. Grammarians would say that I—which serves as the subject—is the nominative form of the first-person pronoun, that me—the object of verbs and prepositions—is the oblique form, and that my is the possessive form. In English, only pronouns are inflected this way, but in some other languages, all nouns behave this way. For example, the Russian translations of the first two sentences above:

ja viž-u  košk-u
I  see-1s cat-ACC

košk-a  vid-et menja
cat-NOM see-3s me

feature the words koška and košku, which are two different forms of the word for 'cat'—the former serving as the subject of the sentence, and the latter the object. Russian has far more than three cases, as well:

košk-a   nominative (subject)
košk-y   genitive ('of the cat')
košk-je  dative ('to the cat')
košk-u   accusative (object)
košk-oj  instrumental ('with the cat')
košk-je  prepositional

It's important to note that the case of a noun is determined by its grammatical role in the sentence, not by its semantic role (or what linguists would call a thematic role.) Consider these two sentences:

The cat ate the fish. The fish was eaten by the cat.

In both sentences, the cat is the agent, the one who performs the action, and the fish is the patient, the one to whom the action is being done. But in the first sentence the cat is the subject, while in the second the fish is is the subject, and would be inflected accordingly:

košk-a  jel-a   ryb-u
cat-NOM ate-FEM fish-ACC

ryb-a    byl-a   sjedena    košk-oj
fish-NOM was-FEM eat.PP.FEM cat-INSTR

Not all languages have grammatical case: Chinese, for example, doesn't even inflect pronouns based on their role in a sentence

wǒ kàn māo
I  see cat

māo kàn  wǒ
cat sees me

Some languages with case have relatively few cases—-Modern Greek, for example, has four: a nominative, an accusative, a genitive, and a vocative—-whereas others may have quite a few—-Finnish, as an extreme case, has fifteen.

Grammatical case, among other things, allows for freer word order. In English, for example, we would normally say

The cat ate the fish.

and we could probably rearrange that sentence a little bit while still making sense

The fish, the cat ate.

but there are obvious limits to what is allowed. We would not say

The fish ate the cat.

and yet mean that the cat ate the fish. Word order is still conventionally fixed in languages with case—-especially in conversation and formal, non-poetic writing—-but we can be freer with word order and still produce a comprehensible sentence. A Russian-speaker might find it odd, but the sentence

ryb-u    jel-a   košk-a
fish-ACC ate-FEM cat

is clearly communicating that it was the cat that ate the fish, despite the words being in the opposite order.

Applying Case To Programming Languages

As it turns out, this has already been done—-in a Perl extension called Lingua::Romana::Perligata which was exploring this very question. In Perligata (as I will call it for short), three cases are distinguished: an accusative, a dative, and a genitive. The accusative is used as the arguments to functions or the left-hand side of an assignment; the dative is used as the left-hand side of an assignment or as the 'mutable' argument of a function like push, and the genitive is used for indexing. So, the expression

pippo = pluto[paperino]

becomes

pippo plutorum paperini da

But I'd argue that Perligata, while interesting, is too tightly tied to Perl semantics to be useful to people outside of Perl. So, I'm going to start from scratch and not tie this to any particular language, but rather to some manner of imperative pseudocode and add various features as I go.

Basic Form

Assume we have nouns and we have verbs. Verbs correspond to functions or subroutines, and nouns correspond to names given to values, or to literal values themselves. The only kind of classification given to nouns here is case; we're going to omit other linguistic classifications like grammatical gender or number. Individual statements will be semicolon-terminated for clarity.

In constrast to spoken language or to Perligata, our fake language is going to use prefixed punctuation marks to indicate the case of a variable. It wouldn't be hard to mimic natural language more closely by tokenizing our values based on some ending

    subject := /[a-z_]+us/
    object  := /[a-z_]+um/
    index   := /[a-z_]+i/

but I won't do that here.

Functions

Let's start with functions that take a single argument and don't return anything interesting; say, some kind of print function. We can give it an argument in the accusative, which we'll mark with an exclamation point:

    print !x; // understood as print(x)

Because we're using indicating the grammatical relationships of our tokens with case, we wouldn't create ambiguity if we were to to modify the order of the tokens here:

    !x print; // still understood as print(x)

We could do a few different things for functions of greater arity. One of the simplest possibilities—-especially for functions like sum in which the ordering of the arguments don't matter—-is to simply list all the arguments in the accusative.

    !a !b !c sum; // sum(a, b, c)
    sum !a !b !c; // sum(a, b, c)
    !a !b sum !c; // sum(a, b, c)

Assignment

We'll introduce a dative sigil now, using the at-sign, to stand for taking the result of a function. Let's assume we have an incr function which adds one to its argument:

    // all of these are x = incr(y)
    @x incr !y;
    incr !y @x;
    !y incr @x;

If we want to assign one variable to another, we can introduce another nominative form which has no sigil.

    @x y; // x = y
    y @x; // x = y

Objects

So far we've assumed that we have functions in a generic namespace, but there are advantages to scoping functions within a namespace or object of some kind. Let's introduce another case to indicate a namespace in which a verb is being looked up, indicated by an ampersand:

    // object.method()
    &object method;
    method &object;

    // x = point.add(otherPoint)
    add &point !otherPoint @x;
    @x !otherPoint &point add;

Prepositions

In natural languages, we have a generally fixed set of prepositions—-a linguist would say that the class of prepositions is closed. I'm going to play fast-and-loose and instead include in this language an open set of prepositions, which will correspond to keyword arguments in languages like Python.

Previously, when supplying arguments to a function, we listed them in accusative form. That's fine for commutative functions like addition, but for non-commutative functions, or even moreso for functions of heterogeneous arguments, we want something that lets us move arguments around. Additionally, some functions may have arbitrarily large numbers of arguments. This is where our “prepositions” might come in handy.

Our “prepositions” will be of the form preposition:name, with optional whitespace, so we can arguably think of the colon as the prepositional sigil here.3

    // p = myPoint.move(x=myX, y=myY)
    @p &myPoint move x:myX y:myY;
    @p &myPoint move y:myY x:myX;
    y:myY x:myX move @p &myPoint;

There's no reason why functions couldn't have prepositions as well as an accusative argument. For example, a map function might use a func preposition to indicate its function argument:

    // newSeq = map(mySeq, func=toString)
    @newSeq map func:toString !mySeq;
    !mySeq map @newSeq func:toString;

Grouping

Lots of extant programming languages have the ability to avoid repeating certain common elements. For example, in JavaScript, one can use the with keyword to avoid naming the same element over and over:

    with (foo) { a(5); b(6); }
    // is equivalent to
    foo.a(5); foo.b(6);

We could mimic this pretty easily in our hypothetical language:

    &foo {
        a !5;
        b !6;
    }

But a structure like this could allow us to factor out repetitions of any given case; for example, if we assign multiple times to the same value:

    @x {
        add !x !1;
        mul !x !2;
        sub !x by:3;
        mul !x by:4;
    }
    // corresponds to:
    // x = add(x, 1);
    // x = mul(x, 1);
    // x = sub(x, 1);
    // x = div(x, 1);

And even that had a repeated element, so let's factor that out, too:

    @x !x {
        add !1;
        mul !2;
        sub by:3;
        div by:4;
    }

We can use this to factor out repeated object accesses:

    &console {
        writeline !"What is your name?";
        readline @name;
        writeline "Hello, {name}!";
    }

or even repeated prepositional arguments:

    x:5 width:10 height:10 init {
        @box1 y:10;
        @box2 y:20;
        @box3 y:30;
    }
    // box1 = init(x=5, y=10, width=10, height=10);
    // box2 = init(x=5, y=20, width=10, height=10);
    // box3 = init(x=5, y=30, width=10, height=10);

Would this be useful?

Probably not. I suspect a general-purpose language with a syntax like this would be rather tedious. Possible! You could even use this syntax with some kind of static type sytem:

writeline: { &Handle, !String } –> () div: { !Float, by:Float } –> Float init: { x: Int, y: Int, width: Int, height: Int} –> Rectangle

but it'd also be quite verbose, and the flexibility afforded by this scheme is probably the flexibility that anybody needs.

One place I can imagine this being useful, however, is in a shell-like system. Commands tend to have 'prepositions' already in the form of optional arguments, input-output redirection, &c, so it's not a far cry from what already exists:

    >logfile {
      echo !"Setting up system";
      port:22 user:alex {
        scp {
          host:alpha !some_file;
          host:beta !other_file;
        }
        host:gamma command:"./run.sh" ssh;
      }
      echo !"Script completed";
    }

In general, though, I suspect that—with the exception of keyword arguments, which have already been proven to be very useful—the ideas here are little more than a quaint curiosity.

As a final aside: I'd love a strongly-typed functional language which included keyword arguments as a design choice from the beginning. OCaml and Haskell both have optional keyword arguments, and in both languages, they don't quite behave like you'd expect, especially with respect to currying. And frankly—they don't have to be optional to be useful! A language that included the ability to name and reorder all the obligatory arguments to a function would still be a huge win for usability.


  1. There is nothing that says a formal language needs to have grammatical categories, or a well-defined phonology, or any of a number of other features of natural language.
  2. A good example of this is the common belief that natural language which does not mirror propositional logic is somehow “illogical”, e.g. that a double negative expressed in natural language follows the conventions of classical logic and cancels itself out. Natural language of course follows rules, but they are not the same rules as formal languages.
  3. This keyword system is very similar to SmallTalk and related languages, which use keywords for all function calls. The difference is that SmallTalk et al. consider the keywords to be an ordered sequence, and not a set, so if I define a constructor that is called like this:

    p := Person name: 'John' age: 22
    

    then the constructor corresponds to the method name name:age: and cannot be called with those keywords in any other order.