Blocks Courtesy of Konrad Zuse

September 28, 2014

Apparently, I've got a theme going of Weird Syntax. Let's run with it.

Konrad Zuse was an early pioneer in computer science, although his name is perhaps somewhat less well-known than others. Zuse holds the honor of having built the first programmable computer—-the Z3—-back in the 40's, as well as several other computing firsts¹. Of particular interest to this blog post is his early unimplemented programming language, Plankalkül.

Plankalkül was, like the Z3, in many respects ahead of its time. Zuse's explicit goal was to be able to describe programs at a high level, which meant he included control structures and datatype definitions² and other high-level constructs that were often missing in languages of the early years of computing. Zuse was working on Plankalkül at a time when his machines were not useable, which meant that his language work was more theoretical than it was technical, and consequently he allowed features that he wasn't entirely sure how to program. Despite his notes on it having been written in the mid-40's, they were not published until the 70's, and it was not implemented until the year 2000.

One thing that struck me, as I read programs in this notation that had been set down on a typewriter³, is that certain kinds of grouping were handled by explicit indication of scope: not via matched delimiters as in ALGOL-style languages, or via indentation in languages such as Python and Haskell, but by formatting the code so that a line bordered on the left of the scoped parts of the code:

This is meant to capture the way grouping works in the hand-written or typeset notation, with brackets spanning multiple lines:

I think this is notationally interesting: it's like Python's significant whitespace, but not, uh, whitespace. It would be incredibly tedious to type out, but still entirely compatible with current programming notation:

class Tet
 | @staticmethod
 | def new_tet()
 |  | n = randint(0, len(Tet.Tets) - 1)
 |  | for p in Tet.Tets[n]
 |  |  | if p in Board.permanent
 |  |  |  | Game.lose()
 |  | Game.current = Tet(Tet.Tets[n], Tet.TetColors[n])
 |
 | def __init__(self, points, color)
 |  | self.points = points
 |  | self.color = color

and would be entirely amenable to beautifying via judicious application of Unicode:

class Tet
 ┃ @staticmethod
 ┃ def new_tet()
 ┃  ┃ n = randint(0, len(Tet.Tets) - 1)
 ┃  ┃ for p in Tet.Tets[n]
 ┃  ┃  ┃ if p in Board.permanent
 ┃  ┃  ┗  ┗ Game.lose()
 ┃  ┗ Game.current = Tet(Tet.Tets[n], Tet.TetColors[n])
 ┃
 ┃ def __init__(self, points, color)
 ┃  ┃ self.points = points
 ┃  ┗ self.color = color

Looking at this notation, however, an interesting possibility struck me: a programmer could explicit annotate information about the kind of scope involved in a given line. In this Python-like example, I could, for example, distinguish class scope using double lines, function scope with thick lines, and control structure scope with thin lines:

class Tet
 ║ @staticmethod
 ║ def new_tet()
 ║  ┃ n = randint(0, len(Tet.Tets) - 1)
 ║  ┃ for p in Tet.Tets[n]
 ║  ┃  │ if p in Board.permanent
 ║  ┃  └  └ Game.lose()
 ║  ┗ Game.current = Tet(Tet.Tets[n], Tet.TetColors[n])
 ║
 ║ def __init__(self, points, color)
 ║  ┃ self.points = points
 ║  ┗ self.color = color

One advantage of this scheme is that a handful of lines, viewed in isolation, still give you a clear view of what surrounds them. For example, I can view these two lines in isolation and still tell that they are within a control structure used within a function declared within a class:

 ║  ┃  │ if p in Board.permanent
 ║  ┃  └  └ Game.lose()

You could also imagine a hypothetical language in which choice of scope delimiter is important. In Python, for and if do not form a new lexical scope. What if instead we could stipulate the kind of scope they form by this notational convention?

def okay()
 ┃ if True
 ┃  └ n = 5   # n is declared in function scope
 ┗ return n   # n leaks out of the if-scope

def not_okay()
 ┃ if True
 ┃  ┗ n = 5   # n is declared in the if's scope
 ┗ return n   # error: no n in scope here

That being said, there are a number of reasons that this notation is in inferior to existing notations:

It makes refactoring code much more difficult.
It requires that the programmer pay attention to the sequence of enclosing scopes on a line-by-line basis, which is generally too pedantic and not particularly useful for a programmer.
The ability to select “which kind of scope” is by no means only expressible by this notation, as other syntactic features such as keywords and delimiters could express the same thing.
There are only so many line-like characters which can serve as a scope marker, so this scheme is not very extensible.
It complicates parsing (especially by introducing an entirely new class of parse errors in which adjacent lines feature incompatible sequences of delimiting lines), and so it also...
Complicates parse error messages, which are an important part of a language's UI and should be considered seriously.

So, as in my previous post on grammatical case in programming languages, I urge readers not to use this notation as the concrete syntax for a programming language. This is merely an entertaining peek through the looking glass at a curious notational convention which was never adopted.

That said: this makes a very nice notation for viewing code, where the programmer does not have to explicitly draw ASCII art around their code; indeed, it bears more than a passing similarity to the graphical interface used in Scratch, and Sean McDirmid's Experiments in Code Typography features this very convention as an interactive ornament on code in a Python-like language.

He even wrote A New Kind Of Science half a century before Stephen Wolfram published it. In spirit, anyway—-if you strip away Wolfram's self-aggrandizement and the pretty pictures, much that remains of the book resembles Zuse's 1969 book Rechnender Raum.
Datatype definitions wouldn't be found in other programming languages until the late 50's. Zuse's treatment of them was quite sophisticated: Plankalkül had no notion of what we would now call sums, but did have products in the form of tuples. The primitive type was S0, which represented a single bit and had the values + and -, so a two-bit value might be represented as (S0,S0). Arrays were included of the form m×t, which stood for m repetitions of t, and variable-length arrays were encoded as □×t. Integers were included as a primitive, and floats were defined as (3×S0,7×S0,22×S0), which stood for three sign bits (which could also indicate whether the number was real or imaginary or zero), a seven-bit exponent, and a twenty-two-bit significand.
The image given is from Knuth and Pardo's The Early History of Programming Languages, which surveys early programming languages—-implemented and not implemented—-and gives the same basic program implemented in each one.