Noun Case

August 3, 2014

Comparing a computer language to a human language is like comparing an operating system kernel to a popcorn kernel.

—kryptkpr

In general, human languages and formal languages—-of which programming languages are one variety—-don't have a whole lot in common. Many natural-language features don't have formal-language analogues¹, and many formal-language properties don't carry over well to natural languages².

On the other hand, it is sometimes fun to derive inspiration for one from the other. One idle idea I had involved applying grammatical case to programming languages and seeing what resulted.

Grammatical Case

In linguistics, case is a grammatical category applied to nouns based on their function in a sentence. Consider these English sentences:

I see the cat.

The cat sees me.

My house is large.

Each sentence contains a word which refers to the speaker (I, me, my) but the choice of which word to use depends on the purpose of that word in the sentence. Grammarians would say that I—which serves as the subject—is the nominative form of the first-person pronoun, that me—the object of verbs and prepositions—is the oblique form, and that my is the possessive form. In English, only pronouns are inflected this way, but in some other languages, all nouns behave this way. For example, the Russian translations of the first two sentences above:

ja viž-u  košk-u
I  see-1s cat-ACC

košk-a  vid-et menja
cat-NOM see-3s me

feature the words koška and košku, which are two different forms of the word for 'cat'—the former serving as the subject of the sentence, and the latter the object. Russian has far more than three cases, as well:

košk-a   nominative (subject)
košk-y   genitive ('of the cat')
košk-je  dative ('to the cat')
košk-u   accusative (object)
košk-oj  instrumental ('with the cat')
košk-je  prepositional

It's important to note that the case of a noun is determined by its grammatical role in the sentence, not by its semantic role (or what linguists would call a thematic role.) Consider these two sentences:

The cat ate the fish. The fish was eaten by the cat.

In both sentences, the cat is the agent, the one who performs the action, and the fish is the patient, the one to whom the action is being done. But in the first sentence the cat is the subject, while in the second the fish is is the subject, and would be inflected accordingly:

košk-a  jel-a   ryb-u
cat-NOM ate-FEM fish-ACC

ryb-a    byl-a   sjedena    košk-oj
fish-NOM was-FEM eat.PP.FEM cat-INSTR

Not all languages have grammatical case: Chinese, for example, doesn't even inflect pronouns based on their role in a sentence

wǒ kàn māo
I  see cat

māo kàn  wǒ
cat sees me

Some languages with case have relatively few cases—-Modern Greek, for example, has four: a nominative, an accusative, a genitive, and a vocative—-whereas others may have quite a few—-Finnish, as an extreme case, has fifteen.

Grammatical case, among other things, allows for freer word order. In English, for example, we would normally say

The cat ate the fish.

and we could probably rearrange that sentence a little bit while still making sense

The fish, the cat ate.

but there are obvious limits to what is allowed. We would not say

The fish ate the cat.

and yet mean that the cat ate the fish. Word order is still conventionally fixed in languages with case—-especially in conversation and formal, non-poetic writing—-but we can be freer with word order and still produce a comprehensible sentence. A Russian-speaker might find it odd, but the sentence

ryb-u    jel-a   košk-a
fish-ACC ate-FEM cat

is clearly communicating that it was the cat that ate the fish, despite the words being in the opposite order.

Applying Case To Programming Languages

As it turns out, this has already been done—-in a Perl extension called Lingua::Romana::Perligata which was exploring this very question. In Perligata (as I will call it for short), three cases are distinguished: an accusative, a dative, and a genitive. The accusative is used as the arguments to functions or the left-hand side of an assignment; the dative is used as the left-hand side of an assignment or as the 'mutable' argument of a function like push, and the genitive is used for indexing. So, the expression

pippo = pluto[paperino]

becomes

pippo plutorum paperini da

But I'd argue that Perligata, while interesting, is too tightly tied to Perl semantics to be useful to people outside of Perl. So, I'm going to start from scratch and not tie this to any particular language, but rather to some manner of imperative pseudocode and add various features as I go.

Basic Form

Assume we have nouns and we have verbs. Verbs correspond to functions or subroutines, and nouns correspond to names given to values, or to literal values themselves. The only kind of classification given to nouns here is case; we're going to omit other linguistic classifications like grammatical gender or number. Individual statements will be semicolon-terminated for clarity.

In constrast to spoken language or to Perligata, our fake language is going to use prefixed punctuation marks to indicate the case of a variable. It wouldn't be hard to mimic natural language more closely by tokenizing our values based on some ending

    subject := /[a-z_]+us/
    object  := /[a-z_]+um/
    index   := /[a-z_]+i/

but I won't do that here.

Functions

Let's start with functions that take a single argument and don't return anything interesting; say, some kind of print function. We can give it an argument in the accusative, which we'll mark with an exclamation point:

    print !x; // understood as print(x)

Because we're using indicating the grammatical relationships of our tokens with case, we wouldn't create ambiguity if we were to to modify the order of the tokens here:

    !x print; // still understood as print(x)

We could do a few different things for functions of greater arity. One of the simplest possibilities—-especially for functions like sum in which the ordering of the arguments don't matter—-is to simply list all the arguments in the accusative.

    !a !b !c sum; // sum(a, b, c)
    sum !a !b !c; // sum(a, b, c)
    !a !b sum !c; // sum(a, b, c)

Assignment

We'll introduce a dative sigil now, using the at-sign, to stand for taking the result of a function. Let's assume we have an incr function which adds one to its argument:

    // all of these are x = incr(y)
    @x incr !y;
    incr !y @x;
    !y incr @x;

If we want to assign one variable to another, we can introduce another nominative form which has no sigil.

    @x y; // x = y
    y @x; // x = y

Objects

So far we've assumed that we have functions in a generic namespace, but there are advantages to scoping functions within a namespace or object of some kind. Let's introduce another case to indicate a namespace in which a verb is being looked up, indicated by an ampersand:

    // object.method()
    &object method;
    method &object;

    // x = point.add(otherPoint)
    add &point !otherPoint @x;
    @x !otherPoint &point add;

Prepositions

In natural languages, we have a generally fixed set of prepositions—-a linguist would say that the class of prepositions is closed. I'm going to play fast-and-loose and instead include in this language an open set of prepositions, which will correspond to keyword arguments in languages like Python.

Previously, when supplying arguments to a function, we listed them in accusative form. That's fine for commutative functions like addition, but for non-commutative functions, or even moreso for functions of heterogeneous arguments, we want something that lets us move arguments around. Additionally, some functions may have arbitrarily large numbers of arguments. This is where our “prepositions” might come in handy.

Our “prepositions” will be of the form preposition:name, with optional whitespace, so we can arguably think of the colon as the prepositional sigil here.³

    // p = myPoint.move(x=myX, y=myY)
    @p &myPoint move x:myX y:myY;
    @p &myPoint move y:myY x:myX;
    y:myY x:myX move @p &myPoint;

There's no reason why functions couldn't have prepositions as well as an accusative argument. For example, a map function might use a func preposition to indicate its function argument:

    // newSeq = map(mySeq, func=toString)
    @newSeq map func:toString !mySeq;
    !mySeq map @newSeq func:toString;

Grouping

Lots of extant programming languages have the ability to avoid repeating certain common elements. For example, in JavaScript, one can use the with keyword to avoid naming the same element over and over:

    with (foo) { a(5); b(6); }
    // is equivalent to
    foo.a(5); foo.b(6);

We could mimic this pretty easily in our hypothetical language:

    &foo {
        a !5;
        b !6;
    }

But a structure like this could allow us to factor out repetitions of any given case; for example, if we assign multiple times to the same value:

    @x {
        add !x !1;
        mul !x !2;
        sub !x by:3;
        mul !x by:4;
    }
    // corresponds to:
    // x = add(x, 1);
    // x = mul(x, 1);
    // x = sub(x, 1);
    // x = div(x, 1);

And even that had a repeated element, so let's factor that out, too:

    @x !x {
        add !1;
        mul !2;
        sub by:3;
        div by:4;
    }

We can use this to factor out repeated object accesses:

    &console {
        writeline !"What is your name?";
        readline @name;
        writeline "Hello, {name}!";
    }

or even repeated prepositional arguments:

    x:5 width:10 height:10 init {
        @box1 y:10;
        @box2 y:20;
        @box3 y:30;
    }
    // box1 = init(x=5, y=10, width=10, height=10);
    // box2 = init(x=5, y=20, width=10, height=10);
    // box3 = init(x=5, y=30, width=10, height=10);

Would this be useful?

Probably not. I suspect a general-purpose language with a syntax like this would be rather tedious. Possible! You could even use this syntax with some kind of static type sytem:

writeline: { &Handle, !String } –> () div: { !Float, by:Float } –> Float init: { x: Int, y: Int, width: Int, height: Int} –> Rectangle

but it'd also be quite verbose, and the flexibility afforded by this scheme is probably the flexibility that anybody needs.

One place I can imagine this being useful, however, is in a shell-like system. Commands tend to have 'prepositions' already in the form of optional arguments, input-output redirection, &c, so it's not a far cry from what already exists:

    >logfile {
      echo !"Setting up system";
      port:22 user:alex {
        scp {
          host:alpha !some_file;
          host:beta !other_file;
        }
        host:gamma command:"./run.sh" ssh;
      }
      echo !"Script completed";
    }

In general, though, I suspect that—with the exception of keyword arguments, which have already been proven to be very useful—the ideas here are little more than a quaint curiosity.

As a final aside: I'd love a strongly-typed functional language which included keyword arguments as a design choice from the beginning. OCaml and Haskell both have optional keyword arguments, and in both languages, they don't quite behave like you'd expect, especially with respect to currying. And frankly—they don't have to be optional to be useful! A language that included the ability to name and reorder all the obligatory arguments to a function would still be a huge win for usability.

There is nothing that says a formal language needs to have grammatical categories, or a well-defined phonology, or any of a number of other features of natural language.
A good example of this is the common belief that natural language which does not mirror propositional logic is somehow “illogical”, e.g. that a double negative expressed in natural language follows the conventions of classical logic and cancels itself out. Natural language of course follows rules, but they are not the same rules as formal languages.
This keyword system is very similar to SmallTalk and related languages, which use keywords for all function calls. The difference is that SmallTalk et al. consider the keywords to be an ordered sequence, and not a set, so if I define a constructor that is called like this:
```
p := Person name: 'John' age: 22
```
then the constructor corresponds to the method name name:age: and cannot be called with those keywords in any other order.