May 18, 2007

Programming Languages Part I: Syntactic Similarity

I like languages. when I was younger, I remember reading The Hobbit and spending more time reading the first two preface pages about the Moon Runes, A gussied up version of the Futhark, than actually reading the book. For a good bit of time after that, I wanted to be a Linguist, and not a mathematician.

But Alas, over time my interests went from Natural Language to Formal, from Formal Language to Abstract Language, and from there to the wide world of Algebra and Logic. A good transition, I think. Nevertheless I still love Languages, and that love has now been turned to specifically Programming languages. I like these languages because they are first and foremost utilitarian. They are designed from the start to do one thing, get a point across to an idiot. Lets face it, Programmers and Computers alike are pretty damn dumb. The language is the smart thing. A Programmer has no capability to tell a computer what to do without the help of a good language, and a Computer can't do anything it isn't told to do. So the most fundamental piece of Computer Technology is that of the Language, the layer that binds Programmer with Programmee.

I love languages, but I often hate them too. Take, for instance, Java. Java is an exceptionally pretty language, but it's also ugly as your Aunt Anita Maykovar (May - ko - var). Java effectively boils down to one syntactic structure, the class. Every Java file revolves around it, and in some ways, this is really good. The fundamental commonality this structure brings allows Java code to be easier to learn to read. Your brain is good at processing things that are similar, the pathways it has to draw are all about the same-- so it's easier to optimize up there. The issue I have with Java actually is that, sometimes its too good at looking the same. To the point where I forget where I am. I get lost in my own neural pathways while I try to figure out whether I'm looking at an Abstract class or an Interface, or if I'm looking at a dirty hack of a databearing class or if I'm looking at something more legitimate. C/C++ is great at making this distinction, but it's also, IMO, ugly as shit, uglier even. I like C often for its ability to compartmentalize things, but I think it takes it to far, nothing looks alike, even if it should. One of my peeves with C vs Java is they're taking extreme views on something which should be easily decided. I'd like to sum it up as a fundamental rule I want to see in all programming languages (though that will probably never happen). Here it is:

Syntacticly similar things should be Semantically similar things, and vice versa, according to the proportion of similarity.

That is, If I want to create an object called "List" which implements a linked list, and then I want to create an interface (which is really just a Type Class, I've come to realize, but thats a story for another day.) called "Listable" which, when implemented, forces a object to have list-like properties. These things should have some similar structure. However, this is not to say we should copy Java. Java takes this too far, I think. In that, Java follows the rule: "If things are Semantically similar, they are Structurally almost Identictal." This is bad, Interfaces should look different than Classes, but only in a minor way. I'd like Java Interfaces, heck, Java in general if I could specify type signatures ala Haskell. I think Haskell has got it right when it comes to how types should work syntactically. The brilliance of this comes in when you try to write a Java Method with this Type Sig ala Haskell type syntax, here's Fibonacci:

fib :: public, static :: int -> int

fib(nth) {
(nth == 1) ? 1 : fib(nth-1) + fib(nth-2);
}

(I think that'll work, but it's been a while, so I might have it wrong.)

Granted, there are issues with this. Notably, Java has things like side effects, but these could be built into the type signatures. I think that the ultimate benefit of this kind of type signaturing is a separation of concerns syntactually. I think that overall, this would make the language as a whole a lot cleaner. As interfaces would no longer have stubs like:

public static int fib(int nth);

which, though nice, doesn't carry the same amount of information that could be held in something like:

fib :: public, static :: int -> int

Syntactually, the latter structure is more extensible, it could allow for the incorporation things like side effect tracking, or thread signatures, which might look like:

fib :: public, static, threaded :: int -> (int, int, ...) -> int

which says that effectively fib is a method with takes a int to a unspecified number of threads, to an integer.

I'm really just spitballing at the end here, with some neat ideas I think that a small change in Java's syntactic structure could bring.

Just my thoughts.

PS: I don't know if the Syntactic/Semantic Similarity rule is well known, but either way, its a damn good idea.

No comments: