July 08, 2007

Peano's Axioms Part I: Haskell and Type Theory, and Object Oriented Programming

Well, the title's a little misleading, this has very little to do with type signatures, I decided to start a little series on what I think is the most powerful aspect of Haskell. Of course, I think that the most part is Math. Specifically in the realm of Type Classes. I'll give you a little background here.

I'm primarily familiar with Java's class system. Notably, upon reading about and toying with other class inheritance systems, Java stood out to me as being my favorite. It feels the most algebraic. One thing that should be noted is that typically, the only "real" difference between these systems (I'm generalizing wildly.) is in the world of multiple inheritance. Java, in particular, uses interfaces, which I intend to show as being a kind of type class, in fact, it is about one step away from being a real live type class.

Now, Lets start talking about inheritance.

Assume we have an Object Joe. Joe is a "Reified" (to borrow some language from type theory) object, that is. He's a finite, usable, object, with real methods, that do real things. Effectively, he's material, you can "poke" him, as it were.

Lets now say Joe has some Children, Jack and Emily. These are also Reified objects. However, they have a special property, they have a "Parent" object, namely Joe. This "special property" allows them to inherit properties from Joe, (say... devilishly good looks, and powerful intellect?). By now, the female readers are probably moaning, "Those have to come from the mother." And thats a good point, how do we get the Mom in on this? We need Multiple Inheritence. Before we get there though, lets talk about a different kind of object, an Abstract object.

An Abstract Object describes some underlying property of an object, but it does so through this Parent/Child relationship. An Abstract Object in our setting might be something like "Ancestor" or even "Parent". Notable things about Abstract objects is that they don't contain any "Reified" methods, that is, methods containing code. The contain stubs which describe the signature of a method, but not the method itself. So we might have a Abstract class "Parent" which contains a method "getChildren" which has a signature void -> List, its purpose is to return a list of the extending classe's (like Joe) Children. The important thing is, it doesn't force that to be the result, it only enforces a type of the content, not the content itself.

But what about Multiple Inheritence. Assume we have another object called "Sarah" which also extends the parent class, and also assume that Jack and Emily are Sarah's children too. Lets assume further that Sarah implements getChildren differently than Joe does. How then do Jack and Emily access there respective parents getChildren Methods? If they are fundamentally different implementations, then how can they choose which to use? Further, what if we want to have Parents for Joe, but we also want Joe to extend the Ancestor class? Now we have a mess...

How do we fix it? Well, the answer is to create a super version of Abstract classes, lets call them interfaces.

What does an Interface do? Well-- in Java, no class is allowed to inherit from more than one parent class. Abstract or Reified. Interfaces exist outside of the usual inheritance tree, and can therefore give information about the tree as a whole. For instance, we can now create our inheritance tree by saying that each ancestor class implements the "hasChildren" interface, which forces the ancestor to have the void -> List method, we can then say that the Children found in that list are instances of the Child class, rather than the class itself, this gives us a solution to the problem by forcing us to access Jack through the Sarah object.

Could we have done this with normal Abstract/Reified classes, sure. Would it work as well? Probably not.

So why are interfaces cool? Well- you can think of an Interface as a way to describe a set of classes. Classes which meet certain properties. Indeed, they use interfaces like this all the time. For instance, Java has a interface called "Comparable" which- as the name suggests, means that any class that implements Comparable can compare instances of itself in a meaninful way. Comparable doesn't say how to do the actual comparison, it just says that there must be a method compare which has type: (using Haskell-esque syntax)

compare :: Comparable a => a -> a -> {-1,0,+1}

The Javadocs recommends that it should have 0 mean that x == y, but that its not required. The interface only defines the kind of interaction you can have.

Why is this cool? Well, consider the obvious example, what if I wanted to write quicksort over a generic list of elements? How do I ensure that I have a sortable list? Lets say we have an Object Foo, which has no ordering, and another Object Bar, which can. I have a magic method qsort which will actually do the sorting, but how do I write the method such that I know that the list is sortable? That is if I write tests:

assertFail(qsort(List = {foo1, foo2, ...}));
assertPass(qsort(List = {bar1, bar2, ...}));

How can I ensure those both succeed (with the assumption, of course, that those testing constructs exist)? Well, the easy answer is to define qsort as having the method signature:

List qsort(List);

(I think thats the right syntax, it's been a while since I wrote Java.)

This will prevent (at compile time, no less) the use of qsort on List, because Foo doesn't implement Comparable. Running it on List is Fine, because its comparable, like we said before.

So now how is this different from Type Classes. Well, principly, there is one difference. Type Classes can not only specify methods the Type must have defined on it, but it can also define other methods derived from those methods. A seemingly trivial extension to Interfaces. However, that one simple change gives you enormous power in creating data abstractions. For instance, given the Haskell equivalent of Comparable, which is the pair of classes Eq and Ord (for Equality and Orderability, resp.), which require only 2 definitions total (== or /=, and compare or <=, /= is not equal). We get for free, about 20-30 functions including: all your comparison and equality operators, the ability to put items implementing the class in a set, the ability to use them as a key in a finite map, sorting of implementing items, some extra list functions, and more. All that, from 2 definitions, in two type classes, along with the Prelude and Haskell's own polymorphism capabilities.

How does that work? Well, lets dissect some code, here's the Ord class:

ignore the '.''s, they're just for spacing.

class (Eq a) => Ord a where
........compare :: a -> a -> Ordering
........(<), (<=), (>=),(>) :: a -> a -> Bool
........max, min :: a -> a -> a
................--minimal complete definition:
................-- (<=) or compare
................-- using compare can be more efficient for complex types.
........compare x y
.............| x == y = EQ
.............| x <= y = LT
.............| otherwise = GT
........x <= y = compare x y /= GT
........x < y = compare x y == LT
........x >= y = compare x y /= LT
........x > y = compare x y == GT
........-- note that (min x y, max x y) = (x,y) or (y,x)
........max x y
.............| x <= y = y
.............| otherwise = x
........min x y
.............| x <= y = x
.............| otherwise = y

So, lets dissect.

First, we declare what the class implements, that is, what methods are required to exist if the class is implemented. However, this is not the same as saying that all those methods must be defined over the structure before we can say it implements the class. We instead only need the "MRD/MCD" or "Minimal Requisite/Complete Definition" that is, the minimal number of functions needed to define all the other functions the class provides. How do we know what the MRD is? Well- thats up to you to decide. We do that by defining each function of the class in terms of other functions in the class, even cyclicly if necessary. I like to call this set of cyclic definitions the "internal definitions" of the set. These internal definitions provide the real power of the type class. Also note how, in the class definition, we have an inheritance bit, in that of Eq a => Ord a. This allows us to assume that we have basic equality functions provided by Eq.

In the next few posts in this series, I'll be implementing Peano Arithmetic in Haskell, principly by pushing one simple data type:

Nat = Z | S Nat

Into as many type classes as I can, as of this writing, I have two more posts planned, and have pushed Nat into Eq, Ord, Num, Integral, Enum, and Real, I hope to get it into Rational and Fractional which will turn it into Nat-QQ (Natural Rationals (that is, all the Positive Rationals)), and maybe then refactor Nat into ZZ (the integers), and get the Regular Rationals out of that.


olegus said...

I've been watching FP related Google talks (in particular "Faith, Evolution, and Programming Languages"), then looking through different kinds of polymorphism and, finally, got what are "type classes" in Haskell. My first idea was that there are much like interfaces in Java, so that's how I've found your blog.

After reading your entry I understand that there are something more that ressembles mixins - in a sense, they provide most functionality and require few methods to be implemented. Of course mixins are used in dynamically typed languages, so this is much different.

I feel like I start to grasp different kinds of techniques that arise from clash of static/dynamic typing and different kinds of polymorphism.

Jake said...

I take it the post helped, then, I'm glad.

I've never used Mixin based OO before, but if its significantly like Type Classes, I'll probably love it to death. The real brilliance is that they give you the ability to really make polymorphism work hard for you. In that with a minimal amount of work on your part (vis a vis, a couple method declarations), and some hard work on someone elses part, we get not only the ability to create a stable infrastructure for our programs to exist in, but that infrastructure is about as extendable as legos. Just snap in a new structure, define some operations, and bam! We get all the associated functions for free.

I'll spin off of Greenspuns Law and say:
Any sufficiently complex object-oriented system will asymptotically approach being a Haskell-esque Type Class/Heavy Polymorphism system.

Maybe I'm awestruck, I dunno, but Type Classes are awesome.