Corecursion and Codata
Four weeks ago, I embarked on a self-guided journey to learn Clojure. Despite not knowing the syntax, I boldly decided to build an Amazon web spider and solve Project Euler problems, hoping to learn as I went along.
So far, the journey has been both rewarding and challenging. I'm often tempted to procrastinate and deal with the issues at my job at Lifebooker. Balancing this learning experience with attending all hackNY events, hackathons, and making the most of my time in NYC (which I admit I'm not doing a great job at) leaves little room for experimentation. Tonight, I decided to take a break from my Amazon project, as I feel like I wouldn't accomplish much. Instead, I'll write a blog post about corecursion and learn by explaining it.
To start, let's define recursion.
According to Wikipedia, recursion is the process of repeating items in a self-similar way. While accurate, for the sake of clarity, let's define recursion as the expression of a function in terms of itself. This function could be a mathematical function $f(x)$ or a function in a programming language function(x).
In mathematics, a function defined in terms of itself might look like f(x) = f(x-1) + 2. A simple function defined in terms of itself in Java-like pseudocode could look like this:
recursive_function(argument1) {
return recursive_function(argument1)
}
Recursive functions typically have at least one base case, so they stop at a certain point and cease calling themselves. A simple example that comes to mind is a function that returns a number after some calls, provided that it's called with an integer argument less than 10.
recursive_function(argument1) {
if (argument1 == 10) { return argument1 }
return recursive_function(argument1 + 1)
}
Now that we've refreshed our CS101 knowledge, let's dive into the main topic. Corecursion is essentially the same concept as recursion but applied to data structures instead of functions. If I had to phrase the problem, it would go something like this:
- Given a data structure, define a computation on that structure.
- Execute it, then repeat the process on the result of this computation, and so on.
- It is similar to recursion, but it produces a sequence of data, with calculations taking place on demand when this data is dereferenced.
- It's data-oriented, not operation-oriented.
Codata
To comprehend corecursion, it's crucial to understand codata. If you're really interested, I recommend reading this paper that covers the basics of F-algebras, F-coalgebras, and duality (with F representing a functor in this context). For simplicity's sake, let's use two basic data types based on natural numbers, Nat and CoNat.
Nat (the natural numbers) is defined as:
O
S O
S (S O)
S (S (S O))
S (S (S (S O)))
...
Ultimately, they're created from S and 0. If you consider S as 1, you could create all the numbers with this data type.
CoNat is almost identical to Nat but with an extra element.
cO
cS cO
cS (cS cO)
cS (cS (cS cO))
...
cS (cS (cS (...)))...
An interesting consequence of this is that the last element of the CoNat set doesn't contain c0. Plus, it's truly infinite, as opposed to the "latest" Nat, which is just applying S and 0 repeatedly. If we were to define Nat, we could say something like:
type nat = Zero | Successor(nat)
But when it comes to infinity, we cannot do infinity = Successor(infinity). Thus, infinity ($\omega$) is NOT a member of this set. If it was, now consider the set N that has all the same elements as Nat except it does not have $\omega$. Clearly Zero is in N, and if y is in N, Succ(y) is in N, but N is smaller than Nat which is a contradiction. Therefore $\omega$ is not in Nat. Data is always finite. Codata can be infinite.
Taken word by word from reddit:
All types in functional programming have two "tools" for using them:
- an introduction principle, and
- an elimination principle
The introduction principle lets us turn objects of some other type into objects of this type. The elimination principle lets us turn objects of this type into objects of some other type.
This should be your personal mantra. Examples of this are:
- Function types - Lambda abstraction & Function application
- Sum types - inl and inr & case expressions
- Product types - the function
(,)&fstandsnd - Unit - the unit constructor
()&case () of () -> - Bool -
TrueandFalse& if-then-else statements - Empty - (no introduction rule) & reductio ad absurdum, the ability to promote $\bot$ to any other type.
A requirement for an introduction principle to be such, is that the introduction principle has to be able to turn some type into Nat. We do have an introduction principle for Nat. It is just our two old friends 0 and S (the constructors).
CoNat does NOT have an introduction principle as simple as its constructors. Its introduction principle is:
coNatIntro : (a -> Either () a) -> a -> CoNat
coNatIntro f a = case f a of
Left () -> cO
Right a' -> cS (coNatIntro f a')
So essentially, coNatIntro f includes a -> CoNat. And that's enough to 'turn some type into CoNat', so it's an introduction principle.
Elimination principles work just the opposite way, they turn elements of a type into elements of some other type, so in our example they would have the form Nat -> something, CoNat -> something else.
The elimination principle for our simple Nat type is:
natElim : a -> (a -> a) -> Nat -> a
natElim z s O = z
natElim z s (S n) = s (natElim z s n)
Wait. Isn't Nat -> a the opposite of a -> CoNat? Yup, so it is even clearer that they are elimination and introduction principles respectively. So it is proven that CoNat is codata, hence infinite but we can only see a finite amount of CoNat. This has some consequences, that can help understand corecursion and codata.
- You can't define total subtraction. (infinity - infinity will always loop).
- You can't write total equality. (it loops when comparing infinity == infinity).
- You can't write a total
elemfunction that works on a CoList (it loops when the list is infinitely long and the element isn't in the list).
Corecursive data structures are defined in terms of itself, corecursive algorithms are defined in terms of its output rather than on its input.
This and lazy evaluation allows "infinitely" large objects without allocating this memory and obviously running out of it prematurely.
Corecursive binary trees
A complex yet easy to understand corecursive structure would be a binary tree. I won't go into the details, but you can see how this would be defined in terms of a corecursive structure. Here is an example on how they take advantage of corecursion to traverse the tree.
(defstruct tree :val :left :right)
(def my-tree (struct tree 1 (struct tree 2) (struct tree 3 (struct tree 4) (struct tree 5))))