This is the missing piece from my mental toolbox. I wish I'd known about it years ago.
Co-recursion is a principled way to turn a top-down algorithm into a bottom-up one. For instance, in Haskell, it's easy to write the fibonacci series from its definition:
But this is ridiculously inefficient. Until now, the only "principled" way I knew of to transform this into something sensible was through memoization, but that's still very wasteful.
With co-recursion in my toolkit, I can transform this into a form that generates the whole series as a stream:
fibonacci_series = 0: 1: more fibonacci_series
where
more f1: rest@(f2: _) = f1 + f2: more(rest)
And thanks to Haskell's lazy, garbage collected semantics, taking the nth element of the series (remembering to drop data I don't need) turns out to be equivalent to:
fibonacci n = fib n 0 1
where
fib 0 a b = a
fib n a b = fib (n-1) b (a+b)
Corecursion is useful for taking a recursive algorithm and transforming it to a stream-like output pattern.
But this is ridiculously inefficient.
Well...you might be surprised in the general case. Because of lazy evaluation, Haskell won't necessarily implode on non-TCO recursive functions (example: http://stackoverflow.com/questions/13042353/does-haskell-hav...), and will actually sometimes cause a stack overflow on "optimized" functions.
Until now, the only "principled" way I knew of to transform this into something sensible was through memoization, but that's still very wasteful.
I think "real" Haskell code usually favors operations of lists over recursive functions. That said, the standard way to transform your recursive structure into something "sensible" is to use a tail-recursive function. In basically any other functional language, you'd go with that approach.
To get the same "benefit" in Haskell, you'd have to force strict evaluation inside of a tail-recursive function. This prevents a thunk from causing problems. That said, Haskell doesn't always build up a normal stack on a regular recursive call.
Otherwise, you'd just use a list structure.
(Someone correct me if I've said something stupid.)
My remarks were addressed to people who don't need tail recursion or the benefits & drawbacks of lazy evaluation explained to them, so I think they went a little over your head.
I presented a recursive definition, a stream based definition, and a tail-call definition of the fibonacci function. In that toy example, it's easy to get between the three different forms, but in many cases the connection is far less obvious. We need principles that unite the different forms, and allow us to move between them. Co-recursion is one of those principles.
I understand lazy evaluation and tail recursion fine. I interpreted your comment as presenting corecursion as the only logical alternative to naive recursive algorithms with or without memoization.
You've tacked on the part where you say the latter algorithm is equivalent (due to Haskell's evaluation) - I get that. I'm still not understanding what you mean by only knowing inefficienct, naive recursion in contrast to corecursion. In practice, I have rarely seen corecursion or naive recursion used, but maybe we read different code.
In that toy example, it's easy to get between the three different forms, but in many cases the connection is far less obvious. We need principles that unite the different forms, and allow us to move between them. Co-recursion is one of those principles.
I once read a textbook introducing the theory of algorithms that started with the Fibonacci series. The author gave a formal definition, and translated this directly into a naive, recursive algorithm. Then he showed how to improve the algorithm with memoization. This was a real revelation to me: using this one simple, generally applicable transformation, all kinds of algorithms can be improved.
Then the author introduced the iterative Fibonacci algorithm, and formally proved that the iterative and recursive algorithms were equivalent. But the author failed to explain where that iterative algorithm came from: there was no general purpose tool like memoization that could be used to derive the iterative form from the recursive form. I still remember the intellectual disappointment.
Now, with co-recursion and a couple of other techniques, that gap is finally bridged. Using co-recursion, I can derive the stream algorithm directly from the recursive definition. And then I can apply deforestation techniques and tail-call optimization to derive the iterative algorithm from the stream algorithm. That's a pretty powerful intellectual toolset, don't you think?
For the benefit of anyone following along at home: The unproven-but-accepted Church-Turing thesis states that any algorithm that can be done on a Turing machine (i.e., a Turing-complete language) can be represented as a recursive function and a function in the lambda calculus, and vice-versa.
And then I can apply deforestation techniques and tail-call optimization to derive the iterative algorithm from the stream algorithm.
How do you use deforestation and tail-call optimization to derive an iterative function from a stream in the general case?
You're also pulling a false comparison: you've jumped from a 'general' algorithm to "what a corecursive function is evaluated as under Haskell's semantics".
Corecursion builds a data structure. A TCO function, for example, won't produce that kind of output. The corecursive function could only be directly equivalent to the linear time, constant memory TCO function in a lazily-evaluated runtime (if that's true - tail recursive functions in Haskell can actually blow up the stack due to lazy evaluation).
How do you apply deforestation ... in the general case?
How do you cut an iron bar in half with a pair of scissors?
Deforestation is a tool; not a universal truth.
You're also pulling a false comparison ... The corecursive function could only be directly equivalent to the linear time, constant memory TCO function in a lazily-evaluated runtime
But I'm using a lazily evaluated runtime. My code is Haskell. And in any case, I used deforestation in the last step to remove the lazily evaluated data structure.
If I understand you correctly, you're telling me that high-level mathematical formalisms work better in Haskell. Yes, they do.
Your version is certainly pretty, and it's based on an insight into the nature of the fibonacci series
But the value of co-recursion is that you don't need that insight. When you expand out the definitions of `tail`, `!!` and `zipWith`, your code turns out to be the same as mine[1], so I consider that a big win for co-recursion.
Not corecursive, but here's the SICP O(ln(n)) algorithm:
fibonacci = fib 1 0 0 1
where
fib a b p q n
| n == 0 = b
| even n = fib a b p' q' (n `div` 2)
| otherwise = fib a' b' p q (n - 1)
where
p' = p*p + q*q
q' = 2*p*q + q*q
a' = a*q + b*q + a*p
b' = b*p + a*q
Note that in Python, you also have the issue that you still need to specify some cleverness about how long something is stored and what is stored. Many example memoize implementations just store everything (e.g.: https://wiki.python.org/moin/PythonDecoratorLibrary#Memoize) but of course that is not a solution in practice.
I don't think the methods mentioned in the haskell wiki are quite equivelent to the python decorator, as they require you to modify the data structure itself to support memoization.
It should be possible to use UnsafePerformIO to create a generic function memoize function (at least for inputs that implelemt Eq or Ord), but that is almost definantly overkill and asking for space leaks.
Minimax algorithm: https://en.wikipedia.org/wiki/Minimax (I actually generally use negamax, the non-corecursive version, but you should understand minimax first)
"Tying the knot" in Haskell
That is, it's possible to create circular lists in Haskell like so:
let x = 1:y -- x = 1 cons y
y = 0:x -- y = 0 cons x
in x -- return x
Which will return an infinite list of alternating 1's and 0's which only take up 2 ints (and pointers) worth of memory. Calling the millionth index merely runs in circles around the list until it's followed a million pointers.
EDIT: Minimax is mutual recursion, not corecursion, whoops.
> Minimax algorithm: https://en.wikipedia.org/wiki/Minimax (I actually generally use negamax, the non-corecursive version, but you should understand minimax first)
That's mutual recursion, not corecursion. Check the article out, corecursion has a much more... "mathematical" definition, relating to the types of initial data and final data.
By my quick reading, your second example might be considered corecursion (but I think is also just regular recursion). It might not be considered recursion at all, in this particular sense, as it's only one piece of infinite data, not a function that takes an input. I'm not the best algebraist though.
The second example is corecursion, I believe. The article specifically mentions
> Corecursion can produce both finite and infinite data structures as result, and may employ self-referential data structures. Corecursion is often used in conjunction with lazy evaluation
Analogies are lossy but one way to think of corecursion/recursion is that corecursion turns the seed into a maple tree and recursion eats the tree's branches and spits out a table leg or something. Corecursion is generative and (structural) recursion aggregates/builds. You corecursively generate a possibly infinite stream while you recursively turn a finite list of numbers into their sum.
Other places you've probably already seen corecursiveness: Power Series in calculus, automatons in computer science and in the close cousin that is the so called map in MapReduce TM.
Programming with just (primitive but on higher order functions) recursion and corecursion is the basis of Total Functional Programming which, while not Turing complete is still very surprisingly powerful.
What are the restrictions on combining primitive recursive and co-recursive functions in total functional programming? Clearly, "repeat" is primitively co-recursive, and "length" is primitively recursive, but "length . repeat" is non-terminating.
length . repeat won't type either. Briefly, recursion and corecursion have the following types
cata : (f a -> a) -> mu f -> a
ana : (a -> f a) -> a -> nu f
where `mu` is the "least fixed point" of a base functor f and `nu` the "greatest fixed point". In non-terminal, non-strict languages `mu f == nu f`, but in total languages they don't. We call types of the form `mu f` data and types of the form `nu f` codata. Data is guaranteed to be finite while codata is not.
You think of it like a generator. Before I wrote something like
ana : Step f seed -> seed -> Nu f
but we might define `Nu f` using an existential to just be
data Nu f = forall seed . (Step f seed, seed)
and then you can write an eliminator for this type which just handles each step. So, `repeat` might look like
data ListF a x = Nil | Cons a x
repeat :: a -> Nu (ListF a)
repeat a = Nu (\x -> Cons x x, a)
and then `take` will be a parametric catamorphism on Nat which just repeatedly creates new layers and consumes them. Together these form a thing called a hylomorphism.
My understanding is that there's nothing wrong with using codata - the problem is with recursing on codata ("problem" in that you're stuck generating codata). Something like "take n" can produce data when reading codata because the recursion happens partly on data (in this case, n).
I'm sure tel will let me know if I've made a misunderstood something.
That's pretty much exactly it! The other thing to note is that the reduction semantics of codata are to... just sit there until it's consumed.
If you use Haskell you get the idea that codata just expands and expands because of the behavior of the repl and laziness. Sometimes, a better intuition is that it's more like a Python generator which must be yielded explicitly (by a recursive algorithm recurring on some data).
Another way to think about it is that only recursion "drives" computation. Corecursion sets up a potentially infinite number of computation steps each of which can be chosen to be churned through via recursion.
Am I misunderstanding this? reading all the examples, the difference between recursion and corecursion seems to be equivalent to the difference between tail recursion and stack recursion. Is that right?
can anyone post an example of corecursion that is stack recursion? or is there something else that needs to happen to qualify something as corecursion?
Those are just at different levels of meaning. I don't think there's a direct relationship between stack/tail recursion and recursion/corecursion.
To get a brief feel for the difference consider the "type function":
T x = 1 + x
where 1 is the type with only one value and the type (X + Y) is the tagged union of types X and Y. We might decide to talk about the type "iterated T" as the type
If we tag "left" values with Inl and "right" values with Inr and take * to be the only value of type 1, then values of IT look like
Inl *
Inr (Inl *)
Inr (Inr (Inr (Inr (Inl *))))
which looks a lot like the Peano Naturals. Now, a recursive definition on IT looks like
cata : forall x . (T x -> x) -> IT -> x
and a corecursive definition looks like
ana : forall x . (x -> T x) -> x -> IT
In other words, recursion occurs by constantly pulling "layers" of type T off of values of IT while corecursion occurs by adding "layers" of type T on to seed values to build an IT.
We can convert IT to natural numbers by using cata
cata (\tx -> case tx of { Inr * -> 0; Inl n -> 1 + n })
and we can invert that conversion with ana
ana (\x -> case (x > 0) of { True -> Inl x; False -> Inr * })
and we can build the "infinite" structure omega with ana
They are different animals. Stack recursion/heap-based recursion or tail recursion are mechanism to implement recursion. Recursion and co-recursion are the usage of recursion with different way to manage the input data.
Recursion tends to just process the input data one by one until there's no more. Co-recursion would add more to the input data along the way.
Let's say I have a list of numbers. In each step of the recursion, a number is removed. If the number is odd, it's dropped. If the number is even, a randomly generated set of numbers is added to the input list, adding more to it. The recursion would take longer. Hopefully it will end if there's no more even number.
Can anyone help me understand how this is different from calculating something iteratively ?
I was reading SICP the other day where they introduced the idea that a process is fundamentally iterative if you just need a few variables to always keep track of the state of the computation.
So tail recursion is iterative and this looks the same.
So then why is it called recursion ?
With recursion, we can define something in terms of itself. For example, this is recursive:
f(n) = f(n - 1) ++ f(n - 1)
Here I'm using "++" to denote list concatenation. Unfortunately this 'diverges', ie. trying to calculate f(x) for any x will keep going forever. There are two ways around this; the first is generally just called "recursion", and it involves fixing a particular "base case", which cuts off the calculation before it becomes infinite, hence causing our function to terminate.
We can do this as follows:
f(0) = [NULL]
Where "[NULL]" is a list containing NULL. Now we can calculate f(x) for any positive x:
And so on. This kind of code can consume a hell of a lot of resources though, so we may prefer to only write a special kind of recursive function called tail-recursive. Here's a tail-recursive function:
g(n) = g(n-1)
g(0) = [0]
This "g" function uses recursion in the same way as "f", but notice that we can switch from calculating g(n) straight to g(n-1), ie. we only transform "n", not g(...). This means we can calculate g in constant space, since we just transform n again and again. Compare this to "f", where we need to calculate f(n-1) separately, then combine the results. Also notice that we don't know anything about the value of "g(x)" until we've finished going down the chain and hit "g(0)".
The other way of preventing divergence is corecursion, which the linked article describes. Here, we must generate a little bit of output before we're allowed to recurse. For example:
h(n) = cons(n, h(n+1))
This will not terminate, but it will give us some output; specifically:
h(0) = cons(0, cons(1, cons(2, cons(3, ...))))
We can't "finish" calculating this list, but we don't need to in order to use its elements. We say this function "coterminates", and that it generates "codata".
One interesting fact about co-recursion is that we can turn any diverging function into a coterminating one:
i(n) = Later(i(n + 1))
This will give an infinite codata structure "Later(Later(Later(Later(...))))". We're effectively breaking down our calculation into steps; after each step, if we're finished we can just give back the result. If we're not finished, we give back a "Later" containing the result of the rest of the calculation.
This forms a really nice structure called the Partial Monad :)
Can anyone say if the difference between recursion and corecursion is the same as the difference between recursive process and iterative process as described in SICP? They seem the same to me. It also seems to be related to the bottom up solving method of dynamic programming.
To me, it seems the same as the distinction made in SICP.
As for DP, I think it also comes very close to being the same as the above two. The one difference I can think of is that with DP you often have access to the entire build-up data structure you are creating, rather than only the latest (previous) level you have with an iterative process.
> Isn't this just the same as Dynamic Programming?
The two concepts are very related. I would say that corecursion is more low-level and more general. It's also true that DP is not really a computing term. DP is an optimization technique which, when performed using a computer, is often coded using corecursion.
See the comment by chriswarbo[1] for both an excellent explanation and some examples that I don't think we would call DP.
Co-recursion is a principled way to turn a top-down algorithm into a bottom-up one. For instance, in Haskell, it's easy to write the fibonacci series from its definition:
But this is ridiculously inefficient. Until now, the only "principled" way I knew of to transform this into something sensible was through memoization, but that's still very wasteful.With co-recursion in my toolkit, I can transform this into a form that generates the whole series as a stream:
And thanks to Haskell's lazy, garbage collected semantics, taking the nth element of the series (remembering to drop data I don't need) turns out to be equivalent to: Which is what we wanted. :-)