Corecursion

Pitarou · on May 8, 2014

This is the missing piece from my mental toolbox. I wish I'd known about it years ago.

Co-recursion is a principled way to turn a top-down algorithm into a bottom-up one. For instance, in Haskell, it's easy to write the fibonacci series from its definition:

    fibonacci 0 = 1
    fibonacci 1 = 1
    fibonacci n = fibonacci (n - 1) + fibonacci (n - 2)

But this is ridiculously inefficient. Until now, the only "principled" way I knew of to transform this into something sensible was through memoization, but that's still very wasteful.

With co-recursion in my toolkit, I can transform this into a form that generates the whole series as a stream:

    fibonacci_series = 0: 1: more fibonacci_series
      where
        more f1: rest@(f2: _) = f1 + f2: more(rest)

And thanks to Haskell's lazy, garbage collected semantics, taking the nth element of the series (remembering to drop data I don't need) turns out to be equivalent to:

    fibonacci n = fib n 0 1
      where
        fib 0 a b = a
        fib n a b = fib (n-1) b (a+b)

Which is what we wanted. :-)

jgg · on May 9, 2014

Corecursion is useful for taking a recursive algorithm and transforming it to a stream-like output pattern.

But this is ridiculously inefficient.

Well...you might be surprised in the general case. Because of lazy evaluation, Haskell won't necessarily implode on non-TCO recursive functions (example: http://stackoverflow.com/questions/13042353/does-haskell-hav...), and will actually sometimes cause a stack overflow on "optimized" functions.

Until now, the only "principled" way I knew of to transform this into something sensible was through memoization, but that's still very wasteful.

I think "real" Haskell code usually favors operations of lists over recursive functions. That said, the standard way to transform your recursive structure into something "sensible" is to use a tail-recursive function. In basically any other functional language, you'd go with that approach.

To get the same "benefit" in Haskell, you'd have to force strict evaluation inside of a tail-recursive function. This prevents a thunk from causing problems. That said, Haskell doesn't always build up a normal stack on a regular recursive call.

Otherwise, you'd just use a list structure.

(Someone correct me if I've said something stupid.)

ref:

http://www.haskell.org/haskellwiki/Tail_recursion

Pitarou · on May 9, 2014

My remarks were addressed to people who don't need tail recursion or the benefits & drawbacks of lazy evaluation explained to them, so I think they went a little over your head.

I presented a recursive definition, a stream based definition, and a tail-call definition of the fibonacci function. In that toy example, it's easy to get between the three different forms, but in many cases the connection is far less obvious. We need principles that unite the different forms, and allow us to move between them. Co-recursion is one of those principles.

jgg · on May 9, 2014

so I think they went a little over your head.

I understand lazy evaluation and tail recursion fine. I interpreted your comment as presenting corecursion as the only logical alternative to naive recursive algorithms with or without memoization.

You've tacked on the part where you say the latter algorithm is equivalent (due to Haskell's evaluation) - I get that. I'm still not understanding what you mean by only knowing inefficienct, naive recursion in contrast to corecursion. In practice, I have rarely seen corecursion or naive recursion used, but maybe we read different code.

In that toy example, it's easy to get between the three different forms, but in many cases the connection is far less obvious. We need principles that unite the different forms, and allow us to move between them. Co-recursion is one of those principles.

Uh, okay.

Pitarou · on May 9, 2014

I once read a textbook introducing the theory of algorithms that started with the Fibonacci series. The author gave a formal definition, and translated this directly into a naive, recursive algorithm. Then he showed how to improve the algorithm with memoization. This was a real revelation to me: using this one simple, generally applicable transformation, all kinds of algorithms can be improved.

Then the author introduced the iterative Fibonacci algorithm, and formally proved that the iterative and recursive algorithms were equivalent. But the author failed to explain where that iterative algorithm came from: there was no general purpose tool like memoization that could be used to derive the iterative form from the recursive form. I still remember the intellectual disappointment.

Now, with co-recursion and a couple of other techniques, that gap is finally bridged. Using co-recursion, I can derive the stream algorithm directly from the recursive definition. And then I can apply deforestation techniques and tail-call optimization to derive the iterative algorithm from the stream algorithm. That's a pretty powerful intellectual toolset, don't you think?

jgg · on May 9, 2014

I once read a textbook

I think I've read that book too.

For the benefit of anyone following along at home: The unproven-but-accepted Church-Turing thesis states that any algorithm that can be done on a Turing machine (i.e., a Turing-complete language) can be represented as a recursive function and a function in the lambda calculus, and vice-versa.

And then I can apply deforestation techniques and tail-call optimization to derive the iterative algorithm from the stream algorithm.

How do you use deforestation and tail-call optimization to derive an iterative function from a stream in the general case?

You're also pulling a false comparison: you've jumped from a 'general' algorithm to "what a corecursive function is evaluated as under Haskell's semantics".

Corecursion builds a data structure. A TCO function, for example, won't produce that kind of output. The corecursive function could only be directly equivalent to the linear time, constant memory TCO function in a lazily-evaluated runtime (if that's true - tail recursive functions in Haskell can actually blow up the stack due to lazy evaluation).

Pitarou · on May 10, 2014

How do you apply deforestation ... in the general case?

How do you cut an iron bar in half with a pair of scissors?

Deforestation is a tool; not a universal truth.

You're also pulling a false comparison ... The corecursive function could only be directly equivalent to the linear time, constant memory TCO function in a lazily-evaluated runtime

But I'm using a lazily evaluated runtime. My code is Haskell. And in any case, I used deforestation in the last step to remove the lazily evaluated data structure.

If I understand you correctly, you're telling me that high-level mathematical formalisms work better in Haskell. Yes, they do.

gamegoblin · on May 8, 2014

Alternatively, making use of laziness and recursive definitions:

  fibonacci n = fibs !! n
  fibs = 1:1:zipWith (+) fibs (tail fibs)

Pitarou · on May 8, 2014

Your version is certainly pretty, and it's based on an insight into the nature of the fibonacci series

But the value of co-recursion is that you don't need that insight. When you expand out the definitions of `tail`, `!!` and `zipWith`, your code turns out to be the same as mine[1], so I consider that a big win for co-recursion.

[1]: At least I think it does.

shoki · on May 9, 2014

Not corecursive, but here's the SICP O(ln(n)) algorithm:

    fibonacci = fib 1 0 0 1
      where
        fib a b p q n
          | n == 0 = b
          | even n = fib a b p' q' (n `div` 2)
          | otherwise = fib a' b' p q (n - 1)
            where
              p' = p*p + q*q
              q' = 2*p*q + q*q
              a' = a*q + b*q + a*p
              b' = b*p + a*q

Osmium · on May 8, 2014

> But this is ridiculously inefficient.

I'm sorry, but I don't understand this. How is that inefficient?

gamegoblin · on May 8, 2014

Here is the call stack for 1, 3, and 5

The number of calls grows exponentially. Roughly 1.618^n, to be specific.

dustingetz · on May 9, 2014

Haskell is lazy and pure, why can't it auto memoize for us so we can code it like we would in math?

albertzeyer · on May 9, 2014

A while ago, I asked exactly that question:

http://stackoverflow.com/questions/5749039/automatic-memoizi...

I think the answer is basically that there is no good heuristic about what to store, how much and how long.

cabalamat · on May 9, 2014

> I think the answer is basically that there is no good heuristic about what to store, how much and how long.

In Python you can just put a @memoize decorator before a function you want memoized; would it be possible to have something similar in Haskell?

albertzeyer · on May 9, 2014

Yes, see here: http://www.haskell.org/haskellwiki/Memoization

Note that in Python, you also have the issue that you still need to specify some cleverness about how long something is stored and what is stored. Many example memoize implementations just store everything (e.g.: https://wiki.python.org/moin/PythonDecoratorLibrary#Memoize) but of course that is not a solution in practice.

gizmo686 · on May 9, 2014

I don't think the methods mentioned in the haskell wiki are quite equivelent to the python decorator, as they require you to modify the data structure itself to support memoization.

It should be possible to use UnsafePerformIO to create a generic function memoize function (at least for inputs that implelemt Eq or Ord), but that is almost definantly overkill and asking for space leaks.

nodejsguy · on May 9, 2014

If this was reddit, you would get gold for that reply

gamegoblin · on May 9, 2014

In the time it took me to do it by hand I am 100% positive I could have written a Python program to do it probably 3 times.

Yyyyep just tested it:

    def fibs(n, tabs=0):
        if n < 2:
            print "\t"*tabs + "1"
            return 1
        print "\t"*tabs+"fibs %d"%n
        a = fibs(n-1, tabs+1)
        b = fibs(n-2, tabs+1)
        print "\t"*tabs+str(a+b)
        return a+b

Took exactly 71 seconds to write. I probably spent 3 minutes on it by hand getting all the spaces right.

Aqwis · on May 8, 2014

It makes two recursive calls at each step.

alepper · on May 8, 2014

Try sketching the call graph of e.g. fibonacci 5.

wyager · on May 8, 2014

Alternatively,

     fibonacci n = (map fst $ iterate (\(a,b) -> (b,a+b)) (0,1)) !! n

Pitarou · on May 9, 2014

I think gamegoblin's was prettier.

wyager · on May 11, 2014

Got a really short one!

     fib a b = a : fib b (a+b)
     fibn = (!!) (fib 0 1)

LBarret · on May 9, 2014

did you read SICP (this is explained in the first chapters of the book) ? if not you're for a treat.

gamegoblin · on May 8, 2014

Some places I find myself using corecursion:

Minimax algorithm: https://en.wikipedia.org/wiki/Minimax (I actually generally use negamax, the non-corecursive version, but you should understand minimax first)

"Tying the knot" in Haskell

That is, it's possible to create circular lists in Haskell like so:

    let x = 1:y    -- x = 1 cons y
        y = 0:x    -- y = 0 cons x
    in x           -- return x

Which will return an infinite list of alternating 1's and 0's which only take up 2 ints (and pointers) worth of memory. Calling the millionth index merely runs in circles around the list until it's followed a million pointers.

EDIT: Minimax is mutual recursion, not corecursion, whoops.

zwegner · on May 8, 2014

> Minimax algorithm: https://en.wikipedia.org/wiki/Minimax (I actually generally use negamax, the non-corecursive version, but you should understand minimax first)

That's mutual recursion, not corecursion. Check the article out, corecursion has a much more... "mathematical" definition, relating to the types of initial data and final data.

By my quick reading, your second example might be considered corecursion (but I think is also just regular recursion). It might not be considered recursion at all, in this particular sense, as it's only one piece of infinite data, not a function that takes an input. I'm not the best algebraist though.

gamegoblin · on May 8, 2014

You're correct about the minimax.

The second example is corecursion, I believe. The article specifically mentions

> Corecursion can produce both finite and infinite data structures as result, and may employ self-referential data structures. Corecursion is often used in conjunction with lazy evaluation

XaspR8d · on May 8, 2014

Dang it. Every time I see something like this, I really get excited for this summer when I have time blocked off for learning Haskell.

tel · on May 8, 2014

Basically any unfold is corecursive.

Dn_Ab · on May 8, 2014

Analogies are lossy but one way to think of corecursion/recursion is that corecursion turns the seed into a maple tree and recursion eats the tree's branches and spits out a table leg or something. Corecursion is generative and (structural) recursion aggregates/builds. You corecursively generate a possibly infinite stream while you recursively turn a finite list of numbers into their sum.

Other places you've probably already seen corecursiveness: Power Series in calculus, automatons in computer science and in the close cousin that is the so called map in MapReduce TM.

Programming with just (primitive but on higher order functions) recursion and corecursion is the basis of Total Functional Programming which, while not Turing complete is still very surprisingly powerful.

oggy · on May 9, 2014

What are the restrictions on combining primitive recursive and co-recursive functions in total functional programming? Clearly, "repeat" is primitively co-recursive, and "length" is primitively recursive, but "length . repeat" is non-terminating.

tel · on May 9, 2014

length . repeat won't type either. Briefly, recursion and corecursion have the following types

    cata : (f a -> a) -> mu f -> a
    ana  : (a -> f a) -> a -> nu f

where `mu` is the "least fixed point" of a base functor f and `nu` the "greatest fixed point". In non-terminal, non-strict languages `mu f == nu f`, but in total languages they don't. We call types of the form `mu f` data and types of the form `nu f` codata. Data is guaranteed to be finite while codata is not.

oggy · on May 9, 2014

Thanks. Is there a way to type something like take 5 . repeat? Or more generally, how do you make use of codata in these languages?

tel · on May 9, 2014

You think of it like a generator. Before I wrote something like

    ana : Step f seed -> seed -> Nu f

but we might define `Nu f` using an existential to just be

    data Nu f = forall seed . (Step f seed, seed)

and then you can write an eliminator for this type which just handles each step. So, `repeat` might look like

    data ListF a x = Nil | Cons a x
    
    repeat :: a -> Nu (ListF a)
    repeat a = Nu (\x -> Cons x x, a)

and then `take` will be a parametric catamorphism on Nat which just repeatedly creates new layers and consumes them. Together these form a thing called a hylomorphism.

dllthomas · on May 9, 2014

My understanding is that there's nothing wrong with using codata - the problem is with recursing on codata ("problem" in that you're stuck generating codata). Something like "take n" can produce data when reading codata because the recursion happens partly on data (in this case, n).

I'm sure tel will let me know if I've made a misunderstood something.

tel · on May 9, 2014

That's pretty much exactly it! The other thing to note is that the reduction semantics of codata are to... just sit there until it's consumed.

If you use Haskell you get the idea that codata just expands and expands because of the behavior of the repl and laziness. Sometimes, a better intuition is that it's more like a Python generator which must be yielded explicitly (by a recursive algorithm recurring on some data).

tel · on May 12, 2014

Another way to think about it is that only recursion "drives" computation. Corecursion sets up a potentially infinite number of computation steps each of which can be chosen to be churned through via recursion.

crashandburn4 · on May 8, 2014

Am I misunderstanding this? reading all the examples, the difference between recursion and corecursion seems to be equivalent to the difference between tail recursion and stack recursion. Is that right?

can anyone post an example of corecursion that is stack recursion? or is there something else that needs to happen to qualify something as corecursion?

tel · on May 8, 2014

Those are just at different levels of meaning. I don't think there's a direct relationship between stack/tail recursion and recursion/corecursion.

To get a brief feel for the difference consider the "type function":

    T x = 1 + x

where 1 is the type with only one value and the type (X + Y) is the tagged union of types X and Y. We might decide to talk about the type "iterated T" as the type

    IT = T (T (T (T (T (T ...))))))
       = 1 + (1 + (1 + (1 + (1 + ...))))))

If we tag "left" values with Inl and "right" values with Inr and take * to be the only value of type 1, then values of IT look like

    Inl *
    Inr (Inl *)
    Inr (Inr (Inr (Inr (Inl *))))

which looks a lot like the Peano Naturals. Now, a recursive definition on IT looks like

    cata : forall x . (T x -> x) -> IT -> x

and a corecursive definition looks like

    ana : forall x . (x -> T x) -> x -> IT

In other words, recursion occurs by constantly pulling "layers" of type T off of values of IT while corecursion occurs by adding "layers" of type T on to seed values to build an IT.

We can convert IT to natural numbers by using cata

    cata (\tx -> case tx of { Inr * -> 0; Inl n -> 1 + n })

and we can invert that conversion with ana

    ana (\x -> case (x > 0) of { True -> Inl x; False -> Inr * })

and we can build the "infinite" structure omega with ana

    omega = ana (\x -> Inl x)

crashandburn4 · on May 8, 2014

Wow, thank you so much for your answer. it definitely helped me understand better!

ww520 · on May 8, 2014

They are different animals. Stack recursion/heap-based recursion or tail recursion are mechanism to implement recursion. Recursion and co-recursion are the usage of recursion with different way to manage the input data.

Recursion tends to just process the input data one by one until there's no more. Co-recursion would add more to the input data along the way.

Let's say I have a list of numbers. In each step of the recursion, a number is removed. If the number is odd, it's dropped. If the number is even, a randomly generated set of numbers is added to the input list, adding more to it. The recursion would take longer. Hopefully it will end if there's no more even number.

Sharlin · on May 8, 2014

I believe the following is a corecursive analog for a stack recursive factorial implementation in Python:

    def factorial(i = 1):
        yield i
        for j in factorial(i + 1):
            yield j*i

Not sure why anyone would want to implement it like that, though.

Dewie · on May 8, 2014

I guess an example of corecursion is an inductive definition of the natural numbers:

Let N be a set.

1. zero is in N.

2. If an element e is in N, then s(e) is in N.

You get: {zero, s(zero), s(s(zero)),...} (of course in any order, since sets are unordered).

You start from zero and build the whole infinite set from that starting point. Bottom up.

adamnemecek · on May 8, 2014

I posted this since I didn't know it had a name.

coolsunglasses · on May 8, 2014

You might consider taking a look at the recursion schemes section of my guide here: https://gist.github.com/bitemyapp/8739525

thewarrior · on May 9, 2014

Can anyone help me understand how this is different from calculating something iteratively ?

I was reading SICP the other day where they introduced the idea that a process is fundamentally iterative if you just need a few variables to always keep track of the state of the computation.

So tail recursion is iterative and this looks the same. So then why is it called recursion ?

Im a bit confused about this.

thewarrior · on May 9, 2014

Edit : Not "few variables" , i meant a fixed single set of variables.

chriswarbo · on May 9, 2014

With recursion, we can define something in terms of itself. For example, this is recursive:

    f(n) = f(n - 1) ++ f(n - 1)

Here I'm using "++" to denote list concatenation. Unfortunately this 'diverges', ie. trying to calculate f(x) for any x will keep going forever. There are two ways around this; the first is generally just called "recursion", and it involves fixing a particular "base case", which cuts off the calculation before it becomes infinite, hence causing our function to terminate.

We can do this as follows:

    f(0) = [NULL]

Where "[NULL]" is a list containing NULL. Now we can calculate f(x) for any positive x:

    f(1) = f(0) ++ f(0)
         = [NULL] ++ [NULL]
         = [NULL, NULL]

    f(2) = f(1) ++ f(1)
         = [NULL, NULL] ++ [NULL, NULL]
         = [NULL, NULL, NULL, NULL]

And so on. This kind of code can consume a hell of a lot of resources though, so we may prefer to only write a special kind of recursive function called tail-recursive. Here's a tail-recursive function:

    g(n) = g(n-1)
    g(0) = [0]

This "g" function uses recursion in the same way as "f", but notice that we can switch from calculating g(n) straight to g(n-1), ie. we only transform "n", not g(...). This means we can calculate g in constant space, since we just transform n again and again. Compare this to "f", where we need to calculate f(n-1) separately, then combine the results. Also notice that we don't know anything about the value of "g(x)" until we've finished going down the chain and hit "g(0)".

The other way of preventing divergence is corecursion, which the linked article describes. Here, we must generate a little bit of output before we're allowed to recurse. For example:

    h(n) = cons(n, h(n+1))

This will not terminate, but it will give us some output; specifically:

    h(0) = cons(0, cons(1, cons(2, cons(3, ...))))

We can't "finish" calculating this list, but we don't need to in order to use its elements. We say this function "coterminates", and that it generates "codata".

One interesting fact about co-recursion is that we can turn any diverging function into a coterminating one:

    i(n) = Later(i(n + 1))

This will give an infinite codata structure "Later(Later(Later(Later(...))))". We're effectively breaking down our calculation into steps; after each step, if we're finished we can just give back the result. If we're not finished, we give back a "Later" containing the result of the rest of the calculation.

This forms a really nice structure called the Partial Monad :)

thewarrior · on May 9, 2014

Thanks for the explanation.

PurplePanda · on May 9, 2014

Can anyone say if the difference between recursion and corecursion is the same as the difference between recursive process and iterative process as described in SICP? They seem the same to me. It also seems to be related to the bottom up solving method of dynamic programming.

midko · on May 9, 2014

To me, it seems the same as the distinction made in SICP.

As for DP, I think it also comes very close to being the same as the above two. The one difference I can think of is that with DP you often have access to the entire build-up data structure you are creating, rather than only the latest (previous) level you have with an iterative process.

leephillips · on May 8, 2014

In Clojure you can generate the Fibonacci sequence with

(def fibs (lazy-cat [0N 1N] (map + fibs (rest fibs))))

which is an example of (memory consuming) corecursion that I managed to sneak into two recent articles:

http://arstechnica.com/science/2014/05/scientific-computings...

and

http://lee-phillips.org/lispmath/

(I did not come up with this.)

gamegoblin · on May 8, 2014

Similarly in Haskell:

  fibs = 1:1:zipWith (+) fibs (tail fibs)

ilozinski · on May 8, 2014

Corecursive factorial in javascript: https://gist.github.com/i/f75f21f6c56eb55c414c

hofstee · on May 8, 2014

Isn't this just bottom up DP?

highwind · on May 8, 2014

This is usually how you codify a dynamic programming algorithm. Good to know that there's a name.

rooted · on May 8, 2014

Isn't this just the same as Dynamic Programming?

ggchappell · on May 9, 2014

> Isn't this just the same as Dynamic Programming?

The two concepts are very related. I would say that corecursion is more low-level and more general. It's also true that DP is not really a computing term. DP is an optimization technique which, when performed using a computer, is often coded using corecursion.

See the comment by chriswarbo[1] for both an excellent explanation and some examples that I don't think we would call DP.

[1] https://news.ycombinator.com/item?id=7721390

seabee · on May 9, 2014

This sounds like the algorithmic counterpart to proof by induction.

chriswarbo · on May 9, 2014

Actually proof by induction is equivalent to recursion. Corecursion is equivalent to, believe it or not, proof by coinduction ;)

signa11 · on May 9, 2014

so, generating fractals would be co-recursive then ? :)