While it is true that one can simulate painting by search (in some sense), the problem is that (naive) search doesn't always work (we have an example in the paper). Moreover, training an agent has a benefit of fast test-time inference (i.e., you give an image, and it paints it almost instantly). Of course, we haven't achieved the ultimate goal yet but it's a step in that direction.
Thanks for the reply. Where in the paper is the example given? Skimmed it but didn't see it. (Edit: nevermind, see it now)
Fast painting is a benefit I guess. My search/painting program is very computationally intensive.
Edit: I think I see the point of the paper now. Unguided search is going to be difficult in high-dimensional search spaces like this. So the NNs become a hopefully-effective heuristic guiding the search.
The (semi-) obvious next step is to do object/digit recognition with a Bayesian probability calculation, with probabilities bases on this image reconstruction process. In other words, we choose e.g. digits based on how likely they are to have been drawn to give the target image.
I have experimented a little with this idea, but with no successful results so far (plain old NNs still beat it).