Learning to write programs that generate images

wpasc · on March 27, 2018

This is really cool tech and very awesome work by deepmind.

But man does this scare me. I remember a quote: “You furnish the pictures and I’ll furnish the war.” – William Randolph Hearst, January 25, 1898. This was in the lead up to the spanish american war.

Can you imagine tech like this/tech like "deepfakes" being used today? Fake news that was text alone has done and is doing damage in elections around the world now. Imagine that armed with pictures?!?!

In a dueling NN architecture, many say the discriminator will be able to detect the fake images. I wonder is there a threshold that a produced image is just too damn close to a real picture that even an equally good NN that is discriminating can no longer differentiate? In the end, both real and fake images are just pixel values... what would we do then?

Cool tech, scary possibilities.

MrLeap · on March 27, 2018

This has already happened, albeit manually. See this clumsy representation: https://en.wikipedia.org/wiki/Adnan_Hajj_photographs_controv...

Will be interesting if it ever reaches the point where it can be automated and scaled. I predict a modernized repeat of the war of the worlds tipping point followed by ratcheted skepticism for all kinds of temporal simulacra.

Then 3d imaging will enter widespread consumer usage and prove to be very difficult to convincingly reproduce by neural networks, until it is. Trust will be restored in some kind of media until it's broken. Rinse and repeat.

opportune · on March 28, 2018

Here's another scary GAN proof of concept [0]. In this case, researchers transferred someone's face in real time to facial expression and mouth movements of public figures. Combined with DeepMind's new tech that seems to be able to produce human voice with believable candor and inflection [1], you could make some very convincing fake footage.

[0]https://www.youtube.com/watch?v=ohmajJTcpNk

[1]https://research.googleblog.com/2018/03/expressive-speech-sy...

gwern · on March 27, 2018

This isn't remotely near the state of the art for raw image generation; that would be something like ProGAN or pixel-CNN, which don't involve any reinforcement learning or paintbrushes, and already do photorealistic synthesis. Horse, barn.

The point of OP is that you're learning to generate little images in a much more difficult setup: you have to control some complex blackbox system (like a paintbrush robot) to try to generate an image with only crude success/failure feedback at the end of the sequence of actions. The hope is that by going through this intermediate environment, instead of generating an entire image in a single shot via convolutions, it'll be learning more abstract structure about what makes up a face etc, so hypothetically it could do things like rotate faces in 3D (whereas something like ProGAN only sees faces as 2D blobs so it can do things like add/subtract sunglasses or change hair color, but 3D transformations are beyond it). And, with this more abstract deeper understanding, it should be able to speed up learning in settings like robotics etc (instead of paintings, imagine they are videos of humans controlling pick-and-place robot arms); you can see this as one way of approaching unsupervised learning and providing primitives which a higher-level agent can learn faster from (somewhat like the GAIL architecture uses GANs for semi-supervised learning).

lainga · on March 27, 2018

It seems to me like many of the computer-generated MNIST digits involved retracing the same contours multiple times.

Is it possible to (a) filter out these duplicate strokes, (b) convert them to heavier-weight single strokes, or (c) change the training regime to not produce duplicate strokes?

I can see that being useful for e.g. a real robot with a limited amount of ink or lead (or time to draw each character).

nthngnss · on March 27, 2018

The reason for that is that I chose a particular brush from the set of available brushes ('dry brush'). Since MNIST digits are quite sharp and opaque, the agent tries to achieve this by retracing the contours. I guess the remedy is to pick an appropriate brush style or make the agent choose it.

lainga · on March 27, 2018

Neat! Thanks and welcome.

DanielBMarkham · on March 27, 2018

This is still a GAN, right? It's running an adversarial system at a level higher than pixel-poking.

gwern · on March 27, 2018

Depends on whether you see the glass half-full or half-empty. Is it a DRL actor-critic where the reward & critic happen to be half of a GAN, or is it a GAN where the generator happens to receive a RL-style loss instead of the normal discriminator loss? Actor-critic and GANs have always been hard to tell apart: https://arxiv.org/pdf/1610.01945.pdf

dbranes · on March 27, 2018

It's a neural network trained in an adversarial manner which generates things, so yes.

eli_gottlieb · on March 28, 2018

Yeah, ok, so you basically reinvented inverse-graphics analysis-as-synthesis, stuck DEEP NEURAL and the GOOGLE DEEPMIND(TM) brand on it, and now you're acting like it's the bee's knees.

I'm starting to understand how Juergen Schmidhuber feels.

nthngnss · on March 28, 2018

You'd be surprised to see that both "inverse graphics" and "analysis-by-synthesis" are mentioned in the paper

eli_gottlieb · on March 28, 2018

Yes, I saw that they are, but at that point, I'd have to ask where the novelty is besides transforming them from probabilistic problems to plain neural-network problems.

nthngnss · on March 29, 2018

I would call "inverse graphics" a task. One can solve that task following different strategies. We demonstrate one way that uses RL and GANs and gives reasonably good results. For Omniglot, for example, there are works by Lake et al. that employ probabilistic perspective but the amount of hand-engineering involved makes their approach hard to apply to other tasks

zombieprocesses · on March 27, 2018

"When trained to paint celebrity faces, the agent is capable of capturing the main traits of the face, such as shape, tone and hair style, much like a street artist would when painting a portrait with a limited number of brush strokes:"

That's interesting. Do we know how artists draw? Is it as "algorithmic" as the article lays it out? I don't draw so I always assumed it was more intuitive and personal rather than a "step by step" process.

Isamu · on March 27, 2018

Methods are very individualized and vary also by medium. Check out the "Manben" videos:

http://www.dailymotion.com/video/x65w5fu

It is eye-opening, even among fellow manga artists, to see how different sometimes their processes are.

Some may start with a definite sketch, others may go straight to ink with only the barest suggestion of a layout. Sometimes they struggle with expressions and may whiteout and re-ink (up to seven times in one of the videos.)

Some artists start inking with the eyes, some may start with an outline of the face. And so on.

codingdave · on March 27, 2018

There are a variety of methods. Some people will teach you formulaic approaches to drawing people/faces, and instruct you to always lay out the 'proper' measurements that most people fit, then just add detail. More traditional methods teach you to draw what you see, but focusing on the structural lines and forms of the person, while merging it with knowledge of anatomy, perspective, and lighting. And other methods are purely 'draw what you see', without additional context, trusting accurate copying to paper to look correct.

What any specific artist uses will vary greatly. But it usually falls into one of those three camps.

setr · on March 27, 2018

I think its more of, if you want to depict something with the minimal number of strokes, you really have no option but to look for the key, defining traits of the object. In that fashion, both the AI and the artist must operate similarly, simply due to the limitations implied by the task

But depicting those traits is another matter. You can render a chin meeting the hair in all sorts of ways; but your choices are limited to your aesthetic preferences, and your ability to draw that form.

Drawing is a highly mechanical process; choosing what/how to draw is a curated one.

derekp7 · on March 27, 2018

I think there is a bit of both. If an artist learned on their own, it may be more intuition (or, step-by-step, but they don't realize it because they don't think about the individual steps). But if you take a drawing class you will learn a lot of steps that you can reproduce.

A good striking example, do a video search for "two point perspective drawing", and look at some of the tutorials / demonstrations that come up.

Hoasi · on March 27, 2018

Intuitive and methodical drawing aren't mutually exclusive. You can derive a method from intuition. Skilled draughtsmanship is somehow technical but will always lack something when confined to purely methodical rendering. Most people who draw cannot exactly explain how they do it. And yet it is a completely learnable skill—albeit somewhat difficult to teach.

Ono-Sendai · on March 28, 2018

Haven't looked into the paper. However you don't need a NN to simulate painting. Basically it's a search problem.

See my results here: https://forwardscattering.org/post/42

nthngnss · on March 28, 2018

While it is true that one can simulate painting by search (in some sense), the problem is that (naive) search doesn't always work (we have an example in the paper). Moreover, training an agent has a benefit of fast test-time inference (i.e., you give an image, and it paints it almost instantly). Of course, we haven't achieved the ultimate goal yet but it's a step in that direction.

Ono-Sendai · on March 28, 2018

Thanks for the reply. Where in the paper is the example given? Skimmed it but didn't see it. (Edit: nevermind, see it now)

Fast painting is a benefit I guess. My search/painting program is very computationally intensive.

Edit: I think I see the point of the paper now. Unguided search is going to be difficult in high-dimensional search spaces like this. So the NNs become a hopefully-effective heuristic guiding the search.

nthngnss · on March 28, 2018

Yup, exactly! Btw, didn't realize those were your results (just mindlessly clicked on the link). Very cool stuff!

Ono-Sendai · on March 28, 2018

Thanks.

The (semi-) obvious next step is to do object/digit recognition with a Bayesian probability calculation, with probabilities bases on this image reconstruction process. In other words, we choose e.g. digits based on how likely they are to have been drawn to give the target image.

I have experimented a little with this idea, but with no successful results so far (plain old NNs still beat it).

cttet · on March 28, 2018

The main difference is that for search you need to specify an image to emulate it, while the RL method aim to generate new ones after observing some examples.

ultrasounder · on March 27, 2018

Discovered mypaint.org FTW!!!Looks awesome and extendable too

chillingeffect · on March 27, 2018

What kind of robot arm is that btw?

m3andros · on March 27, 2018

What a sweet site! May I ask what SSG you're using?