Adversarial Reprogramming of Neural Networks

cs702 · on July 3, 2018

A trained deep neural net can be viewed as -- indeed, it is -- a program that accepts some input data and produces some output.

The idea here, at a very high level, is to use other neural nets that (a) accept different input data and learn to 'adversarially embed it' into the input data accepted by the first neural net, and (b) extract from the output of the first neural net the actual output desired by the attacker... without ever touching the first neural net.

The authors demonstrate adversarial neural nets that target a deep convnet trained to classify ImageNet data. They are able alter this convnet's function from ImageNet classification to counting squares in an image, classifying MNIST digits, and classifying CIFAR-10 images... without ever touching the convnet.

Great work.

Erlich_Bachman · on July 3, 2018

What are some of the deep/philosophical reasons for why it is so easy to construct an adversarial example for a contemporary vanilla deep CNN network? Why do individual pixels get so much power about what the network decides to classify its output as?

During training, each input sample is basically driving each weight to be adjusted to classify itself correctly, while also classifying other inputs correctly. The way the network should thus see its world, is that it can only be one of those samples. That the world cannot be anything else. Thus, it should be the easiest way, computationally-wise, for the network to learn the things that differ the most between the classes. In an adversarial example, those couple of different pixels cannot possibly be what is even mathematically much more different for that class, compared to other classes.

How does this happen? It it is easy to understand why it would be easy to fool a network that looks for a leopard couch by an image of a real leopard, because leopard colors and texture is what the network actually was looking for during training. The patterns of the fooling picture were in the input. Given that such a network is only a gross simplification of a real brain, it is easy to see that it can be fooled. But just some pixels? The network was not looking for those pixels during training. It was not optimized to look for them. Why would it ever treat them as having that much information? Does it optimize for random things more highly than for the actual classification result that affects its weights. Does it have so many pixels that it asserts random importantness to them, so that out of millions, there is always 1 or 2 that happen to decide so much about the overall result?

Is it because the network looks for the combination of certain parameters, and treats the exactness of a combination as the most important factor, more important than its global context? So that the combination of the adversarially-modified pixels look like having the most exact ratios between each other, even though their ratios compared to the rest of the pixels is not on par at all - and the network decides that the most exact combination has the most information? Then, why isn't this easily combated by regularization and stuff like dropout?

nerdponx · on July 3, 2018

leopard colors and texture is what the network actually was looking for during training

Eh. The network is finding regular patterns. It doesn't know the difference between actual leopard parts and arbitrary noise in the image. The only reason we believe it's learning something about leopards is that we have fed the network lots of different kinds of leopards in lots of different contexts and positions.

But there are still some soft invariants that apply to images of leopards. These networks are complicated dynamical systems. Adding visual noise that is dissimilar to any kind of visual noise found naturally in leopard photos amounts to an input outside the normal operating range of the system. The effect of something like that, in general, is unpredictable. The fact that these examples exist shouldn't be that surprising. The fact that the output is quantized to one prediction or another only exacerbates the issue.

Is it because the network looks for the combination of certain parameters, and treats the exactness of a combination as the most important factor, more important than it's global context?

Probably. A CNN is an ensemble of many local representations.

So that the combination of the adversarially-modified pixels look like having the most exact ratios between each other, even though their ratios compared to the rest of the pixels is not on par at all - and the network decides that the most exact combination has the most information? Then, why isn't this easily combated by regularization and stuff like dropout?

Someone more knowledgeable than me can probably answer this. I'm not an expert in CNNs. But hopefully the rest is helpful.

stochastic_monk · on July 3, 2018

In light of [0], there is both theoretical and experimental reason to believe that CNNs are mostly picking up fourier decomposition.

[0] https://arxiv.org/pdf/1711.11561.pdf

azernik · on July 3, 2018

Generally the answers I've seen boil down to: our ML models is still pretty primitive. Models make their decisions in ways that aren't super generalizable, have decision frontiers that don't reflect the actual problem space well, etc. An example of such an analysis, see the "why is it hard to defend" section of https://blog.openai.com/adversarial-example-research/, or for a more academic approach which should be a good hook into the broader literature, see https://openreview.net/forum?id=rk6H0ZbRb

red75prime · on July 3, 2018

In the light of [0], it seems that standard (not adversarially trained) NNs learn each and every feature which happens to correlate with a class label in a training set (including noise). So moving along adversarial gradient looks like noise.

Take my interpretation with a (huge) grain of salt.

[0] https://arxiv.org/abs/1805.12152

panarky · on July 3, 2018

> Why do individual pixels get so much power about what the network decides to classify its output as?

This paper isn't about tricking a NN to classify incorrectly. It's about getting the NN to perform a completely different task.

We introduce adversarial attacks that instead reprogram the target model to perform a task chosen by the attacker

Erlich_Bachman · on July 4, 2018

The question applies to this field, not limited to this specific paper. It's already clear from the abstract that this is paper is doing it's own thing. But it is still based on the phenomenon of adversarial examples.

cozzyd · on July 3, 2018

Can't wait to adversarially modify pedestrian crossing signs to stop signs.

cshimmin · on July 3, 2018

I appreciate that your comment was tongue-in-cheek, but note that what you propose is actually a case of a standard targeted adversarial attack (changing a label prediction from "ped x-ing" to "stop"). This is fundamentally different from what's being proposed in the paper, which is to repurpose an existing NN (and its output labels) to perform another task altogether.

I'm still trying to figure how it could possibly be useful... the authors suggest it could be used to "steal resources", but it seems a bit contrived.

cozzyd · on July 3, 2018

Well, if you could transform your problem into the "stop" vs. "ped x-ing" problem, you could let the future smart self-driving cars solve it for you. Maybe they have more computational resources than you can muster (but that sounds very contrived, even if you could somehow harness the computational power of a freeway full of future cars). Or you could imagine another very contrived case where some self-driving car company controls some important algorithm you need to solve your problem via a patent and it would be illegal to solve your problem otherwise (also very contrived...).

king07828 · on July 3, 2018

> how it could possibly be useful

1 create a GPU friendly cryptocurrency where the payload includes a two dimensional data set (or image) that is processed with a standardized neural network model

2 let programmers pay a mining fee to expedite execution

3 instead of GPUs doing nothing but redundant calculations and getting replaced by ASICs, mining GPUs can dual purpose mine and execute arbitrary code

Will it be efficient, probably not. It does kill two birds with one stone

nerdponx · on July 3, 2018

It would be useful to anyone looking to cause havoc, terror, and/or harm.

seaucre · on July 3, 2018

It could be the beachhead of a larger attack. You could use the model's authority to steal its data or gain access to other resources.

squidbot · on July 3, 2018

Is this effectively brainwashing?

ggggtez · on July 3, 2018

Don't ask yes or no questions. No it's not brain washing. Neural networks are not brains.