Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This problem was observed 20+ years ago with linear models used for protein structure prediction. For any given model of what described a properly folded protein, one could locate conformations of the same protein that were rated as folded even better than the correct conformation (I called them doppelgangers, but the name "decoy" is what caught on).

The statistical naivete of the field led to all sorts of inadvertent mixing of training and test set data which generated a lot of spurious claims for solving the problem. That is until one attempted to find those decoys and they were always found. This led to the creation of the CASP competition to weed this out and the field finally moved forward.

http://en.wikipedia.org/wiki/CASP

The key similarity to what I described above is that adversarial search is done posterior to the training of the deep neural network. That makes all the difference in the world IMO. These adversaries may just be strange, otherwise hard to reach bad neighborhoods in image space without using a roadmap. Or they may be an unvaoidable consequence of the curse of dimensionality.

http://en.wikipedia.org/wiki/Curse_of_dimensionality

But given that neural networks have a gradient, it doesn't shock me that it can serve as a roadmap to locate a set of correlated but seemingly minor changes to an example in order to flip its classification. Doing so is simply back-propagation with constant weight values to propagate the gradient to the input data itself - literally a couple lines of code.

IMO there are two interesting experiments to do next (not that anyone will take this seriously I expect, but ya know, hear me now, believe me later):

1. Characterize the statistical nature of the changes in input images and then use those summary statistics as the basis of an image altering algorithm to see if that can be used to flip the classification of any image on its own. If it can, be afraid, your driverless car may have blind spots. If not, then this is probably just a narrower form of overfitting.

2. If it's likely overfitting, attempt an expectation maximization-like fix to the problem. Train the network. Generate adversaries, Add them to the training set, train again and then lather rinse repeat until either the network can't be trained or the problem goes away.

Expensive? Yes. But you're Google/Facebook/Microsoft and you have lots of GPUs. No excuses...

Failing that, the above is on my todo list so I'm throwing it out there to see if anyone can poke holes in the approach.



Thanks for saying this. I can't really comment on your experiments (I;m not qualified) but you can be assured that some people are working in machine learning today specifically having learnt the lessons of pre- and post-CASP. I don't know that I agree CASP was founded specifically because people found decoys, but...

it was an special shock when I learned about ensemble methods (I think they were just called "combined servers" at the time) at CASP and saw that all our hard work (manual alignments, lots of expert analysis of models, etc) wasn't really better (far worse in fact) than a few simply trained ensemble systems that memorized what they were bad at and classified their predictions with the appropriate probabilities.

See also: http://www.nature.com/nchem/journal/v6/n1/nchem.1821/metrics... http://googleresearch.blogspot.com/2012/12/millions-of-core-... (note, 4 of the 6 projects awarded specifically involved physical modelling of proteins and the fifth was a drug-protein binding job) http://research.google.com/archive/large_deep_networks_nips2...

none of the above are coincidental: the first two links are specifically because I went to Google to use those GPUs and CPUs for protein folding and design and drug discovery. The third project is now something I am experimenting with.


> don't know that I agree CASP was founded specifically because people found decoys, but...

Here's an example of what drove my work back then:

Look at the energies and RMSDs (a measure of distance from the native structure) of melittin in these two papers:

Table 2 in http://onlinelibrary.wiley.com/doi/10.1002/pro.5560020508/pd...

and

Table 1 in http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1260499/pdf/biop...

In the first paper, the energy is higher, but the RMSD is lower. In the second paper, the RMSD is higher, but the energy is lower. How did this happen?

Well, in the first paper, phi/psi angles are set directly from a library of sequentially homologous dipeptides to pentapeptides that INCLUDES MELITTIN. So, by the time you get to tripeptides, you're nearly guaranteed to just be outputting the native conformation phi/psi angles over and over again. And this paper is just one of many to make basic mistakes like this.

As young turk back then, I got into a rather long and vigorous online argument with one of the founders of CASP who insisted the first paper was a partial solution to the protein folding problem. And I suspect that argument influenced the subsequent creation of CASP.

Anyway, it's been nice rehashing my post-doc glory days(tm), but we no longer have any excuses here. We have the tools, we have the technology...


> If it's likely overfitting, attempt an expectation maximization-like fix to the problem. Train the network. Generate adversaries, Add them to the training set, train again and then lather rinse repeat until either the network can't be trained or the problem goes away.

As I quoted in my other comment, the paper suggests doing exactly that.


I'm knowingly being a pedant and I apologize for that but they don't quite say that, rather they come awfully close to doing so. And I'm being a pedant because of all the low information sorts claiming this is a refutation of deep neural networks (it's not, well at least not yet).

"The above observations suggest that adversarial examples are somewhat universal and not just the results of overfitting to a particular model or to the specific selection of the training set. They also suggest that back-feeding adversarial examples to training might improve generalization of the resulting models."

20 years ago I did this for linear models for protein energetics (also known as knowledge-based potentials or force fields), adding the decoys then refitting the parameters ad nauseum. What I eventually arrived at was the invalidation of every single energy model and force field in use for protein energetics (yes I really reverse engineered just about everyone from Michael Levitt to George Rose to AMBER, CHARMM, and ECEPP). This was an unpublishable result according to my post-doc adviser at the time so it never got written up.

In retrospect, he was utterly wrong. So I am really curious what would happen here if this were attempted with these much more complex models.


So, the other interesting paper which I failed to cite was this one: http://www.ncbi.nlm.nih.gov/pubmed/24265211 in which we showed that rosetta needed to include bond angle terms to accurately model some proteins.

That said, I'm a bit surprised you found what you did about AMBER (and other force fields), or rather, that you didn't publish. The cornell et al force field was later acknowledged to have serious problems with protein folding, but a number of improvements have been made since then.

Anyway, I would have happily published that result with you (I worked with Kollman, have worked with Baker and Pande, and desperately want to see the force fields improve using machine learning). There was a guy at BMS who was working on this back in the day ('99-2000) who was using ML and the AMBER folks trashed him because they believed the force field's transferrability from small molecules to proteins was valid (in many ways it was, but it got some key details wrong).

Ifd you think there is a straightforward machine learning for force field problem that can dramatically improve ab initio folding with distbelief and exacycle, let me know. It shouldn't be hard to figure out my email address if you look at the papers I cited and do some basic set operations :-)


I talked to one of the authors of this paper at ICLR and he said that it wasn't really worth the time to compute the adversaries and train, though it did improve results. He said that in the time it took to generate adversaries and then train on them, the net was better off just training on more image data, since there is a near infinite supply of it. Perhaps if you didn't have an infinite dataset, this wouldn't apply.

Also, the really interesting thing was that adversaries generated for one network topology/data set were still adversarial even for other network topologys/data sets, which might imply that the nature of the adversaries is universal rather than highly specific to that exact network trained


Can you point to any literature on 2?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: