This problem was observed 20+ years ago with linear models used for protein stru...

dekhn · on June 19, 2014

Thanks for saying this. I can't really comment on your experiments (I;m not qualified) but you can be assured that some people are working in machine learning today specifically having learnt the lessons of pre- and post-CASP. I don't know that I agree CASP was founded specifically because people found decoys, but...

it was an special shock when I learned about ensemble methods (I think they were just called "combined servers" at the time) at CASP and saw that all our hard work (manual alignments, lots of expert analysis of models, etc) wasn't really better (far worse in fact) than a few simply trained ensemble systems that memorized what they were bad at and classified their predictions with the appropriate probabilities.

See also: http://www.nature.com/nchem/journal/v6/n1/nchem.1821/metrics... http://googleresearch.blogspot.com/2012/12/millions-of-core-... (note, 4 of the 6 projects awarded specifically involved physical modelling of proteins and the fifth was a drug-protein binding job) http://research.google.com/archive/large_deep_networks_nips2...

none of the above are coincidental: the first two links are specifically because I went to Google to use those GPUs and CPUs for protein folding and design and drug discovery. The third project is now something I am experimenting with.

varelse · on June 20, 2014

> don't know that I agree CASP was founded specifically because people found decoys, but...

Here's an example of what drove my work back then:

Look at the energies and RMSDs (a measure of distance from the native structure) of melittin in these two papers:

Table 2 in http://onlinelibrary.wiley.com/doi/10.1002/pro.5560020508/pd...

and

Table 1 in http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1260499/pdf/biop...

In the first paper, the energy is higher, but the RMSD is lower. In the second paper, the RMSD is higher, but the energy is lower. How did this happen?

Well, in the first paper, phi/psi angles are set directly from a library of sequentially homologous dipeptides to pentapeptides that INCLUDES MELITTIN. So, by the time you get to tripeptides, you're nearly guaranteed to just be outputting the native conformation phi/psi angles over and over again. And this paper is just one of many to make basic mistakes like this.

As young turk back then, I got into a rather long and vigorous online argument with one of the founders of CASP who insisted the first paper was a partial solution to the protein folding problem. And I suspect that argument influenced the subsequent creation of CASP.

Anyway, it's been nice rehashing my post-doc glory days(tm), but we no longer have any excuses here. We have the tools, we have the technology...

magicalist · on June 19, 2014

> If it's likely overfitting, attempt an expectation maximization-like fix to the problem. Train the network. Generate adversaries, Add them to the training set, train again and then lather rinse repeat until either the network can't be trained or the problem goes away.

As I quoted in my other comment, the paper suggests doing exactly that.

varelse · on June 19, 2014

I'm knowingly being a pedant and I apologize for that but they don't quite say that, rather they come awfully close to doing so. And I'm being a pedant because of all the low information sorts claiming this is a refutation of deep neural networks (it's not, well at least not yet).

"The above observations suggest that adversarial examples are somewhat universal and not just the results of overfitting to a particular model or to the specific selection of the training set. They also suggest that back-feeding adversarial examples to training might improve generalization of the resulting models."

20 years ago I did this for linear models for protein energetics (also known as knowledge-based potentials or force fields), adding the decoys then refitting the parameters ad nauseum. What I eventually arrived at was the invalidation of every single energy model and force field in use for protein energetics (yes I really reverse engineered just about everyone from Michael Levitt to George Rose to AMBER, CHARMM, and ECEPP). This was an unpublishable result according to my post-doc adviser at the time so it never got written up.

In retrospect, he was utterly wrong. So I am really curious what would happen here if this were attempted with these much more complex models.

dekhn · on June 19, 2014

So, the other interesting paper which I failed to cite was this one: http://www.ncbi.nlm.nih.gov/pubmed/24265211 in which we showed that rosetta needed to include bond angle terms to accurately model some proteins.

That said, I'm a bit surprised you found what you did about AMBER (and other force fields), or rather, that you didn't publish. The cornell et al force field was later acknowledged to have serious problems with protein folding, but a number of improvements have been made since then.

Anyway, I would have happily published that result with you (I worked with Kollman, have worked with Baker and Pande, and desperately want to see the force fields improve using machine learning). There was a guy at BMS who was working on this back in the day ('99-2000) who was using ML and the AMBER folks trashed him because they believed the force field's transferrability from small molecules to proteins was valid (in many ways it was, but it got some key details wrong).

Ifd you think there is a straightforward machine learning for force field problem that can dramatically improve ab initio folding with distbelief and exacycle, let me know. It shouldn't be hard to figure out my email address if you look at the papers I cited and do some basic set operations :-)

dwiel · on June 20, 2014

I talked to one of the authors of this paper at ICLR and he said that it wasn't really worth the time to compute the adversaries and train, though it did improve results. He said that in the time it took to generate adversaries and then train on them, the net was better off just training on more image data, since there is a near infinite supply of it. Perhaps if you didn't have an infinite dataset, this wouldn't apply.

Also, the really interesting thing was that adversaries generated for one network topology/data set were still adversarial even for other network topologys/data sets, which might imply that the nature of the adversaries is universal rather than highly specific to that exact network trained

cscurmudgeon · on June 19, 2014

Can you point to any literature on 2?