Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
What's wrong with convolutional nets? [video] (techtv.mit.edu)
113 points by drewvolpe on Dec 22, 2014 | hide | past | favorite | 18 comments


Short, massively simplified, summary,

Vast improvement in unsupervised learning.

Walks through biases humans have in object recognition. Explains new structures to add on top of neural networks.

Basic backprop on MNIST with 30,000 labeled examples gives 1.7% error rate. Hinton's approach gets the same results with 25 examples.


With 25 examples on which the systems asks the labels (and unknown number of unlabeled examples).

... liked the progress on reverse computer graphics approach. A bit of personal perspective here, from the data that I saw (and labeled!), while working on hand gesture recognition project the reverse computer graphics felt like the only approach that could bring human level gesture recognition from 2d data. And I was really exited at some point when Nvidea had started a computer vision / gesture recognition group. It felt really right that a photographic-level computer graphics GPU company would try to do reverse computer graphics processor. Too bad it didn't go anywhere.


One of the few speakers I saw as an academic and said to myself "This guy is a time traveler. I need to really pay attention to what he's saying."

That was in 2001. A decade+ later and we're still trying to catch up.


This precisely describes the feeling I got when watching the video. He consistently provides explanations/examples that make complete sense in hindsight but are pretty hard to come up with in the first place. I feel like this video kinda deserves a "spoiler alert" of sorts because he just exposes many of the things and it ends up looking so obvious afterwards. Not obvious in a "yeah the brain certainly works like that" way, because we don't know that yet, but obvious in a "I really wouldn't be surprised if it worked like that" way.


And this is why I <3 the MIT CoCoSci group et al. Serious nerd-squee levels, here. The whole damn clique of labs have these simple, obvious, clever ideas that are just miles away of everyone else trying to nudge their error rates a few points lower.


Video buffers every few seconds for me, are there any mirrors?

Edit: found download link after getting some caffeine http://techtv.mit.edu/videos/30698-what-s-wrong-with-convolu...

Edit 2: Also, there is a class on Coursera where he shares much knowledge and intuition about neural networks. I personally took it and it was excellent. Unfortunately there are no repeating sessions, but even archived material is a goldmine. https://class.coursera.org/neuralnets-2012-001


I just pointed youtube-dl at the original URL and it detected and downloaded the embedded .mp4 just fine.


If I'm understanding some of the criticism, it is basically that we don't think "that's how our brain does it" (when it comes to convnets, pooling, etc.). But this begs the question: do we _have_ to duplicate what goes on in the brain? For example: we really don't know how the brain does multiplication; and if I were to guess, I'd say it has nothing to do with binary arithmetic and shift registers. But yet we've been able to teach a computer how to do it, using an algorithm that has (in all probability) no relation to how our brains work. So maybe convnets, pooling and other hacks work, just not how the brain works?


A lot of these sound like ideas that Jeff Hawkins has been pushing with his Cortical Learning Algorithm


I'm starting to think that Hawkins is all talk. Anybody can claim esoteric breakthroughs; but the proof is if you can implement something that'll, say, beat the others on ImageNet. Heck, pick a problem, any problem, and show that your approach is the best by _some_ criterion. Then we start making progress.


It's interesting since Hinton's approach seems to be more inspired by the computational geometry aspects of vision (reverse computer graphics, invariant representations) and working backwards towards the neural superstructures.

Hawkins' seems to be inspired by the biology itself, working from the "common computational substrate" hypothesis up to the cortical units required.

They both seem to meet in the middle at the need to figure out invariant representations for the intermediate features presented to each level of the hierarchy.

I do wonder what an argument between those two would look like concerning the applicability of back-propagation, which I remember Hawkins' deriding as totally artificial compared to the feedback structure of the actual neocortex.

Anyone more up to date on the state of the argument regarding that?


Backpropagation is just calculating the gradient of a multilayer perceptron. You do not even have to calculate it, you can use autodiff for it: http://users.cecs.anu.edu.au/~jdomke/courses/sml2010/07autod... (pdf).

They both seem to neglect time (internal dynamics). I haven't seen either of them coming up with a model as that from Izhikevich with polychronization: http://www.izhikevich.org/publications/spnet.htm. If we would be able to make one of such ideas computationally useful, things would become really interesting.


Interesting, thanks for the links!

I've done your typical ANN 101 training in the past so have a good mental model for back-propagation. Modelling the actual nonlinear dynamics of "realistic" neural networks seems like an obvious path of research but I know how daunting it is. It seems like every tiny bit we can push forward our tools for understanding complex non-linear systems should pay large dividends across so many different computational fields (fluid dynamics, QFT, economics, ..., everything?)

I'll have to read Izhikevich's paper, seems like a unique line of research.


Absolutely nice to check his work. He founded a startup, Brain Corporation (http://www.braincorporation.com/) recently.

Coincidence detection where delays are playing a functional role is one of the things that I find interesting (as well as the fact that there are more polychronous groups than neurons). The other thing is the emergence of gamma waves. I would be surprised if these do also not have some functional role (although it might be just as well the humming of our biological processor). :-)

I wish I was brave enough to start experimenting with different neural networks. For now I am on the roll "Bayesianfying" everything I encounter. Even the Hough transform that Hinton is so fond of in this talk. :-)


If I'm following this presentation correct it seems like it would be relatively simple to add time into these models by using the previous frame's guesses as priors for this frame.

He offhandedly mentioned something once that may have been this, but there wasn't enough context to be sure this is what he meant. Still, with the general idea of this reliable high-level affine invariance it doesn't seem hard to imagine how to convert this to a temporally-aware approach, at least at a basic level.


At the end of his talk he answers a question about audio. A student of him has worked on it and he expressed admiration for that student. He didn't come across as if he solved that problem as well. :-)

Time is a tricky bastard. It is one thing to incorporate some Markovian dynamics in an artificial network, it is another thing to cope with dynamics as we know it. You might be interested in for example something like what Ralf Der is studying: http://www.informatik.uni-leipzig.de/~der/ (Tishby is working on this as well: https://www.cs.purdue.edu/homes/spa/venice08/docs/Venice-Tih... (pdf, slides)).

Two things I learned from his talk:

* If you have (human-like) part-based representations you need linear relationships to build up the whole again.

* Routing is key.

With dynamics involved, say movie scenes, or your hand moving in front of your own eyes, we might postulate that we will first discover linear relationships as well. A car disappearing temporarily behind a fence on the right we will expect to appear on the other side at the left.

Routing in the brain that corresponds with dynamic behavior, is probably something else than the time slicing / windowing we are all so familiar with in machine learning. It also goes beyond a simple central pattern generator for locomotion, which is not a routing problem at all (but just a clone of the outside frequency pattern with dynamics between neurons). "Real" dynamics is about being able to put items into slots, to learn a (visual) grammar. The thought that routing is key seems also be shared by Schmidthuber who created LTSM only to be able to route errors in a more sophisticated way through a network.


The problem is that Hawkins hasn't really found a comparable alternative. He's trying to throw away decades of progress in mathematics and reinvent the learning algorithms, which is great, but he hasn't succeeded (yet)


Does anyone have a link to the presentation itself (e.g. ppt/pdf)?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: