Machine learning gives performance without theoretical undertstanding. Norvig discusses Chomsky on machine learning in linguistics: http://norvig.com/chomsky.html
> I mean actually you could do physics this way,
instead of studying things like balls rolling down frictionless planes, which can't happen in nature,
if you took a ton of video tapes of what's happening outside my office window, let's say,
you know, leaves flying and various things,
and you did an extensive analysis of them,
you would get some kind of prediction of what's likely to happen next,
certainly way better than anybody in the physics department could do.
Well that's a notion of success which is I think novel,
I don't know of anything like it in the history of science. [from the linked transcript]
>Machine learning gives performance without theoretical undertstanding.
that is good experimental data and the role of theorists here is to look how that performance achieved and why. For example, one can reasonably suspect that there is a good reason why the kernels in a well trained image recognition deep learning net do look like receptive fields of neurons in visual cortex. I'm pretty sure that there is some kind of statistical optimality in that, something similar to like normal distribution is maximum entropy distribution for a given standard variation. The same way i'd guess Gabor of neuron receptive field is something like maximum entropy on the set of all possible edges or something like this. The point here is that the great success of deep learning generates a lot of very good data for theorists to consume. You can do only so much theory without good experimental data, and in the decades before the availability of computing power (and resulting success of deep learning) there wasn't that much of the computer vision theory advances to speak about, really.
Here's a problem: you can argue that Gabor filters arise because we design neural nets to encourage them. Gabor filters mostly arise in CNN or things otherwise regularized to be like CNNs. Convolutional layers are a form of regularization that restrict the space of models that a network can conform to. The Gabor filters are learned but none of this is evidence they are globally "optimal" given that a human manually decided whether or not to include the presence of convolutions.
It also goes without saying that the phrase "statistically optimal" is meaningless in this specific context. You can claim they are a part of minimizing the cost function, but, again, you have to be very careful about the chicken and egg problem, because humans are the ones who manually craft the cost function.
You might find this 1996 Nature paper by Olshausen & Field interesting. In it, they describe how a coding strategy that maximizes spareness when representing natural scenes is enough to produce a family of localized, oriented, bandpass receptive fields, like those found in the early visual system of humans.
This fits in with my point: they imposed several restrictions about the model space: maximizing sparseness, and they also make several linearity assumptions.
those papers should be, if not 101, at least 201 for neural net track as it would help to establish common framework and basis for thinking, talking and analysis of neural nets machinery.
>you can argue that Gabor filters arise because we design neural nets to encourage them. Gabor filters mostly arise in CNN or things otherwise regularized to be like CNNs.
and do we know why? Usually it would mean some optimality. It should be relatively simple math here (back at the time at our University it would be given to a student as a thesis project and couple months later we'd have it), and that would give us 2 things - insight into biological visual cortex (which we suppose follows some optimality too and know we would have a very good candidate for the one) as well as to allow to start some primary convolutional layers with the (optimal set of) Gabors instead of going through learning them. Actually some of the best results i saw 15-20 years ago were produced by the simulation of visual cortex through such construction. And now image trained deep learning nets converge to the same.
No, we don't know why. You also have little basis to claim the biological visual cortex is optimal. It certainly works, which is enough for people who draw inspiration from it.
that is my point - we have very good suspicion and experimental data (successful deep learning CNN as well as visual cortex) that it is optimal, at least in some very wide class if not global, and that warrants investigation for a proof of it. Proven optimality would be very telling, especially for biological visual cortex. Even if optimality doesn't happen for CNN Gabor, there may happen to be discovered reasons for why not, and thus that would help to construct even better, probably optimal, approach.
Suspicion and empirical evidence cannot prove something is optimal. You cannot exhaustively empirically search an infinite space of models. You are seriously misunderstanding the definition of "optimal."
I am not seeing the chicken and egg problem. Isn't it always the case that when we consider the optimality of something it depends on some definitions?
The chicken and egg problem is that we design neural nets in order for Gabor filters to show up. If you used a different neural net architecture choice, they wouldn't show up. So the presence of Gabor filters indicating optimality is sort of begging the question.
In practical terms, performance without understanding can lead to highly surprising/counter-intuitive results when algorithms are applied to real-life problems. This doesn't matter much if you're doing movie suggestions or something like that, but it does matter in many other areas that could benefit from AI/ML.
But this is what the human brain is doing. A large complex statistical analysis of video that provided accurate predictions would contain an understanding of physics, just as our brain does. What you are essentially doing with this is creating a new researcher out of thin air, asking him to work out how to understand video and then when he/she/it succeeds not bothering to ask him how it was done. Then tossing the knowledge in his head away because you didn't come up with it.
> I mean actually you could do physics this way, instead of studying things like balls rolling down frictionless planes, which can't happen in nature, if you took a ton of video tapes of what's happening outside my office window, let's say, you know, leaves flying and various things, and you did an extensive analysis of them, you would get some kind of prediction of what's likely to happen next, certainly way better than anybody in the physics department could do. Well that's a notion of success which is I think novel, I don't know of anything like it in the history of science. [from the linked transcript]