Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Your terminology is inaccurate.

Threshold neurons like Pitts-McCulloch neurons are rarely used in artificial neural network architectures. They can't learn.

ANN's uses a nonlinear activation function that is differentiable (at least in practice). That small change makes huge difference.



That is not true, one can easily regularise the derivative, which is delta distribution, in some appropriate way, the easiest one being a 'triangle' centered at zero. That way one can actually easily train networks of McCulloch-Pitts neurons.


> They can't learn.

The strongest statement that might fit here is that they can't learn efficiently. Zero gradients just make learning slower (though I do agree that differentiability is something to strive for).

As a bit of an aside, in practice stochastic versions of algorithms assuming differentiability work on wide ranges of functions, and compositions of poorly behaved functions can be quite nicely.

For a couple [0] concrete examples:

(1) Throw the absolute value function into your favorite gradient/Newton's minimization routine. Blindly using differentiable techniques often works if a sub-gradient technique would work.

(2) Consider minimizing the magnitude of the smallest eigenvalue of the Jacobian matrix of your favorite function. Many of the intermediate components (e.g., trying to derive the eigenvalue with respect to matrix entries) are poorly defined, undefined, or have cusps and other nasty features. The composition is (under mild constraints) differentiable with non-zero gradients.

(3) Consider minimizing the absolute value of a step function. By using a wide difference quotient as an approximation of the derivative and feeding that into optimizers you'll still find the minimum near zero (See (1); it works similarly).

If the composite output is only constant on small regions in the input space (which holds if those neurons are modeling anything non-trivial), you can rig together something close enough to backprop to still learn efficiently.

[0] https://xkcd.com/1070/


I stand corrected. But the point is that artificial neurons are computationally simple.


> They can't learn

Is there a good (i.e. theoretical) reason for this?


No, because it's wrong. Threshold neurons are still differentiable almost everywhere, no different than Relus which are ubiquitous. They may not be very good activation functions but they don't prevent a network from learning.


Sorry, but I think I disagree...

If I understand the PM neuron, it's outputs are boolean. This means there's no backprop signal, despite differentiability, since the outputs are constant (thus, gradient zero) in any neighborhood, which in turns zeros out any learning signal you would want to backprop through them. So you need a different learning strategy than backprop to use them.

(see also: the 'dead neuron' problem/phenomenon with ReLU activations.)


One can regularise the derivative (a delta distribution) in several ways (e.g. a triangle at zero) and that is good enough (even from a theoretical perspective) to find an approximate gradient. Experimentally it is then possible to train deep neural networks with such non-linearities.


Non-differentiability, obviously




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: