Using sin(x) or the other input features like x^2 goes back to making it too eas...

teraflop · on April 12, 2016

I managed to get to 0.01 loss from only x1/x2, using 3 hidden layers, L1 regularization, a bit of added noise, and some patience: http://i.imgur.com/Y3zKpJF.png

aab0 · on April 12, 2016

Yes, noise & regularization seem to be key here. I've gotten a 2-layer with 7/8 neurons down to 0.06 and dropping but only with noise & l1: http://playground.tensorflow.org/#activation=relu&regulariza... Final loss of 0.051. Interestingly, increasing noise from 10 to 15 destroys performance, loss of 0.47.

vanboxel · on April 13, 2016

Is it really "making it too easy" if you're applying your knowledge of the structure of the problem space to make it easier for the computer to solve? Certainly this isn't easy to do with every problem, but it seems like a better idea in general to start with parameters you suspect to be correct.

In the "swiss cake roll" the circular nature of the classes suggests using a sin or cos function, and the fact that they spiral out suggests also inputting magnitude information. Sure, you can just add more neurons that will end up computing the same thing, but we might as well give the computer a head start when we can.

Lerc · on April 13, 2016

I look at things like this as not "making it too easy", but rather, "time for a more difficult problem".

I'd quite like if you could define your own input patterns and data sets.