Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Recommendation: Reach out to my colleague James Bergstra, and build out automatic hyperparameter selection. This will make your offering work off-the-shelf, which is what is necessary for it to see wider adoption.

Why? The real pain in the ass in training a deep network is the hyperparameter selection.

What is your learning rate? What is your noise level? What is your regularization parameter?

Choosing these values is a far bigger pain than almost everything else combined.

Doing a grid search is intractable. Random hyperparameter search is better. You can use a sophisticated strategy, like Bergstra et al have proposed.



I agree that the hyper-parameter selection is a huge pain, personally though, I am more familiar with the work of Snoek et al. [1] from NIPS in December last year. He even distributes a neat Python package that will perform Bayesian optimisation combined with MCMC [2] so that even people, like me, that are not yet familiar with Gaussian Processes can deploy it easily.

[1]: http://arxiv.org/pdf/1206.2944v2

[2]: http://www.cs.toronto.edu/~jasper/software.html



Yeah, it's a really good point.

I haven't played with automatic parameter selection much (but have been seeing more papers on it recently) so I hadn't really considered it all that closely.

While I'd like to give people a fair amount of control over model parameters if they want, it probably is very important that I make things as turnkey as I can. Shouldn't be too tough to hack something together and make it an option during training.

While I'm trying to start things off relatively simply, the overall goal really is towards allowing people to create models that act as parts of much larger systems, maybe larger neural nets themselves. A sort of genetic algorithm that spawns new neural networks with random parameters and random connections to previous networks could be kind of neat, and making the base elements of those types of architectures (a single fully connected deep net, for example) easily accessible is a first step towards that goal.


Presumably if you have a GPU backed cloud DBN. Hyper parameter selection is faster than one param per day. Also how to you choose the parameters to the hyper parameter tuner? I am never convinced these things work given no free lunch theorem.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: