Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Actually, there's already a better way to perform initialization, based on the so called lottery ticket hypothesis [1]. I haven't gotten to the article, so I'll just regurgitate the abstract, but basically there frequently are subnetworks which may be exposed by pruning trained networks which perform on par with full sizes neural nets with ≈20% of the parameters and substantially quicker training time. It turns out that with some magic algorithm described in the paper, one can initialize weights to quickly find these "winning tickets" to drastically reduce neural network size and training time.

1. https://arxiv.org/abs/1803.03635



As far as I understand there is no quick magic algorithm to find them: you train the full architecture as usual the long and hard way, then you identify the right subnetwork and you can retrain faster from the architecture and initialization of just this subnetwork


Based off of the results, you have to train a larger number of architectures to identify the right subnetwork.


This paper had trouble getting this to work with lager models I believe.

https://arxiv.org/abs/1902.09574



No. From my link:

>Additionally, we provide strong counterexamples to two recently proposed theories that models learned through pruning techniques can be trained from scratch to the same test set performance of a model learned with sparsification as part of the optimization process. Our results highlight the need for large-scale benchmarks in sparsification and model compression.


It sounds very cool. This work also won the best paper award at ICLR 2019.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: