Actually, there's already a better way to perform initialization, based on the s...

iXce · on May 21, 2019

As far as I understand there is no quick magic algorithm to find them: you train the full architecture as usual the long and hard way, then you identify the right subnetwork and you can retrain faster from the architecture and initialization of just this subnetwork

L2R · on May 22, 2019

Based off of the results, you have to train a larger number of architectures to identify the right subnetwork.

hnaccy · on May 21, 2019

This paper had trouble getting this to work with lager models I believe.

https://arxiv.org/abs/1902.09574

p1esk · on May 21, 2019

You mean this one https://arxiv.org/abs/1903.01611 ?

hnaccy · on May 21, 2019

No. From my link:

>Additionally, we provide strong counterexamples to two recently proposed theories that models learned through pruning techniques can be trained from scratch to the same test set performance of a model learned with sparsification as part of the optimization process. Our results highlight the need for large-scale benchmarks in sparsification and model compression.

axiom92 · on May 21, 2019

It sounds very cool. This work also won the best paper award at ICLR 2019.