Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

From what I can tell the architecture is more important anyway and having smaller but more parameters gives the model more chances to figure out the optimal architecture on its own.


My best understanding is that architecture is predetermined, which determines the number of parameters up front?

I do think that, however, having shallower bit depths over time will require some slightly deeper networks to compensate, as a result. Sorta makes sense when you think about it a bit. :) <3 :DDDD :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: