Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Empirically yes, I can consider a very deep fully-connected network, measure the gradients in each layer with and without skip connections, and compare. I can do this across multiple seeds and run a statistical test on the deltas.


Empirical studies are only useful until the system is mathematically understood. For example, I can construct transformer circuits where the skip connection (provably) purely adds noise.

I can also prove in particular cases the MLP's sole purpose is to remove the noise added from the skip connection.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: