Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I love to hate on U-net. It works but it's just so inelegant. That is not a true convolution and only works for particular 'patch' sizes bothers me to no end.

I am not super up to date with the field, but has anyone caught on to using 'wavenet' like architectures yet? That is, dialated convolutions.

You have to be a little clever to get residual connections to work properly, but it's a true convolution that works for any patch size, is super-parameter efficient, and captures the same multi-scale features U-net was designed for.

Anecdotally, I used such an arch for some (unfortunately proprietary) 3D imaging work and achieved some nice results.



> "It works".

Well that's sorta the point. Personally I'm not a super huge fan of creating a super specific network architecture and resulting in 2-3% difference in performance. Certainly if you're doing something where a configuration makes sense (LSTM for time series for example), but I think there needs to be a rethinking of the Grand Theory of Deep Learning Architecture TM.

And frankly I think a unsaid reason why U-net is so popular is that it does generalize reasonably well with limited data, which in many fields is not as massive as COCO.

I realize it's sorta asking too much (I both want a NN that works both out of the box, super easily, and doesn't require a TON of data), but I think that's where the current pains are for really explosive growth in AI.


> I think there needs to be a rethinking of the Grand Theory of Deep Learning Architecture TM.

strong agree. Although perhaps not so much a rethinking as a theory of all. Huge dearth of theory in the field. Daily practition involves regular use of black magic intuition for arch, problem posing and debugging. Weird times.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: