Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm not clear why you are blaming SGD. Maybe I missed the point. In principle SDG might well find the global optimal. The problem is this optimal is achieved only for training data. It could certainly perform worse on the test set. Maybe you are referring to the entire training process? The general idea is from the days of SVM's where the optimization method was convex.

Though personally I do find a lot this modern "experimental" research quite hokey. I don't think this is something academics should be getting research funding to pursue. This is engineers building intuition about how to tune their product.



In practise, how do you know if SDG has converged to a global optimum, and do you let it run long enough in practise until covergence?


You don't know. The point was even if you did get there it could still be an overfit model you don't want, since it's based on a training data set, not the true statistics of the distribution the samples come from.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: