Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Great question and my main point is less about up sampling the rare cases but more about the default loss function used in the model training might not directly align with the final business metric (which is the metric practitioners should care more about). As a result, it's important to align the both. For some algorithms, it's easier to incorporate different loss function, while for some others, it might not be the case. Over or under sampling is one fairly generally applicable way to tweak the loss function.

While I'm not an expert of the theory behind sampling, if you do find the need to tweak sampling to align the default loss function and the business metric, I would say doing grid search first, and validate the result with the business insight, e.g. if you find getting the rare cases right is much more important that getting the common cases right, does that align with the business insight?)



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: