Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If you withhold a small amount of data, or even retrain on a sample of your training data, then isotonicregression is good to solve many calibration problems.

https://scikit-learn.org/dev/modules/generated/sklearn.isoto...

I also agree with your intuition that if your output is censored at 0, with a large mass there, it's good to create two models, one for likelihood of zero karma, and another expected karma, conditional on it being non-zero.



I hadn't heard of isotonicregression before but I like it!

> it's good to create two models, one for likelihood of zero karma, and another expected karma, conditional on it being non-zero.

Another way to do this is to keep a single model but have it predict two outputs: (1) likelihood of zero karma, and (2) expected karma if non-zero. This would require writing a custom loss function which sounds intimidating but actually isn't too bad.

If I were actually putting a model like this into production at HN I'd likely try modeling the problem in that way.


Did you dictate this? It looks like you typo'd/brain I'd "centered" into "censored", but even allowing for phonetic mistakes (of which I make many) and predictive text flubs, I still can't understand how this happened.


I was thinking of censoring, maybe I should have said another word like floored.

The reason I think of this as censoring is that there are are some classical statistical models that model a distribution with a large mass at a minimum threshold, e.g. "tobit" censored regression.

https://en.wikipedia.org/wiki/Censoring_(statistics)


Thanks for the explanation. I never paid much attention in my stats lectures so I deserve to have missed out on that term-of-art. I think the physics lingo would be to call it "capped" or "bounded" or "constrained".


thanks, it's very understandable that you thought i was mistyping 'centred'.


I'm not the parent commenter, but whisper based dictation is getting pretty awesome nowadays. It's almost as good as sci-fi.

(Fully dictated, no edits except for this)


I also thought that the commenter spoke "centered" and the speech recognition model output "censored".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: