If you withhold a small amount of data, or even retrain on a sample of your trai...

kcorbitt · on Oct 28, 2024

I hadn't heard of isotonicregression before but I like it!

> it's good to create two models, one for likelihood of zero karma, and another expected karma, conditional on it being non-zero.

Another way to do this is to keep a single model but have it predict two outputs: (1) likelihood of zero karma, and (2) expected karma if non-zero. This would require writing a custom loss function which sounds intimidating but actually isn't too bad.

If I were actually putting a model like this into production at HN I'd likely try modeling the problem in that way.

Y_Y · on Oct 28, 2024

Did you dictate this? It looks like you typo'd/brain I'd "centered" into "censored", but even allowing for phonetic mistakes (of which I make many) and predictive text flubs, I still can't understand how this happened.

oli5679 · on Oct 28, 2024

I was thinking of censoring, maybe I should have said another word like floored.

The reason I think of this as censoring is that there are are some classical statistical models that model a distribution with a large mass at a minimum threshold, e.g. "tobit" censored regression.

https://en.wikipedia.org/wiki/Censoring_(statistics)

Y_Y · on Oct 28, 2024

Thanks for the explanation. I never paid much attention in my stats lectures so I deserve to have missed out on that term-of-art. I think the physics lingo would be to call it "capped" or "bounded" or "constrained".

oli5679 · on Oct 28, 2024

thanks, it's very understandable that you thought i was mistyping 'centred'.

CaptainFever · on Oct 28, 2024

I'm not the parent commenter, but whisper based dictation is getting pretty awesome nowadays. It's almost as good as sci-fi.

(Fully dictated, no edits except for this)

1024core · on Oct 28, 2024

I also thought that the commenter spoke "centered" and the speech recognition model output "censored".