Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

From a stats perspective the cutoff is included in the coefficients. If you use a design matrix (add a column of 1s to your variables) you get in a non matrix notation (beta_01 + beta_1 X_1 +...) So the threshold can be considered beta_0.

In the software, you can get classification models to output class probabilities instead of class labels. You can then use whatever threshold you like for to transform those probabilities to labels.

You may see it refered to as "discrimination threshold". Varying that threshold is how ROC curves are constructed.



The threshold would be beta_0 on every case or only when you have subtracted the mean from your data?


You don't want to demean your dependent (response) binary variable. So you almost always want to keep beta0 to control for any imbalance in your dependent var.


I meant demeaning the independent variables. My understanding is that the beta_0 will have the meaning the curiousgal attach it only if you demean your independent variables.


I see. But I think after demeaning X, beta0 will just have a special meaning... log odds of the average case. Nothing more.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: