Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I would apply an L1-regularized regression where the variables are simple 0-1 for the presence of the gene. The L1-regularization helps you deal with the high-dimensionality of the problem.

https://en.wikipedia.org/wiki/Lasso_(statistics)

Since these are ages, I wouldn't assume an underlying Gaussian distribution. Making that change isn't as hard as you think.

https://en.wikipedia.org/wiki/Generalized_linear_model

As Always: Consult your friendly neighborhood statistician



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: