Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Sharing the big GPU cluster with non-latency critical load is one solution we also explored.

For this work, we are targeting more on the problem of smaller models running SOTA GPUs. Distilled/fine-tuned small models have shown comparable performance in vertial tasks.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: