Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You can fine-tune a 60mm parameter (e.g. distilBERT) discriminative (not generative) language model and it's one or two order of magnitude more efficient for classification tasks like sentiment analysis, and probably similar if not more accurate.


Yup, I'm not saying TinyLLAMA is minimal, efficient, etc (indeed, that is just saying that you can take models even smaller). And a whole lot of what we just throw LLMs at is not the right tool for the job, but it's expedient and surprisingly works.


it seems that BERT can be run on the llama.cpp platform https://github.com/ggerganov/llama.cpp/pull/5423

so presumably those models could benefit from the speed ups described in OP article when running on CPU


llama.cpp only support BERT architectures for embedding but not with classification heads - although there is a feature request to add that


ah I see, thanks, did not read the PR closely!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: