Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> You don't need a large computer to run a large language model

While running tiny llama does indeed count as running a language model, I’m skeptical that the capabilities of doing so match what most people would consider a baseline requirement to be useful.

Running 10 param model is also “technically” running an LM, and I can do it by hand with a piece of paper.

That doesn’t mean “you don’t need a computer to run an LM”…

I’m not sure where LM becomes LLM, but… I personally think it’s more about capability than parameter count.

I don’t realllly believe you can do a lot of useful LLM work on a pi



Tinyllama isn't going to be doing what ChatGPT does, but it still beats the pants off what we had for completion or sentiment analysis 5 years ago. And now a Pi can run it decently fast.


You can fine-tune a 60mm parameter (e.g. distilBERT) discriminative (not generative) language model and it's one or two order of magnitude more efficient for classification tasks like sentiment analysis, and probably similar if not more accurate.


Yup, I'm not saying TinyLLAMA is minimal, efficient, etc (indeed, that is just saying that you can take models even smaller). And a whole lot of what we just throw LLMs at is not the right tool for the job, but it's expedient and surprisingly works.


it seems that BERT can be run on the llama.cpp platform https://github.com/ggerganov/llama.cpp/pull/5423

so presumably those models could benefit from the speed ups described in OP article when running on CPU


llama.cpp only support BERT architectures for embedding but not with classification heads - although there is a feature request to add that


ah I see, thanks, did not read the PR closely!


Some newer models trained more recently have been repeatedly shown to have comparable performance as larger models. And the Mixture of Experts architecture makes it possible to train large models that know how to selectively activate only the parts that are relevant for the current context, which drastically reduces compute demand. Smaller models can also level the playing field by being faster to process content retrieved by RAG. Via the same mechanism, they could also access larger, more powerful models for tasks that exceed their capabilities.


I've gotten some useful stuff out of 7B param LLMs, and that should fit on a Pi quantized.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: