> You don't need a large computer to run a large language model
While running tiny llama does indeed count as running a language model, I’m skeptical that the capabilities of doing so match what most people would consider a baseline requirement to be useful.
Running 10 param model is also “technically” running an LM, and I can do it by hand with a piece of paper.
That doesn’t mean “you don’t need a computer to run an LM”…
I’m not sure where LM becomes LLM, but… I personally think it’s more about capability than parameter count.
I don’t realllly believe you can do a lot of useful LLM work on a pi
Tinyllama isn't going to be doing what ChatGPT does, but it still beats the pants off what we had for completion or sentiment analysis 5 years ago. And now a Pi can run it decently fast.
You can fine-tune a 60mm parameter (e.g. distilBERT) discriminative (not generative) language model and it's one or two order of magnitude more efficient for classification tasks like sentiment analysis, and probably similar if not more accurate.
Yup, I'm not saying TinyLLAMA is minimal, efficient, etc (indeed, that is just saying that you can take models even smaller). And a whole lot of what we just throw LLMs at is not the right tool for the job, but it's expedient and surprisingly works.
Some newer models trained more recently have been repeatedly shown to have comparable performance as larger models. And the Mixture of Experts architecture makes it possible to train large models that know how to selectively activate only the parts that are relevant for the current context, which drastically reduces compute demand. Smaller models can also level the playing field by being faster to process content retrieved by RAG. Via the same mechanism, they could also access larger, more powerful models for tasks that exceed their capabilities.
While running tiny llama does indeed count as running a language model, I’m skeptical that the capabilities of doing so match what most people would consider a baseline requirement to be useful.
Running 10 param model is also “technically” running an LM, and I can do it by hand with a piece of paper.
That doesn’t mean “you don’t need a computer to run an LM”…
I’m not sure where LM becomes LLM, but… I personally think it’s more about capability than parameter count.
I don’t realllly believe you can do a lot of useful LLM work on a pi