> You don't need a large computer to run a large language model While running ti...

mlyle · on April 1, 2024

Tinyllama isn't going to be doing what ChatGPT does, but it still beats the pants off what we had for completion or sentiment analysis 5 years ago. And now a Pi can run it decently fast.

jerrygenser · on April 1, 2024

You can fine-tune a 60mm parameter (e.g. distilBERT) discriminative (not generative) language model and it's one or two order of magnitude more efficient for classification tasks like sentiment analysis, and probably similar if not more accurate.

mlyle · on April 1, 2024

Yup, I'm not saying TinyLLAMA is minimal, efficient, etc (indeed, that is just saying that you can take models even smaller). And a whole lot of what we just throw LLMs at is not the right tool for the job, but it's expedient and surprisingly works.

anentropic · on April 2, 2024

it seems that BERT can be run on the llama.cpp platform https://github.com/ggerganov/llama.cpp/pull/5423

so presumably those models could benefit from the speed ups described in OP article when running on CPU

jerrygenser · on April 3, 2024

llama.cpp only support BERT architectures for embedding but not with classification heads - although there is a feature request to add that

anentropic · on April 3, 2024

ah I see, thanks, did not read the PR closely!

samus · on April 1, 2024

Some newer models trained more recently have been repeatedly shown to have comparable performance as larger models. And the Mixture of Experts architecture makes it possible to train large models that know how to selectively activate only the parts that are relevant for the current context, which drastically reduces compute demand. Smaller models can also level the playing field by being faster to process content retrieved by RAG. Via the same mechanism, they could also access larger, more powerful models for tasks that exceed their capabilities.

SoothingSorbet · on April 1, 2024

I've gotten some useful stuff out of 7B param LLMs, and that should fit on a Pi quantized.