Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Comparing the 13B model here https://huggingface.co/cerebras/Cerebras-GPT-13B to LLaMA-13B https://github.com/facebookresearch/llama/blob/main/MODEL_CA... you can see that in all of the reasoning tasks Cerebras-GPT lags behind. Any reason to use Cerebras instead of LLaMA? Doesn't seem like it.


Can the LLaMA weights be used for commercial products?


There are two aspects to it.

The first one is whether they would actually sue. The optics would be terrible. A similar situation occurred in the 90s when the RC4 cipher’s code was leaked. Everyone used the leaked code pretending that it was a new cipher called arc4random, even though they had confirmation from people that licensed the cipher that its output was identical. Nobody was sued, and the RSA company never acknowledged it.

The second one is related to the terms. The LLaMA weights themselves are licensed under terms that exclude commercial use:[0]

> You will not […] use […] the Software Products (or any derivative works thereof, works incorporating the Software Products, or any data produced by the Software), […] for […] any commercial or production purposes.

But the definition of derivative works is gray. AFAIK, if LLaMA is distilled, there is an unsettled argument to be had that the end result is not a LLaMA derivative, and cannot be considered copyright or license infringement, similar to how models trained on blog articles and tweets are not infringing on those authors’ copyright or licensing. The people that make the new model may be in breach of the license if they agreed to it, but maybe not the people that use that new model. Otherwise, ad absurdum, a model trained on the Internet will have content that was generated by LLaMA in its training set, so all models trained on the Internet after Feb 2023 will break the license.

IANAL, but ultimately, Meta wins more by benefiting from what the community contributes on top of their work (similar to what happened with React), than by suing developers that use derivatives of their open models.

[0]: https://docs.google.com/forms/d/e/1FAIpQLSfqNECQnMkycAp2jP4Z...


Unclear, likely jurisdiction dependent, almost certainly not if you need to operate world wide.


LLaMA is non-commercial


it lags behind because according to their blogpost it was trained on <300B tokens. LLaMAs as far as I know were trained on more than trillion


The LLaMa paper says 1 trillion for the smaller models (7B, 13B) and 1.4 trillion for the larger models (30B, 65B)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: