Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

feels like large language models sucked all the air out of the room because it was a lot easier to scale compute and data, and after roberta, no one was willing to continue exploring.


No, there are mathematical reasons LLMs are better. They are trained with multiobjective loss (coding skills, translation skills, etc) so they understand the world much better than MLM. Original post discuss that but with more words and points than necessary.


GPTs also get gradients from all tokens, BERT only on 15% masked tokens. GPTs are more effective.


Call it a CLM vs MLM, not LLM vs MLM. Soon LMLM's will exist, which will be LLMs too...


T5 is LLM, I think first one of them.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: