Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The tokens don't have to be related to the task at all. (From an outside perspective. The connections are internal in the model. That might raise transparency concerns.) A single designated 'compute token' repeated over and over can perform as well as traditional 'chain of thought.' See for example, Let's Think Dot by Dot (https://arxiv.org/abs/2404.15758).


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: