The tokens don't have to be related to the task at all. (From an outside perspec...

The tokens don't have to be related to the task at all. (From an outside perspective. The connections are internal in the model. That might raise transparency concerns.) A single designated 'compute token' repeated over and over can perform as well as traditional 'chain of thought.' See for example, Let's Think Dot by Dot (https://arxiv.org/abs/2404.15758).