It's because you need to mess with embeddings or even train new heads on top of ...

		visarga on July 20, 2024 \| parent \| context \| favorite \| on: What happened to BERT and T5? It's because you need to mess with embeddings or even train new heads on top of a network to use it. LLMs just use tokens-in tokens-out, they don't classify with softmax over classes, they softmax over vocabulary tokens. LLMs are more convenient