LLMs don't ingest text a character at a time. The difficulty with analyzing individual letterings just reflected that they don't directly "see" letters in their tokenized input.
A direct comparison would be asking someone how many convex Bézier curves are in the spoken word "monopoly".
Or how many red pixels are in a visible icon.
We could work out answers to both. But they won't come to us one-shot or accurately, without specific practice.
LLMs don't ingest text a character at a time. The difficulty with analyzing individual letterings just reflected that they don't directly "see" letters in their tokenized input.
A direct comparison would be asking someone how many convex Bézier curves are in the spoken word "monopoly".
Or how many red pixels are in a visible icon.
We could work out answers to both. But they won't come to us one-shot or accurately, without specific practice.