Why would that be annoying? It’s much easier to understand, predict and truncate appropriately than having to explain all of these different tokenization schemes to devs.
Yeah, everybody agrees on what a character is, right? It's just {an ASCII byte|a UTF8 code unit|a UTF16 code unit|a Unicode code point|a Unicode grapheme}.
Bytes are understandable but make no sense from a business point of view. If you submit the same simple query with UTF-8 and UTF-32, the latter will cost 4x as much.