Let me ask you, with 10k commonly used characters doesn't that lead to shorter texts? Kind of like how higher base numbers can encode larger numbers with fewer digits, in that case the longer encoding of UTF-8 could be made up for by using fewer characters. Or am I wrong about this assumption?
As an example, suppose that there are one character that denotes the word 'house', if that single character is encoded using five bytes it takes the same amount of space as the english encoding.
As an example, suppose that there are one character that denotes the word 'house', if that single character is encoded using five bytes it takes the same amount of space as the english encoding.