Let me ask you, with 10k commonly used characters doesn't that lead to shorter t... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		adsr on Jan 16, 2014 \| parent \| context \| favorite \| on: UTF-8 Everywhere Let me ask you, with 10k commonly used characters doesn't that lead to shorter texts? Kind of like how higher base numbers can encode larger numbers with fewer digits, in that case the longer encoding of UTF-8 could be made up for by using fewer characters. Or am I wrong about this assumption? As an example, suppose that there are one character that denotes the word 'house', if that single character is encoded using five bytes it takes the same amount of space as the english encoding.

Crito on Jan 16, 2014 [–]

That seems more than plausible to me. While the character 象 is two bytes longer than the character "f", it is five bytes shorter than "elephant".

IIRC the average word length in English is around 5 characters.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact