Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I admire and appreciate your concern for something that is missunderstood and ignored. However this webpage took way to long to say what is so great about utf 8.


Honestly, I think it is platform politics. *nix systems seem to prefer UTF-8, while UTF-16 is the default on Windows. Space and memory are cheap, so either encoding seems fine.

The bottom line is that UTF-8 is awkward to use on Windows, while UTF-16/wchar_t is awkward to use on Linux, simply because the core APIs make them so (there is no _wfopen function in glibc).


The other problem with UTF-16 is that it's much easier to pretend that 1 element = 1 character than with UTF-8.


It's not really politics. Microsoft made the choice for fixed-sized chars back when it was thought that 16 bits was enough for everyone. MS was at the forefront of internationalizing things, and probably still are. (Multilanguage support in Windows and Office is quite top class.)

Unfortunately, we need more than 16 bits of codepoints, so 16-bit chars is a waste and a bad decision with that insight. It seems unlikely that a fresh platform with no legacy requirements would choose a 16-bit encoding. Think of all the XML in Java and .NET - all of it nearly always ASCII, using up double the RAM for zero benefit. It sucks.

Was UTF-8 even around when Microsoft decided on 16-bit widechar?

Other platforms seem to have lucked out by not worrying as much as standardizing on a single charset and UTF8 came in and solved the problems.


"Was UTF-8 even around when Microsoft decided on 16-bit widechar?"

No, Thompson's placemat is from September 1992 and NT 3.1 from July 1993, but development on NT started in November 1989 (http://en.wikipedia.org/wiki/Windows_NT#Development)


This summary is excellent, and concise:

http://research.swtch.com/utf8




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: