Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

TLDR: node sucks


TLDR: The V8 engine can't (supposedly) encode Unicode codepoints that are over 16-bits in length, because it uses the UCS-2 encoding.


TLDR: v8 "sucks" (and doesn't support Unicode code points outside of the lowest ~64k characters).

Edit: v8 in general is pretty cool, but not supporting Unicode outside UCS-2 is pretty bad.


Most apps seem to "support" surrogate pairs by simply not being aware of them at all.

Good on the V8 developers for recognizing these conditions that their code didn't fully handle and refusing to muddle on through with broken processing.


It's v8's fault, and v8 does not suck.


Unicode 2.0 added surrogate pairs in 1996. Unfortunately, the first versions of both Java and JavaScript predated this and got strings horribly wrong, and now any conforming implementation of either is required to suck. The Right Thing would be for almost everyone to work with only combining character sequences, except for a rare few who need to know how to dissect one into its codepoints and reassemble them correctly (just as people don't normally need to extract high or low bits from an ASCII character).


No. Combining characters and NF(K)C/D normalisation rules are a different problem entirely - consider the "heavy metal umlaut" (ie. Spın̈al Tap) where there is no lossless conversion possible - only “n" followed by U+0308


They're facets of the same problem. I shouldn't routinely be dealing with either surrogates or combining marks; unless I have a specific reason, it's only an opportunity to make a mistake that hardly anyone knows how to troubleshoot. "n̈" should be an indivisible string of length one until I need to ask how it would actually be encoded in UTF-16 or whatever.


But that's the point - there is no such character. Given the Unicode consortium have added codepoints for every other bloody thing under the sun, I'm amazed that there isn't one for n-diaresis but there you are.

Add a small number of people who for artistic reasons decide that they want to make life hard (Rinôçérôse I'm looking at you) and you just have to accept that the length of your string might not equal the number of codepoints contained therein...


Damn you for being right. :)


Not even a little bit accurate.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: