This is a MySQL limitation that they have fixed in recent releases (as the OP no...

masklinn · on July 10, 2012

> This is a MySQL limitation that they have fixed in recent releases (as the OP notes)

TFA notes that this is not fixed, the `utf-8` mysql encoding still isn't utf-8. And as TFA also notes related technologies (aka drivers) may not be compatible with it (the example he uses, mysql2 for Ruby, still hasn't had an official release supporting utf8mb4[0])

> it's true that Unicode is (relatively speaking) very new for such a fundamental technology

That's becoming quite hard an argument to swallow when encountering astral planes issues in 2012 when Unicode 2.0 was introduced in 1996.

[0] https://github.com/brianmario/mysql2/issues/249

mrj · on July 10, 2012

> > it's true that Unicode is (relatively speaking) very new for such a fundamental technology

> That's becoming quite hard an argument to swallow when encountering astral planes issues in 2012 when Unicode 2.0 was introduced in 1996.

I don't get your argument. MySQL was also released around that time and we don't call it "cutting edge" because we found a bug. There are bugs in old stuff all the time but (most) people don't throw a fit.

pyre · on July 10, 2012

Unicode is being called 'cutting edge' because it's no longer 'old hat.' Lots of things claim support for Unicode, but few (or none) support it well. Unicode isn't a software project, it's a spec/idea. It's like calling a Star Trek tricorder "cutting edge" because no one has implemented a fully-functional version. Sure the idea has been around for a while, but at this point there's no acceptable manifestations of that idea.

masklinn · on July 11, 2012

> I don't get your argument. MySQL was also released around that time and we don't call it "cutting edge" because we found a bug.

Not supporting astral planes and saying you're supporting utf-8 is not a bug, it's a lie.

crazygringo · on July 11, 2012

Honestly it's probably better they don't change the behavior of an existing MySQL character set. Who knows what software out there depends on it breaking on 4-byte characters, or whatnot.

Creating a new character set `utf8mb4` was the right thing to do, as annoying as it is. Just clearly label the `utf8` collation as 'deprecated' in the docs or something.

masklinn · on July 11, 2012

> Honestly it's probably better they don't change the behavior of an existing MySQL character set.

Or they could just have implemented it correctly to start with, considering unicode "support" was introduced in mysql 4.1.

In 2005.

> Who knows what software out there depends on it breaking on 4-byte characters, or whatnot.

Then again, mysql routinely drops and corrupts data anyway, I'm sure its "users" could have dealt with it corrupting data slightly less than before.