wc counts bytes, to make it count characters use -m in the GNU version.

lifeisstillgood · on May 13, 2014

I think the point being made is that -m does not count characters, it counts multi-bytes. Or at least tries to. So the same Unicode point in utf-8 and utf-16 (and utf-32) could be very different strings of bytes. No way to tell unless you know before hand you are dealing with utf-8 or 16. Hence BOM, but no one likes that.

Its hard. And possibly we have to abandon tools like wc when we leave the Latin world.