Now, all the advice in the Windows section - don't do this, don't do that, only and always do third - is lovely, but if you happen to care about app's performance, you will have to carry wstrings around.
Take a simple example of an app that generates a bunch of logs that need to be displayed to the user. If you are to follow article's recommendations, you'd have these logs generated and stored in UTF8. Then, only when they are about to be displayed on the screen you'd convert them to UTF16. Now, say, you have a custom control that renders log entries. Furthermore, let's imagine a user who sits there and hits PgUp, PgDown, PgUp, PgDown repeatedly.
On every keypress the app will run a bunch of strings through MultiByteToWideChar() to do the conversion (and whatever else fluff that comes with any boost/stl wrappers), feed the result to DrawText() and then discard wstrings, triggering a bunch of heap operation along the way. And you'd better hope latter doesn't cause heap wobble across a defrag threshold.
Is your code as sublime as it gets? Check. Does it look like it's written by over-enlightened purists? You bet. Just look at this "advice" from the page -
This marvel passes a constant string to widen() to get another constant string to pass to an API call. Just because the code is more kosher without that goddamn awful L prefix. Extra CPU cycles? Bah. A couple of KB added to the .exe due to inlining? Who cares. But would you just look at how zen the code is.
--
tl;dr - keeping as much text as possible in UTF8 in a Windows app is a good idea, but just make sure not to take it to the extremes.
That was an unfortunate example. The widen() in this case is absolutely unnecessary. The author even recommends using the L prefix for UTF-16 string literals inside of Windows API calls (but not on other platforms, where wchar_t isn't UTF-16):
> Do not use _T("") or L"" literals in any place other than parameters to APIs accepting UTF-16.
Except for that, you do make a good point. It's probably better to store some strings in memory according to the platform requirements, if the program can be shown to exhibit delays caused by string conversions.
"if you happen to care about app's performance, you will have to carry wstrings around"
If those strings are for the user to read, he's reading a million times slower than you handle the most ornate reencoding. Sounds like a premature optimization.
Not only that, but the time required to convert from UTF-8 to UTF-16 is negligible in relation to the time required to lay out the glyphs and draw them on screen. Premature optimisation indeed.
It's not a premature optimization. It's a manifestation of a different set of coding ethics which is just ... err ... less wasteful and generally more thoughtful.
Yup. I'd wish this ethics was more popular. I can understand that we "waste" countless cycles in order to support abstraction layers that help us code faster and with less bugs. But I think that our programs could still be an order of magnitude faster (and/or burn less coal) if people thought a little bit more and coded a little bit slower. The disregard people have for writing fast code is terrifying.
Or maybe it's just me who is weird. I grew up on gamedev, so I feel bad when writing something obviously slow, that could be sped up if one spent 15 minutes more of thinking/coding on it.
Yeah, I'll have to disagree with both of you. The "coding ethics" that wants to optimze for speed everywhere is the wasteful and thoughtless one.
Computers are fast, you don't have to coddle them. Never do any kind of optimization that reduces readability without concrete proof that it will actually make a difference.
15 minutes spent optimizing code that takes up 0.1% of a program's time are 15 wasted minutes that probably made your program worse.
Additionally: "Even good programmers are very good at constructing performance arguments that end up being wrong, so the best programmers prefer profilers and test cases to speculation."(Martin Fowler)
> Computers are fast, you don't have to coddle them
This mentality is exactly why Windows feels sluggish in comparison to Linux on the same hardware. Being careless with the code and unceremoniously relying on spare (and frequently assumed) hardware capacity is certainly a way to do things. I'm sure it makes a lot of business sense, but is it a good engineering? It's not.
Neither is optimization for its own sake, it's just a different (and worse) form of carelessness and bad engineering.
Making code efficient is not a virtue in its own right. If you want performance, set measurable goals and optimize the parts of the code that actually help you achieve those goals. Compulsively optimizing everything will just waste a lot of time, lead to unmaintainable code and quite often not actually yield good performance, because bottlenecks can (and often do) hide in places where bytes-and-cycles OCD overlooks them.
I think we are talking about different optimizations here. I'm referring to "think and use qsort over bubblesort" kind of thing while you seem to be referring to a hand-tuned inline assembly optimizations.
My point is that the "hardware can handle it" mantra is a tell-tale site of a developer who is more concerned with his own comforts than anything else. It's someone who's content with not pushing himself and that's just wrong.
--
(edit) While I'm here, do you know how to get an uptime on Linux?
cat /proc/uptime
Do you know how to get uptime on Windows? WMI. That's just absolutely f#cking insane that I need to initialize COM, instantiate an object, grant it required privileges, set up a proxy impersonation only to allow me send an RPC request to a system service (that may or may not be running, in which case it will take 3-5 seconds to start) that would on my behalf talk to something else in Windows guts and then reply with a COM variant containing an answer. So that's several megs of memory, 3-4 non-trivial external dependencies and a second of run-time to get the uptime.
Can you guess why I bring this up?
Because that's exactly a kind of mess that spawns from "oh, it's not a big overhead" assumption. Little by little crap accumulates, solidifies and you end up with this massive pile of shitty negligent code that is impossible to improve or refactor. All because of that one little assumption.
I agree that optimization for its own sake is not a good thing (though tempting one for some, including me), but there's a difference between prematurely optimizing and just careless cowboy-coding. Sometimes two minutes of thinking and few different characters are enough to speed code up an order of magnitude (e.g. by choosing the proper type or data structure).
Also, being aware of different ways code can be slow (from things dependent on programming language of choice to low-level stuff like page faults and cache misses) can make you produce faster code by default, because the optimized code is the intuitive one for you.
Still, I think there's a gap between "fast enough and doesn't suck" and "customers angry enough to warrant optimization". It's especially visible in the smartphone market, where the cheaper ones can't sometimes even handle their operating system, not to mention the bloated apps. For me it's one of the problems with businesses. There's no good way to incentivize them to stop producing barely-good-enough-crap and deliver something with decent quality.
For display purposes, UTF-8 vs. UTF-16 is going to be such a miniscule difference that it's not worth the potential portability bugs to try to optimize for speed. You're talking about at most 30000 characters of text on screen at once. If that's entirely stored in UTF-8, and entirely rendered in UTF-16, and the conversion takes an insane 100 cycles per character on average, you're still using less than 0.1% of a single core of a modern desktop CPU.
If you got into the 1%+ range, I could see justifying some attention to speed, but otherwise...
Less wasteful of computer time, but more wasteful of developer time. And, given that the comment is advocating a more complex strategy for using strings with different encodings rather than the simple one given in the story, probably more error-prone too.
The advocating strategy is simpler: UTF8 strings are a lot easier to handle than UCS2. What is complex is that windows API is inconsistent and more oriented toward UCS2.
You do realise that drawing the characters to the screen is an order of magnitude (at least) slower than grabbing some heap memory, doing a string conversion and freeing the memory, right? You don't even need to allocate memory - you can have a constant small thread-local buffer of, say, 1kb that you reuse for these conversions.
Take a simple example of an app that generates a bunch of logs that need to be displayed to the user. If you are to follow article's recommendations, you'd have these logs generated and stored in UTF8. Then, only when they are about to be displayed on the screen you'd convert them to UTF16. Now, say, you have a custom control that renders log entries. Furthermore, let's imagine a user who sits there and hits PgUp, PgDown, PgUp, PgDown repeatedly.
On every keypress the app will run a bunch of strings through MultiByteToWideChar() to do the conversion (and whatever else fluff that comes with any boost/stl wrappers), feed the result to DrawText() and then discard wstrings, triggering a bunch of heap operation along the way. And you'd better hope latter doesn't cause heap wobble across a defrag threshold.
Is your code as sublime as it gets? Check. Does it look like it's written by over-enlightened purists? You bet. Just look at this "advice" from the page -
This marvel passes a constant string to widen() to get another constant string to pass to an API call. Just because the code is more kosher without that goddamn awful L prefix. Extra CPU cycles? Bah. A couple of KB added to the .exe due to inlining? Who cares. But would you just look at how zen the code is. tl;dr - keeping as much text as possible in UTF8 in a Windows app is a good idea, but just make sure not to take it to the extremes.