If you find yourself using formatted strings often, e.g. for logging, then it's also worth considering another approach: just dump your message and arguments to some lightly structured format, like JSON, instead. In my testing, rapidjson can produce structured JSON more quickly than even the good formatting libraries, like CppFormat, can produce a string.
The simple POSIX write/read works nicely enough for IO. Very versatile and damn simple. No "oop" necessary.
For string parsing, people use all kinds of things from regex to deserializers. Sometimes we even use simple string tokenisation function. No one ever uses scanf or cin.
The simple POSIX write/read works nicely enough for IO – where simple includes the loop to check the return code and then errno for cases that should be retried, ( EINTR, EAGAIN, EWOULDBLOCK, depending on what you are doing and your OS), adjust your arguments for how much data actually got sent or read, and then retry. Also be aware that depending on your file descriptor you might be making storms of tiny context switches, interrupts, and IOs.
Old Unix programmers handle that with about as much thought as breathing, but only because they were young unix programmers once.
Copying the bytes one more time isn't the worst thing that you will ever make a computer do. The stdio.h level is pretty nice from the standpoint of keeping your foot un-blown-off.
That's true - you never see scanf outside beginner exercises. stdio is definitely hard to use safely for input. In this library, I explore using operator() for input, but error handling is challenging.
> print("hello",who,"answer",answer,'\n'); -- This is cute, ... but what if you need an integer in hex, or a floating-point number to a particular precision, and so forth?
Then you can use a helper function to convert them to string objects first.
If your string class is smart enough to do small-string optimization (SSO), then this won't even need heap allocation, so the overhead is really very minimal.
This is basically what I do for my string library. But I also have a special format mode for things like UI localization:
print(tr("{1} is the value of {0}\n"), format{name, value});
//tr returns "{0}の値は{1}です\n" for Japanese locale
This goes further than printf by allowing you to very easily reorder the arguments. (omitting the indexes and using just {} makes them sequential.) [I've heard printf can do this too, but I don't know if that's standardized. I've never seen anyone do it in the wild.]
The format{} builds a list of values [I convert them to strings but you could use an any type if you wanted), and when print() gets a const format& as a type, it applies the list to the current string before it. It would be a trivial exercise to extend the token syntax to support formatting rules, eg "{0,-4h}" or what not.
Further, adapters can be added outside the library. So for SQL, one can have bindings via:
database.execute("select * from user where name=? and age >= ?",
escape{name, minimumAge});
Still, I'll freely confess that this isn't in the spirit of the C++ standard library: you pay a tiny bit of performance overhead in return for niceness. But unless your application is I/O bound to string formatting, I strongly believe it's worth the (minimal) cost.
String interpolation would make C++ print statements even nicer, but that's unlikely to ever arrive.
"Conversions can be applied to the nth argument after the format in the argument list, rather than to the next unused argument. In this case, the conversion specifier character % (see below) is replaced by the sequence "%n$", where n is a decimal integer in the range [1,{NL_ARGMAX}], giving the position of the argument in the argument list. This feature provides for the definition of format strings that select arguments in an order appropriate to specific languages"
It's another time when I see translating functions indexing text by... text itself, instead of some kind of numeric ID. Since we're talking about performance, I wonder why people write it this way.
Just a simple hash table, or if you prefer, a red-black tree. The index where the string was found would be the same between the native list and locale string file.
I like this style because a) if the locale file is deleted you don't end up with no text in your app, b) you don't need a gigantic pool of cryptic IDs, c) it's more intuitive what you're trying to print in the source code.
print(tr("Warning: changing the video driver requires a restart to take effect"));
//vs
print(tr(TR_WARN_VIDEO_DRIVER_CHANGE_RESTART));
You'll create the English locale file anyway for others to translate for you, and you can even use that. The two major downsides are a) sometimes the same sentence is used in two areas and has the same translation in one language, but different translations in another (very rare), and b) if you change the text wording later, you have to edit it in the source code instead of in a separate English locale file.
> Since we're talking about performance, I wonder why people write it this way.
You're doing something horrendously wrong if your application spends even 0.1% of its time looking up strings to put into GUI windows.
> a) sometimes the same sentence is used in two areas and has the same translation in one language
I suspect it's not that rare of a thing, it would explain at least some of the bad translation cases I've seen, that are the reason I never use software in my native language if I can avoid it.
> b) if you change the text wording later, you have to edit it in the source code instead of in a separate English locale file.
And, if I understand correctly, in every other locale file too. This tingles my DRY sense in a bad way.
> You're doing something horrendously wrong if your application spends even 0.1% of its time looking up strings to put into GUI windows.
True. Though you can use this reasoning for pretty much every single thing in the source code, and this way lies Java and XML. Premature optimization may be bad, but I still try to always pick the faster option when it doesn't require too much work :).
> And, if I understand correctly, in every other locale file too. This tingles my DRY sense in a bad way.
Not as long as your locale file is ordered the same for each region. Then the index match on the English string tells you which line to read from.
> Premature optimization may be bad, but I still try to always pick the faster option when it doesn't require too much work :)
Certainly. It's definitely a balancing act. But I think I give up maybe ~5% total performance for a ~500% improvement in code clarity. Moving to Java and things like that would increase the overhead by an order of magnitude while probably not helping readability at all (maybe just a little.)
Of course, if I were trying to parse a 20GiB text file, I'd probably be working on it via mmap() over a char* array :D
I like it. There's definitely an opportunity here. `iostreams` is stateful and using it for formatted I/O has a faint masochistic flavor. `printf()` is fine, but recall that the compiler writers had to learn to critique the comparison between formatting characters and remaining arguments for a reason. What you want is to smack things together into an output pattern that's good enough. The leverage of callability is clearly a point in favor.
This is very nice library with very good and simple concepts.
One thing still bugs me. Chaining adds space in between the tokens. This is not always desirable. Adding another function out would solve this (outs would read then outs(pace)).
Edit: I noticed while readying the test cases that it is possible to change the separator using outs.sep(' ');
I am now wondering how nice it would play with multi threading.
gcc at least also had a major issue with streams and multithreading because of some global locale locks. Not sure if it was fixed. More than once it caused writing some alternatives.