> There is a reason why "wc" is not multithreaded: it just can't. It must work sequentially, because in the general case the input of "wc" cannot be skipped over.
Eh what? Word counting is embarrassingly parallel, at least for files.
You start up k threads each working on its own chunk of the file, assuming that there is a word boundary at the start of its chunk. Then you sum up the total word count but inspect whether the chunk boundaries actually were word boundaries, and subtract 1 each time this was violated.
> Eh what? Word counting is embarrassingly parallel, at least for files.
wc does not make the presumption that it is working on a file that can be seeked into (and, more importantly, re-wound). in c++ terms, it behaves as though the source it is reading from is an "input iterator"[1].
if you are piping into wc, then you'll never be able to parallelise it. word counting a file is the edge case here, not the general use.
You don't need to seek: Spin up workers, have them wait on a mutex, and read blocks. If the producer is slow, then you won't need to feed multiple workers. If it's fast, it's a matter of finding the tradeoff between block size vs. the processing time per block.
i expect you're being downvoted because it's likely that the cost of deciding which thread to assign work to is an order-of-magnitude more expensive than accumulating the reqiured counters.
in any case: you're right, it is of course possible. what i should have said was:
> if you are piping into wc, then you'll never see a performance gain from any attempts at parallelisation.
Well, I'm back at 1 point now. In any case, the amount of work needed to assign to a thread is trivial. For a small number of threads, it at most requires a couple of mutex operations and a couple of load/stores to distribute each block. Given the block size can be set arbitrarily high, the overhead of work assignment per byte can be made arbitrarily low. The biggest issue is that you won't know the input size, and for small input sizes there will be a cutoff point where multiple threads adds cost. But for small input sizes, performance won't be an issue anyway.
EDIT: Actually, the simplest way of assigning work is probably just to have each thread try to acquire a mutex, call read(), and then release the mutex and do their work.
Eh what? Word counting is embarrassingly parallel, at least for files.
You start up k threads each working on its own chunk of the file, assuming that there is a word boundary at the start of its chunk. Then you sum up the total word count but inspect whether the chunk boundaries actually were word boundaries, and subtract 1 each time this was violated.