Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> There is a reason why "wc" is not multithreaded: it just can't. It must work sequentially, because in the general case the input of "wc" cannot be skipped over.

Eh what? Word counting is embarrassingly parallel, at least for files.

You start up k threads each working on its own chunk of the file, assuming that there is a word boundary at the start of its chunk. Then you sum up the total word count but inspect whether the chunk boundaries actually were word boundaries, and subtract 1 each time this was violated.



> Eh what? Word counting is embarrassingly parallel, at least for files.

wc does not make the presumption that it is working on a file that can be seeked into (and, more importantly, re-wound). in c++ terms, it behaves as though the source it is reading from is an "input iterator"[1].

if you are piping into wc, then you'll never be able to parallelise it. word counting a file is the edge case here, not the general use.

[1] - https://en.cppreference.com/w/cpp/named_req/InputIterator


You don't need to seek: Spin up workers, have them wait on a mutex, and read blocks. If the producer is slow, then you won't need to feed multiple workers. If it's fast, it's a matter of finding the tradeoff between block size vs. the processing time per block.


i expect you're being downvoted because it's likely that the cost of deciding which thread to assign work to is an order-of-magnitude more expensive than accumulating the reqiured counters.

in any case: you're right, it is of course possible. what i should have said was:

> if you are piping into wc, then you'll never see a performance gain from any attempts at parallelisation.


Well, I'm back at 1 point now. In any case, the amount of work needed to assign to a thread is trivial. For a small number of threads, it at most requires a couple of mutex operations and a couple of load/stores to distribute each block. Given the block size can be set arbitrarily high, the overhead of work assignment per byte can be made arbitrarily low. The biggest issue is that you won't know the input size, and for small input sizes there will be a cutoff point where multiple threads adds cost. But for small input sizes, performance won't be an issue anyway.

EDIT: Actually, the simplest way of assigning work is probably just to have each thread try to acquire a mutex, call read(), and then release the mutex and do their work.


> work counting is embarrassingly parallel, at least for files.

True; and "wc" works not just on files. "wc" also works on character devices, on streams of data that cannot be randomly accessed.


I think your parent meant that you can't do that if you get your data from a pipe. You can't `lseek` on a pipe.


The sentence before your quote explains why that's not always an option.

(Of course wc could switch to a multithreaded algorithm while working on files, but the point did make sense.)


Pipes.

Used Unix every day for 20 years and I'd say 90% of WC use cases is read from a pipe




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: