Nice write-up. Haskell and GHC have definitely been enormously successful platforms for PL research in general, and research in parallelism in particular. However, one unfortunate side effect of all that research effort is that there are now at least half a dozen libraries for parallelism and concurrency (STM, Async, Control.Parallel.Strategies, etc), to the point that it is hard for a software engineer to grasp the tradeoffs presented by the various options. I feel like I succumb to decision paralysis when I attempt parallel programming in Haskell; from a purely pragmatic perspective, I much prefer the Go approach, where there is a single canonical concurrency mechanism which is baked right into the language and is extremely well-documented.
Been a while since I've done Haskell, but the essential breakdown as I understand it is this:
At lowest level you have the Control.Concurrent primitives: forkIO and MVar. forkIO allows you to spin up new Haskell threads (not OS threads) and MVar's can be used to communicate between them. MVar's can be empty or full and block when the operation can't be completed immediately (writing to a full MVar or reading from an empty MVar). They also have single wakeup semantics, so if you have multiple readers blocking on an empty MVar read, only one will be woken up when someone writes to it.
STM has transactional semantics. You can write to multiple STM variables atomically in a transaction, and your transaction will restart if any of your input variables have been written to while your transaction was running. Useful if you need to maintain consistency between several variables at all times.
Async is a wrapper around STM and provides common functions you would often write yourself to express common patterns, but more battle tested.
Strategies are about optimistically evaluating Haskell thunks in parallel. For example if you map a function over a list, now you have a list of thunks. You can use Strategies to now evaluate those thunks in parallel. Key thing with Strategies is that if you remove them from your code (or use a non-threaded runtime) it doesn't affect the semantics of your program, it just might make it slower due to loss of parallel evaluation.
The Par Monad isn't used much, but the key idea there is you build an explicit dependency graph of the values in your computation and the runtime tries to parallelize the nodes in the graph that it can. Similar to what make does if you run it with -j.
I don’t think this an entirely fair comparison. STM provides essentially primitive operations, Async just builds a higher level API on top of them for common pattern. Not unlike the relationship between errgroup and go routines.
Strategies solve a different problem, in giving control of evaluation order of pure values. STM is aimed more at traditional multi threading
> I much prefer the Go approach, where there is a single canonical concurrency mechanism which is baked right into the language and is extremely well-documented.
Haskell doesn't have everything but I'd prefer something other than channels for concurrency. Actually, I like the Rust ecosystem in that regard (though it's missing something like accelerate...)
Really nice paper, especially the bullet points near the end.
I applied for a job with Simon Marlow’s Haskell team at FaceBook about three years ago. I really enjoyed talking with him in the phone, but it was obvious that I didn’t have sufficient Haskell skills.
There is a lot of good synergy between meeting FB’s platform requirements and research and development for the Haskell ecosystem.
Sadly, Multicore OCaml [1] is nowhere near [2] completion yet. Nor it moves fast. There are still a lot [3] of issues to solve and sync with the mainstream OCaml until it will be ready for anything basic. I write OCaml mostly these days (along with Rust, C, Python, and a couple less mainstream languages), but often miss the ease of parallelization with Haskell, code clarity, and so on.
Truthfully, I'm waiting for Multicore OCaml and I'll only use it, Rust and Elixir.
OCaml's typing system is in my eyes unbeatable. Still considering making a transpiler from it to Rust and Elixir but I'm grossly unqualified to write compilers. :(
Haskell though, I couldn't grok it. Seemed to introduce a lot of tension for very basic topics -- like strings -- and it lost me.
The big distinguishing difference is that OCaml doesn't have ad-hoc polymorphism and function overloading. You cannot have a function that operates on many different types; which is why you have two sets of arithmetic operators for integers and floats ("+" vs "+.", for example). In theory you can use modules for this, but it's awkward, and I don't think they're generally used this way. (OCaml is getting implicit modules, and this may help the situation.)
In Haskell, typeclasses make this super easy. It allows you define a family of functions/operators that together become what a Go or Java developer might call an interface. Code written to use a particular typeclass will work for any implementation of that typeclass.
On the other hand, OCaml's module system is very powerful and has no counterpart in Haskell. It allows you to package a whole type implementation into a single module that has a public signature, and can be instantiated with generic type parameters.
I don’t know anything about OCaml so please forgive me if this question seems silly.
two sets of arithmetic operators for integers and floats
What if I want to create more numeric types? Say vectors in R^n, for example. Do I need to create more operators for vector addition and scalar multiplication? How do I make sure the user can’t add vectors of incompatible dimension?
Yes, you need to add new functions. You cannot redefine "+" to work a custom vector type.
Similarly, you cannot have a "print" function that works for all sorts of types. OCaml has print_string, print_int, and so on.
OCaml does have parametric polymorphism ("generics"). You can write functions which support generic parameters. And you can use modules as "functors" to sort of package up types in a similar way as Haskell typeclasses (e.g. see [1]).
There are couple of implementarion of strings:
* String - this is default, it is list of characters and it was a mistake.
* ByteString and LazyByteString - this represents array of 8bit characters. Performant but does not handle encodings
* Text and LazyText - this is what you want to use when yiu work with text, since it represent text in unicode.
There is mess in these types. s9me libraries are using different types, so you might have to convert between those and it can be confusing, and the conversion apis are not unified. But if you manage in your project consistently use single type (say Text), then you'll actually have a great time working with strings. Its problem if you want to quickly mash together few libraries that does not use same types, since it get messy.
> ByteString and LazyByteString - this represents array of 8bit characters
Not exactly. ByteString has no relation at all with text or characters. It represents a sequence of bytes, AKA binary data.
I imagined I would hear complaints about too many text types when I made that question, but it sounded like the GP had a different complaint.
The library mess is real, but it's improving quickly and it's a bad reason to decide Haskell is a bad language to deal with text. It's throwing a lot of great stuff out just because of a minor inconvenient. And by the way, one of the best things on Haskell is that you can defined a lot more text types. For example, my current project has types for internal identifiers, HTML, HTML form values, JS, and SQL, they are all text but I can always be sure things are properly encoded.
"there’s a lot more to realising good parallel speedup than just choosing the right language"
from ye olde days - if you can't split up computation into nice isolated chucks point is moot, so relatively fine granularity of your algo stepping is the key here, lack or severe control of side effects means you don't get to make too many mistakes related to mechanics of running the computation.
> so relatively fine granularity of your algo stepping is the key here
Fine granularity requires pretty much zero overhead synchronization, which green threads or any shared memory multithreading implementations can't do, they need to spend like a few thousands of nanoseconds of useful work before synchronization costs become even just bearable.
> Fine granularity requires pretty much zero overhead synchronization
Yes, like a work-stealing scheduler. Many tiny tasks are fine as long as you keep them on one core. Other cores can steal a batch of them from the other end of your queue with minimal synchronisation every now and again.
That's not my experience. I think work stealing is low overhead as it amortises synchronisation, only does it at all when actively required to get more work, and reduces conflicts due to two ends of the queue. Why do you think it's high overhead?
Isn't work stealing excellent for locality? Jobs stay on the same core until there's a need to steal, and then the most likely non-resident jobs are taken. Is it even cache oblivious?