Tons of use cases! There are lots of concurrent algorithms that use locking and shared memory instead of message-passing. Matrix multiplication or tree search come to mind.
Parallel blocked n^3 matrix multiplication doesn't use any synchronization primitives (short of join). The processes read from the same input matrices, but write to non-overlapping regions of memory. This is easily accomplished using posix shared memory. Indeed, this is how it is done when the matrices to be multiplied are too large to be handled by single worker nodes.