what i always think about stuff like this: if stuff like mongita and sqlite woul...

ezrast · on April 21, 2021

There are generic distributed consensus algorithms out there. The most famous are Paxos and Raft. In theory, you can jam those on top of any system you like, as long as it has well-defined state transitions.

Making it fast - or usable at all in the presence of heavy contention - is another story. Distributing a write-heavy workload over a cluster is useless if the cluster ends up rejecting most updates because they get preempted by some other write. Solving that problem usually means analyzing the underlying system to figure out which parts need to be truly atomic and which you can get away with doing in parallel. That job is a) really complex and b) filled with opportunities to make significant performance gains in exchange for weaker safety guarantees, like losing committed writes in a crash, or allowing individual nodes to reorder independent writes.

You should check out http://jepsen.io/analyses if this stuff interests you.

deknos · on April 21, 2021

thank you!

scottrogowski · on April 21, 2021

I think as I understand it, your question is whether we can't just put a distributed layer on top of basic embedded databases.

This is really interesting and is something I came across while writing this. It turns out that concurrency is actually quite difficult because either you have global locks, which means only one process can write to the database/indicies at once and slows things down considerably, or you have to do a lot of clever things to avoid those locks.

deknos · on April 21, 2021

> I think as I understand it, your question is whether we can't just put a distributed layer on top of basic embedded databases.

Exactly!

> This is really interesting and is something I came across while writing this. It turns out that concurrency is actually quite difficult because either you have global locks, which means only one process can write to the database/indicies at once and slows things down considerably, or you have to do a lot of clever things to avoid those locks.

well, that would be also the case with traditional db services, the question can they have more granular mechanisms for more granular locking than embedded databases. but perhaps they even can have only less granular locking?