Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Interesting, the separate compute and storage tiers is another system going that direction which I think is becoming almost the standard at this point, especially for "cloud-native" things designed to run on k8s. From what I can tell (it isn't very explicit on this point) they are avoiding a distributed consensus at the storage layer and instead relying on a single writer/multiple reader model with the single writer being enforced by assignment of the tablets in the compute tier, with the tablet being responsible for writing to multiple storage nodes for durability? (But I might be wrong)

Assuming yes this approach, I think, is under utilized and is pretty similar to how Apache Pulsar works (my day job),but I am not sure how many distributed RDBMS have tried it out, will be cool to see how it evolves! It isn't clear how they ensure the assignment of a tablet to only a single compute node, but I think that is an easier problem relative to distributed consensus at the storage tier.



Each tablet gather a quorum of answers from members of so called BlobStorage group. BlobStorage group is a number of so called VDisks (virtual disk), all VDisks run on different nodes (even on different fail domain like racks, AZs). VDisk stores its data on physical device, i.e. PDisk.


From my past experience, Datomic uses the same approach, ie, multiple reader nodes and a single transact node. However, it's much more locked in with AWS, as it uses Dynamo and S3 for backing (maybe others as well?).


Assigning leaders is trivial with something like zookeeper. But in this case it appears that the leader metadata is stored in a table of the database itself, which raises questions of operability if those tablets are unavailable.


YDB doesn't use Zookeeper. The system is built of tablets, every tablet implements distributed consensus algorithm. There are different types of tablets in the system, say SchemeShard is tablet that stores metadata, table schema for instance. DataShard stores table partition data.


Which consensus algorithm? I don't see raft in the codebase


We have consensus protocol over shared log. Some details you can find in a power point presentation at this page https://2019.hydraconf.com/2019/talks/muxomfgqembsb3st7i3ci/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: