Your users will still need to manually merge in a CRDT application; *because com...

Groxx · on June 25, 2024

It can't decide human intent flawlessly, but the point of a CRDT is that it does choose one, and all others choose the same one regardless of how they got there.

Git does not do this, so it is not a CRDT. The content-addressable-database portion of git sorta fits this though (as does any other content-addressable system).

kaba0 · on June 25, 2024

This is basically git automatically doing “accept theirs” or “yours” for any fork. You can see that it will not generally be what you want, so whether such a strategy could work is domain-dependent.

vlovich123 · on June 25, 2024

No that’s not accurate. If I merge branch X and then branch Y and someone else merges branch Y and then branch X, with CRDT the result should also be the same whereas with git it won’t be if you’re strategy is always “accept theirs” or “accept yours”. CRDT is also order invariant - it doesn’t matter which ordering of edit operations you accept, the end result is consistent across all nodes.

You may want to read up on the Wikipedia page rather than taking 1 thing I said and extrapolating. https://en.wikipedia.org/wiki/Conflict-free_replicated_data_...

gwbas1c · on June 25, 2024

I wouldn't argue that CRDT "solves" this problem, either.

The git solution exposes the conflict to the user, who can then fix it. (Or leave it there if they choose to.)

The best a CRDT can do is leave some kind of conflict marker that the user can fix. (Remember, computers can't read minds. See my "quick brown fox" example.)

Git does this. It's predictable and lossless. Deciding if it's a CRDT probably is more of a discussion about semantics than fact, because "git merge" is lossless: It presents a consistent view that the user can accept or change.

And yes, I know what a CRDT is.

vlovich123 · on June 26, 2024

But a CRDT would. The end result at node 1 seeing merge X first and then merge Y next would necessarily have to be the same as node 2 seeing merge Y first and then merge X. That’s literally the core property of CRDTs - all nodes eventually converge to the same state regardless of the network partitioning. Git does not have this property and thus is not a CRDT (for edits - it’s a CRDT for mirroring).

Git is not a CRDT not because “git merge is lossless” but because the result is order dependent which is not partition tolerant.

You may want to read the original paper which defines CRDT [1]. Here’s some choice quotes to help you:

> System model: We consider a system of processes interconnected by an asynchronous network. The network can partition and recover

> Clearly, a sufficient condition for convergence of an op-based object is that all its con- current operations commute. An object satisfying this condition is called a Commutative Replicated Data Type (CmRDT).

Git has some CRDT concepts but the core behavior of creating commits and sharing them does not generally meet the criteria of a CRDT. And no. Requiring a manual merge is also not a property of a CRDT as the whole point of it is to generate a “correct” merge result without human intervention. Otherwise the point of the paper would be almost irrelevant.

[1] https://pages.lip6.fr/Marc.Shapiro/papers/RR-7687.pdf

gwbas1c · on June 26, 2024

You're now arguing semantics.

Let's change course for a bit: If git was a CRDT, what would happen when there is a merge conflict between two branches?

vlovich123 · on June 26, 2024

Whatever happens, the end result on two different nodes doing the same merge operations (or a commutative ordering of those merge operation) would be identical.

Think about a CRDT document: if two people edit the same line, regardless of what happens, once the documents synchronize, the final state of the document will be identical. That’s also the reason manually resolved merges don’t work because two different people might resolve the same conflict in different ways. But again, the conflict resolution being identical under any commutative ordering of simultaneous operations is the hardest requirement of CRDTs. The commutation requirement is what kills the “always theirs” or “always mine” strategy (there are other scenarios but that’s the easiest one to demonstrate).

gwbas1c · on June 26, 2024

Ahh, now you're missing some critical details: How can a CRDT perform a sane merge? (Remember my quick brown fox example.) IE, is it destructive (picks one) or does it output something like: "The quick brown fox !!!(ran around|||dug under)!!! the fence."

This is kind-of what git does: It leaves a sane conflict in your source code. (The result is always the same given the same inputs, too.) The merge conflict might not build; but how git handles merge conflicts will always result in a functioning git repository.

Groxx · on June 26, 2024

tbh it's increasingly sounding like you're defining a CRDT as "something is decided and written down in all cases" and simply ignoring every single other quality they guarantee.

Those other qualities matter. So much so that they're literally the defining qualities.

vlovich123 · on June 26, 2024

Yeah I'm done trying to help this person understand the differences between Gits and CRDTs. They're being intentionally difficult by redefining CRDTs to "what Git does" rather than evaluating Git against the properties a CRDT is defined to have.

threatofrain · on June 25, 2024

Whether or not the user is manually involved at some point is a product decision. I think the trend of consumer companies is not to do that. Possibly damaging user data is simply a tradeoff in this way of thinking.

vlovich123 · on June 25, 2024

No, that’s literally the definition of CRDT. Requirement 2 out of the 3 listed on Wikipedia:

> An algorithm (itself part of the data type) automatically resolves any inconsistencies that might occur.

So no, human resolution vs automatic is not a product decision but a key definitional requirement to be a CRDT.

https://en.wikipedia.org/wiki/Conflict-free_replicated_data_...