Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The challenge of course is that if your write depends on a previous read, offline/online sync can easily in the wrong sync state once you come back online (in the general case). Even CRDTs are not immune from this - for example your offline document copy says step 1 is delete line 3 so you delete it but then you sync and step 1 was corrected to be “delete line 4” your deletion of line 3 is now incorrect even though the merged result is “valid”. We see this everyday in code development - they’re called merge conflicts.


Under what circumstances would a past entry be corrected? Doesn't CRDT require an immutable stream of operations?


Original document has a list of instructions. The first instruction says “delete line 3”. User 1 offline deletes line 3. User 2 updates the instructions so that step 1 says “delete line 4”. Merged result when user 1 comes back online: step 1 says “delete line 4” and line 3 is deleted with the users none the wiser about what happened unless they double check the merged result. Physically the result is merged consistently but logically the end result of the document is not what was intended.

CRDT just guarantees that the document merges an edit stream in some sensible way without any need for manual conflict resolution. However, such resolution doesn’t (and can’t really) understand logically that the merged result is not what was intended to have happened. This is a contrived example but you can imagine similar scenarios and it’s concerning that people think CRDTs would solve the scenario I presented somehow when they solve a much more constrained technical problem. Imagine you have a CRDT algorithm for managing distributed state. And then you update the code about how the state itself is interpreted or managed. Now you have the problem of differing versions of code running mutating the document containing the state and the document is always in some valid state, but the server are not actually in agreement about how to apply the mutations and the state can diverge in unintended ways.


I get it that you can reach conflicting states, but not with this particular example.

Assuming updates are synced through a centralized server, if user 1 has seen the “delete line 3” instruction, it means that instruction was committed by user 2. At this point they can’t “edit” the first instruction. You can only move forward. Undoing the line 3 deletion is in itself a new instruction adding the line back; then line 4 gets deleted and both clients’ states have converged.


Reread what I wrote as what you wrote seems to be a totally different formulation. User 1 deletes line 3 while User 2 updates the instruction concurrently. The end result is the unintended line is deleted due to a race. The CRDT preserved the structure of the document but not it’s semantic contents.

If it helps, I drew out the document in https://news.ycombinator.com/threads?id=vlovich123#40793833


Ha, I thought by instructions you meant CRDT commands, not actual text instructions.

This is not a failure of CRDT but an inevitable consequence of trying to point to a specific line in a live document. It can happen even without concurrent updates, just two people editing a document in turns; as such it is impossible to resolve regardless of merge strategies, you have to use some kind of section marker that stays in place within the text as it moves (which is basically what every collaborative editor does internally).


This is exactly the kind of scenario CRDTs rectify.


CRDT's largely can't rectify merge conflicts... what CRDT's can do is make your merges convergent. If I auto-merged your Git code using a blunt policy I don't know if you'd reach for "correct" as your first description as opposed to "convergent".

The longer you're offline the more you have to explain to your users the circumstances where they might need to do manual data cleanup or expect creative merging policy. No matter what tech you're using.


This is true if you're working with documents edited by different devices, but many CRDTs do not and can indeed be not only "convergent" but also "correct".

How SSB works is a good example.


Can’t find what SSB is. Can you clarify?

But yes, it’s possible there are applications that can constrain the set of operations such that the merged result is also correct. I was just highlighting that CRDTs are trotted out as a catch all when a) it’s just a way of thinking about the problem rather than an actual algorithm (eg while there’s a lot of CRDT work for document editing, that work has to be done for scratch for any other problem domain) b) you have to modify your problem constrains and solution such that merge results are correct. B is a very very hard problem in a distributed system even if you ignore the challenge of a which itself is quite hard. I wouldn’t trust anything in this space too much without a TLA+ proof that it’s correct (unless it’s something low stakes like document editing).


I think SSB is "Secure Scuttlebutt": http://ssbc.github.io/ssb-db/


Some of our most important DB sync'ing algorithms were only very recently verified. It was okay to run the whole world's systems, including hospitals and armed forces this way. Very very few industry algos receive formal verification.


CRDTs solve merge conflicts by construction, because they necessarily don't permit merge conflicts in the first place.


By definition they don't have conflicts in the technical sense of the word, but they can easily get into bad states after a merge.

It's not technically a conflict since there is a path forward, but the resulting output can be nonsensical.

For example you could employ a strategy where given two concurrent edits, the merge "randomly*" picks one of the two edits for each property/fact, and abandons the other.

That's a construction of a conflict free replicated data type, but it says nothing of the quality of the final result.

Ultimately, the quality comes down to how well you can produce merge strategies for your domain.

* "Randomly" in a deterministic way, e.g. by comparing a hashes of transaction IDs or similar.


> By definition they don't have conflicts in the technical sense of the word...

...which is the sense that we're talking about, when we talk about CRDTs.


When someone goes into detail about correct versus convergent, they are explaining how the technical sense is not what people really want, and you bluntly replying that it's solved (because it's technically solved) is unhelpful at best and misleading at worse.


A CRDT is a purely technical construct! Correctness (in terms of merge conflicts) is well-defined in that context and it's orthogonal to any kind of semantic/application definition!

This is a bonkers conversation. It's like claiming integer division isn't correct because 5/2 should be 2.5 instead of 2.


Alright, try this in Google Doc.

Two users simultaneously select the same word, and replace it by a different word each.

You might end up with: "apple" -> "carrot orange"

Sure it technically merged, and did not lose data. But you still need humans to reconcile the final state anyways. Worse, you might not even notice that you need to cleanup.


Bad example. Google docs doesn’t use CRDTs but uses OT instead. CRDTs may handle your scenario just fine depending on how they decide to handle this scenario.

The easier scenario that is independent of the specific algorithm is the one I described (have follow up comments describing it in more explicit detail) where you have a logical meta instruction within the document and then the document is updated based on outdated meta instructions offline but when they come back online the meta instructions were changed.

Then there’s not even a merge conflict to really worry about.


> Bad example. Google docs doesn’t use CRDTs but uses OT instead. CRDTs may handle your scenario just fine depending on how they decide to handle this scenario.

The CRDT may pick one or the other replacement word, but who is to say that either choice is correct? Perhaps including both words is correct.

> Then there’s not even a merge conflict...

Agree, this is what CRDTs are all about.

> ...to really worry about.

I think it is important to make clear that CRDTs do not "solve" the merging problem, they merely make it possible to solve in a deterministic way across replicas.

Often, CRDTs do not capture higher level schema invariants, and so a "conflict free" CRDT merge can produce an invalid state for a particular application.

There is also the example above, where at the application level, one particular merge outcome may be preferred over another.

So, it isn't as simple as having nothing to worry about. When using CRDTs, often, there are some pretty subtle things that must be worried about. :-)


> The CRDT may pick one or the other replacement word, but who is to say that either choice is correct?

You, as the application developer.

> I think it is important to make clear that CRDTs do not "solve" the merging problem

They literally do, in the context in which they are defined. Which is about data consistency, not semantic correctness.

> Often, CRDTs do not capture higher level schema invariants,

CRDTs never capture higher level schema invariants. Just like TCP doesn't enforce HTTP session authentication. Orthogonal concerns.


Yup 100% agreed.


No, they don't, unfortunately. I feel like the term CRDT leads to misunderstandings like this. CRDTs are more like packaging and abstractions on top of various strategies for building multi-user, local-first UX, they let end users program against recognisable data structures like strings, lists and dictionaries. How they handle conflicts is implementation dependant and isn't anything new, many will fall back to LWW for example.


The specific case of an edit to a line being applied to a different line than what it was intended for because a previous change deleted said line is exactly what CRDTs rectify. I'm not referring to merge conflicts per say, or that the final document will make human sense. But CRDTs will prevent changesets from different participats from being applied in the worst possible way. And there will be no data loss.


You’re misunderstanding what I’m saying.

Original document has a list of instructions. The first instruction says “delete line 3”. User 1 offline deletes line 3. User 2 updates the instructions so that step 1 says “delete line 4”. Merged result when user 1 comes back online: step 1 says “delete line 4” and line 3 is deleted with the users none the wiser about what happened unless they double check the merged result. Physically the result is merged consistently but logically the end result of the document is not what was intended.


That's why relative positions are not used in CRDTs in the absolute sense. Entities are added via IDs and operations are recorded via IDs. If user one deletes line 3 and user 2 deletes line 4, user one will see lines 1, 2 and 5 and onwards. The commands that are captured by CRDTs only operate on the state of the document as the user sees it at the time, their own changes might be overwritten later during a merge, but their changes won't interfere with parts they were not destined to interfere with.


You seem to still not be getting it. Here’s the original online document:

> 1. Please delete line 3.

> 2. Blank line

> 3. This is line 3

> 4. This is line 4.

User 1 is accessing the document offline, sees the instruction on line 1 and changes their local copy to:

> 1. Please delete line 3

> 2. Blank line

> 3. This is line 4

User 2 concurrently changes the document to update the instruction:

> 1. Please delete line 4

> 2. Blank line

> 3. This is line 3

> 4. This is line 4

When user 1 & user 2 merge their documents, they’ll get:

> 1. Please delete line 4

> 2. Blank line

> 3. This is line 4

You can also split this into two documents with a similar effect (one contains instructions, the other contents). The point is that it’s not about how changes are represented. It’s that the logical and semantic structure isn’t represented in the CRDT and thus while some eventually consistent result is attained by all nodes, the logical and semantic structure may end up broken. That’s why you generally don’t see CRDTs for code editors - the merge conflicts still need to be resolved by hand and doing it automatically can result in worse and harder to understand results than normal.


For certain scenarios there will be conflicts, take a boolean value. Client A sets it and client B unsets it. There can only be one winner.

But that might be a benefit from the proposed log sync, because these conflicting situations can be shown and marked for human review in the UI. Each step of change is well documented and the history can fully be reviewed.


Note that in the example I gave there’s not even a conflict to notice. You just have meta instructions telling you what to do within a document. You apply those instructions offline but in the mean time your meta instructions are updated online. You come back and synchronize your changes but there’s nothing that suggests your offline sync against the meta instructions was stale so your edits are applied based on stale meta instructions resulting in a logically undesirable result.

This is basically a TOCTOU race in a distributed context and CRDTs do not magically solve this problem because it’s not solvable. That’s why we have PAXOS and RAFT to do such things and why many many papers have been written trying to solve the Byzantine generals problem in constrained scenarios with well defined criteria.


> And there will be no data loss.

CRDT's can absolutely lose data!


Not when the log is append-only.


Yes, and CRDT's can absolutely lose data. That's the point that should be delivered across when someone claims they can't. Nobody needs to be warned that if you save endless data then you could save all data.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: