Hacker Newsnew | past | comments | ask | show | jobs | submit | shunza's commentslogin

The authors gave an example showing WOOT also could cause the interleaving phenomenon.

WOOT was the first CRDT algorithm for text co-editing, but the authors showed that the first CRDT algorithm has a serious flaw after the algorithm was invented 17 years ago


A comment from "https://news.ycombinator.com/item?id=36208585"

A Critical Examination of "The Art of the Fugue" Paper in Relation to OT The introductory paragraph of the paper's abstract states:

"Existing algorithms for replicated lists, which are widely used in collaborative text editors, suffer from a problem: when two users concurrently insert text at the same position in the document, the merged outcome may interleave the inserted text passages, resulting in corrupted and potentially unreadable text. The problem has gone unnoticed for decades, and it affects both CRDTs and Operational Transformation."

The issue of text interleaving in certain CRDT solutions, such as Logoot, is not a recent or overlooked problem. In fact, it has been independently reported by several researchers and practitioners as early as 2018. For detailed visual representations, please refer to Figure 2 and Section 4.4 in [1]. An earlier version of this paper was also published in 2018 and can be accessed at https://arxiv.org/abs/1810.02137.

What sets "The Art of the Fugue" paper apart is its identification of the interleaving problem in several early OT solutions, namely JUPITER OT [1994], adOPTed control algorithm [1996], GOT control algorithm [1998], and TTF functions [2006]. However, even if we were to accept the reported interleaving problem in these co-editing solutions as valid (which is not the case, as later shown), it would be highly unjustifiable to make sweeping claims such as "existing algorithms for replicated lists, which are widely used in collaborative text editors,” suffer from the interleaving problem, or "all text collaboration algorithms have an interleaving problem” (claimed elsewhere by a co-author). This is due to the existence of numerous other co-editing solutions not covered in the paper, including GOTO (ACM CSCW1998), NICE (ACM CSCW2002), TIBOT (IEEE ICPADS2004), COT (ACM CSCW2006), GOOLGO WAVE and Docs OT (2010), and POT (IEEE TPDS2016), to name just a subset of the vast co-editing landscape. It is important to acknowledge that these examples do not encompass a considerable number of industry and open-source OT solutions as well.

Having been involved in co-editing since the 1990s, I have extensive familiarity with various OT solutions, including those mentioned in "The Art of the Fugue" paper (refer to [2]). However, I must admit that I was previously unaware of the interleaving problem in any of these solutions. Naturally, the report in "The Art of the Fugue" paper has sparked my curiosity, prompting me to review these early works to identify any overlooked aspects and evaluate the validity of "The Art of the Fugue" paper's claims.

Regrettably, after thorough review, I find the claims in the paper regarding these co-editing solutions to be unfounded. To share my findings with fellow researchers and practitioners, I will publish a series of reviews on HN. These reviews will address each solution in relation to "The Art of the Fugue" paper, aiming to contribute to knowledge advancement in co-editing and provide accurate information to those who depend on it.

What’s wrong with “The art of the Fugue” paper about OT (adOPTed)?

Let's examine the original text from the paper (A.1.1. page 18) that attempts to illustrate the "interleaving" problem in the adOPTed algorithm:

   “To demonstrate forward interleaving, replica A generates ins(1, a, A) followed by ins(2, b, A). Concurrently, replica B generates ins(1, x, B). Assume A < B and consider the execution at B:

   1. B executes ins(1, x, B), so the document is x.

   2. When B receives ins(1, a, A), it computes the transformation T(ins(1,a,A),ins(1,x,B))=ins(1,a,A) because 1=1 & A<B so a is inserted before x, yielding ax.

   3. When B receives ins(2, b, A), it computes the transformation T (ins(2, b, A), ins(1, x, B)) = ins(3, b, A) because 2 > 1, so b is inserted after x, yielding axb.”
However, there is a flaw in the illustration, particularly in Step 3. According to the paper, when replica B receives ins(2, b, A), it computes the transformation T(ins(2, b, A), ins(1, x, B)) = ins(3, b, A) because 2 > 1. However, the correct transformation under the adOPTed algorithm would be T(ins(2, b, A), ins(2, x, B)) = ins(2, b, A) because 2=2 & A<B. Therefore, b should be inserted before x, yielding abx, which is the correct and non-interleaved outcome.

Additionally, it is worth noting that the adOPTed control algorithm should produce ins(2, x, B) in Step 2 using the transformation T(ins(1, x, B), ins(1, a, A)) because 1=1 & A<B. This important aspect was missing in "The Art of the Fugue" paper.

To gain a deeper understanding of the underlying issues involved, let's further explore an illustration of what would occur at replica A (not depicted in "The Art of the Fugue" paper) under the same scenario:

     1. Replica A sequentially executes ins(1, a, A) and ins(2, b, A), resulting in the document being ab.

     2. When A receives ins(1, x, B), it computes two transformations in sequence: a. T(ins(1, x, B), ins(1, a, A)) = ins(2, x, B) because 1=1 & A<B. b. T(ins(2, x, B), ins(2, b, A)) = ins(3, x, B) because 2=2 & A<B.

   As a result, x is inserted after ab, yielding abx at replica A.
It is evident that the outcome abx at replica A differs from the result axb at replica B, as depicted in "The Art of the Fugue" paper. However, the result abx at replica A aligns with the corrected result abx at replica B, and both results exhibit non-interleaving behavior. With the inclusion of this additional illustration at replica A, it becomes apparent that what was depicted in "The Art of the Fugue" paper does not accurately reflect how the adOPTed algorithm truly functions. Instead, it portrays the behavior of the dOPT algorithm, which was the first OT control algorithm. The co-editing scenario presented in the paper represents an instance of the classic dOPT-puzzle, which highlights a flaw in the dOPT algorithm:

    (ins(1, a, A) -> ins(2, b, A)) || ins(1, x, B)
In this scenario, the operations at replica A are causally related and concurrent with the operation at replica B. It's well-known that the dOPT algorithm produces inconsistent results in this straightforward scenario. This emphasizes the need to accurately represent the behavior of the adOPTed algorithm and distinguish it from dOPT to ensure a thorough understanding of the intricacies involved in co-editing solutions. The resolution to the dOPT-puzzle, which had been solved for approximately 30 years, remained largely unknown and under-appreciated by subsequent contributors in the field of co-editing. This lack of awareness regarding the insights and lessons gained from solving this puzzle is a significant factor behind numerous misconceptions and flawed arguments surrounding OT.

Therefore, it is important to highlight that the key aspect of solving the dOPT-puzzle, achieved by OT control algorithms like adOPTed, JUPITER, GOT, GOTO, COT, POT, and GOOGLE OT, is to ensure context-equivalence between the two input operations processed by the transformation function. Context-equivalence, as further explained in OTFAQ [2], means that the operations are defined on the same state. In the provided example, ins(2, b, A) is context-equivalent to ins(2, x, B), whereas it is context-inequivalent to ins(1, x, B). Emphasising this principle is crucial for a comprehensive understanding of OT and to avoid misconceptions in its application.

Lastly, I want to address why I excluded TTF from the previous discussions. The reason is that TTF does not bear relevance to illustrating the "interleaving" issue or resolving the dOPT-puzzle. When TTF is not integrated with a suitably correct OT control algorithm like adOPTed but instead combined with a flawed one like dOPT (as portrayed in "The Art of the Fugue" paper), inconsistencies inevitably arise. This is evident when we combine the illustration at replica B in the paper with the additional illustration at replica A, demonstrating the effect of integrating TTF with a dOPT-like algorithm.

In conclusion, it is vital to grasp the resolution to the dOPT-puzzle and recognise the significance of ensuring context-equivalence in OT control algorithms like adOPTed, JUPITER, GOT, GOTO, COT, POT, and GOOGLE OT, among others. This understanding is crucial in dispelling misconceptions and flawed arguments surrounding OT and preventing the repetition of similar mistakes in the future.

[1] David Sun, Chengzheng Sun, Agustina Ng, and Weiwei Cai. Real differences between OT and CRDT in correctness and complexity for consistency maintenance in co-editors. Proceedings of the ACM on Human-Computer Interaction, 4(CSCW1), May 2020. doi:10.1145/3392825. Also available at https://arxiv.org/abs/1905.01302, May 2, 2019.

[2]. Chengzheng Sun, “OTFAQ: Operational Transformation Frequently Asked Questions and Answers,” https://www3.ntu.edu.sg/scse/staff/czsun/projects/otfaq/


There is a new CRDT algorithm for text co-editing. Are anybody interested in this CRDT algorithm and the interleaving issues brought by CRDT algorithms.


Representative CRDTs, at least Logoot and TreeDoc, were never truly proved. The correctness of Logoot and TreeDoc are claimed under the assumption that the required property of ids is preserved by the CRDT design. However, the existing id generation algorithms cannot achieve the desired property.

To be specific, I will enumerate two examples.

First, in Logoot (https://hal.inria.fr/inria-00432368/document", the function generateLinePositions in section 4.2 has a serious error, which could lead to the failure of generating new identifiers. This flawed design can be also found in the following research (Logoot-Undo, https://hal.inria.fr/hal-00450416/)

Second, in TreeDoc (https://hal.inria.fr/inria-00445975/document), the function newPosID described in section 3.2 may generate incorrect ids. This function is to generate an id r between any two ids p and q (p<q) so that p<r<q. However, in line 5, if pn=0, then we would get r<p. Readers can refer to Figure 3, in the tree, the character dY precedes the character dZ in infix order, however, the id of dY (10(0:dY)) is greater than the id of dZ (100(1:dZ)), thus their positions are inconsistent their ids.

In the literature, CRDT researchers always claim their CRDT designs are formally proved.

However, there are various scenarios whereby CRDTs would have inconsistency issues as stated above.

In my opinion, I don't think there are much validity for these claims.


Well, RGA has been proved formally [DOI 10.1145/2933057.2933090]. Regarding Figure 3 of the Treedoc paper, I believe the IDs of dY and dZ are in the correct order, according to the rules in the paper.


TTF is a CRDT-like approach. They both maintain an extra data structure including tombstones. And TTF-operations are only applied on the designed data structure. To be applied on the document visible to users, TTF-operations need to be translated like CRDT.

Your solution is to combine GOTO with TTF. Actually, it is a combination of OT and CRDT.


Sorry, I can't follow your statements.

What is a rope-based CRDT?

What is the meaning of "String splicing is O(N)"

The String refers to what? And What is N?


Except for WOOT, which CRDT for co-editor does not require causal ordering?


RON CT/RGA may consume inputs in arbitrary order.


Some background on the Causal Tree (CT) that gritzko is referring to: http://archagon.net/blog/2018/03/24/data-laced-with-history.


In the original paper of RGA, RGA does require that operations are received with respect to the causal ordering. And RGA preserves causality by state vector technique.


Are you the inventor of CT?


The time complexity of most OT systems is not related to H.


I've been working in OT systems for years (G Wave, ShareJS, ShareDB, some other stuff). I'm consistently surprised by how badly academic papers predict OT systems will perform. In reality, they perform great. My little C implementation of text OT can handle about 20M text operation transforms / second[1].

Part of the gap is that many academic papers model text operations as just single character edits. If you do that, copy+pasting some text into an editor can inject thousands of insert operations into the system all at once. But that design is really sloppy - nobody actually designs real world OT systems like that. A much better way to design text OT operations uses a list of skip, insert and delete parts. That way you can arbitrarily compose edits together. This way an entire pasted string just gets inserted in one operation and performance is fine. (Or if the user goes offline, you can merge all the edits they do into a single change, then transform & commit it all at once).

I've still never seen the OT algorithms of anything I've worked on have a meaningful impact on performance. Actually thinking about it I'm not sure if I've ever even seen the OT code show up in profile traces.

[1] https://github.com/ottypes/libot


My personal experience is different. OT tends to be quadratic in terms of computational complexity and codebase size. As long as you consider a simple case, 1^2 is 1. Then, add more data types (not just text), faulty servers, faulty network, non trivial network architectures... All these problems tend to interact. They easily go combinatorial, but again 1! is 1. I worked with an OT codebase that had C++ templates within C defines and I edited them using regexes (x types of operations => x^2 types of potential transformations).

OT aside, data sync in general is not a solved problem. Maybe I am especially unlucky, but I have yet to see a chat app that has no sync glitches. A chat is not the most difficult data structure to sync, but I saw bugs in Slack, Telegram, Instagram, Gchat, Skype, others... WhatsApp does not even try to sync...

A lot depends on the diversity of your use cases and the number of the nines you expect from the system...


<- This, concurs with our experience.

When we profiled our string OT implementation on a single 2-3 GHz CPU, with 2-3 CPI (Cycle Per Instruction)), it was able to handle about 10M transformations per second. On a half dozen of OT based system I've worked on, the OT implementation was never where we found the real engineering complexity or the performance bottleneck in working coeditors.

The significance of using string-wise operation model in OT is under appreciated and can't be emphasized enough for real world systems. The first string-wise OT transformation functions can be traced back to Achieving Convergence, Causality Preservation, and Intention Preservation in Real-Time Cooperative Editing Systems, 1998. We also found that skip, insert, and delete can improve compression of string edits.


I fully agree, I don't understand the emphasis on performance both in the literature and in comments on the subject.

Additionally, it is more common to see CRDT libraries that make character-wise operations and create an object for each character (eg. y-js.org, github.com/google/ot-crdt-papers), which is not ideal.

Obviously there are also libraries such as Atom's Teletype that have string-wise operations.

The cynic in me feels like the CRDT vs. OT war misses the forest for the trees. What matters is lacking features, and the feature that is most needed and least described is a systematic way to offer a diff editor matching the normal experience. Indeed, after having been offline a while, one wants to see and select how their changes will integrate the shared resource.


I think you'll find the article supports your sentiment to an extent.

One of the key messages we tried to get across is that academic research for co-editing ought to go beyond theoretical analysis but rather drive research by pushing the envelope in new areas of application and/or features. OT's evolution has been symbiotic with new applications, while we haven't seen this in CRDT (for coediting) research. The reality is that you'll find new CRDT papers every year that still (a) make fairly broad claims of benefits are either theoretically dubious or not backed/validated by application/system implementations, (b) dwelling on technical issues that were resolve years ago. To move beyond this, we want to have put these issues in context (a general transformation approach) and dispel the most common fallacies in theoretical analysis.

Related to the 'visual-merging' feature you mentioned, when I worked on making MS Word collaborative as a research project, one of the UX research questions was how to allow users to visualize the different combined effects of updates to objects (e.g. if you changed the background of object X to red and I change it to yellow at the same time). We came up with a multi-versioning technique to display the potentially different combinations and help users select one that's the most desirable. It is definitely a interesting and challenging problem to consider for text documents.


My understanding of OT is that each change requires traveling backward though the edit history to find the first state that is consistent between sites, then traveling forward to transform the new change against past edits. In the worst case, an edit requires transformation against the document’s origin state.

Is that not the case?


Most OT systems only transform concurrent operations.


That's true. My performance claims are biased toward offline-capable editors, where the last consistent state between sites may be thousands or millions of edits ago. For online-only editors the set of edits between now and the most recent consistent state may be much smaller.


Be specific. Which statements are attacking CRDTs?


Why OT cannot be applied on P2P networks? Does Google Docs use a transformation-based server mean that all OT need a central server?


Yeah, I thought the whole point of OT was that it allowed any client to do resolution (because all operations transform other operations in a way to make them all resolve the same way no matter their order) Which is what made it seem like such overkill for Google Docs where there is a central authoritative server, so a much simpler architecture could have done the job...


Simple OT algorithms do work a lot better with a centralized server.

With a centralized server, you can consider every resolution as happening between 2 nodes (server and client). This means transform / catchup is as simple as a for loop.

In contrast, in decentralized context OT code needs to support Transform Property 2[1] in order to converge correctly in all cases. TP2 dramatically complicates the OT implementation, and you need a much more complicated resolution algorithm to merge arbitrary changes between nodes.

For text, this means you need:

- Tombstones (or something like it) - eg https://github.com/josephg/TP2/blob/master/src/text.coffee

- Either an operation prune function (anti-transform) or full history, forever.

- A resolution algorithm that can flatten DAGs of operations into a transformed list. This stuff gets really hairy, and hard to implement efficiently. This is my attempt from a few years ago. Its correct, but super slow: https://github.com/josephg/tp2stuff/blob/master/node3.coffee

Implementing high performance OT with centralized servers is easy. Implementing OT in a decentralized network is hard to do correctly, and much harder to implement in a highly performant way. For decentralized systems, CRDTs are a much better approach imo.

[1] https://en.wikipedia.org/wiki/Operational_transformation#Tra...


Several decentralized OT systems [1] can avoid the TP2 constraint on transformation functions.

[1] https://dl.acm.org/citation.cfm?id=2914099


Very insightful, thank you for the detailed comment. About the DAG flattening algorithm you mention - in what way does this differ from a topological sorting? (https://en.wikipedia.org/wiki/Topological_sorting ) E.g. does it need to know about expensive or cheap combinations of operations, and try to avoid the former?


Oh! I didn’t know that had a name. Yes - it is indeed a topological sort. At least the way I’ve implemented it, the performance problem is that moving an operation from position K to position N in the output requires O(|K-N|) steps. CRDTs can do this work in O(log(|K-N|)) steps.


There are a few things related to TP2, aka Convergence Property 2 (CP2), that needs to be unpacked here.

Do your system always need to preserve TP2/CP2 to converge correctly? Contrary to what many folks and papers will claim, answer is categorically no.

There are two general approaches to arrive this desired outcome: (1) avoiding it or (2) preserving it. Let's look at the basics of what you need for each:

(1) Avoiding it - this can be done with a centralized server, as well as in a fully distributed, decentralized, peer-to-peer context. So I'm in agreement with josephg in that a central server will avoid TP2/CP2, but I'd like to amend that with a decentralized avoidance strategy is available and just as easy to adopt. To back this up, this paper [1] looks at TP2/CP2-avoidance systematically in representative protocols and algorithms: JUPITER, NICE, Google Wave, COT, TIBOT/TIBOT2. TIBOT/TIBOT2 are two fully decentralized algorithms/protocols that avoids TP2/CP2.

In addition, [1] also answers the more interesting question of why these systems were able to avoid TP2/CP2, by giving the general conditions for TP2/CP2-avoidance. You can apply the conditions to design a new protocol or algorithm that avoids TP2/CP2.

[1] https://www.dropbox.com/s/u4mhm4ol4qq24uz/TPDS2412938-POT.pd...

(2) Preserving it - By TP2/CP2 preservation, it usually refers to the task of writing transformation functions (e.g. between insert and delete, or delete and delete) that satisfies the aforementioned condition. This is a solved problem for String-wise operations, from 2016 [2]:

- You can take these String-wise transformation functions and use them with many of the existing OT algorithms in fully-decentralized context and satisfy all TP1 and TP2.

- The transformation functions are free of Tombstones (or its variants) or other schemes (DAG etc) to ensure TP2.

Happy to say more about it if someone has an interest.

[2] https://dl.acm.org/citation.cfm?doid=2998181.2998252.

You can implement correct and performant OT with both a centralized server or with a fully decentralized topology, using TP2/CP2-avoidance and/or with TP2/CP-preservation. I've built multiple production systems with all the varieties. I generally do prefer OT with TP2/CP2-avoidance on a server, but it's not because OT can't avoid TP2/CP2 without a server or preserve TP2/CP2, it is that there is not much real-world evidence showing clear benefits moving a collaborative editing system to decentralized topology, esp when most of the CRDT-based p2p proposals in papers that I've come across involves a server one way or another.

TP2/CP2 in OT is often stood up as a straw-man so folks can throw stones against its correctness, it is really a solved problem. The unfortunately outcome is that you see a ton of papers adopting this strategy to get published and overtime the conversation becomes really convoluted.

We discuss these ideas and try to dispel many of these confusions in detail in the arxiv manuscript. A good starting point would be Section 4.1.3: OT Solutions to the CP2 Issue and the FT Puzzle


Thanks! That was super informative.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: