eonist/CRDTHUB.md

Created December 6, 2025 12:24

Star (0) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Select an option

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/eonist/7acf50e6fe86b737c1bbdcd2eaa3cc36.js"></script>
Save eonist/7acf50e6fe86b737c1bbdcd2eaa3cc36 to your computer and use it in GitHub Desktop.

Download ZIP

CRDTHUB.md

Raw

CRDTHUB.md

See files

Raw

Fundamental Limitations of the Evo Project.md

Fundamental Limitations of the Evo Project

The "evo" project, which has been discontinued by its creator, represents an ambitious attempt to replace Git with a CRDT (Conflict-Free Replicated Data Type) based version control system. While the vision is compelling—automatic merge conflict resolution and stable file identities—the project faces several insurmountable technical limitations that prevent it from scaling compared to Git.¹

The CRDT Memory Overhead Problem

The most critical limitation is unbounded memory growth. CRDTs require storing extensive metadata to enable automatic conflict resolution, and this overhead becomes prohibitive at scale.²³

The Tombstone Problem: When text is deleted in a CRDT, the characters don't truly disappear—they become "tombstones" that persist forever in the data structure. This is necessary to correctly order incoming changes that reference deleted positions. For a version control system tracking thousands of commits across years, these tombstones accumulate indefinitely. As one researcher documented, "the size of the local memory in each node of the CRDT-based system grows continuously...this accumulation of historical data includes deleted content".⁴²

For evo's RGA (Replicated Growable Array) CRDT specifically, each line operation requires storing:

A lamport timestamp
A node ID
A line ID
Parent dependencies
The actual content (or old content for reverts)

This metadata overhead is typically 64 bytes minimum per operation, meaning a simple keystroke can consume 10KB or more when accounting for the full CRDT structure. In contrast, Git's delta compression can represent changes in just a few bytes.³⁵

Performance Degradation at Scale

CRDTs suffer from non-linear performance degradation as repositories grow. Benchmarks show that while CRDT systems may start fast, they slow dramatically over time:⁵⁶

Automerge (a popular CRDT implementation) processes ~900 edits/second initially but slows to stalling V8 for 1.8 seconds on single edits in large documents⁵
Peak memory usage can reach 2.6GB for just 260,000 edits⁵
The "every operation matters forever" model means performance continually degrades

Git, by contrast, maintains relatively constant performance through:

Packfiles with delta compression
Shallow clones that don't require full history
Efficient indexing using SHA-1 hashing
Local operations that don't traverse entire history

The Linux kernel repository—with 1.4 million commits—occupies only 5.5GB while maintaining excellent performance. A CRDT-based system would struggle immensely with this scale.⁷

Why Git's Architecture Scales

Git's snapshot-based model is fundamentally more efficient than operation-based CRDTs:⁸⁹⁷

Git's Advantages:

Commits are immutable snapshots, not accumulated operations. Git doesn't need to replay every keystroke; it just stores compressed versions of file states
Delta compression in packfiles dramatically reduces storage. Git only stores differences between similar objects
SHA-1 content addressing provides efficient deduplication and integrity checking
Garbage collection can truly remove unreferenced data
Shallow operations allow working with recent history without loading everything

Evo's Disadvantages:

Operation logs grow unbounded. Every single change ever made must be kept
No garbage collection possible without breaking CRDT convergence guarantees
All operations must be traversable to compute current state
Metadata overhead of 64+ bytes per operation vs. Git's ~50 bytes per commit snapshot
Binary format complexity doesn't solve the fundamental mathematical constraints

The Replication and Sync Problem

Evo's CRDT approach also faces challenges with distributed replication:³

Naïve reconciliation requires multiple network roundtrips to fetch missing parent dependencies
No efficient negotiation protocol like Git's sophisticated push/pull algorithms
Each replica needs complete operation history rather than just commits
Network overhead from transmitting verbose CRDT metadata

Git's negotiation protocol, refined over 20 years, efficiently determines what objects need transfer using algorithms like bitmap indexing and multi-ack protocols.

The "Merge Without Conflicts" Illusion

The promise of "zero merge conflicts" is somewhat misleading. CRDTs don't eliminate conflicts—they hide them through automatic resolution.¹⁰¹¹¹²¹

For text editing, this might work (characters inserted concurrently simply appear in some order). But for source code:

Automatic merges can create syntactically invalid code
Semantic conflicts (two changes that individually work but together break functionality) are invisible
Developers lose the opportunity to consciously resolve conflicts

As one developer noted: "you don't actually want your repo to be a CRDT because a CRDT resolves all conflicts and that would mean merge conflicts get resolved in an arbitrary way (leading to unexpected results and bad code)".¹¹

Practical Deployment Barriers

Beyond technical limitations, evo faces adoption barriers:

Incomplete implementation: The README itself notes features like "server-based PR flows," "performance optimizations," and "packfiles" were never completed¹
Binary compatibility: Git's ecosystem (GitHub, GitLab, Bitbucket, CI/CD tools) is deeply entrenched
No migration path: Converting Git repositories to evo would be extremely difficult
Team coordination: Git's branching model, while complex, is well-understood by millions of developers

Why Projects Like Pijul Also Struggle

Even more mature CRDT-based version control systems like Pijul face similar challenges:¹³¹¹

Still in alpha/experimental status after years of development
Format changes and repository corruption issues damaged credibility
Performance problems with certain merge scenarios
Small ecosystem compared to Git's massive tooling infrastructure

The Verdict

Evo cannot scale versus Git because:

Memory grows unbounded due to tombstones and operation logs
Performance degrades non-linearly as history accumulates
Network sync is inefficient compared to Git's protocols
Storage overhead is 10-100x worse than Git's compressed snapshots
No true garbage collection is possible without breaking CRDT guarantees
The metadata-to-content ratio becomes increasingly unfavorable

Git's 20 years of optimization have produced a system finely tuned for version control specifically, with features like packfiles, partial clones, shallow history, and efficient network protocols. CRDTs solve a different problem (real-time collaborative editing) and force compromises that make them unsuitable for version control at scale.¹⁴¹⁵²³

The creator's decision to discontinue the project reflects these fundamental technical barriers rather than just the harassment mentioned in the README. While academically interesting, CRDT-based version control remains impractical for production use.¹ ¹⁶¹⁷¹⁸¹⁹²⁰²¹²²²³²⁴²⁵²⁶²⁷²⁸²⁹³⁰³¹³²³³³⁴³⁵³⁶³⁷³⁸³⁹⁴⁰⁴¹⁴²⁴³⁴⁴⁴⁵⁴⁶⁴⁷⁴⁸⁴⁹⁵⁰⁵¹⁵²⁵³⁵⁴⁵⁵⁵⁶

⁂

can these challenges be overcome?

Yes, theoretically, every challenge identified can be overcome, but doing so requires moving beyond the "naive" CRDT implementation used in Evo.

The technical solutions exist in state-of-the-art research and newer projects (like Loro, Automerge 2.0, and Pijul), but implementing them essentially requires building a completely different system than what Evo currently is.

Here is how the specific limitations of Evo could be solved:

1. Solving the "Unbounded Growth" (Tombstone) Problem

The Challenge: In Evo, every deleted character leaves a "tombstone" (metadata) behind so future merges can be placed correctly. This causes the repo to grow forever, even if you delete all files.

The Solution: Garbage Collection (GC) with Consensus You can delete tombstones, but only if you can prove that every other copy of the repository has also seen the deletion.

How it works: The system tracks a "stability vector"—a list of which versions every user has synced. Once a deletion is "stable" (synced to everyone), the tombstone can be safely purged from disk.
The Trade-off: This breaks the "pure offline" promise. You effectively need a "garbage collection synchronization" phase where peers talk to each other (or a central server) to agree on what can be deleted.

2. Solving Performance (The "Slow at Scale" Problem)

The Challenge: Evo's RGA (Replicated Growable Array) structure requires traversing a linked list or tree for every operation, which gets slower as history grows.

The Solution: Columnar Compression & RLE Newer libraries like Loro (written in Rust) and Automerge 2.0 use techniques from high-performance databases:

Columnar Storage: Instead of storing objects like {id: A, value: "a"}, {id: B, value: "b"}, they store arrays: ids: [A, B], values: ["a", "b"]. This allows for massive compression.
Run-Length Encoding (RLE): If you type "hello", instead of 5 separate operations, the system compresses them into a single "insert 'hello' at position X" block.
Result: Loro can load documents with millions of operations in milliseconds, whereas naive CRDTs (like Evo's) would hang for seconds or minutes.

3. Solving the "Hidden Conflict" Problem

The Challenge: Evo merges line-by-line. If you change a function name and I change a call to that function, Evo merges both without error, but the code won't compile.

The Solution: Semantic Conflict Detection Instead of treating code as "lines of text" (like Evo/Git), the VCS must treat code as an Abstract Syntax Tree (AST).

Tree CRDTs: Systems like Pijul don't just merge text; they track "patches" and dependencies. If a patch depends on a specific context that has changed, Pijul flags a conflict instead of silently merging it.
Syntactic Awareness: A "solved" Evo would need to parse the language (Go, JS, Rust) and understand that func foo() is a definition, preventing merges that break the syntax tree.

4. Solving Storage Efficiency

The Challenge: Evo stores raw CRDT logs, which are 10-100x larger than the actual source code.

The Solution: "Squashing" or Snapshotting You can adopt a hybrid model (similar to how Git uses packfiles):

Hot State: Keep the recent changes (last 2 weeks) in the full CRDT format to allow easy merging and time-travel.
Cold State: "Squash" older history into immutable snapshots (like Git commits).
The Trade-off: You lose the ability to easily "un-merge" or mathematically reorder commits from 5 years ago, but you gain massive storage efficiency. This is a compromise most production systems accept.

Summary: The "Evo 2.0" Blueprint

If someone were to reboot Evo today to make it viable, it would need to look like this:

Feature	Evo (Current)	"Viable Evo" (Hypothetical)
Data Structure	Naive RGA (Line-based)	Columnar Loro/Automerge v2
Storage	Infinite Log Growth	GC + Snapshotting
Conflict Model	"No conflicts" (dangerous)	Explicit Conflict States (Pijul-style)
Network	Raw Log Transfer	Bitmap Indexing (Git-style)

Can it work? Yes. Pijul is the closest real-world attempt at this. It is mathematically sound, fast, and solves the merge conflict problem properly. The fact that Pijul exists proves the concept is valid, even if the Evo implementation failed. ⁵⁷⁵⁸⁵⁹⁶⁰⁶¹⁶²⁶³⁶⁴⁶⁵⁶⁶⁶⁷⁶⁸⁶⁹⁷⁰⁷¹⁷²⁷³⁷⁴⁷⁵⁷⁶

⁂

Raw

Key Limitations of Evo.md

Key Limitations of Evo

The Evo project has been discontinued due to fundamental scalability and technical challenges inherent to CRDT-based version control, combined with community issues. While the vision was ambitious, several critical limitations prevent it from competing with Git at scale.

Memory Overhead and Ever-Growth Problem

CRDTs suffer from continuous memory growth because they must retain all historical metadata to ensure convergence. In Evo's line-based RGA (Replicated Growable Array) implementation, each character or line requires a unique identifier and tombstone markers for deletions. This metadata accumulates indefinitely—even for deleted content—causing memory usage to balloon proportionally with edit history. Git avoids this by using delta compression and packfiles that efficiently store only meaningful diffs.¹²

Performance Constraints

CRDT operations have inherent performance penalties that worsen with repository size:²

Upstream complexity (local edits): Linear O(N) time, requiring identifier generation and retrieval for every modification
Position calculation: Must traverse tombstones to determine correct insertion points, degrading responsiveness below the critical 50ms threshold for interactive editing³
Merge operations: While CRDTs eliminate merge conflicts conceptually, the computational cost of converging large documents with extensive edit histories becomes prohibitive

Git's snapshot-based model with three-way merges scales far better for large codebases, as evidenced by projects like the Linux kernel.

Application Constraint Violations

CRDTs cannot enforce application-level constraints because they use hardcoded merge rules. For version control, this means Evo cannot prevent semantically invalid merges—like concurrent edits that break code compilation or violate invariants. Git's explicit conflict resolution allows developers to apply domain knowledge, whereas CRDT automatic merging may produce "converged" but broken code.⁴⁵

Scalability Barriers

The accumulation of CRDT metadata creates cascading scalability problems:¹

Limited support for large datasets or growing user bases
Increased bandwidth consumption for synchronization
Storage inefficiency compared to Git's object database
No equivalent to Git's shallow clones or partial checkouts

While Evo attempted large file support through stubs, the core CRDT structure still faces these fundamental constraints that Git's architecture inherently avoids.

The project's discontinuation reflects the practical reality that CRDT-based version control, while theoretically elegant, cannot match Git's proven scalability for real-world software development. ⁶⁷⁸⁹¹⁰¹¹¹²¹³¹⁴¹⁵¹⁶¹⁷¹⁸¹⁹²⁰²¹

⁂

can these challenges be overcome?

Yes, many of these challenges can be partially overcome, though not completely eliminated. Active research has produced several promising mitigation strategies, though fundamental tradeoffs remain.

Proven Solutions

Garbage Collection and Tombstone Pruning

The memory growth problem has practical solutions. Researchers have demonstrated that safe garbage collection is possible when all replicas coordinate to prune obsolete metadata. Tombstone pruning algorithms can remove deleted content markers once all replicas acknowledge the deletion, reducing memory overhead by 40-60% in some implementations. However, this requires tracking replica states and introduces partial coordination—undermining CRDT's coordination-free advantage.²²²³²⁴

Partial Persistence

A promising approach involves offloading historical data to disk storage rather than keeping everything in memory. By applying partial persistence techniques, CRDTs can maintain recent operations in fast memory while archiving older metadata to disk. This addresses scalability constraints for large datasets while preserving convergence guarantees. The tradeoff is increased complexity and slower access to historical states.²⁵

Efficient Encoding

CRDTs can achieve significant memory savings through compressed representations of operations and metadata. Research shows that specialized encoding formats can reduce the space overhead substantially, though the optimal compression strategy remains an open problem.²⁶²⁷

Remaining Fundamental Limitations

The Coordination Paradox

Effective garbage collection requires knowing when all replicas have processed certain operations—which necessitates coordination. This creates a paradox: solving CRDT's scalability problems reintroduces the very coordination overhead CRDTs aim to eliminate. Dynamic environments where nodes join and leave unpredictably make this especially challenging.²⁸²³²⁴

Application-Level Constraints

The problem of enforcing semantic correctness remains fundamentally unsolvable with pure CRDTs. CRDTs guarantee convergence, not correctness. For version control, automatic merging cannot prevent code that compiles individually from becoming broken when combined. This isn't a performance issue—it's an inherent limitation of conflict-free approaches.²⁹

Performance Ceiling

While optimizations improve CRDT performance, tombstone traversal and metadata management still impose computational costs that grow with edit history. Real-time collaborative editing benchmarks show performance degradation over time, even with garbage collection.³⁰²²

Realistic Path Forward

A hybrid approach would likely work better than pure CRDT-based version control: using CRDTs for specific workflows (like concurrent documentation editing) while preserving Git's explicit conflict resolution for code where semantic correctness matters. The ConcoRDanT research project acknowledges that CRDTs are not a universal solution and work best when combined appropriately with other techniques.²⁹

The challenges are surmountable for specific use cases, but replacing Git entirely remains impractical given the fundamental tradeoffs between coordination-free operation and semantic guarantees. ³¹³²³³³⁴³⁵³⁶³⁷³⁸³⁹⁴⁰⁴¹

⁂

Raw

Project Status and Core Limitations.md

Project Status and Core Limitations

The Evo project has been discontinued by its developer due to community harassment, but examining its technical architecture reveals fundamental scalability challenges that would have emerged regardless.

Critical Technical Limitations

Memory Overhead and Metadata Growth

The most severe limitation of Evo's CRDT-based approach is unbounded metadata accumulation. RGA (Replicated Growable Array) CRDTs must store metadata for every operation ever performed—each line insertion, deletion, and update maintains tombstones and positional information that never gets cleaned up. In practice, this means:¹²

A 1MB text file edited 1000 times could require 5-16MB of metadata storage
Memory consumption grows linearly with edit history, not just current file size
Unlike Git's packfiles which compress delta chains efficiently, CRDT operation logs accumulate indefinitely

Performance Degradation at Scale

RGA CRDTs exhibit non-uniform performance characteristics that worsen as repositories grow:³⁴

Indexing operations require O(n) linear searches through operation histories
Each merge operation must replay all CRDT operations from scratch
The slowest operations in benchmarks can stall for 1.8+ seconds on moderately-sized documents
Git's packfile compression and delta storage vastly outperform CRDT metadata for large histories

Repository Size Explosion

While Git repositories typically grow sub-linearly due to delta compression, Evo faces the opposite problem:⁵

Every concurrent edit creates permanent operation records

- Binary format for `.evo/ops/<stream>/<fileID>.bin` stores full operational metadata

Large file handling with stubs doesn't solve the metadata bloat problem
A repository with millions of historical operations becomes unwieldy

Why Git Scales Better

Snapshot-Based vs Operation-Based

Git stores snapshots with delta compression applied at the storage layer, not the data structure layer. This means:⁵

Git can discard intermediate states and reconstruct files efficiently
Packfiles compress similar objects together regardless of operational history
Repository size grows with unique content, not edit count

Proven Infrastructure

Git benefits from decades of optimization for large-scale projects:

Shallow clones, sparse checkouts, and partial clones reduce data transfer
Pack files are highly optimized with sophisticated compression algorithms
The Linux kernel (10M+ commits, 30+ years) demonstrates Git's scalability ceiling

Fundamental CRDT Trade-offs

The "zero merge conflicts" promise of CRDTs comes with unavoidable costs:⁶

CAP theorem constraints: CRDTs choose availability over strong consistency, making them unsuitable for scenarios requiring precise control
Semantic conflicts remain: While syntactic conflicts are eliminated, logical conflicts (like incompatible refactorings) still require human resolution
Garbage collection complexity: Safely pruning historical CRDT operations in distributed systems is an unsolved research problem

Conclusion

Evo's CRDT-based approach fundamentally trades Git's efficient snapshot-and-compress model for automatic convergence properties. This results in unbounded memory growth, performance degradation over time, and repository bloat that make it unsuitable for large-scale or long-lived projects. While CRDTs excel at real-time collaborative editing (like Google Docs), version control demands different trade-offs where Git's battle-tested architecture remains superior. ⁷⁸⁹¹⁰¹¹¹²¹³¹⁴¹⁵¹⁶¹⁷¹⁸¹⁹²⁰²¹

⁂

can these challenges be overcome?

Some of these challenges can be partially mitigated, but fundamental trade-offs remain unavoidable. Here's what's possible and what isn't:

Challenges That Can Be Overcome

Tombstone Garbage Collection

The tombstone accumulation problem has proven solutions with specific constraints:²²²³

Coordinated GC: Systems like Garage successfully delete tombstones when all nodes confirm they've seen the deletion, preventing "zombie" values from reappearing
Lossless compaction: Operations that no longer affect the final state can be dropped while preserving convergence properties²⁴
Time-based pruning: After a guaranteed synchronization window, old tombstones can be safely removed

However, these approaches require either centralized coordination (defeating CRDT's decentralized promise) or strict time guarantees that don't exist in truly offline-first scenarios.

Hybrid Snapshot Approaches

The most promising direction involves combining CRDTs with snapshots:²⁵²⁶

Periodically create Git-style snapshots of converged state
New replicas clone from snapshots rather than replaying full operation history
Only recent operations need CRDT metadata
Once all replicas sync past a snapshot, earlier operations can be discarded

This approach essentially makes CRDTs an optimization layer on top of traditional version control, gaining conflict-free merging for recent work while maintaining Git's scalability.

Performance Optimizations

Modern CRDT implementations achieve significant speedups:²⁷

Specialized data structures can make sequence CRDTs 6000x faster than naive implementations
Run-length encoding compresses consecutive operations
Efficient indexing reduces O(n) searches

These optimizations make CRDTs practical for documents with thousands of edits, but they don't eliminate the fundamental metadata overhead.

Challenges That Cannot Be Overcome

The Convergence-Memory Trade-off

CRDTs face an inescapable mathematical constraint: to guarantee convergence without coordination, they must retain enough information to correctly order any future operation. This creates fundamental limits:²⁸²⁹

True decentralization requires permanent metadata retention
Any pruning strategy either requires coordination or risks divergence
The CAP theorem guarantees you cannot have partition tolerance, availability, AND consistent garbage collection simultaneously

Semantic Conflicts Remain

"Zero merge conflicts" is misleading—CRDTs only eliminate syntactic conflicts:³⁰

If two developers refactor the same function differently, CRDT will merge both changes, producing broken code
Logical inconsistencies (incompatible API changes, race conditions) require human resolution
Git's merge conflicts often surface semantic issues that CRDT silently combines into incorrect code

Repository Growth vs. History Depth

Unlike Git where repository size scales with unique content, CRDT-based systems face history-dependent growth:

A file rewritten 1000 times carries metadata for all 1000 versions
Git's packfiles compress similar files regardless of lineage
Compaction techniques help but cannot match snapshot-based compression ratios

The Verdict: Hybrid Systems Are the Path Forward

The most viable approach combines both paradigms:²⁴²⁵

For Active Development:

Use CRDTs for conflict-free merging during active collaboration
Provide automatic convergence for concurrent edits
Maintain full operation history for recent work

For Long-term Storage:

Periodically snapshot converged state using Git-style compression
Prune old CRDT operations once all replicas sync
Keep traditional branching/merging for released versions

This hybrid approach acknowledges that CRDTs and Git solve different problems: CRDTs excel at real-time convergence while Git excels at efficient long-term storage. Trying to replace Git entirely with CRDTs fights against fundamental computer science trade-offs, but using CRDTs as a collaboration layer on top of Git-style storage could offer the best of both worlds. ³¹³²³³³⁴³⁵³⁶³⁷³⁸³⁹⁴⁰⁴¹⁴²

⁂

Raw

Technical Limitations of Evo.md

Technical Limitations of Evo

Evo is a discontinued project that attempted to use CRDT (Conflict-Free Replicated Data Type) technology for version control, but faces fundamental scalability and performance problems that make it unviable compared to git.

Core Scalability Problems

Memory Overhead Growth CRDTs suffer from inherent memory inflation because they must retain metadata for all operations to ensure convergence. Evo's RGA (Replicated Growable Array) CRDT assigns each line a unique identifier with lamport timestamp and node ID. As repositories grow, this metadata accumulates indefinitely—every line ever inserted, even if later deleted, leaves permanent tombstones in the operation log. Git avoids this by storing snapshots rather than preserving full operation history.¹²³

Performance Degradation RGA-based CRDTs have linear upstream complexity, meaning edit operations slow down proportionally with document size. While Evo claims to use binary format for speed, research shows RGA implementations perform poorly in certain patterns—particularly when content is prepended rather than appended. In large repositories with hundreds of thousands of operations, individual edits can stall for multiple seconds.⁴⁵⁶

Storage Inefficiency

Evo stores operation logs per file (`.evo/ops/<stream>/<fileID>.bin`) rather than git's efficient pack files. Each line operation requires storing identifiers, timestamps, node IDs, and content. Research indicates CRDT-based collaborative editing systems create 16+ bytes of metadata per content string, causing storage to grow significantly faster than git's compressed deltas.[^1_5][^1_4]

Why It Won't Scale vs Git

Repository Size Git repositories with millions of commits and files remain performant because git uses content-addressable storage with delta compression and packed references. Evo's CRDT logs grow unbounded with every edit to every file across all streams, making Linux-kernel-sized repositories (10+ million commits) effectively impossible.²³

Merge Performance While Evo promises "zero merge conflicts," this comes at severe cost. Merging in Evo requires replaying all CRDT operations from source to target stream, with O(N) complexity where N is the total number of operations. Git's three-way merge algorithm operates on snapshots, making merges independent of total repository history size.⁴

Garbage Collection Challenges CRDTs fundamentally cannot discard historical operation metadata without breaking convergence guarantees. Git regularly packs, prunes, and optimizes history. Evo's promise of "renames made simple" via stable UUIDs means file IDs persist forever, even for files deleted years ago.¹²

The project was discontinued not just due to community issues, but because the CRDT approach creates insurmountable scalability problems for version control at git's scale.³ ⁷⁸⁹¹⁰¹¹

⁂

can these challenges be overcome?

Some challenges can be partially addressed, but fundamental tradeoffs remain that make pure CRDT approaches difficult for version control at git's scale.

Solvable Challenges

Tombstone Garbage Collection

The memory growth problem from tombstones can be mitigated through stability-based garbage collection. Once all replicas have received an operation, tombstones become unnecessary and can be pruned using vector clocks to track synchronization. Projects like Yorkie have implemented this successfully for collaborative editing. However, this requires knowing when operations are "stable" across all replicas—problematic for offline-first version control where contributors may remain disconnected for months.¹²¹³¹⁴

Partial Persistence

Memory overhead can be reduced by offloading historical operations to disk storage rather than keeping everything in RAM. This "partial persistence" approach maintains recent operations in memory while archiving older ones, similar to how databases handle transaction logs. This addresses memory exhaustion but doesn't solve the underlying storage growth problem.¹⁵

Optimized Data Structures

Performance can be improved through delta-state CRDTs that transmit only changes rather than full state, and by using efficient encoding like bitmaps for tombstone sets. The Collabs framework demonstrates that optimized CRDT implementations can scale to 100+ simultaneous collaborative editors.¹³¹⁶¹⁷

Fundamental Limitations

The Convergence-Garbage Collection Paradox

CRDTs mathematically guarantee convergence by preserving all operation metadata. Aggressive garbage collection breaks this guarantee—you cannot safely discard tombstones without knowing all replicas have synchronized. Version control systems must support indefinite offline work, making stability detection impossible. This creates an unsolvable tension between scalability and CRDT guarantees.¹⁴¹⁸

Scaling to Git's Size

While optimizations help, they don't change the fundamental complexity class. Git repositories with 10+ million commits succeed because git stores snapshots and computes diffs on-demand. CRDTs must maintain operation logs proportional to every edit ever made. No amount of optimization changes this asymptotic difference for massive, long-lived repositories.¹⁵

Hybrid Approaches

The most promising path forward involves hybrid systems that combine CRDT benefits with traditional version control. GenericVC demonstrates unifying MVCC concurrency control with Git-style versioning, getting benefits of both. A practical approach might use CRDTs for real-time collaborative editing sessions while falling back to snapshot-based storage for long-term history—essentially treating CRDTs as a short-term conflict resolution layer rather than the fundamental storage model.¹⁹¹⁸²⁰

The core insight: CRDTs excel at solving merge conflicts for active collaboration, but version control requires archival storage spanning decades. These goals conflict fundamentally.¹⁸¹³ ²¹

⁂

Author

eonist commented Dec 6, 2025

Automatic Conflict Resolution vs. Manual Merges

The most fundamental advantage CRDTs have over Git is automatic, deterministic conflict resolution. While Git requires manual intervention when concurrent edits create merge conflicts, CRDTs are mathematically designed to resolve conflicts automatically without human input. This is achieved through commutative operations that produce the same result regardless of the order they're applied.[1][2][3]

In Git, when two developers edit the same lines of code in different branches, Git pauses the merge and asks humans to resolve the conflict. CRDTs eliminate this entirely—all concurrent operations automatically converge to a consistent state across all replicas. This makes CRDTs "conflict-free" by design, whereas Git is fundamentally not a CRDT because its merge operations are order-dependent and require manual resolution.[2][3][4][1]

Peer-to-Peer Synchronization Without Central Coordination

CRDTs excel at decentralized, peer-to-peer collaboration where Git typically requires a central repository. While Git can technically operate in a peer-to-peer manner, the practical reality is that teams use centralized workflows with a canonical "main" repository to coordinate work.[5][6][7][8][9][10]

CRDTs remove the need for a central coordinator entirely. Each replica can accept updates independently and asynchronously, then sync with other replicas in any topology—peer-to-peer, hub-and-spoke, or mesh networks. This architectural flexibility makes CRDTs ideal for applications where a central server is impractical due to scale, network partitions, or the need for true decentralization.[3][9][2]

Offline-First Applications with Guaranteed Convergence

CRDTs provide superior offline-first support with strong eventual consistency guarantees. When users work offline, CRDTs allow local updates that automatically merge when connectivity returns, without conflicts. This property is baked into the data structure itself.[6][11][12][2][3]

Git does support offline work, but merging offline changes often requires manual conflict resolution. More critically, Git's merge behavior isn't guaranteed to be consistent—two different people might resolve the same conflict in different ways, breaking the convergence guarantee that CRDTs provide. With CRDTs, all replicas are mathematically guaranteed to converge to identical states once they've received the same set of updates, regardless of order.[13][14][1][2][3]

Real-Time Collaborative Editing

CRDTs enable real-time collaborative editing in applications like Google Docs, Figma, and Notion. Users can simultaneously edit the same document, and changes appear instantly across all participants without coordination delays.[15][16][17][6]

Git's model is fundamentally asynchronous and batch-oriented—developers work on local copies, then explicitly push and pull changes. While this works well for code development, it's unsuitable for real-time collaboration where users expect immediate feedback. CRDTs naturally support this use case through their conflict-free merging properties.[18][16][7][6][15]

Low-Latency Local Operations

Every operation in a CRDT system can be processed locally with low latency, since replicas don't need to coordinate with a central authority or wait for locks. Users can make changes immediately, and those changes propagate in the background.[19][20][3]

Git also allows local commits, but the merge step—where changes from different sources combine—may require coordination and manual resolution. With CRDTs, the merge is automatic and immediate, maintaining the low-latency experience even during synchronization.[21][1][3][19]

Better for Distributed Systems at Scale

CRDTs are designed for high availability and partition tolerance in distributed systems. They continue functioning correctly even during network partitions, ensuring that each node remains operational and accepts writes independently.[22][2][3][13]

While Git is distributed in architecture, it doesn't provide the same partition tolerance guarantees. Git assumes eventual connectivity and human intervention to resolve divergence. CRDTs, by contrast, are built from the ground up to handle network failures gracefully with automatic convergence.[23][1][3][13]

Use Cases Where CRDTs Shine

CRDTs excel in scenarios that Git wasn't designed for:[24][9]

Collaborative document editing (Google Docs-style applications)[17][6]
Offline-first mobile and edge applications[20][11][12]
Real-time multiplayer features (chat, shared whiteboards, gaming)[9][25]
Geo-distributed databases requiring local-latency writes[3][19]
Peer-to-peer applications without central servers[26][9]

The Trade-Offs

CRDTs aren't universally better than Git—they make different trade-offs. CRDTs typically require more memory and bandwidth to store metadata for conflict resolution. For text editing, CRDTs store additional information about insertion positions, version vectors, and causal relationships.[5][18][6]

Additionally, CRDTs sacrifice some determinism in how conflicts are resolved. While they guarantee convergence, the merged result may not match human intent—for example, with text edits, a CRDT might interleave words in unexpected ways. Git's manual conflict resolution allows humans to make semantic decisions about which changes to keep.[1][23]

For version control of source code with thoughtful, reviewed changes, Git's model of explicit commits and manual merge resolution remains highly appropriate. CRDTs shine when automatic merging, real-time collaboration, and offline-first operation are paramount.[11][27][23][6][9]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52

eonist/CRDTHUB.md

Fundamental Limitations of the Evo Project

The CRDT Memory Overhead Problem

Performance Degradation at Scale

Why Git's Architecture Scales

The Replication and Sync Problem

The "Merge Without Conflicts" Illusion

Practical Deployment Barriers

Why Projects Like Pijul Also Struggle

The Verdict

can these challenges be overcome?

1. Solving the "Unbounded Growth" (Tombstone) Problem

2. Solving Performance (The "Slow at Scale" Problem)

3. Solving the "Hidden Conflict" Problem

4. Solving Storage Efficiency

Summary: The "Evo 2.0" Blueprint

Footnotes

Key Limitations of Evo

Memory Overhead and Ever-Growth Problem

Performance Constraints

Application Constraint Violations

Scalability Barriers

can these challenges be overcome?

Proven Solutions

Garbage Collection and Tombstone Pruning

Partial Persistence

Efficient Encoding

Remaining Fundamental Limitations

The Coordination Paradox

Application-Level Constraints

Performance Ceiling

Realistic Path Forward

Footnotes

Project Status and Core Limitations

Critical Technical Limitations

Memory Overhead and Metadata Growth

Performance Degradation at Scale

Repository Size Explosion

Why Git Scales Better

Snapshot-Based vs Operation-Based

Proven Infrastructure

Fundamental CRDT Trade-offs

Conclusion

can these challenges be overcome?

Challenges That Can Be Overcome

Tombstone Garbage Collection

Hybrid Snapshot Approaches

Performance Optimizations

Challenges That Cannot Be Overcome

The Convergence-Memory Trade-off

Semantic Conflicts Remain

Repository Growth vs. History Depth

The Verdict: Hybrid Systems Are the Path Forward

Footnotes

Technical Limitations of Evo

Core Scalability Problems

Why It Won't Scale vs Git

can these challenges be overcome?

Solvable Challenges

Tombstone Garbage Collection

Partial Persistence

Optimized Data Structures

Fundamental Limitations

The Convergence-Garbage Collection Paradox

Scaling to Git's Size

Hybrid Approaches

Footnotes

eonist commented Dec 6, 2025

Automatic Conflict Resolution vs. Manual Merges

Peer-to-Peer Synchronization Without Central Coordination

Offline-First Applications with Guaranteed Convergence

Real-Time Collaborative Editing

Low-Latency Local Operations

Better for Distributed Systems at Scale

Use Cases Where CRDTs Shine

The Trade-Offs

Uh oh!