Skip to main content

[Google Docs] Data Model


Why we need editor state model? Not just directly use DOM?

There are some reasons why we need a data model:

  1. ❌ Not able to support undo/redo
  2. ❌ Hard to handle custom elements
  3. ❌ DOM is too large
  4. ❌ DOM is often messed by browser extensions

Apparently, we need a data model to store the document state efficiently.


Data Model

To store data model, there are 2 ways to store the document:

For example, we have a document like this:

<root>
<h1>Benson's doc</h1>
<p>Hello <strong>World</strong></p>
</root>

Simple version: Tree structure

The most simple way to store document is to imitate the DOM tree structure, but in JSON format.

DOM like tree structure


And here is the representation in code:

{
type: "root",
children: [
{
type: "h1",
children: [
{ type: "text", text: "Benson's doc" },
]
},
{
type: "paragraph",
children: [
{ type: "text", text: "Hello " },
{ type: "text", text: "World", format: 'bold' }
]
}
]
}




Better version: Map with pointer children

The most performant way to store document is to use a map with pointer children because :

  • we can get and set the children in O(1) time.
  • we still can make children point to other nodes, to keep the tree relationship.

Node map

And here is the representation in code:

const nodeMap = {
Root: {
type: "root",
children: ["1", "3"]
},
"1": {
type: "h1",
children: ["2"]
},
"2": {
type: "text",
text: "Benson's doc"
},
"3": {
type: "paragraph",
children: ["4", "5"]
},
"4": {
type: "text",
text: "Hello "
},
"5": {
type: "text",
text: "World",
format: "bold"
}
}

Advantages:

  • ✅ Single layer map, easy to clone, same for shallow and deep clone.
  • ✅ O(1) access if know NodeKey

Disadvantages:

  • ❌ O(n) if children array is inserted new node or removed node.



Performant version: Map with Linked List children

Because inserted new node or removing node is still not ideal, we can use a linked list to store the children.

Node linked list type


And here is the representation in code:

interface EditorNode {
key: NodeKey | null,
parent: NodeKey | null,
prev: NodeKey | null,
next: NodeKey | null,
}

interface ElementNode extends EditorNode {
firstChild: NodeKey | null,
lastChild: NodeKey | null,
}

interface TextNode extends EditorNode {
text: string,
}




With the linked list node, we have the node map like this:

Node map tree with linked list



And the tree will be like this:

Node map tree with linked list

Advantages:

  • ✅ Single layer map, easy to clone, same for shallow and deep clone.
  • ✅ O(1) access if know NodeKey
  • ✅ O(1) if children array is inserted new node or removed node.

Disadvantages:

  • ❌ Data structure is more complex, harder to understand.



Comparison

PerformanceTree structureMap with pointer childrenMap with linked list children
Clone shallow
AccessO(n)O(m)O(1)
Insert/removeO(n)O(1)O(1)
Complexity🟢🟡🔴
  • n : all nodes length
  • m : children array length

For performance reason, we should use the Map with linked list children.



References