[Google Docs] Data Model

Why we need editor state model? Not just directly use DOM?

There are some reasons why we need a data model:

❌ Not able to support undo/redo
❌ Hard to handle custom elements
❌ DOM is too large
❌ DOM is often messed by browser extensions

Apparently, we need a data model to store the document state efficiently.

Data Model

To store data model, there are 2 ways to store the document:

DOM like tree structure (e.g. Slate, Quill)
Map with pointer children (e.g. Lexical)

For example, we have a document like this:

<root>
  <h1>Benson's doc</h1>
  <p>Hello <strong>World</strong></p>
</root>

Simple version: Tree structure

The most simple way to store document is to imitate the DOM tree structure, but in JSON format.

And here is the representation in code:

{
  type: "root",
  children: [
    {
      type: "h1",
      children: [
        { type: "text", text: "Benson's doc" },
      ]
    },
    {
      type: "paragraph",
      children: [
        { type: "text", text: "Hello " },
        { type: "text", text: "World", format: 'bold' }
      ]
    }
  ]
}

Better version: Map with pointer children

The most performant way to store document is to use a map with pointer children because :

we can get and set the children in O(1) time.
we still can make children point to other nodes, to keep the tree relationship.

And here is the representation in code:

const nodeMap = {
  Root: {
    type: "root",
    children: ["1", "3"]
  },
  "1": {
    type: "h1",
    children: ["2"]
  },
  "2": {
    type: "text",
    text: "Benson's doc"
  },
  "3": {
    type: "paragraph",
    children: ["4", "5"]
  },
  "4": {
    type: "text",
    text: "Hello "
  },
  "5": {
    type: "text",
    text: "World",
    format: "bold"
  }
}

Advantages:

✅ Single layer map, easy to clone, same for shallow and deep clone.
✅ O(1) access if know NodeKey

Disadvantages:

❌ O(n) if children array is inserted new node or removed node.

Performant version: Map with Linked List children

Because inserted new node or removing node is still not ideal, we can use a linked list to store the children.

And here is the representation in code:

interface EditorNode {
  key: NodeKey | null,
  parent: NodeKey | null,
  prev: NodeKey | null,
  next: NodeKey | null,
}

interface ElementNode extends EditorNode {
  firstChild: NodeKey | null,
  lastChild: NodeKey | null,
}

interface TextNode extends EditorNode {
  text: string,
}

With the linked list node, we have the node map like this:

And the tree will be like this:

And the code is like this:

const nodeMap = {
  Root: {
    type: "root",
    firstChild: "1",
    lastChild: "3"
  },
  "1": {
    type: "h1",
    parent: "Root",
    prev: null,
    next: "3",
    firstChild: "2",
    lastChild: "2"
  },
  "2": {
    type: "text",
    text: "Benson's doc",
    parent: "1",
    prev: null,
    next: null
  },
  "3": {
    type: "paragraph",
    parent: "Root",
    prev: "1",
    next: null,
    firstChild: "4",
    lastChild: "5"
  },
  "4": {
    type: "text",
    text: "Hello ",
    parent: "3",
    prev: null,
    next: "5"
  },
  "5": {
    type: "text",
    text: "World",
    format: "bold",
    parent: "3",
    prev: "4",
    next: null
  }
};

Advantages:

✅ Single layer map, easy to clone, same for shallow and deep clone.
✅ O(1) access if know NodeKey
✅ O(1) if children array is inserted new node or removed node.

Disadvantages:

❌ Data structure is more complex, harder to understand.

Comparison

Performance	Tree structure	Map with pointer children	Map with linked list children
Clone shallow	❌	✅	✅
Access	O(n)	O(m)	O(1)
Insert/remove	O(n)	O(n)	O(1)
Complexity	🟢	🟡	🔴

n : all nodes length
m : children array length

For performance reason, we should use the Map with linked list children.

References

Great Frontend - Rich Text Editor - Data Model

Why we need editor state model? Not just directly use DOM?​

Data Model​

Simple version: Tree structure​

Better version: Map with pointer children​

Performant version: Map with Linked List children​

Comparison​

References​

Why we need editor state model? Not just directly use DOM?

Data Model

Simple version: Tree structure

Better version: Map with pointer children

Performant version: Map with Linked List children

Comparison

References