Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
155 views
in Technique[技术] by (71.8m points)

javascript - How to build a tree array into which / out of which items can be spliced, which only allows arrays of 1, 2, 4, 8, 16, or 32 items?

So there is a very elegant answer to a similar problem. The problem there was to build an array tree where every array had only 1, 2, 4, 8, 16, or 32 items, and where every item was at the same nesting level. I formulated this problem without having the entire system in mind (doing rapid prototyping I guess), but the current system I don't think will really work for deleting items from the middle of the array, or adding items into the middle of the array. Unfortunately.

I need the ability to add/remove items in the middle of the array because this will be used for arrays in bucketed hash tables, or general arrays which items are rapidly added and removed (like managing memory blocks). So I am thinking how to balance that with the desire to have memory block sizes of 1, 2, 4, 8, 16, or 32 items only. Hence the tree, but I think the tree needs to work slightly differently from the problem posed in that question.

What I'm thinking is having a system like follows. Each array in the nested array tree can have 1, 2, 4, 8, 16, or 32 items, but the items don't need to sit at the same level. The reason for putting items at the same level is because there is a very efficient algorithm for getItemAt(index) if they are at the same level. But it has the problem of not allowing efficient inserts/deletes. But I think this can be solved where items in an array are at different levels by having each parent array "container" keep track of how many deeply nested children it has. It would essentially keep track of the size of the subtree. Then to find the item with getItemAt(index), you would traverse the top level of the tree, count the top-level tree sizes, and then narrow your search down the tree like that.

In addition, the leaf arrays (which have 1, 2, 4, 8, 16, or 32 items each) can have items removed, and then you only have to adjust that short array item's positions. So you'd go from this:

[1, 2, 3, 4, 5, 6, 7, 8]

...delete 6, and get this (where - is null):

[1, 2, 3, 4, 5, 7, 8, -]

Then if you add an item 9 at say position 3, it would result in:

[1, 2, 9, 3, 4, 5, 7, 8]

This is nice because say you have a million items array. You now only have to adjust a single array with up to 32 items, rather than shifting a million items.

BUT, it gets a bit complicated when you add an item in the "middle of this tree array", but at the end of a 32-item array. You would first think you would have to shift every single subsequent array. But there is a way to make it so you don't have to do this shifting! Here is one case.

We start here:

[
  [ 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 ,
    9 , 10, 11, 12, 13, 14, 15, 16,
    17, 18, 19, 20, 21, 22, 23, 24,
    25, 26, 27, 28, 29, 30, 31, 32],
  [ 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 ,
    9 , 10, 11, 12, 13, 14, 15, 16,
    17, 18, 19, 20, 21, 22, 23, 24,
    25, 26, 27, 28, 29, 30, 31, 32]
]

Now we add an item 90 at the 16th position. We should end up with this, since this array must grow to be 4 in length:

[
  [ 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 ,
    9 , 10, 11, 12, 13, 14, 15, 90,
    16, 17, 18, 19, 20, 21, 22, 23,
    24, 25, 26, 27, 28, 29, 30, 21],
  [32],
  -,
  [ 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 ,
    9 , 10, 11, 12, 13, 14, 15, 16,
    17, 18, 19, 20, 21, 22, 23, 24,
    25, 26, 27, 28, 29, 30, 31, 32]
]

If we delete 90 now, we would end with this:

[
  [ 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 ,
    9 , 10, 11, 12, 13, 14, 15, 16,
    17, 18, 19, 20, 21, 22, 23, 24,
    25, 26, 27, 28, 29, 30, 31, - ],
  [32],
  -,
  [ 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 ,
    9 , 10, 11, 12, 13, 14, 15, 16,
    17, 18, 19, 20, 21, 22, 23, 24,
    25, 26, 27, 28, 29, 30, 31, 32]
]

Basically, it is minimizing the changes that are made. To getByIndex(index) it would work like this, with more metadata on the arrays:

{
  size: 64,
  list: [
    {
      size: 31,
      list: [
        1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 ,
        9 , 10, 11, 12, 13, 14, 15, 16,
        17, 18, 19, 20, 21, 22, 23, 24,
        25, 26, 27, 28, 29, 30, 31, - ] },
    {
      size: 1,
      list: [32] },
    null,
    {
      size: 32,
      list: [
        1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 ,
        9 , 10, 11, 12, 13, 14, 15, 16,
        17, 18, 19, 20, 21, 22, 23, 24,
        25, 26, 27, 28, 29, 30, 31, 32] }
  ]
}

So my question is, how can you build such a tree that only has 1, 2, 4, 8, 16, or 32 nodes at each level, which allows for inserting or removing nodes at any place in the overall conceptual "array", where the leaf nodes in the tree don't need to be at any specific level? How to implement the splice method?

For this question, don't worry about compactification just yet, I will try and see if I can figure that out on my own first. For this question, just leave junk and nulls around wherever they end up being, which is less than ideal. I mean, if you know how to compactify things easily, then by all means include it, but I think it will add significantly to the answer, so the default should be to leave it out of the answer :)

Also note, the arrays should be treated as if they are static arrays, i.e. they can't dynamically grow in size.

The algorithm for insertItemAt(index) would work something like this:

  • Find the appropriate leaf array to put the item in. (By traversing down based on size information).
  • If the leaf has some room in it (as null pointers at the end of the leaf array), then just shift the items to make room for the item at its exact index.
  • If the leaf is too short, replace it with a longer one, and place the item in that leaf.
  • If the leaf is max length (32), then attempt to add another leaf sibling. Can't just easily do that if there are 32 siblings... Or if there are not already null siblings in place if it's shorter.
  • If the leaf is max length and there aren't max length siblings, then check for a free null sibling. If there are no more free siblings, then double the number of siblings with null pointers, and create the next array and put it there.
  • If the leaf is max length and the siblings are max length, and the parent is max length, I have a hard time imagining what the algorithm should do exactly in order to grow while adhering to these constraints, which is why I struggle here.

The algorithm for removeItemAt(index) (the second piece of functionality of splice) would do something like this:

  • Find the appropriate item based on index and size information of each array node in the tree.
  • Set it to null.
  • Compactify surrounding null pointers if there are multiple of them at the same level. Bring them down so they equal 1, 2, 4, 8, 16, or 32 (probably since we're deleting it will never equal 32). But this portion of the algorithm can be left out, I can probably eventually figure this out, unless you know how to do it quickly.

Here is what I have basically so far.

const createTree = () => createTreeLeaf()

const createTreeContainer = () => {
  return {
    type: 'container',
    list: [],
    size: 0
  }
}

const createTreeLeaf = () => {
  return {
    type: 'leaf',
    list: [],
    size: 0
  }
}

const setItemAt = (tree, item, index) => {
  let nodes = [tree]
  let startSize = 0
  a:
  while (true) {
    b:
    for (let i = 0, n = nodes.length; i < n; i++) {
      let node = nodes[i]
      let endSize = startSize + node.size

      if (startSize <= index && index < endSize) {
        // it's within here.
        if (node.type == 'container') {
          nodes = node.list
          break b
        } else {
          let relativeIndex = index - startSize
          // grow if less than max
          if (relativeIndex > node.size - 1) {
            if (node.size == 32) {
              // grow somehow
            } else {
              let newArray = new Array(node.size * 2)
              node.list.forEach((x, i) => newArray[i] = x)
              node.list = newArray
            }
          }
          if (node.list[relativeIndex] == null) {
            node.size++
          }
          node.list[relativeIndex] = item
          break a
        }
      }
    }
  }
}

const insertItemAt = (tree, item, index) => {
  let nodes = [tree]
  let startSize = 0
  a:
  while (true) {
    b:
    for (let i = 0, n = nodes.length; i < n; i++) {
      let node = nodes[i]
      let endSize = startSize + node.size

      if (startSize <= index && index < endSize) {
        // it's within here.
        if (node.type == 'container') {
          nodes = node.list
          break b
        } else {
          let relativeIndex = index - startSize
          // grow if less than max
          if (relativeIndex > node.size - 1 || isPowerOfTwo(node.size)) {
            if (node.size == 32) {
              // grow somehow
            } else {
              let newArray = new Array(node.size * 2)
              node.list.forEach((x, i) => newArray[i] = x)
              node.list = newArray
            }
          }
          // todo: shift the items over to make room for this item
          break a
        }
      }
    }
  }
}

const removeItemAt = (tree, item, index) => {

}

const appendItemToEndOfTree = (tree, item) => {

}

const getLeafContainingIndex = (tree, index) => {
  if (index > tree.size - 1) {
    return { node: null, index: -1 }
  }

  let nodes = [tree]
  let startSize = 0
  a:
  while (true) {
    b:
    for (let i = 0, n = nodes.length; i < n; i++) {
      let node = nodes[i]
      let endSize = startSize + node.size
      if (startSize <= index && index < endSize) {
        if (node.type == 'container') {
          nodes = node.list
          break b
        } else {
          let relativeIndex = index - startSize
          return { node, index: relativeIndex }
        }
      } else {
        startSize = endSize
      }
    }
  }
}

const getItemAt = (tree, getIndex) => {
  const { node, index } = getLeafContainingIndex(tree, getIndex)
  if (!node) return null
  return node.list[index]
}

const isPowerOfTwo = (x) => {
  return (x != 0) && ((x & (x - 1)) == 0)
}

const log = tree => console.log(JSON.stringify(tree.list))

// demo
const tree = createTree()

setItemAt(tree, { pos: '1st', attempt: 1 }, 0)
log(tree)
setItemAt(tree, { pos: '2nd', attempt: 2 }, 1)
log(tree)
setItemAt(tree, { pos: '2nd', attempt: 3 }, 1)
log(tree)
setItemAt(tree, { pos: '3rd', attempt: 4 }, 2)
log(tree)
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The requirements are close to what a B+ tree offers. A 32-ary B+ tree would provide these properties:

  • The leaves of the tree are all at the same level
  • The actual data is stored only in the leaves
  • All internal nodes, except the root, have at least 16 children (not counting the null fillers)
  • All leaves, except the root, have at least 16 values (not counting the null fillers)
  • If the root is not a leaf, then it will have at least 2 children

In addition it has this useful feature:

  • Leaf nodes are commonly maintained in a linked list, so that iteration of that list will visit the data in its intended order

B+ trees are search trees, which is a feature that does not match your requirements. So that means the typical keys, that are stored in the internal nodes of a B+ tree, are not needed here.

On the other hand, you require that elements can be identified by index. As you already suggested, you can extend each node with a property that provides the total count of data values that are stored in the leaves of the subtree rooted in that node. This will allow to find a data value by index in logarithmic time.

Dynamic node sizes

As to the requirement that node sizes should be powers of 2 up to 32: B+ trees do not provide variable node sizes. But note that all nodes in this B+ tree are guaranteed to have at least 16 used entries. The only exception is the root, which could have any number of used entries.

So in a first version of my answer I did not focus too much on this requirement: implementing it would mean that sometimes you could save some space in an non-root node by limiting its size to 16 instead of 32. But the very next insertion in that node would require it to extend (again) to a size of 32. So I considered it might not be worth the effort. Adapting the record size for the root would also not contribute much gain as it just applies to that single node.

After a comment about this, I adapted the implementation, so each node will reduce or extend its size to the previous/next power of 2 as soon as possible/needed. This means that non-root nodes may sometimes get their size reduced to 16 (while they have 16 items/children), and the root can have any of the possible powers (1, 2, 4, 8, 16, or 32) as size.

Other implementation choices

In line with your preference I avoided the use of recursion.

I opted to include the following properties in each node:

  • children: this is the list of either child nodes or (in case of a leaf) of data values. This list is an array with 1, 2, 4, 8, 16 or 32 slots, where the non-used slots are filled with null. This is very un-JavaScript like, but I understand you are actually targeting a different language, so I went with it.
  • childCount: this indicates how many slots in the above array are actually used. We could do without this, if we could assume that null can never be used as data, and an occurrence would indicate the end of the real content. But anyway, I went for this property. So in theory, the content could now actually include intended null values.
  • treeSize. this is the total number of data items that are in the leaves of the subtree that is rooted in this node. If this is itself a leaf node, then it will always be equal to childCount.
  • parent. B+ trees don't really need a back reference from a child to a parent, but I went with it for this implementation, also because it makes it somewhat easier to provide a non-recursion based implementation (which you seem to prefer).
  • prev, next: these properties reference sibling nodes (in the same level), so that each level of the tree has its nodes in one, doubly linked list. B+ trees commonly have this at the bottom level only, but it is also handy to have at the other levels.

B+ Tree Algorithm

Insertion

You already sketched an algorithm for insertion. It would indeed go like this:

  • Locate the leaf where the value should be inserted
  • If that leaf has room for it, insert the value there and update the treeSize (you called it size), propagating that increase upward in the tree up to the root.
  • Otherwise, check if the node has a neighbor to which it could shift some items, so to make room for the new value. If so, go do it, and stop.
  • Otherwise, create a new leaf, and move half of the node's values into it. Then there is room to insert the value. Depending on the index, it will be in the old or new node
  • Now the task is to insert the new node as sibling in the parent node. Use the same algorithm to perform that action, but on the level above.

If the root needs to split, then create a new root node that will have the two split nodes as its children.

During the execution of this algorithm, the properties of the impacted nodes should of course be well maintained.

Deletion

  • Locate the leaf where the value should be deleted.
  • Remove the value from that leaf, and update the treeSize property, and also upward in the tree up to the root.
  • If the node is the root, or the node has more than 16 used slots, then stop.
  • The node has too few values, so look at a neighbor to see whether it could merge with this one, or otherwise share some of its entries for redistribution.
  • If the items in the chosen neighbor and this one could not fit in one node, then redistribute those items, so each still has at least 16 of them, and stop. There is a boundary case where the node has 16 items, but the neighbor has 17. In that case there is no advantage in redistributing values. Then also stop.
  • Otherwise merge the items from the chosen neighbor into the current node.
  • Repeat this algorithm for deleting the now empty neighbor from the level above.

If the root ends up with just one child node, then make the root to be that single child node, removing the top level.

Implementation

Below you'll find an implementation of a Tree class. It includes the methods you were asking for:

  • getItemAt
  • setItemAt
  • removeItemAt
  • insertItemAt

It also includes some extra methods:

  • Symbol.iterator: this makes the tree instance iterable. This allows for easy iteration of values, using the linked list of the bottom level of the tree.

  • print: speaks for itself

  • verify: this visits every node of the tree in a breadth-first traversal, checking that all conditions are met, and no inconsistencies exist. Among many other tests, it also verifies that each node has a fill factor of more than 50%, or otherwise put: that the array size is the least possible power of two to host the content. If a test fails, a simple exception will be thrown. I didn't put any effort in providing context to the error: they should not ever occur.

Snippet

Running the snippet below will create one tree instance, and perform 1000 insertions, 1000 update/retrievals, and 1000 deletions. In parallel the same actions are done on a 1-dimensional array, using splice. After each step, the values from the tree iteration are compared with the array, and the consistency of the tree is verified. The test is performed with a maximum node capacity of 8 (instead of 32), so the tree grows faster vertically, and a lot more shifting, splitting and merging needs to happen.

class Node {
    constructor(capacity) {
        // Mimic fixed-size array (avoid accidentally growing it)
        this.children = Object.seal(Array(capacity).fill(null));
        this.childCount = 0; // Number of used slots in children array
        this.treeSize = 0; // Total number of values in this subtree
        // Maintain back-link to parent.
        this.parent = null;
        // Per level in the tree, maintain a doubly linked list
        this.prev = this.next = null;
    }
    setCapacity(capacity) {
        if (capacity < 1) return;
        // Here we make a new array, and copy the data into it
        let children = Object.seal(Array(capacity).fill(null));
        for (let i = 0; i < this.childCount; i++) children[i] = this.children[i];
        this.children = children;
    }
    isLeaf() {
        return !(this.children[0] instanceof Node);
    }
    index() {
        return this.parent.children.indexOf(this);
    }
    updateTreeSize(start, end, sign=1) {        
        let sum = 0;
        if (this.isLeaf()) {
            sum = end - start;
        } else {
            for (let i = start; i < end; i++) sum += this.children[i].treeSize;
        }
        if (!sum) return;
        sum *= sign;
        // Apply the sum change to this node and all its ancestors
        for (let node = this; node; node = node.parent) {
            node.treeSize += sum;
        }
    }
    wipe(start, end) {
        this.updateTreeSize(start, end, -1);
        this.children.copyWithin(start, end, this.childCount);
        for (let i = this.childCount - end + start; i < this.childCount; i++) {
            this.children[i] = null;
        }
        this.childCount -= end - start;
        // Reduce allocated size if possible
        if (this.childCount * 2 <= this.children.length) this.setCapacity(this.children.length / 2);
    }
    moveFrom(neighbor, target, start, count=1) {
        // Note: `start` can have two meanings:
        //   if neighbor is null, it is the value/Node to move to the target
        //   if neighbor is a Node, it is the index from where value(s) have to be moved to the target
        // Make room in target node
        if (this.childCount + count > this.children.length) this.setCapacity(this.children.length * 2);
        this.children.copyWithin(target + count, target, Math.max(target + count, this.childCount));
        this.childCount += count;
        if (neighbor !== null) {
            // Copy the children
            for (let i = 0; i < count; i++) {
                this.children[target + i] = neighbor.children[start + i];
            }
            // Remove the original references
            neighbor.wipe(start, start + count);
        } else {
            this.children[target] = start; // start is value to insert
        }
        this.updateTreeSize(target, target + count, 1);
        // Set parent link(s)
        if (!this.isLeaf()) {
            for (let i = 0; i < count; i++) {
                this.children[target + i].parent = this;
            }
        }
    }
    moveToNext(count) {
        this.next.moveFrom(this, 0, this.childCount - count, count);
    }
    moveFromNext(count) {
        this.moveFrom(this.next, this.childCount, 0, count);
    }
    basicRemove(index) {
        if (!this.isLeaf()) {
            // Take node out of the level's linked list
            let prev = this.children[index].prev;
            let next = this.children[index].next;
            if (prev) prev.next = next;
            if (next) next.prev = prev;
        }
        this.wipe(index, index + 1);
    }
    basicInsert(index, value) {
        this.

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...