Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 1 | The prio_tree.c code indexes vmas using 3 different indexes: |
| 2 | * heap_index = vm_pgoff + vm_size_in_pages : end_vm_pgoff |
| 3 | * radix_index = vm_pgoff : start_vm_pgoff |
| 4 | * size_index = vm_size_in_pages |
| 5 | |
| 6 | A regular radix-priority-search-tree indexes vmas using only heap_index and |
| 7 | radix_index. The conditions for indexing are: |
| 8 | * ->heap_index >= ->left->heap_index && |
| 9 | ->heap_index >= ->right->heap_index |
| 10 | * if (->heap_index == ->left->heap_index) |
| 11 | then ->radix_index < ->left->radix_index; |
| 12 | * if (->heap_index == ->right->heap_index) |
| 13 | then ->radix_index < ->right->radix_index; |
| 14 | * nodes are hashed to left or right subtree using radix_index |
| 15 | similar to a pure binary radix tree. |
| 16 | |
| 17 | A regular radix-priority-search-tree helps to store and query |
| 18 | intervals (vmas). However, a regular radix-priority-search-tree is only |
| 19 | suitable for storing vmas with different radix indices (vm_pgoff). |
| 20 | |
| 21 | Therefore, the prio_tree.c extends the regular radix-priority-search-tree |
| 22 | to handle many vmas with the same vm_pgoff. Such vmas are handled in |
| 23 | 2 different ways: 1) All vmas with the same radix _and_ heap indices are |
| 24 | linked using vm_set.list, 2) if there are many vmas with the same radix |
| 25 | index, but different heap indices and if the regular radix-priority-search |
| 26 | tree cannot index them all, we build an overflow-sub-tree that indexes such |
| 27 | vmas using heap and size indices instead of heap and radix indices. For |
| 28 | example, in the figure below some vmas with vm_pgoff = 0 (zero) are |
| 29 | indexed by regular radix-priority-search-tree whereas others are pushed |
| 30 | into an overflow-subtree. Note that all vmas in an overflow-sub-tree have |
| 31 | the same vm_pgoff (radix_index) and if necessary we build different |
| 32 | overflow-sub-trees to handle each possible radix_index. For example, |
| 33 | in figure we have 3 overflow-sub-trees corresponding to radix indices |
| 34 | 0, 2, and 4. |
| 35 | |
| 36 | In the final tree the first few (prio_tree_root->index_bits) levels |
| 37 | are indexed using heap and radix indices whereas the overflow-sub-trees below |
| 38 | those levels (i.e. levels prio_tree_root->index_bits + 1 and higher) are |
| 39 | indexed using heap and size indices. In overflow-sub-trees the size_index |
| 40 | is used for hashing the nodes to appropriate places. |
| 41 | |
| 42 | Now, an example prio_tree: |
| 43 | |
| 44 | vmas are represented [radix_index, size_index, heap_index] |
| 45 | i.e., [start_vm_pgoff, vm_size_in_pages, end_vm_pgoff] |
| 46 | |
| 47 | level prio_tree_root->index_bits = 3 |
| 48 | ----- |
| 49 | _ |
| 50 | 0 [0,7,7] | |
| 51 | / \ | |
| 52 | ------------------ ------------ | Regular |
| 53 | / \ | radix priority |
| 54 | 1 [1,6,7] [4,3,7] | search tree |
| 55 | / \ / \ | |
| 56 | ------- ----- ------ ----- | heap-and-radix |
| 57 | / \ / \ | indexed |
| 58 | 2 [0,6,6] [2,5,7] [5,2,7] [6,1,7] | |
| 59 | / \ / \ / \ / \ | |
| 60 | 3 [0,5,5] [1,5,6] [2,4,6] [3,4,7] [4,2,6] [5,1,6] [6,0,6] [7,0,7] | |
| 61 | / / / _ |
| 62 | / / / _ |
| 63 | 4 [0,4,4] [2,3,5] [4,1,5] | |
| 64 | / / / | |
| 65 | 5 [0,3,3] [2,2,4] [4,0,4] | Overflow-sub-trees |
| 66 | / / | |
| 67 | 6 [0,2,2] [2,1,3] | heap-and-size |
| 68 | / / | indexed |
| 69 | 7 [0,1,1] [2,0,2] | |
| 70 | / | |
| 71 | 8 [0,0,0] | |
| 72 | _ |
| 73 | |
| 74 | Note that we use prio_tree_root->index_bits to optimize the height |
| 75 | of the heap-and-radix indexed tree. Since prio_tree_root->index_bits is |
| 76 | set according to the maximum end_vm_pgoff mapped, we are sure that all |
| 77 | bits (in vm_pgoff) above prio_tree_root->index_bits are 0 (zero). Therefore, |
| 78 | we only use the first prio_tree_root->index_bits as radix_index. |
| 79 | Whenever index_bits is increased in prio_tree_expand, we shuffle the tree |
| 80 | to make sure that the first prio_tree_root->index_bits levels of the tree |
| 81 | is indexed properly using heap and radix indices. |
| 82 | |
| 83 | We do not optimize the height of overflow-sub-trees using index_bits. |
| 84 | The reason is: there can be many such overflow-sub-trees and all of |
| 85 | them have to be suffled whenever the index_bits increases. This may involve |
| 86 | walking the whole prio_tree in prio_tree_insert->prio_tree_expand code |
| 87 | path which is not desirable. Hence, we do not optimize the height of the |
| 88 | heap-and-size indexed overflow-sub-trees using prio_tree->index_bits. |
| 89 | Instead the overflow sub-trees are indexed using full BITS_PER_LONG bits |
| 90 | of size_index. This may lead to skewed sub-trees because most of the |
Paolo Ornati | 670e9f3 | 2006-10-03 22:57:56 +0200 | [diff] [blame] | 91 | higher significant bits of the size_index are likely to be 0 (zero). In |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 92 | the example above, all 3 overflow-sub-trees are skewed. This may marginally |
| 93 | affect the performance. However, processes rarely map many vmas with the |
| 94 | same start_vm_pgoff but different end_vm_pgoffs. Therefore, we normally |
| 95 | do not require overflow-sub-trees to index all vmas. |
| 96 | |
| 97 | From the above discussion it is clear that the maximum height of |
| 98 | a prio_tree can be prio_tree_root->index_bits + BITS_PER_LONG. |
| 99 | However, in most of the common cases we do not need overflow-sub-trees, |
| 100 | so the tree height in the common cases will be prio_tree_root->index_bits. |
| 101 | |
| 102 | It is fair to mention here that the prio_tree_root->index_bits |
| 103 | is increased on demand, however, the index_bits is not decreased when |
| 104 | vmas are removed from the prio_tree. That's tricky to do. Hence, it's |
| 105 | left as a home work problem. |
| 106 | |
| 107 | |