blob: f30ce304351db725cfe5b2986426fa1d64328bfa [file] [log] [blame]
Guido van Rossum4b48d6b2002-08-02 20:23:56 +00001# -*- coding: Latin-1 -*-
2
Guido van Rossum0a824382002-08-02 16:44:32 +00003"""Heap queue algorithm (a.k.a. priority queue).
4
5Heaps are arrays for which a[k] <= a[2*k+1] and a[k] <= a[2*k+2] for
6all k, counting elements from 0. For the sake of comparison,
7non-existing elements are considered to be infinite. The interesting
8property of a heap is that a[0] is always its smallest element.
9
10Usage:
11
12heap = [] # creates an empty heap
13heappush(heap, item) # pushes a new item on the heap
14item = heappop(heap) # pops the smallest item from the heap
15item = heap[0] # smallest item on the heap without popping it
Tim Peters28c25522002-08-02 21:48:06 +000016heapify(heap) # transform list into a heap, in-place, in linear time
Guido van Rossum0a824382002-08-02 16:44:32 +000017
18Our API differs from textbook heap algorithms as follows:
19
20- We use 0-based indexing. This makes the relationship between the
21 index for a node and the indexes for its children slightly less
22 obvious, but is more suitable since Python uses 0-based indexing.
23
24- Our heappop() method returns the smallest item, not the largest.
25
26These two make it possible to view the heap as a regular Python list
27without surprises: heap[0] is the smallest item, and heap.sort()
28maintains the heap invariant!
29"""
30
Guido van Rossum37c3b272002-08-02 16:50:58 +000031# Code by Kevin O'Connor
32
Guido van Rossum0a824382002-08-02 16:44:32 +000033__about__ = """Heap queues
34
35[explanation by François Pinard]
36
37Heaps are arrays for which a[k] <= a[2*k+1] and a[k] <= a[2*k+2] for
38all k, counting elements from 0. For the sake of comparison,
39non-existing elements are considered to be infinite. The interesting
40property of a heap is that a[0] is always its smallest element.
41
42The strange invariant above is meant to be an efficient memory
43representation for a tournament. The numbers below are `k', not a[k]:
44
45 0
46
47 1 2
48
49 3 4 5 6
50
51 7 8 9 10 11 12 13 14
52
53 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
54
55
56In the tree above, each cell `k' is topping `2*k+1' and `2*k+2'. In
57an usual binary tournament we see in sports, each cell is the winner
58over the two cells it tops, and we can trace the winner down the tree
59to see all opponents s/he had. However, in many computer applications
60of such tournaments, we do not need to trace the history of a winner.
61To be more memory efficient, when a winner is promoted, we try to
62replace it by something else at a lower level, and the rule becomes
63that a cell and the two cells it tops contain three different items,
64but the top cell "wins" over the two topped cells.
65
66If this heap invariant is protected at all time, index 0 is clearly
67the overall winner. The simplest algorithmic way to remove it and
68find the "next" winner is to move some loser (let's say cell 30 in the
69diagram above) into the 0 position, and then percolate this new 0 down
70the tree, exchanging values, until the invariant is re-established.
71This is clearly logarithmic on the total number of items in the tree.
72By iterating over all items, you get an O(n ln n) sort.
73
74A nice feature of this sort is that you can efficiently insert new
75items while the sort is going on, provided that the inserted items are
76not "better" than the last 0'th element you extracted. This is
77especially useful in simulation contexts, where the tree holds all
78incoming events, and the "win" condition means the smallest scheduled
79time. When an event schedule other events for execution, they are
80scheduled into the future, so they can easily go into the heap. So, a
81heap is a good structure for implementing schedulers (this is what I
82used for my MIDI sequencer :-).
83
84Various structures for implementing schedulers have been extensively
85studied, and heaps are good for this, as they are reasonably speedy,
86the speed is almost constant, and the worst case is not much different
87than the average case. However, there are other representations which
88are more efficient overall, yet the worst cases might be terrible.
89
90Heaps are also very useful in big disk sorts. You most probably all
91know that a big sort implies producing "runs" (which are pre-sorted
92sequences, which size is usually related to the amount of CPU memory),
93followed by a merging passes for these runs, which merging is often
94very cleverly organised[1]. It is very important that the initial
95sort produces the longest runs possible. Tournaments are a good way
96to that. If, using all the memory available to hold a tournament, you
97replace and percolate items that happen to fit the current run, you'll
98produce runs which are twice the size of the memory for random input,
99and much better for input fuzzily ordered.
100
101Moreover, if you output the 0'th item on disk and get an input which
102may not fit in the current tournament (because the value "wins" over
103the last output value), it cannot fit in the heap, so the size of the
104heap decreases. The freed memory could be cleverly reused immediately
105for progressively building a second heap, which grows at exactly the
106same rate the first heap is melting. When the first heap completely
107vanishes, you switch heaps and start a new run. Clever and quite
108effective!
109
110In a word, heaps are useful memory structures to know. I use them in
111a few applications, and I think it is good to keep a `heap' module
112around. :-)
113
114--------------------
115[1] The disk balancing algorithms which are current, nowadays, are
116more annoying than clever, and this is a consequence of the seeking
117capabilities of the disks. On devices which cannot seek, like big
118tape drives, the story was quite different, and one had to be very
119clever to ensure (far in advance) that each tape movement will be the
120most effective possible (that is, will best participate at
121"progressing" the merge). Some tapes were even able to read
122backwards, and this was also used to avoid the rewinding time.
123Believe me, real good tape sorts were quite spectacular to watch!
124From all times, sorting has always been a Great Art! :-)
125"""
126
127def heappush(heap, item):
128 """Push item onto heap, maintaining the heap invariant."""
129 pos = len(heap)
130 heap.append(None)
131 while pos:
Tim Petersd9ea39d2002-08-02 19:16:44 +0000132 parentpos = (pos - 1) >> 1
Guido van Rossum0a824382002-08-02 16:44:32 +0000133 parent = heap[parentpos]
134 if item >= parent:
135 break
136 heap[pos] = parent
137 pos = parentpos
138 heap[pos] = item
139
Tim Peters28c25522002-08-02 21:48:06 +0000140# The child indices of heap index pos are already heaps, and we want to make
141# a heap at index pos too.
142def _siftdown(heap, pos):
143 endpos = len(heap)
144 assert pos < endpos
145 item = heap[pos]
146 # Sift item into position, down from pos, moving the smaller
Tim Peters62abc2f2002-08-02 20:09:14 +0000147 # child up, until finding pos such that item <= pos's children.
148 childpos = 2*pos + 1 # leftmost child position
149 while childpos < endpos:
150 # Set childpos and child to reflect smaller child.
151 child = heap[childpos]
152 rightpos = childpos + 1
153 if rightpos < endpos:
154 rightchild = heap[rightpos]
155 if rightchild < child:
156 childpos = rightpos
157 child = rightchild
158 # If item is no larger than smaller child, we're done, else
159 # move the smaller child up.
160 if item <= child:
161 break
162 heap[pos] = child
163 pos = childpos
164 childpos = 2*pos + 1
Guido van Rossum0a824382002-08-02 16:44:32 +0000165 heap[pos] = item
Tim Peters28c25522002-08-02 21:48:06 +0000166
167def heappop(heap):
168 """Pop the smallest item off the heap, maintaining the heap invariant."""
169 lastelt = heap.pop() # raises appropriate IndexError if heap is empty
170 if heap:
171 returnitem = heap[0]
172 heap[0] = lastelt
173 _siftdown(heap, 0)
174 else:
175 returnitem = lastelt
Guido van Rossum0a824382002-08-02 16:44:32 +0000176 return returnitem
177
Tim Peters28c25522002-08-02 21:48:06 +0000178def heapify(heap):
179 """Transform list heap into a heap, in-place, in O(len(heap)) time."""
180 n = len(heap)
181 # Transform bottom-up. The largest index there's any point to looking at
182 # is the largest with a child index in-range, so must have 2*i + 1 < n,
183 # or i < (n-1)/2. If n is even = 2*j, this is (2*j-1)/2 = j-1/2 so
184 # j-1 is the largest, which is n//2 - 1. If n is odd = 2*j+1, this is
185 # (2*j+1-1)/2 = j so j-1 is the largest, and that's again n//2-1.
186 for i in xrange(n//2 - 1, -1, -1):
187 _siftdown(heap, i)
188
Guido van Rossum0a824382002-08-02 16:44:32 +0000189if __name__ == "__main__":
190 # Simple sanity test
191 heap = []
192 data = [1, 3, 5, 7, 9, 2, 4, 6, 8, 0]
193 for item in data:
194 heappush(heap, item)
195 sort = []
196 while heap:
197 sort.append(heappop(heap))
198 print sort