blob: fe8c4113badcd2b0d99b71e81e431d39f70b7d0a [file] [log] [blame]
Guido van Rossum97512162002-08-02 18:03:24 +00001\section{\module{heapq} ---
2 Heap queue algorithm}
3
4\declaremodule{standard}{heapq}
5\modulesynopsis{Heap queue algorithm (a.k.a. priority queue).}
6\sectionauthor{Guido van Rossum}{guido@python.org}
7% Implementation contributed by Kevin O'Connor
8% Theoretical explanation by François Pinard
9
10
11This module provides an implementation of the heap queue algorithm,
12also known as the priority queue algorithm.
13\versionadded{2.3}
14
15Heaps are arrays for which
16\code{\var{heap}[\var{k}] <= \var{heap}[2*\var{k}+1]} and
17\code{\var{heap}[\var{k}] <= \var{heap}[2*\var{k}+2]}
18for all \var{k}, counting elements from zero. For the sake of
19comparison, non-existing elements are considered to be infinite. The
20interesting property of a heap is that \code{\var{heap}[0]} is always
21its smallest element.
22
23The API below differs from textbook heap algorithms in two aspects:
24(a) We use zero-based indexing. This makes the relationship between the
25index for a node and the indexes for its children slightly less
26obvious, but is more suitable since Python uses zero-based indexing.
27(b) Our pop method returns the smallest item, not the largest.
28
29These two make it possible to view the heap as a regular Python list
30without surprises: \code{\var{heap}[0]} is the smallest item, and
31\code{\var{heap}.sort()} maintains the heap invariant!
32
33To create a heap, use a list initialized to \code{[]}.
34
35The following functions are provided:
36
37\begin{funcdesc}{heappush}{heap, item}
38Push the value \var{item} onto the \var{heap}, maintaining the
39heap invariant.
40\end{funcdesc}
41
42\begin{funcdesc}{heappop}{heap}
43Pop and return the smallest item from the \var{heap}, maintaining the
44heap invariant.
45\end{funcdesc}
46
47Example of use:
48
49\begin{verbatim}
50>>> from heapq import heappush, heappop
51>>> heap = []
52>>> data = [1, 3, 5, 7, 9, 2, 4, 6, 8, 0]
53>>> for item in data:
54... heappush(heap, item)
55...
56>>> sorted = []
57>>> while heap:
58... sorted.append(heappop(heap))
59...
60>>> print sorted
61[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
62>>> data.sort()
63>>> print data == sorted
64True
65>>>
66\end{verbatim}
67
68
69\subsection{Theory}
70
71(This explanation is due to François Pinard. The Python
72code for this module was contributed by Kevin O'Connor.)
73
74Heaps are arrays for which \code{a[\var{k}] <= a[2*\var{k}+1]} and
75\code{a[\var{k}] <= a[2*\var{k}+2]}
76for all \var{k}, counting elements from 0. For the sake of comparison,
77non-existing elements are considered to be infinite. The interesting
78property of a heap is that \code{a[0]} is always its smallest element.
79
80The strange invariant above is meant to be an efficient memory
81representation for a tournament. The numbers below are \var{k}, not
82\code{a[\var{k}]}:
83
84\begin{verbatim}
85 0
86
87 1 2
88
89 3 4 5 6
90
91 7 8 9 10 11 12 13 14
92
93 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
94\end{verbatim}
95
96In the tree above, each cell \var{k} is topping \code{2*\var{k}+1} and
97\code{2*\var{k}+2}.
98In an usual binary tournament we see in sports, each cell is the winner
99over the two cells it tops, and we can trace the winner down the tree
100to see all opponents s/he had. However, in many computer applications
101of such tournaments, we do not need to trace the history of a winner.
102To be more memory efficient, when a winner is promoted, we try to
103replace it by something else at a lower level, and the rule becomes
104that a cell and the two cells it tops contain three different items,
105but the top cell "wins" over the two topped cells.
106
107If this heap invariant is protected at all time, index 0 is clearly
108the overall winner. The simplest algorithmic way to remove it and
109find the "next" winner is to move some loser (let's say cell 30 in the
110diagram above) into the 0 position, and then percolate this new 0 down
111the tree, exchanging values, until the invariant is re-established.
112This is clearly logarithmic on the total number of items in the tree.
113By iterating over all items, you get an O(n log n) sort.
114
115A nice feature of this sort is that you can efficiently insert new
116items while the sort is going on, provided that the inserted items are
117not "better" than the last 0'th element you extracted. This is
118especially useful in simulation contexts, where the tree holds all
119incoming events, and the "win" condition means the smallest scheduled
120time. When an event schedule other events for execution, they are
121scheduled into the future, so they can easily go into the heap. So, a
122heap is a good structure for implementing schedulers (this is what I
123used for my MIDI sequencer :-).
124
125Various structures for implementing schedulers have been extensively
126studied, and heaps are good for this, as they are reasonably speedy,
127the speed is almost constant, and the worst case is not much different
128than the average case. However, there are other representations which
129are more efficient overall, yet the worst cases might be terrible.
130
131Heaps are also very useful in big disk sorts. You most probably all
132know that a big sort implies producing "runs" (which are pre-sorted
133sequences, which size is usually related to the amount of CPU memory),
134followed by a merging passes for these runs, which merging is often
135very cleverly organised\footnote{The disk balancing algorithms which
136are current, nowadays, are
137more annoying than clever, and this is a consequence of the seeking
138capabilities of the disks. On devices which cannot seek, like big
139tape drives, the story was quite different, and one had to be very
140clever to ensure (far in advance) that each tape movement will be the
141most effective possible (that is, will best participate at
142"progressing" the merge). Some tapes were even able to read
143backwards, and this was also used to avoid the rewinding time.
144Believe me, real good tape sorts were quite spectacular to watch!
145From all times, sorting has always been a Great Art! :-)}.
146It is very important that the initial
147sort produces the longest runs possible. Tournaments are a good way
148to that. If, using all the memory available to hold a tournament, you
149replace and percolate items that happen to fit the current run, you'll
150produce runs which are twice the size of the memory for random input,
151and much better for input fuzzily ordered.
152
153Moreover, if you output the 0'th item on disk and get an input which
154may not fit in the current tournament (because the value "wins" over
155the last output value), it cannot fit in the heap, so the size of the
156heap decreases. The freed memory could be cleverly reused immediately
157for progressively building a second heap, which grows at exactly the
158same rate the first heap is melting. When the first heap completely
159vanishes, you switch heaps and start a new run. Clever and quite
160effective!
161
162In a word, heaps are useful memory structures to know. I use them in
163a few applications, and I think it is good to keep a `heap' module
164around. :-)