Blame - Doc/library/heapq.rst - platform/external/python/cpython2

blob: 8e6fd2d3ce5db853f43629ae5474096d73660ad8 [file] [log] [blame]

Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	1	:mod:`heapq` --- Heap queue algorithm
				2	=====================================
				3
				4	.. module:: heapq
				5	:synopsis: Heap queue algorithm (a.k.a. priority queue).
				6	.. moduleauthor:: Kevin O'Connor
				7	.. sectionauthor:: Guido van Rossum <guido@python.org>
				8	.. sectionauthor:: François Pinard
Raymond Hettinger	fb4c604	2010-08-07 23:35:52 +0000	[diff] [blame^]	9	.. sectionauthor:: Raymond Hettinger
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	10
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	11	.. versionadded:: 2.3
				12
				13	This module provides an implementation of the heap queue algorithm, also known
				14	as the priority queue algorithm.
				15
				16	Heaps are arrays for which ``heap[k] <= heap[2*k+1]`` and ``heap[k] <=
				17	heap[2k+2]`` for all k*, counting elements from zero. For the sake of
				18	comparison, non-existing elements are considered to be infinite. The
				19	interesting property of a heap is that ``heap[0]`` is always its smallest
				20	element.
				21
				22	The API below differs from textbook heap algorithms in two aspects: (a) We use
				23	zero-based indexing. This makes the relationship between the index for a node
				24	and the indexes for its children slightly less obvious, but is more suitable
				25	since Python uses zero-based indexing. (b) Our pop method returns the smallest
				26	item, not the largest (called a "min heap" in textbooks; a "max heap" is more
				27	common in texts because of its suitability for in-place sorting).
				28
				29	These two make it possible to view the heap as a regular Python list without
				30	surprises: ``heap[0]`` is the smallest item, and ``heap.sort()`` maintains the
				31	heap invariant!
				32
				33	To create a heap, use a list initialized to ``[]``, or you can transform a
				34	populated list into a heap via function :func:`heapify`.
				35
				36	The following functions are provided:
				37
				38
				39	.. function:: heappush(heap, item)
				40
				41	Push the value item onto the heap, maintaining the heap invariant.
				42
				43
				44	.. function:: heappop(heap)
				45
				46	Pop and return the smallest item from the heap, maintaining the heap
				47	invariant. If the heap is empty, :exc:`IndexError` is raised.
				48
Raymond Hettinger	53bdf09	2008-03-13 19:03:51 +0000	[diff] [blame]	49	.. function:: heappushpop(heap, item)
				50
				51	Push item on the heap, then pop and return the smallest item from the
				52	heap. The combined action runs more efficiently than :func:`heappush`
				53	followed by a separate call to :func:`heappop`.
				54
				55	.. versionadded:: 2.6
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	56
				57	.. function:: heapify(x)
				58
				59	Transform list x into a heap, in-place, in linear time.
				60
				61
				62	.. function:: heapreplace(heap, item)
				63
				64	Pop and return the smallest item from the heap, and also push the new item.
				65	The heap size doesn't change. If the heap is empty, :exc:`IndexError` is raised.
				66	This is more efficient than :func:`heappop` followed by :func:`heappush`, and
				67	can be more appropriate when using a fixed-size heap. Note that the value
				68	returned may be larger than item! That constrains reasonable uses of this
				69	routine unless written as part of a conditional replacement::
				70
				71	if item > heap[0]:
				72	item = heapreplace(heap, item)
				73
Georg Brandl	e8f1b00	2008-03-22 22:04:10 +0000	[diff] [blame]	74	Example of use:
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	75
				76	>>> from heapq import heappush, heappop
				77	>>> heap = []
				78	>>> data = [1, 3, 5, 7, 9, 2, 4, 6, 8, 0]
				79	>>> for item in data:
				80	... heappush(heap, item)
				81	...
				82	>>> ordered = []
				83	>>> while heap:
				84	... ordered.append(heappop(heap))
				85	...
				86	>>> print ordered
				87	[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
				88	>>> data.sort()
				89	>>> print data == ordered
				90	True
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	91
Georg Brandl	32d1408	2008-12-04 18:59:16 +0000	[diff] [blame]	92	Using a heap to insert items at the correct place in a priority queue:
				93
				94	>>> heap = []
				95	>>> data = [(1, 'J'), (4, 'N'), (3, 'H'), (2, 'O')]
				96	>>> for item in data:
				97	... heappush(heap, item)
				98	...
				99	>>> while heap:
				100	... print heappop(heap)[1]
				101	J
				102	O
				103	H
				104	N
				105
Georg Brandl	c62ef8b	2009-01-03 20:55:06 +0000	[diff] [blame]	106
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	107	The module also offers three general purpose functions based on heaps.
				108
				109
				110	.. function:: merge(*iterables)
				111
				112	Merge multiple sorted inputs into a single sorted output (for example, merge
Georg Brandl	e7a0990	2007-10-21 12:10:28 +0000	[diff] [blame]	113	timestamped entries from multiple log files). Returns an :term:`iterator`
Georg Brandl	92b70bc	2008-10-17 21:41:49 +0000	[diff] [blame]	114	over the sorted values.
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	115
				116	Similar to ``sorted(itertools.chain(*iterables))`` but returns an iterable, does
				117	not pull the data into memory all at once, and assumes that each of the input
				118	streams is already sorted (smallest to largest).
				119
				120	.. versionadded:: 2.6
				121
				122
				123	.. function:: nlargest(n, iterable[, key])
				124
				125	Return a list with the n largest elements from the dataset defined by
				126	iterable. key, if provided, specifies a function of one argument that is
				127	used to extract a comparison key from each element in the iterable:
				128	``key=str.lower`` Equivalent to: ``sorted(iterable, key=key,
				129	reverse=True)[:n]``
				130
				131	.. versionadded:: 2.4
				132
				133	.. versionchanged:: 2.5
				134	Added the optional key argument.
				135
				136
				137	.. function:: nsmallest(n, iterable[, key])
				138
				139	Return a list with the n smallest elements from the dataset defined by
				140	iterable. key, if provided, specifies a function of one argument that is
				141	used to extract a comparison key from each element in the iterable:
				142	``key=str.lower`` Equivalent to: ``sorted(iterable, key=key)[:n]``
				143
				144	.. versionadded:: 2.4
				145
				146	.. versionchanged:: 2.5
				147	Added the optional key argument.
				148
				149	The latter two functions perform best for smaller values of n. For larger
				150	values, it is more efficient to use the :func:`sorted` function. Also, when
Georg Brandl	d7d4fd7	2009-07-26 14:37:28 +0000	[diff] [blame]	151	``n==1``, it is more efficient to use the built-in :func:`min` and :func:`max`
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	152	functions.
				153
				154
Raymond Hettinger	fb4c604	2010-08-07 23:35:52 +0000	[diff] [blame^]	155	Priority Queue Implementation Notes
				156	-----------------------------------
				157
				158	A `priority queue <http://en.wikipedia.org/wiki/Priority_queue>`_ is common use
				159	for a heap, and it presents several implementation challenges:
				160
				161	* Sort stability: how do you get two tasks with equal priorities to be returned
				162	in the order they were originally added?
				163
				164	* In the future with Python 3, tuple comparison breaks for (priority, task)
				165	pairs if the priorities are equal and the tasks do not have a default
				166	comparison order.
				167
				168	* If the priority of a task changes, how do you move it to a new position in
				169	the heap?
				170
				171	* Or if a pending task needs to be deleted, how do you find it and remove it
				172	from the queue?
				173
				174	A solution to the first two challenges is to store entries as 3-element list
				175	including the priority, an entry count, and the task. The entry count serves as
				176	a tie-breaker so that two tasks with the same priority are returned in the order
				177	they were added. And since no two entry counts are the same, the tuple
				178	comparison will never attempt to directly compare two tasks.
				179
				180	The remaining challenges revolve around finding a pending task and making
				181	changes to its priority or removing it entirely. Finding a task can be done
				182	with a dictionary pointing to an entry in the queue.
				183
				184	Removing the entry or changing its priority is more difficult because it would
				185	break the heap structure invariants. So, a possible solution is to mark an
				186	entry as invalid and optionally add a new entry with the revised priority::
				187
				188	pq = [] # the priority queue list
				189	counter = itertools.count(1) # unique sequence count
				190	task_finder = {} # mapping of tasks to entries
				191	INVALID = 0 # mark an entry as deleted
				192
				193	def add_task(priority, task, count=None):
				194	if count is None:
				195	count = next(counter)
				196	entry = [priority, count, task]
				197	task_finder[task] = entry
				198	heappush(pq, entry)
				199
				200	def get_top_priority():
				201	while True:
				202	priority, count, task = heappop(pq)
				203	del task_finder[task]
				204	if count is not INVALID:
				205	return task
				206
				207	def delete_task(task):
				208	entry = task_finder[task]
				209	entry[1] = INVALID
				210
				211	def reprioritize(priority, task):
				212	entry = task_finder[task]
				213	add_task(priority, task, entry[1])
				214	entry[1] = INVALID
				215
				216
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	217	Theory
				218	------
				219
				220	(This explanation is due to François Pinard. The Python code for this module
				221	was contributed by Kevin O'Connor.)
				222
				223	Heaps are arrays for which ``a[k] <= a[2k+1]`` and ``a[k] <= a[2k+2]`` for all
				224	k, counting elements from 0. For the sake of comparison, non-existing
				225	elements are considered to be infinite. The interesting property of a heap is
				226	that ``a[0]`` is always its smallest element.
				227
				228	The strange invariant above is meant to be an efficient memory representation
				229	for a tournament. The numbers below are k, not ``a[k]``::
				230
				231	0
				232
				233	1 2
				234
				235	3 4 5 6
				236
				237	7 8 9 10 11 12 13 14
				238
				239	15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
				240
				241	In the tree above, each cell k is topping ``2k+1`` and ``2k+2``. In an usual
				242	binary tournament we see in sports, each cell is the winner over the two cells
				243	it tops, and we can trace the winner down the tree to see all opponents s/he
				244	had. However, in many computer applications of such tournaments, we do not need
				245	to trace the history of a winner. To be more memory efficient, when a winner is
				246	promoted, we try to replace it by something else at a lower level, and the rule
				247	becomes that a cell and the two cells it tops contain three different items, but
				248	the top cell "wins" over the two topped cells.
				249
				250	If this heap invariant is protected at all time, index 0 is clearly the overall
				251	winner. The simplest algorithmic way to remove it and find the "next" winner is
				252	to move some loser (let's say cell 30 in the diagram above) into the 0 position,
				253	and then percolate this new 0 down the tree, exchanging values, until the
				254	invariant is re-established. This is clearly logarithmic on the total number of
				255	items in the tree. By iterating over all items, you get an O(n log n) sort.
				256
				257	A nice feature of this sort is that you can efficiently insert new items while
				258	the sort is going on, provided that the inserted items are not "better" than the
				259	last 0'th element you extracted. This is especially useful in simulation
				260	contexts, where the tree holds all incoming events, and the "win" condition
				261	means the smallest scheduled time. When an event schedule other events for
				262	execution, they are scheduled into the future, so they can easily go into the
				263	heap. So, a heap is a good structure for implementing schedulers (this is what
				264	I used for my MIDI sequencer :-).
				265
				266	Various structures for implementing schedulers have been extensively studied,
				267	and heaps are good for this, as they are reasonably speedy, the speed is almost
				268	constant, and the worst case is not much different than the average case.
				269	However, there are other representations which are more efficient overall, yet
				270	the worst cases might be terrible.
				271
				272	Heaps are also very useful in big disk sorts. You most probably all know that a
				273	big sort implies producing "runs" (which are pre-sorted sequences, which size is
				274	usually related to the amount of CPU memory), followed by a merging passes for
				275	these runs, which merging is often very cleverly organised [#]_. It is very
				276	important that the initial sort produces the longest runs possible. Tournaments
				277	are a good way to that. If, using all the memory available to hold a
				278	tournament, you replace and percolate items that happen to fit the current run,
				279	you'll produce runs which are twice the size of the memory for random input, and
				280	much better for input fuzzily ordered.
				281
				282	Moreover, if you output the 0'th item on disk and get an input which may not fit
				283	in the current tournament (because the value "wins" over the last output value),
				284	it cannot fit in the heap, so the size of the heap decreases. The freed memory
				285	could be cleverly reused immediately for progressively building a second heap,
				286	which grows at exactly the same rate the first heap is melting. When the first
				287	heap completely vanishes, you switch heaps and start a new run. Clever and
				288	quite effective!
				289
				290	In a word, heaps are useful memory structures to know. I use them in a few
				291	applications, and I think it is good to keep a 'heap' module around. :-)
				292
				293	.. rubric:: Footnotes
				294
				295	.. [#] The disk balancing algorithms which are current, nowadays, are more annoying
				296	than clever, and this is a consequence of the seeking capabilities of the disks.
				297	On devices which cannot seek, like big tape drives, the story was quite
				298	different, and one had to be very clever to ensure (far in advance) that each
				299	tape movement will be the most effective possible (that is, will best
				300	participate at "progressing" the merge). Some tapes were even able to read
				301	backwards, and this was also used to avoid the rewinding time. Believe me, real
				302	good tape sorts were quite spectacular to watch! From all times, sorting has
				303	always been a Great Art! :-)
				304