Blame - Doc/library/heapq.rst - platform/external/python/cpython2

blob: f26d29b9d641aa8e12eedca34d96c99224051372 [file] [log] [blame]

Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1	:mod:`heapq` --- Heap queue algorithm
				2	=====================================
				3
				4	.. module:: heapq
				5	:synopsis: Heap queue algorithm (a.k.a. priority queue).
				6	.. moduleauthor:: Kevin O'Connor
				7	.. sectionauthor:: Guido van Rossum <guido@python.org>
				8	.. sectionauthor:: François Pinard
				9
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	10	This module provides an implementation of the heap queue algorithm, also known
				11	as the priority queue algorithm.
				12
				13	Heaps are arrays for which ``heap[k] <= heap[2*k+1]`` and ``heap[k] <=
				14	heap[2k+2]`` for all k*, counting elements from zero. For the sake of
				15	comparison, non-existing elements are considered to be infinite. The
				16	interesting property of a heap is that ``heap[0]`` is always its smallest
				17	element.
				18
				19	The API below differs from textbook heap algorithms in two aspects: (a) We use
				20	zero-based indexing. This makes the relationship between the index for a node
				21	and the indexes for its children slightly less obvious, but is more suitable
				22	since Python uses zero-based indexing. (b) Our pop method returns the smallest
				23	item, not the largest (called a "min heap" in textbooks; a "max heap" is more
				24	common in texts because of its suitability for in-place sorting).
				25
				26	These two make it possible to view the heap as a regular Python list without
				27	surprises: ``heap[0]`` is the smallest item, and ``heap.sort()`` maintains the
				28	heap invariant!
				29
				30	To create a heap, use a list initialized to ``[]``, or you can transform a
				31	populated list into a heap via function :func:`heapify`.
				32
				33	The following functions are provided:
				34
				35
				36	.. function:: heappush(heap, item)
				37
				38	Push the value item onto the heap, maintaining the heap invariant.
				39
				40
				41	.. function:: heappop(heap)
				42
				43	Pop and return the smallest item from the heap, maintaining the heap
				44	invariant. If the heap is empty, :exc:`IndexError` is raised.
				45
				46
				47	.. function:: heapify(x)
				48
				49	Transform list x into a heap, in-place, in linear time.
				50
				51
				52	.. function:: heapreplace(heap, item)
				53
				54	Pop and return the smallest item from the heap, and also push the new item.
				55	The heap size doesn't change. If the heap is empty, :exc:`IndexError` is raised.
				56	This is more efficient than :func:`heappop` followed by :func:`heappush`, and
				57	can be more appropriate when using a fixed-size heap. Note that the value
				58	returned may be larger than item! That constrains reasonable uses of this
				59	routine unless written as part of a conditional replacement::
				60
				61	if item > heap[0]:
				62	item = heapreplace(heap, item)
				63
				64	Example of use::
				65
				66	>>> from heapq import heappush, heappop
				67	>>> heap = []
				68	>>> data = [1, 3, 5, 7, 9, 2, 4, 6, 8, 0]
				69	>>> for item in data:
				70	... heappush(heap, item)
				71	...
				72	>>> ordered = []
				73	>>> while heap:
				74	... ordered.append(heappop(heap))
				75	...
Georg Brandl	6911e3c	2007-09-04 07:15:32 +0000	[diff] [blame]	76	>>> ordered
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	77	[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
				78	>>> data.sort()
Georg Brandl	6911e3c	2007-09-04 07:15:32 +0000	[diff] [blame]	79	>>> data == ordered
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	80	True
				81	>>>
				82
				83	The module also offers three general purpose functions based on heaps.
				84
				85
				86	.. function:: merge(*iterables)
				87
				88	Merge multiple sorted inputs into a single sorted output (for example, merge
Georg Brandl	9afde1c	2007-11-01 20:32:30 +0000	[diff] [blame]	89	timestamped entries from multiple log files). Returns an :term:`iterator`
				90	over over the sorted values.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	91
				92	Similar to ``sorted(itertools.chain(*iterables))`` but returns an iterable, does
				93	not pull the data into memory all at once, and assumes that each of the input
				94	streams is already sorted (smallest to largest).
				95
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	96
				97	.. function:: nlargest(n, iterable[, key])
				98
				99	Return a list with the n largest elements from the dataset defined by
				100	iterable. key, if provided, specifies a function of one argument that is
				101	used to extract a comparison key from each element in the iterable:
				102	``key=str.lower`` Equivalent to: ``sorted(iterable, key=key,
				103	reverse=True)[:n]``
				104
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	105
				106	.. function:: nsmallest(n, iterable[, key])
				107
				108	Return a list with the n smallest elements from the dataset defined by
				109	iterable. key, if provided, specifies a function of one argument that is
				110	used to extract a comparison key from each element in the iterable:
				111	``key=str.lower`` Equivalent to: ``sorted(iterable, key=key)[:n]``
				112
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	113
				114	The latter two functions perform best for smaller values of n. For larger
				115	values, it is more efficient to use the :func:`sorted` function. Also, when
				116	``n==1``, it is more efficient to use the builtin :func:`min` and :func:`max`
				117	functions.
				118
				119
				120	Theory
				121	------
				122
				123	(This explanation is due to François Pinard. The Python code for this module
				124	was contributed by Kevin O'Connor.)
				125
				126	Heaps are arrays for which ``a[k] <= a[2k+1]`` and ``a[k] <= a[2k+2]`` for all
				127	k, counting elements from 0. For the sake of comparison, non-existing
				128	elements are considered to be infinite. The interesting property of a heap is
				129	that ``a[0]`` is always its smallest element.
				130
				131	The strange invariant above is meant to be an efficient memory representation
				132	for a tournament. The numbers below are k, not ``a[k]``::
				133
				134	0
				135
				136	1 2
				137
				138	3 4 5 6
				139
				140	7 8 9 10 11 12 13 14
				141
				142	15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
				143
				144	In the tree above, each cell k is topping ``2k+1`` and ``2k+2``. In an usual
				145	binary tournament we see in sports, each cell is the winner over the two cells
				146	it tops, and we can trace the winner down the tree to see all opponents s/he
				147	had. However, in many computer applications of such tournaments, we do not need
				148	to trace the history of a winner. To be more memory efficient, when a winner is
				149	promoted, we try to replace it by something else at a lower level, and the rule
				150	becomes that a cell and the two cells it tops contain three different items, but
				151	the top cell "wins" over the two topped cells.
				152
				153	If this heap invariant is protected at all time, index 0 is clearly the overall
				154	winner. The simplest algorithmic way to remove it and find the "next" winner is
				155	to move some loser (let's say cell 30 in the diagram above) into the 0 position,
				156	and then percolate this new 0 down the tree, exchanging values, until the
				157	invariant is re-established. This is clearly logarithmic on the total number of
				158	items in the tree. By iterating over all items, you get an O(n log n) sort.
				159
				160	A nice feature of this sort is that you can efficiently insert new items while
				161	the sort is going on, provided that the inserted items are not "better" than the
				162	last 0'th element you extracted. This is especially useful in simulation
				163	contexts, where the tree holds all incoming events, and the "win" condition
				164	means the smallest scheduled time. When an event schedule other events for
				165	execution, they are scheduled into the future, so they can easily go into the
				166	heap. So, a heap is a good structure for implementing schedulers (this is what
				167	I used for my MIDI sequencer :-).
				168
				169	Various structures for implementing schedulers have been extensively studied,
				170	and heaps are good for this, as they are reasonably speedy, the speed is almost
				171	constant, and the worst case is not much different than the average case.
				172	However, there are other representations which are more efficient overall, yet
				173	the worst cases might be terrible.
				174
				175	Heaps are also very useful in big disk sorts. You most probably all know that a
				176	big sort implies producing "runs" (which are pre-sorted sequences, which size is
				177	usually related to the amount of CPU memory), followed by a merging passes for
				178	these runs, which merging is often very cleverly organised [#]_. It is very
				179	important that the initial sort produces the longest runs possible. Tournaments
				180	are a good way to that. If, using all the memory available to hold a
				181	tournament, you replace and percolate items that happen to fit the current run,
				182	you'll produce runs which are twice the size of the memory for random input, and
				183	much better for input fuzzily ordered.
				184
				185	Moreover, if you output the 0'th item on disk and get an input which may not fit
				186	in the current tournament (because the value "wins" over the last output value),
				187	it cannot fit in the heap, so the size of the heap decreases. The freed memory
				188	could be cleverly reused immediately for progressively building a second heap,
				189	which grows at exactly the same rate the first heap is melting. When the first
				190	heap completely vanishes, you switch heaps and start a new run. Clever and
				191	quite effective!
				192
				193	In a word, heaps are useful memory structures to know. I use them in a few
				194	applications, and I think it is good to keep a 'heap' module around. :-)
				195
				196	.. rubric:: Footnotes
				197
				198	.. [#] The disk balancing algorithms which are current, nowadays, are more annoying
				199	than clever, and this is a consequence of the seeking capabilities of the disks.
				200	On devices which cannot seek, like big tape drives, the story was quite
				201	different, and one had to be very clever to ensure (far in advance) that each
				202	tape movement will be the most effective possible (that is, will best
				203	participate at "progressing" the merge). Some tapes were even able to read
				204	backwards, and this was also used to avoid the rewinding time. Believe me, real
				205	good tape sorts were quite spectacular to watch! From all times, sorting has
				206	always been a Great Art! :-)
				207