Blame - Doc/library/heapq.rst - platform/external/python/cpython3

blob: 1168fb688a2ce2b0ad32b3e1f359f914425dfc92 [file] [log] [blame]

Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	1	:mod:`heapq` --- Heap queue algorithm
				2	=====================================
				3
				4	.. module:: heapq
				5	:synopsis: Heap queue algorithm (a.k.a. priority queue).
				6	.. moduleauthor:: Kevin O'Connor
				7	.. sectionauthor:: Guido van Rossum <guido@python.org>
				8	.. sectionauthor:: François Pinard
				9
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	10	.. versionadded:: 2.3
				11
				12	This module provides an implementation of the heap queue algorithm, also known
				13	as the priority queue algorithm.
				14
				15	Heaps are arrays for which ``heap[k] <= heap[2*k+1]`` and ``heap[k] <=
				16	heap[2k+2]`` for all k*, counting elements from zero. For the sake of
				17	comparison, non-existing elements are considered to be infinite. The
				18	interesting property of a heap is that ``heap[0]`` is always its smallest
				19	element.
				20
				21	The API below differs from textbook heap algorithms in two aspects: (a) We use
				22	zero-based indexing. This makes the relationship between the index for a node
				23	and the indexes for its children slightly less obvious, but is more suitable
				24	since Python uses zero-based indexing. (b) Our pop method returns the smallest
				25	item, not the largest (called a "min heap" in textbooks; a "max heap" is more
				26	common in texts because of its suitability for in-place sorting).
				27
				28	These two make it possible to view the heap as a regular Python list without
				29	surprises: ``heap[0]`` is the smallest item, and ``heap.sort()`` maintains the
				30	heap invariant!
				31
				32	To create a heap, use a list initialized to ``[]``, or you can transform a
				33	populated list into a heap via function :func:`heapify`.
				34
				35	The following functions are provided:
				36
				37
				38	.. function:: heappush(heap, item)
				39
				40	Push the value item onto the heap, maintaining the heap invariant.
				41
				42
				43	.. function:: heappop(heap)
				44
				45	Pop and return the smallest item from the heap, maintaining the heap
				46	invariant. If the heap is empty, :exc:`IndexError` is raised.
				47
Raymond Hettinger	53bdf09	2008-03-13 19:03:51 +0000	[diff] [blame^]	48	.. function:: heappushpop(heap, item)
				49
				50	Push item on the heap, then pop and return the smallest item from the
				51	heap. The combined action runs more efficiently than :func:`heappush`
				52	followed by a separate call to :func:`heappop`.
				53
				54	.. versionadded:: 2.6
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	55
				56	.. function:: heapify(x)
				57
				58	Transform list x into a heap, in-place, in linear time.
				59
				60
				61	.. function:: heapreplace(heap, item)
				62
				63	Pop and return the smallest item from the heap, and also push the new item.
				64	The heap size doesn't change. If the heap is empty, :exc:`IndexError` is raised.
				65	This is more efficient than :func:`heappop` followed by :func:`heappush`, and
				66	can be more appropriate when using a fixed-size heap. Note that the value
				67	returned may be larger than item! That constrains reasonable uses of this
				68	routine unless written as part of a conditional replacement::
				69
				70	if item > heap[0]:
				71	item = heapreplace(heap, item)
				72
				73	Example of use::
				74
				75	>>> from heapq import heappush, heappop
				76	>>> heap = []
				77	>>> data = [1, 3, 5, 7, 9, 2, 4, 6, 8, 0]
				78	>>> for item in data:
				79	... heappush(heap, item)
				80	...
				81	>>> ordered = []
				82	>>> while heap:
				83	... ordered.append(heappop(heap))
				84	...
				85	>>> print ordered
				86	[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
				87	>>> data.sort()
				88	>>> print data == ordered
				89	True
				90	>>>
				91
				92	The module also offers three general purpose functions based on heaps.
				93
				94
				95	.. function:: merge(*iterables)
				96
				97	Merge multiple sorted inputs into a single sorted output (for example, merge
Georg Brandl	e7a0990	2007-10-21 12:10:28 +0000	[diff] [blame]	98	timestamped entries from multiple log files). Returns an :term:`iterator`
				99	over over the sorted values.
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	100
				101	Similar to ``sorted(itertools.chain(*iterables))`` but returns an iterable, does
				102	not pull the data into memory all at once, and assumes that each of the input
				103	streams is already sorted (smallest to largest).
				104
				105	.. versionadded:: 2.6
				106
				107
				108	.. function:: nlargest(n, iterable[, key])
				109
				110	Return a list with the n largest elements from the dataset defined by
				111	iterable. key, if provided, specifies a function of one argument that is
				112	used to extract a comparison key from each element in the iterable:
				113	``key=str.lower`` Equivalent to: ``sorted(iterable, key=key,
				114	reverse=True)[:n]``
				115
				116	.. versionadded:: 2.4
				117
				118	.. versionchanged:: 2.5
				119	Added the optional key argument.
				120
				121
				122	.. function:: nsmallest(n, iterable[, key])
				123
				124	Return a list with the n smallest elements from the dataset defined by
				125	iterable. key, if provided, specifies a function of one argument that is
				126	used to extract a comparison key from each element in the iterable:
				127	``key=str.lower`` Equivalent to: ``sorted(iterable, key=key)[:n]``
				128
				129	.. versionadded:: 2.4
				130
				131	.. versionchanged:: 2.5
				132	Added the optional key argument.
				133
				134	The latter two functions perform best for smaller values of n. For larger
				135	values, it is more efficient to use the :func:`sorted` function. Also, when
				136	``n==1``, it is more efficient to use the builtin :func:`min` and :func:`max`
				137	functions.
				138
				139
				140	Theory
				141	------
				142
				143	(This explanation is due to François Pinard. The Python code for this module
				144	was contributed by Kevin O'Connor.)
				145
				146	Heaps are arrays for which ``a[k] <= a[2k+1]`` and ``a[k] <= a[2k+2]`` for all
				147	k, counting elements from 0. For the sake of comparison, non-existing
				148	elements are considered to be infinite. The interesting property of a heap is
				149	that ``a[0]`` is always its smallest element.
				150
				151	The strange invariant above is meant to be an efficient memory representation
				152	for a tournament. The numbers below are k, not ``a[k]``::
				153
				154	0
				155
				156	1 2
				157
				158	3 4 5 6
				159
				160	7 8 9 10 11 12 13 14
				161
				162	15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
				163
				164	In the tree above, each cell k is topping ``2k+1`` and ``2k+2``. In an usual
				165	binary tournament we see in sports, each cell is the winner over the two cells
				166	it tops, and we can trace the winner down the tree to see all opponents s/he
				167	had. However, in many computer applications of such tournaments, we do not need
				168	to trace the history of a winner. To be more memory efficient, when a winner is
				169	promoted, we try to replace it by something else at a lower level, and the rule
				170	becomes that a cell and the two cells it tops contain three different items, but
				171	the top cell "wins" over the two topped cells.
				172
				173	If this heap invariant is protected at all time, index 0 is clearly the overall
				174	winner. The simplest algorithmic way to remove it and find the "next" winner is
				175	to move some loser (let's say cell 30 in the diagram above) into the 0 position,
				176	and then percolate this new 0 down the tree, exchanging values, until the
				177	invariant is re-established. This is clearly logarithmic on the total number of
				178	items in the tree. By iterating over all items, you get an O(n log n) sort.
				179
				180	A nice feature of this sort is that you can efficiently insert new items while
				181	the sort is going on, provided that the inserted items are not "better" than the
				182	last 0'th element you extracted. This is especially useful in simulation
				183	contexts, where the tree holds all incoming events, and the "win" condition
				184	means the smallest scheduled time. When an event schedule other events for
				185	execution, they are scheduled into the future, so they can easily go into the
				186	heap. So, a heap is a good structure for implementing schedulers (this is what
				187	I used for my MIDI sequencer :-).
				188
				189	Various structures for implementing schedulers have been extensively studied,
				190	and heaps are good for this, as they are reasonably speedy, the speed is almost
				191	constant, and the worst case is not much different than the average case.
				192	However, there are other representations which are more efficient overall, yet
				193	the worst cases might be terrible.
				194
				195	Heaps are also very useful in big disk sorts. You most probably all know that a
				196	big sort implies producing "runs" (which are pre-sorted sequences, which size is
				197	usually related to the amount of CPU memory), followed by a merging passes for
				198	these runs, which merging is often very cleverly organised [#]_. It is very
				199	important that the initial sort produces the longest runs possible. Tournaments
				200	are a good way to that. If, using all the memory available to hold a
				201	tournament, you replace and percolate items that happen to fit the current run,
				202	you'll produce runs which are twice the size of the memory for random input, and
				203	much better for input fuzzily ordered.
				204
				205	Moreover, if you output the 0'th item on disk and get an input which may not fit
				206	in the current tournament (because the value "wins" over the last output value),
				207	it cannot fit in the heap, so the size of the heap decreases. The freed memory
				208	could be cleverly reused immediately for progressively building a second heap,
				209	which grows at exactly the same rate the first heap is melting. When the first
				210	heap completely vanishes, you switch heaps and start a new run. Clever and
				211	quite effective!
				212
				213	In a word, heaps are useful memory structures to know. I use them in a few
				214	applications, and I think it is good to keep a 'heap' module around. :-)
				215
				216	.. rubric:: Footnotes
				217
				218	.. [#] The disk balancing algorithms which are current, nowadays, are more annoying
				219	than clever, and this is a consequence of the seeking capabilities of the disks.
				220	On devices which cannot seek, like big tape drives, the story was quite
				221	different, and one had to be very clever to ensure (far in advance) that each
				222	tape movement will be the most effective possible (that is, will best
				223	participate at "progressing" the merge). Some tapes were even able to read
				224	backwards, and this was also used to avoid the rewinding time. Believe me, real
				225	good tape sorts were quite spectacular to watch! From all times, sorting has
				226	always been a Great Art! :-)
				227