Blame - Lib/heapq.py - platform/external/python/cpython3

blob: 9fb4e70824028b5d81a5ec5c02426c012cec0884 [file] [log] [blame]

Raymond Hettinger	c46cb2a	2004-04-19 19:06:21 +0000	[diff] [blame]	1	# -- coding: Latin-1 --
				2
				3	"""Heap queue algorithm (a.k.a. priority queue).
				4
				5	Heaps are arrays for which a[k] <= a[2k+1] and a[k] <= a[2k+2] for
				6	all k, counting elements from 0. For the sake of comparison,
				7	non-existing elements are considered to be infinite. The interesting
				8	property of a heap is that a[0] is always its smallest element.
				9
				10	Usage:
				11
				12	heap = [] # creates an empty heap
				13	heappush(heap, item) # pushes a new item on the heap
				14	item = heappop(heap) # pops the smallest item from the heap
				15	item = heap[0] # smallest item on the heap without popping it
				16	heapify(x) # transforms list into a heap, in-place, in linear time
				17	item = heapreplace(heap, item) # pops and returns smallest item, and adds
				18	# new item; the heap size is unchanged
				19
				20	Our API differs from textbook heap algorithms as follows:
				21
				22	- We use 0-based indexing. This makes the relationship between the
				23	index for a node and the indexes for its children slightly less
				24	obvious, but is more suitable since Python uses 0-based indexing.
				25
				26	- Our heappop() method returns the smallest item, not the largest.
				27
				28	These two make it possible to view the heap as a regular Python list
				29	without surprises: heap[0] is the smallest item, and heap.sort()
				30	maintains the heap invariant!
				31	"""
				32
Raymond Hettinger	33ecffb	2004-06-10 05:03:17 +0000	[diff] [blame]	33	# Original code by Kevin O'Connor, augmented by Tim Peters and Raymond Hettinger
Raymond Hettinger	c46cb2a	2004-04-19 19:06:21 +0000	[diff] [blame]	34
				35	__about__ = """Heap queues
				36
				37	[explanation by François Pinard]
				38
				39	Heaps are arrays for which a[k] <= a[2k+1] and a[k] <= a[2k+2] for
				40	all k, counting elements from 0. For the sake of comparison,
				41	non-existing elements are considered to be infinite. The interesting
				42	property of a heap is that a[0] is always its smallest element.
				43
				44	The strange invariant above is meant to be an efficient memory
				45	representation for a tournament. The numbers below are `k', not a[k]:
				46
				47	0
				48
				49	1 2
				50
				51	3 4 5 6
				52
				53	7 8 9 10 11 12 13 14
				54
				55	15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
				56
				57
				58	In the tree above, each cell `k' is topping `2k+1' and `2k+2'. In
				59	an usual binary tournament we see in sports, each cell is the winner
				60	over the two cells it tops, and we can trace the winner down the tree
				61	to see all opponents s/he had. However, in many computer applications
				62	of such tournaments, we do not need to trace the history of a winner.
				63	To be more memory efficient, when a winner is promoted, we try to
				64	replace it by something else at a lower level, and the rule becomes
				65	that a cell and the two cells it tops contain three different items,
				66	but the top cell "wins" over the two topped cells.
				67
				68	If this heap invariant is protected at all time, index 0 is clearly
				69	the overall winner. The simplest algorithmic way to remove it and
				70	find the "next" winner is to move some loser (let's say cell 30 in the
				71	diagram above) into the 0 position, and then percolate this new 0 down
				72	the tree, exchanging values, until the invariant is re-established.
				73	This is clearly logarithmic on the total number of items in the tree.
				74	By iterating over all items, you get an O(n ln n) sort.
				75
				76	A nice feature of this sort is that you can efficiently insert new
				77	items while the sort is going on, provided that the inserted items are
				78	not "better" than the last 0'th element you extracted. This is
				79	especially useful in simulation contexts, where the tree holds all
				80	incoming events, and the "win" condition means the smallest scheduled
				81	time. When an event schedule other events for execution, they are
				82	scheduled into the future, so they can easily go into the heap. So, a
				83	heap is a good structure for implementing schedulers (this is what I
				84	used for my MIDI sequencer :-).
				85
				86	Various structures for implementing schedulers have been extensively
				87	studied, and heaps are good for this, as they are reasonably speedy,
				88	the speed is almost constant, and the worst case is not much different
				89	than the average case. However, there are other representations which
				90	are more efficient overall, yet the worst cases might be terrible.
				91
				92	Heaps are also very useful in big disk sorts. You most probably all
				93	know that a big sort implies producing "runs" (which are pre-sorted
				94	sequences, which size is usually related to the amount of CPU memory),
				95	followed by a merging passes for these runs, which merging is often
				96	very cleverly organised[1]. It is very important that the initial
				97	sort produces the longest runs possible. Tournaments are a good way
				98	to that. If, using all the memory available to hold a tournament, you
				99	replace and percolate items that happen to fit the current run, you'll
				100	produce runs which are twice the size of the memory for random input,
				101	and much better for input fuzzily ordered.
				102
				103	Moreover, if you output the 0'th item on disk and get an input which
				104	may not fit in the current tournament (because the value "wins" over
				105	the last output value), it cannot fit in the heap, so the size of the
				106	heap decreases. The freed memory could be cleverly reused immediately
				107	for progressively building a second heap, which grows at exactly the
				108	same rate the first heap is melting. When the first heap completely
				109	vanishes, you switch heaps and start a new run. Clever and quite
				110	effective!
				111
				112	In a word, heaps are useful memory structures to know. I use them in
				113	a few applications, and I think it is good to keep a `heap' module
				114	around. :-)
				115
				116	--------------------
				117	[1] The disk balancing algorithms which are current, nowadays, are
				118	more annoying than clever, and this is a consequence of the seeking
				119	capabilities of the disks. On devices which cannot seek, like big
				120	tape drives, the story was quite different, and one had to be very
				121	clever to ensure (far in advance) that each tape movement will be the
				122	most effective possible (that is, will best participate at
				123	"progressing" the merge). Some tapes were even able to read
				124	backwards, and this was also used to avoid the rewinding time.
				125	Believe me, real good tape sorts were quite spectacular to watch!
				126	From all times, sorting has always been a Great Art! :-)
				127	"""
				128
Raymond Hettinger	33ecffb	2004-06-10 05:03:17 +0000	[diff] [blame]	129	__all__ = ['heappush', 'heappop', 'heapify', 'heapreplace', 'nlargest',
				130	'nsmallest']
				131
				132	from itertools import islice, repeat
Raymond Hettinger	b25aa36	2004-06-12 08:33:36 +0000	[diff] [blame]	133	import bisect
Raymond Hettinger	c46cb2a	2004-04-19 19:06:21 +0000	[diff] [blame]	134
				135	def heappush(heap, item):
				136	"""Push item onto heap, maintaining the heap invariant."""
				137	heap.append(item)
				138	_siftdown(heap, 0, len(heap)-1)
				139
				140	def heappop(heap):
				141	"""Pop the smallest item off the heap, maintaining the heap invariant."""
				142	lastelt = heap.pop() # raises appropriate IndexError if heap is empty
				143	if heap:
				144	returnitem = heap[0]
				145	heap[0] = lastelt
				146	_siftup(heap, 0)
				147	else:
				148	returnitem = lastelt
				149	return returnitem
				150
				151	def heapreplace(heap, item):
				152	"""Pop and return the current smallest value, and add the new item.
				153
				154	This is more efficient than heappop() followed by heappush(), and can be
				155	more appropriate when using a fixed-size heap. Note that the value
				156	returned may be larger than item! That constrains reasonable uses of
Raymond Hettinger	28224f8	2004-06-20 09:07:53 +0000	[diff] [blame^]	157	this routine unless written as part of a larger expression:
				158
				159	result = item <= heap[0] and item or heapreplace(heap, item)
Raymond Hettinger	c46cb2a	2004-04-19 19:06:21 +0000	[diff] [blame]	160	"""
				161	returnitem = heap[0] # raises appropriate IndexError if heap is empty
				162	heap[0] = item
				163	_siftup(heap, 0)
				164	return returnitem
				165
				166	def heapify(x):
				167	"""Transform list into a heap, in-place, in O(len(heap)) time."""
				168	n = len(x)
				169	# Transform bottom-up. The largest index there's any point to looking at
				170	# is the largest with a child index in-range, so must have 2*i + 1 < n,
				171	# or i < (n-1)/2. If n is even = 2j, this is (2j-1)/2 = j-1/2 so
				172	# j-1 is the largest, which is n//2 - 1. If n is odd = 2*j+1, this is
				173	# (2*j+1-1)/2 = j so j-1 is the largest, and that's again n//2-1.
				174	for i in reversed(xrange(n//2)):
				175	_siftup(x, i)
				176
Raymond Hettinger	33ecffb	2004-06-10 05:03:17 +0000	[diff] [blame]	177	def nlargest(iterable, n):
				178	"""Find the n largest elements in a dataset.
				179
				180	Equivalent to: sorted(iterable, reverse=True)[:n]
				181	"""
				182	it = iter(iterable)
				183	result = list(islice(it, n))
				184	if not result:
				185	return result
				186	heapify(result)
				187	_heapreplace = heapreplace
				188	sol = result[0] # sol --> smallest of the nlargest
				189	for elem in it:
				190	if elem <= sol:
				191	continue
				192	_heapreplace(result, elem)
				193	sol = result[0]
				194	result.sort(reverse=True)
				195	return result
				196
				197	def nsmallest(iterable, n):
				198	"""Find the n smallest elements in a dataset.
				199
				200	Equivalent to: sorted(iterable)[:n]
				201	"""
Raymond Hettinger	b25aa36	2004-06-12 08:33:36 +0000	[diff] [blame]	202	if hasattr(iterable, '__len__') and n * 10 <= len(iterable):
				203	# For smaller values of n, the bisect method is faster than a minheap.
				204	# It is also memory efficient, consuming only n elements of space.
				205	it = iter(iterable)
				206	result = sorted(islice(it, 0, n))
				207	if not result:
				208	return result
				209	insort = bisect.insort
				210	pop = result.pop
				211	los = result[-1] # los --> Largest of the nsmallest
				212	for elem in it:
				213	if los <= elem:
				214	continue
				215	insort(result, elem)
				216	pop()
				217	los = result[-1]
				218	return result
				219	# An alternative approach manifests the whole iterable in memory but
				220	# saves comparisons by heapifying all at once. Also, saves time
				221	# over bisect.insort() which has O(n) data movement time for every
				222	# insertion. Finding the n smallest of an m length iterable requires
				223	# O(m) + O(n log m) comparisons.
Raymond Hettinger	33ecffb	2004-06-10 05:03:17 +0000	[diff] [blame]	224	h = list(iterable)
				225	heapify(h)
				226	return map(heappop, repeat(h, min(n, len(h))))
				227
Raymond Hettinger	c46cb2a	2004-04-19 19:06:21 +0000	[diff] [blame]	228	# 'heap' is a heap at all indices >= startpos, except possibly for pos. pos
				229	# is the index of a leaf with a possibly out-of-order value. Restore the
				230	# heap invariant.
				231	def _siftdown(heap, startpos, pos):
				232	newitem = heap[pos]
				233	# Follow the path to the root, moving parents down until finding a place
				234	# newitem fits.
				235	while pos > startpos:
				236	parentpos = (pos - 1) >> 1
				237	parent = heap[parentpos]
				238	if parent <= newitem:
				239	break
				240	heap[pos] = parent
				241	pos = parentpos
				242	heap[pos] = newitem
				243
				244	# The child indices of heap index pos are already heaps, and we want to make
				245	# a heap at index pos too. We do this by bubbling the smaller child of
				246	# pos up (and so on with that child's children, etc) until hitting a leaf,
				247	# then using _siftdown to move the oddball originally at index pos into place.
				248	#
				249	# We could break out of the loop as soon as we find a pos where newitem <=
				250	# both its children, but turns out that's not a good idea, and despite that
				251	# many books write the algorithm that way. During a heap pop, the last array
				252	# element is sifted in, and that tends to be large, so that comparing it
				253	# against values starting from the root usually doesn't pay (= usually doesn't
				254	# get us out of the loop early). See Knuth, Volume 3, where this is
				255	# explained and quantified in an exercise.
				256	#
				257	# Cutting the # of comparisons is important, since these routines have no
				258	# way to extract "the priority" from an array element, so that intelligence
				259	# is likely to be hiding in custom __cmp__ methods, or in array elements
				260	# storing (priority, record) tuples. Comparisons are thus potentially
				261	# expensive.
				262	#
				263	# On random arrays of length 1000, making this change cut the number of
				264	# comparisons made by heapify() a little, and those made by exhaustive
				265	# heappop() a lot, in accord with theory. Here are typical results from 3
				266	# runs (3 just to demonstrate how small the variance is):
				267	#
				268	# Compares needed by heapify Compares needed by 1000 heappops
				269	# -------------------------- --------------------------------
				270	# 1837 cut to 1663 14996 cut to 8680
				271	# 1855 cut to 1659 14966 cut to 8678
				272	# 1847 cut to 1660 15024 cut to 8703
				273	#
				274	# Building the heap by using heappush() 1000 times instead required
				275	# 2198, 2148, and 2219 compares: heapify() is more efficient, when
				276	# you can use it.
				277	#
				278	# The total compares needed by list.sort() on the same lists were 8627,
				279	# 8627, and 8632 (this should be compared to the sum of heapify() and
				280	# heappop() compares): list.sort() is (unsurprisingly!) more efficient
				281	# for sorting.
				282
				283	def _siftup(heap, pos):
				284	endpos = len(heap)
				285	startpos = pos
				286	newitem = heap[pos]
				287	# Bubble up the smaller child until hitting a leaf.
				288	childpos = 2*pos + 1 # leftmost child position
				289	while childpos < endpos:
				290	# Set childpos to index of smaller child.
				291	rightpos = childpos + 1
				292	if rightpos < endpos and heap[rightpos] <= heap[childpos]:
				293	childpos = rightpos
				294	# Move the smaller child up.
				295	heap[pos] = heap[childpos]
				296	pos = childpos
				297	childpos = 2*pos + 1
				298	# The leaf at pos is empty now. Put newitem there, and bubble it up
				299	# to its final resting place (by sifting its parents down).
				300	heap[pos] = newitem
				301	_siftdown(heap, startpos, pos)
				302
				303	# If available, use C implementation
				304	try:
Raymond Hettinger	2e3dfaf	2004-06-13 05:26:33 +0000	[diff] [blame]	305	from _heapq import heappush, heappop, heapify, heapreplace, nlargest, nsmallest
Raymond Hettinger	c46cb2a	2004-04-19 19:06:21 +0000	[diff] [blame]	306	except ImportError:
				307	pass
				308
				309	if __name__ == "__main__":
				310	# Simple sanity test
				311	heap = []
				312	data = [1, 3, 5, 7, 9, 2, 4, 6, 8, 0]
				313	for item in data:
				314	heappush(heap, item)
				315	sort = []
				316	while heap:
				317	sort.append(heappop(heap))
				318	print sort