Blame - Lib/heapq.py - platform/external/python/cpython3

blob: 464663a78a5360f007ddfc25592e54ea9d52bee8 [file] [log] [blame]

Raymond Hettinger	c46cb2a	2004-04-19 19:06:21 +0000	[diff] [blame]	1	"""Heap queue algorithm (a.k.a. priority queue).
				2
				3	Heaps are arrays for which a[k] <= a[2k+1] and a[k] <= a[2k+2] for
				4	all k, counting elements from 0. For the sake of comparison,
				5	non-existing elements are considered to be infinite. The interesting
				6	property of a heap is that a[0] is always its smallest element.
				7
				8	Usage:
				9
				10	heap = [] # creates an empty heap
				11	heappush(heap, item) # pushes a new item on the heap
				12	item = heappop(heap) # pops the smallest item from the heap
				13	item = heap[0] # smallest item on the heap without popping it
				14	heapify(x) # transforms list into a heap, in-place, in linear time
				15	item = heapreplace(heap, item) # pops and returns smallest item, and adds
				16	# new item; the heap size is unchanged
				17
				18	Our API differs from textbook heap algorithms as follows:
				19
				20	- We use 0-based indexing. This makes the relationship between the
				21	index for a node and the indexes for its children slightly less
				22	obvious, but is more suitable since Python uses 0-based indexing.
				23
				24	- Our heappop() method returns the smallest item, not the largest.
				25
				26	These two make it possible to view the heap as a regular Python list
				27	without surprises: heap[0] is the smallest item, and heap.sort()
				28	maintains the heap invariant!
				29	"""
				30
Raymond Hettinger	33ecffb	2004-06-10 05:03:17 +0000	[diff] [blame]	31	# Original code by Kevin O'Connor, augmented by Tim Peters and Raymond Hettinger
Raymond Hettinger	c46cb2a	2004-04-19 19:06:21 +0000	[diff] [blame]	32
				33	__about__ = """Heap queues
				34
Mark Dickinson	b4a17a8	2010-07-04 19:23:49 +0000	[diff] [blame]	35	[explanation by François Pinard]
Raymond Hettinger	c46cb2a	2004-04-19 19:06:21 +0000	[diff] [blame]	36
				37	Heaps are arrays for which a[k] <= a[2k+1] and a[k] <= a[2k+2] for
				38	all k, counting elements from 0. For the sake of comparison,
				39	non-existing elements are considered to be infinite. The interesting
				40	property of a heap is that a[0] is always its smallest element.
				41
				42	The strange invariant above is meant to be an efficient memory
				43	representation for a tournament. The numbers below are `k', not a[k]:
				44
				45	0
				46
				47	1 2
				48
				49	3 4 5 6
				50
				51	7 8 9 10 11 12 13 14
				52
				53	15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
				54
				55
				56	In the tree above, each cell `k' is topping `2k+1' and `2k+2'. In
				57	an usual binary tournament we see in sports, each cell is the winner
				58	over the two cells it tops, and we can trace the winner down the tree
				59	to see all opponents s/he had. However, in many computer applications
				60	of such tournaments, we do not need to trace the history of a winner.
				61	To be more memory efficient, when a winner is promoted, we try to
				62	replace it by something else at a lower level, and the rule becomes
				63	that a cell and the two cells it tops contain three different items,
				64	but the top cell "wins" over the two topped cells.
				65
				66	If this heap invariant is protected at all time, index 0 is clearly
				67	the overall winner. The simplest algorithmic way to remove it and
				68	find the "next" winner is to move some loser (let's say cell 30 in the
				69	diagram above) into the 0 position, and then percolate this new 0 down
				70	the tree, exchanging values, until the invariant is re-established.
				71	This is clearly logarithmic on the total number of items in the tree.
				72	By iterating over all items, you get an O(n ln n) sort.
				73
				74	A nice feature of this sort is that you can efficiently insert new
				75	items while the sort is going on, provided that the inserted items are
				76	not "better" than the last 0'th element you extracted. This is
				77	especially useful in simulation contexts, where the tree holds all
				78	incoming events, and the "win" condition means the smallest scheduled
				79	time. When an event schedule other events for execution, they are
				80	scheduled into the future, so they can easily go into the heap. So, a
				81	heap is a good structure for implementing schedulers (this is what I
				82	used for my MIDI sequencer :-).
				83
				84	Various structures for implementing schedulers have been extensively
				85	studied, and heaps are good for this, as they are reasonably speedy,
				86	the speed is almost constant, and the worst case is not much different
				87	than the average case. However, there are other representations which
				88	are more efficient overall, yet the worst cases might be terrible.
				89
				90	Heaps are also very useful in big disk sorts. You most probably all
				91	know that a big sort implies producing "runs" (which are pre-sorted
				92	sequences, which size is usually related to the amount of CPU memory),
				93	followed by a merging passes for these runs, which merging is often
				94	very cleverly organised[1]. It is very important that the initial
				95	sort produces the longest runs possible. Tournaments are a good way
				96	to that. If, using all the memory available to hold a tournament, you
				97	replace and percolate items that happen to fit the current run, you'll
				98	produce runs which are twice the size of the memory for random input,
				99	and much better for input fuzzily ordered.
				100
				101	Moreover, if you output the 0'th item on disk and get an input which
				102	may not fit in the current tournament (because the value "wins" over
				103	the last output value), it cannot fit in the heap, so the size of the
				104	heap decreases. The freed memory could be cleverly reused immediately
				105	for progressively building a second heap, which grows at exactly the
				106	same rate the first heap is melting. When the first heap completely
				107	vanishes, you switch heaps and start a new run. Clever and quite
				108	effective!
				109
				110	In a word, heaps are useful memory structures to know. I use them in
				111	a few applications, and I think it is good to keep a `heap' module
				112	around. :-)
				113
				114	--------------------
				115	[1] The disk balancing algorithms which are current, nowadays, are
				116	more annoying than clever, and this is a consequence of the seeking
				117	capabilities of the disks. On devices which cannot seek, like big
				118	tape drives, the story was quite different, and one had to be very
				119	clever to ensure (far in advance) that each tape movement will be the
				120	most effective possible (that is, will best participate at
				121	"progressing" the merge). Some tapes were even able to read
				122	backwards, and this was also used to avoid the rewinding time.
				123	Believe me, real good tape sorts were quite spectacular to watch!
				124	From all times, sorting has always been a Great Art! :-)
				125	"""
				126
Thomas Wouters	cf297e4	2007-02-23 15:07:44 +0000	[diff] [blame]	127	__all__ = ['heappush', 'heappop', 'heapify', 'heapreplace', 'merge',
Christian Heimes	dd15f6c	2008-03-16 00:07:10 +0000	[diff] [blame]	128	'nlargest', 'nsmallest', 'heappushpop']
Raymond Hettinger	33ecffb	2004-06-10 05:03:17 +0000	[diff] [blame]	129
Benjamin Peterson	18e9512	2009-01-18 22:46:33 +0000	[diff] [blame]	130	from itertools import islice, repeat, count, tee, chain
Raymond Hettinger	b25aa36	2004-06-12 08:33:36 +0000	[diff] [blame]	131	import bisect
Raymond Hettinger	c46cb2a	2004-04-19 19:06:21 +0000	[diff] [blame]	132
				133	def heappush(heap, item):
				134	"""Push item onto heap, maintaining the heap invariant."""
				135	heap.append(item)
				136	_siftdown(heap, 0, len(heap)-1)
				137
				138	def heappop(heap):
				139	"""Pop the smallest item off the heap, maintaining the heap invariant."""
				140	lastelt = heap.pop() # raises appropriate IndexError if heap is empty
				141	if heap:
				142	returnitem = heap[0]
				143	heap[0] = lastelt
				144	_siftup(heap, 0)
				145	else:
				146	returnitem = lastelt
				147	return returnitem
				148
				149	def heapreplace(heap, item):
				150	"""Pop and return the current smallest value, and add the new item.
				151
				152	This is more efficient than heappop() followed by heappush(), and can be
				153	more appropriate when using a fixed-size heap. Note that the value
				154	returned may be larger than item! That constrains reasonable uses of
Raymond Hettinger	8158e84	2004-09-06 07:04:09 +0000	[diff] [blame]	155	this routine unless written as part of a conditional replacement:
Raymond Hettinger	28224f8	2004-06-20 09:07:53 +0000	[diff] [blame]	156
Raymond Hettinger	8158e84	2004-09-06 07:04:09 +0000	[diff] [blame]	157	if item > heap[0]:
				158	item = heapreplace(heap, item)
Raymond Hettinger	c46cb2a	2004-04-19 19:06:21 +0000	[diff] [blame]	159	"""
				160	returnitem = heap[0] # raises appropriate IndexError if heap is empty
				161	heap[0] = item
				162	_siftup(heap, 0)
				163	return returnitem
				164
Christian Heimes	dd15f6c	2008-03-16 00:07:10 +0000	[diff] [blame]	165	def heappushpop(heap, item):
				166	"""Fast version of a heappush followed by a heappop."""
Georg Brandl	f78e02b	2008-06-10 17:40:04 +0000	[diff] [blame]	167	if heap and heap[0] < item:
Christian Heimes	dd15f6c	2008-03-16 00:07:10 +0000	[diff] [blame]	168	item, heap[0] = heap[0], item
				169	_siftup(heap, 0)
				170	return item
				171
Raymond Hettinger	c46cb2a	2004-04-19 19:06:21 +0000	[diff] [blame]	172	def heapify(x):
				173	"""Transform list into a heap, in-place, in O(len(heap)) time."""
				174	n = len(x)
				175	# Transform bottom-up. The largest index there's any point to looking at
				176	# is the largest with a child index in-range, so must have 2*i + 1 < n,
				177	# or i < (n-1)/2. If n is even = 2j, this is (2j-1)/2 = j-1/2 so
				178	# j-1 is the largest, which is n//2 - 1. If n is odd = 2*j+1, this is
				179	# (2*j+1-1)/2 = j so j-1 is the largest, and that's again n//2-1.
Guido van Rossum	805365e	2007-05-07 22:24:25 +0000	[diff] [blame]	180	for i in reversed(range(n//2)):
Raymond Hettinger	c46cb2a	2004-04-19 19:06:21 +0000	[diff] [blame]	181	_siftup(x, i)
				182
Raymond Hettinger	e1defa4	2004-11-29 05:54:48 +0000	[diff] [blame]	183	def nlargest(n, iterable):
Raymond Hettinger	33ecffb	2004-06-10 05:03:17 +0000	[diff] [blame]	184	"""Find the n largest elements in a dataset.
				185
				186	Equivalent to: sorted(iterable, reverse=True)[:n]
				187	"""
				188	it = iter(iterable)
				189	result = list(islice(it, n))
				190	if not result:
				191	return result
				192	heapify(result)
Christian Heimes	dd15f6c	2008-03-16 00:07:10 +0000	[diff] [blame]	193	_heappushpop = heappushpop
Raymond Hettinger	33ecffb	2004-06-10 05:03:17 +0000	[diff] [blame]	194	for elem in it:
Benjamin Peterson	5c6d787	2009-02-06 02:40:07 +0000	[diff] [blame]	195	_heappushpop(result, elem)
Raymond Hettinger	33ecffb	2004-06-10 05:03:17 +0000	[diff] [blame]	196	result.sort(reverse=True)
				197	return result
				198
Raymond Hettinger	e1defa4	2004-11-29 05:54:48 +0000	[diff] [blame]	199	def nsmallest(n, iterable):
Raymond Hettinger	33ecffb	2004-06-10 05:03:17 +0000	[diff] [blame]	200	"""Find the n smallest elements in a dataset.
				201
				202	Equivalent to: sorted(iterable)[:n]
				203	"""
Raymond Hettinger	b25aa36	2004-06-12 08:33:36 +0000	[diff] [blame]	204	if hasattr(iterable, '__len__') and n * 10 <= len(iterable):
				205	# For smaller values of n, the bisect method is faster than a minheap.
				206	# It is also memory efficient, consuming only n elements of space.
				207	it = iter(iterable)
				208	result = sorted(islice(it, 0, n))
				209	if not result:
				210	return result
				211	insort = bisect.insort
				212	pop = result.pop
				213	los = result[-1] # los --> Largest of the nsmallest
				214	for elem in it:
				215	if los <= elem:
				216	continue
				217	insort(result, elem)
				218	pop()
				219	los = result[-1]
				220	return result
				221	# An alternative approach manifests the whole iterable in memory but
				222	# saves comparisons by heapifying all at once. Also, saves time
				223	# over bisect.insort() which has O(n) data movement time for every
				224	# insertion. Finding the n smallest of an m length iterable requires
				225	# O(m) + O(n log m) comparisons.
Raymond Hettinger	33ecffb	2004-06-10 05:03:17 +0000	[diff] [blame]	226	h = list(iterable)
				227	heapify(h)
Guido van Rossum	c1f779c	2007-07-03 08:25:58 +0000	[diff] [blame]	228	return list(map(heappop, repeat(h, min(n, len(h)))))
Raymond Hettinger	33ecffb	2004-06-10 05:03:17 +0000	[diff] [blame]	229
Raymond Hettinger	c46cb2a	2004-04-19 19:06:21 +0000	[diff] [blame]	230	# 'heap' is a heap at all indices >= startpos, except possibly for pos. pos
				231	# is the index of a leaf with a possibly out-of-order value. Restore the
				232	# heap invariant.
				233	def _siftdown(heap, startpos, pos):
				234	newitem = heap[pos]
				235	# Follow the path to the root, moving parents down until finding a place
				236	# newitem fits.
				237	while pos > startpos:
				238	parentpos = (pos - 1) >> 1
				239	parent = heap[parentpos]
Georg Brandl	f78e02b	2008-06-10 17:40:04 +0000	[diff] [blame]	240	if newitem < parent:
				241	heap[pos] = parent
				242	pos = parentpos
				243	continue
				244	break
Raymond Hettinger	c46cb2a	2004-04-19 19:06:21 +0000	[diff] [blame]	245	heap[pos] = newitem
				246
				247	# The child indices of heap index pos are already heaps, and we want to make
				248	# a heap at index pos too. We do this by bubbling the smaller child of
				249	# pos up (and so on with that child's children, etc) until hitting a leaf,
				250	# then using _siftdown to move the oddball originally at index pos into place.
				251	#
				252	# We could break out of the loop as soon as we find a pos where newitem <=
				253	# both its children, but turns out that's not a good idea, and despite that
				254	# many books write the algorithm that way. During a heap pop, the last array
				255	# element is sifted in, and that tends to be large, so that comparing it
				256	# against values starting from the root usually doesn't pay (= usually doesn't
				257	# get us out of the loop early). See Knuth, Volume 3, where this is
				258	# explained and quantified in an exercise.
				259	#
				260	# Cutting the # of comparisons is important, since these routines have no
				261	# way to extract "the priority" from an array element, so that intelligence
Mark Dickinson	a56c467	2009-01-27 18:17:45 +0000	[diff] [blame]	262	# is likely to be hiding in custom comparison methods, or in array elements
Raymond Hettinger	c46cb2a	2004-04-19 19:06:21 +0000	[diff] [blame]	263	# storing (priority, record) tuples. Comparisons are thus potentially
				264	# expensive.
				265	#
				266	# On random arrays of length 1000, making this change cut the number of
				267	# comparisons made by heapify() a little, and those made by exhaustive
				268	# heappop() a lot, in accord with theory. Here are typical results from 3
				269	# runs (3 just to demonstrate how small the variance is):
				270	#
				271	# Compares needed by heapify Compares needed by 1000 heappops
				272	# -------------------------- --------------------------------
				273	# 1837 cut to 1663 14996 cut to 8680
				274	# 1855 cut to 1659 14966 cut to 8678
				275	# 1847 cut to 1660 15024 cut to 8703
				276	#
				277	# Building the heap by using heappush() 1000 times instead required
				278	# 2198, 2148, and 2219 compares: heapify() is more efficient, when
				279	# you can use it.
				280	#
				281	# The total compares needed by list.sort() on the same lists were 8627,
				282	# 8627, and 8632 (this should be compared to the sum of heapify() and
				283	# heappop() compares): list.sort() is (unsurprisingly!) more efficient
				284	# for sorting.
				285
				286	def _siftup(heap, pos):
				287	endpos = len(heap)
				288	startpos = pos
				289	newitem = heap[pos]
				290	# Bubble up the smaller child until hitting a leaf.
				291	childpos = 2*pos + 1 # leftmost child position
				292	while childpos < endpos:
				293	# Set childpos to index of smaller child.
				294	rightpos = childpos + 1
Georg Brandl	f78e02b	2008-06-10 17:40:04 +0000	[diff] [blame]	295	if rightpos < endpos and not heap[childpos] < heap[rightpos]:
Raymond Hettinger	c46cb2a	2004-04-19 19:06:21 +0000	[diff] [blame]	296	childpos = rightpos
				297	# Move the smaller child up.
				298	heap[pos] = heap[childpos]
				299	pos = childpos
				300	childpos = 2*pos + 1
				301	# The leaf at pos is empty now. Put newitem there, and bubble it up
				302	# to its final resting place (by sifting its parents down).
				303	heap[pos] = newitem
				304	_siftdown(heap, startpos, pos)
				305
				306	# If available, use C implementation
				307	try:
Raymond Hettinger	0dd737b	2009-03-29 19:30:50 +0000	[diff] [blame]	308	from _heapq import *
Raymond Hettinger	c46cb2a	2004-04-19 19:06:21 +0000	[diff] [blame]	309	except ImportError:
				310	pass
				311
Thomas Wouters	cf297e4	2007-02-23 15:07:44 +0000	[diff] [blame]	312	def merge(*iterables):
				313	'''Merge multiple sorted inputs into a single sorted output.
				314
Guido van Rossum	d8faa36	2007-04-27 19:54:29 +0000	[diff] [blame]	315	Similar to sorted(itertools.chain(*iterables)) but returns a generator,
Thomas Wouters	cf297e4	2007-02-23 15:07:44 +0000	[diff] [blame]	316	does not pull the data into memory all at once, and assumes that each of
				317	the input streams is already sorted (smallest to largest).
				318
				319	>>> list(merge([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25]))
				320	[0, 1, 2, 3, 4, 5, 5, 7, 8, 10, 15, 20, 25]
				321
				322	'''
				323	_heappop, _heapreplace, _StopIteration = heappop, heapreplace, StopIteration
				324
				325	h = []
				326	h_append = h.append
				327	for itnum, it in enumerate(map(iter, iterables)):
				328	try:
Georg Brandl	a18af4e	2007-04-21 15:47:16 +0000	[diff] [blame]	329	next = it.__next__
Thomas Wouters	cf297e4	2007-02-23 15:07:44 +0000	[diff] [blame]	330	h_append([next(), itnum, next])
				331	except _StopIteration:
				332	pass
				333	heapify(h)
				334
				335	while 1:
				336	try:
				337	while 1:
				338	v, itnum, next = s = h[0] # raises IndexError when h is empty
				339	yield v
				340	s[0] = next() # raises StopIteration when exhausted
				341	_heapreplace(h, s) # restore heap condition
				342	except _StopIteration:
				343	_heappop(h) # remove empty iterator
				344	except IndexError:
				345	return
				346
Raymond Hettinger	4901a1f	2004-12-02 08:59:14 +0000	[diff] [blame]	347	# Extend the implementations of nsmallest and nlargest to use a key= argument
				348	_nsmallest = nsmallest
				349	def nsmallest(n, iterable, key=None):
				350	"""Find the n smallest elements in a dataset.
				351
				352	Equivalent to: sorted(iterable, key=key)[:n]
				353	"""
Benjamin Peterson	18e9512	2009-01-18 22:46:33 +0000	[diff] [blame]	354	# Short-cut for n==1 is to use min() when len(iterable)>0
				355	if n == 1:
				356	it = iter(iterable)
				357	head = list(islice(it, 1))
				358	if not head:
				359	return []
				360	if key is None:
				361	return [min(chain(head, it))]
				362	return [min(chain(head, it), key=key)]
				363
				364	# When n>=size, it's faster to use sort()
				365	try:
				366	size = len(iterable)
				367	except (TypeError, AttributeError):
				368	pass
				369	else:
				370	if n >= size:
				371	return sorted(iterable, key=key)[:n]
				372
				373	# When key is none, use simpler decoration
Georg Brandl	3a9b062	2009-01-03 22:07:57 +0000	[diff] [blame]	374	if key is None:
				375	it = zip(iterable, count()) # decorate
				376	result = _nsmallest(n, it)
Raymond Hettinger	ba86fa9	2009-02-21 23:20:57 +0000	[diff] [blame]	377	return [r[0] for r in result] # undecorate
Benjamin Peterson	18e9512	2009-01-18 22:46:33 +0000	[diff] [blame]	378
				379	# General case, slowest method
Raymond Hettinger	4901a1f	2004-12-02 08:59:14 +0000	[diff] [blame]	380	in1, in2 = tee(iterable)
Georg Brandl	3a9b062	2009-01-03 22:07:57 +0000	[diff] [blame]	381	it = zip(map(key, in1), count(), in2) # decorate
Raymond Hettinger	4901a1f	2004-12-02 08:59:14 +0000	[diff] [blame]	382	result = _nsmallest(n, it)
Raymond Hettinger	ba86fa9	2009-02-21 23:20:57 +0000	[diff] [blame]	383	return [r[2] for r in result] # undecorate
Raymond Hettinger	4901a1f	2004-12-02 08:59:14 +0000	[diff] [blame]	384
				385	_nlargest = nlargest
				386	def nlargest(n, iterable, key=None):
				387	"""Find the n largest elements in a dataset.
				388
				389	Equivalent to: sorted(iterable, key=key, reverse=True)[:n]
				390	"""
Benjamin Peterson	18e9512	2009-01-18 22:46:33 +0000	[diff] [blame]	391
				392	# Short-cut for n==1 is to use max() when len(iterable)>0
				393	if n == 1:
				394	it = iter(iterable)
				395	head = list(islice(it, 1))
				396	if not head:
				397	return []
				398	if key is None:
				399	return [max(chain(head, it))]
				400	return [max(chain(head, it), key=key)]
				401
				402	# When n>=size, it's faster to use sort()
				403	try:
				404	size = len(iterable)
				405	except (TypeError, AttributeError):
				406	pass
				407	else:
				408	if n >= size:
				409	return sorted(iterable, key=key, reverse=True)[:n]
				410
				411	# When key is none, use simpler decoration
Georg Brandl	3a9b062	2009-01-03 22:07:57 +0000	[diff] [blame]	412	if key is None:
Raymond Hettinger	bd171bc	2009-02-21 22:10:18 +0000	[diff] [blame]	413	it = zip(iterable, count(0,-1)) # decorate
Georg Brandl	3a9b062	2009-01-03 22:07:57 +0000	[diff] [blame]	414	result = _nlargest(n, it)
Raymond Hettinger	ba86fa9	2009-02-21 23:20:57 +0000	[diff] [blame]	415	return [r[0] for r in result] # undecorate
Benjamin Peterson	18e9512	2009-01-18 22:46:33 +0000	[diff] [blame]	416
				417	# General case, slowest method
Raymond Hettinger	4901a1f	2004-12-02 08:59:14 +0000	[diff] [blame]	418	in1, in2 = tee(iterable)
Raymond Hettinger	bd171bc	2009-02-21 22:10:18 +0000	[diff] [blame]	419	it = zip(map(key, in1), count(0,-1), in2) # decorate
Raymond Hettinger	4901a1f	2004-12-02 08:59:14 +0000	[diff] [blame]	420	result = _nlargest(n, it)
Raymond Hettinger	ba86fa9	2009-02-21 23:20:57 +0000	[diff] [blame]	421	return [r[2] for r in result] # undecorate
Raymond Hettinger	4901a1f	2004-12-02 08:59:14 +0000	[diff] [blame]	422
Raymond Hettinger	c46cb2a	2004-04-19 19:06:21 +0000	[diff] [blame]	423	if __name__ == "__main__":
				424	# Simple sanity test
				425	heap = []
				426	data = [1, 3, 5, 7, 9, 2, 4, 6, 8, 0]
				427	for item in data:
				428	heappush(heap, item)
				429	sort = []
				430	while heap:
				431	sort.append(heappop(heap))
Guido van Rossum	be19ed7	2007-02-09 05:37:30 +0000	[diff] [blame]	432	print(sort)
Thomas Wouters	cf297e4	2007-02-23 15:07:44 +0000	[diff] [blame]	433
				434	import doctest
				435	doctest.testmod()