Blame - Lib/heapq.py - platform/external/python/cpython3

blob: 2d3404644aa4db308213f816c416f1fa95be1639 [file] [log] [blame]

Raymond Hettinger	c46cb2a	2004-04-19 19:06:21 +0000	[diff] [blame]	1	# -- coding: Latin-1 --
				2
				3	"""Heap queue algorithm (a.k.a. priority queue).
				4
				5	Heaps are arrays for which a[k] <= a[2k+1] and a[k] <= a[2k+2] for
				6	all k, counting elements from 0. For the sake of comparison,
				7	non-existing elements are considered to be infinite. The interesting
				8	property of a heap is that a[0] is always its smallest element.
				9
				10	Usage:
				11
				12	heap = [] # creates an empty heap
				13	heappush(heap, item) # pushes a new item on the heap
				14	item = heappop(heap) # pops the smallest item from the heap
				15	item = heap[0] # smallest item on the heap without popping it
				16	heapify(x) # transforms list into a heap, in-place, in linear time
				17	item = heapreplace(heap, item) # pops and returns smallest item, and adds
				18	# new item; the heap size is unchanged
				19
				20	Our API differs from textbook heap algorithms as follows:
				21
				22	- We use 0-based indexing. This makes the relationship between the
				23	index for a node and the indexes for its children slightly less
				24	obvious, but is more suitable since Python uses 0-based indexing.
				25
				26	- Our heappop() method returns the smallest item, not the largest.
				27
				28	These two make it possible to view the heap as a regular Python list
				29	without surprises: heap[0] is the smallest item, and heap.sort()
				30	maintains the heap invariant!
				31	"""
				32
Raymond Hettinger	33ecffb	2004-06-10 05:03:17 +0000	[diff] [blame]	33	# Original code by Kevin O'Connor, augmented by Tim Peters and Raymond Hettinger
Raymond Hettinger	c46cb2a	2004-04-19 19:06:21 +0000	[diff] [blame]	34
				35	__about__ = """Heap queues
				36
				37	[explanation by François Pinard]
				38
				39	Heaps are arrays for which a[k] <= a[2k+1] and a[k] <= a[2k+2] for
				40	all k, counting elements from 0. For the sake of comparison,
				41	non-existing elements are considered to be infinite. The interesting
				42	property of a heap is that a[0] is always its smallest element.
				43
				44	The strange invariant above is meant to be an efficient memory
				45	representation for a tournament. The numbers below are `k', not a[k]:
				46
				47	0
				48
				49	1 2
				50
				51	3 4 5 6
				52
				53	7 8 9 10 11 12 13 14
				54
				55	15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
				56
				57
				58	In the tree above, each cell `k' is topping `2k+1' and `2k+2'. In
				59	an usual binary tournament we see in sports, each cell is the winner
				60	over the two cells it tops, and we can trace the winner down the tree
				61	to see all opponents s/he had. However, in many computer applications
				62	of such tournaments, we do not need to trace the history of a winner.
				63	To be more memory efficient, when a winner is promoted, we try to
				64	replace it by something else at a lower level, and the rule becomes
				65	that a cell and the two cells it tops contain three different items,
				66	but the top cell "wins" over the two topped cells.
				67
				68	If this heap invariant is protected at all time, index 0 is clearly
				69	the overall winner. The simplest algorithmic way to remove it and
				70	find the "next" winner is to move some loser (let's say cell 30 in the
				71	diagram above) into the 0 position, and then percolate this new 0 down
				72	the tree, exchanging values, until the invariant is re-established.
				73	This is clearly logarithmic on the total number of items in the tree.
				74	By iterating over all items, you get an O(n ln n) sort.
				75
				76	A nice feature of this sort is that you can efficiently insert new
				77	items while the sort is going on, provided that the inserted items are
				78	not "better" than the last 0'th element you extracted. This is
				79	especially useful in simulation contexts, where the tree holds all
				80	incoming events, and the "win" condition means the smallest scheduled
				81	time. When an event schedule other events for execution, they are
				82	scheduled into the future, so they can easily go into the heap. So, a
				83	heap is a good structure for implementing schedulers (this is what I
				84	used for my MIDI sequencer :-).
				85
				86	Various structures for implementing schedulers have been extensively
				87	studied, and heaps are good for this, as they are reasonably speedy,
				88	the speed is almost constant, and the worst case is not much different
				89	than the average case. However, there are other representations which
				90	are more efficient overall, yet the worst cases might be terrible.
				91
				92	Heaps are also very useful in big disk sorts. You most probably all
				93	know that a big sort implies producing "runs" (which are pre-sorted
				94	sequences, which size is usually related to the amount of CPU memory),
				95	followed by a merging passes for these runs, which merging is often
				96	very cleverly organised[1]. It is very important that the initial
				97	sort produces the longest runs possible. Tournaments are a good way
				98	to that. If, using all the memory available to hold a tournament, you
				99	replace and percolate items that happen to fit the current run, you'll
				100	produce runs which are twice the size of the memory for random input,
				101	and much better for input fuzzily ordered.
				102
				103	Moreover, if you output the 0'th item on disk and get an input which
				104	may not fit in the current tournament (because the value "wins" over
				105	the last output value), it cannot fit in the heap, so the size of the
				106	heap decreases. The freed memory could be cleverly reused immediately
				107	for progressively building a second heap, which grows at exactly the
				108	same rate the first heap is melting. When the first heap completely
				109	vanishes, you switch heaps and start a new run. Clever and quite
				110	effective!
				111
				112	In a word, heaps are useful memory structures to know. I use them in
				113	a few applications, and I think it is good to keep a `heap' module
				114	around. :-)
				115
				116	--------------------
				117	[1] The disk balancing algorithms which are current, nowadays, are
				118	more annoying than clever, and this is a consequence of the seeking
				119	capabilities of the disks. On devices which cannot seek, like big
				120	tape drives, the story was quite different, and one had to be very
				121	clever to ensure (far in advance) that each tape movement will be the
				122	most effective possible (that is, will best participate at
				123	"progressing" the merge). Some tapes were even able to read
				124	backwards, and this was also used to avoid the rewinding time.
				125	Believe me, real good tape sorts were quite spectacular to watch!
				126	From all times, sorting has always been a Great Art! :-)
				127	"""
				128
Thomas Wouters	cf297e4	2007-02-23 15:07:44 +0000	[diff] [blame]	129	__all__ = ['heappush', 'heappop', 'heapify', 'heapreplace', 'merge',
Christian Heimes	dd15f6c	2008-03-16 00:07:10 +0000	[diff] [blame]	130	'nlargest', 'nsmallest', 'heappushpop']
Raymond Hettinger	33ecffb	2004-06-10 05:03:17 +0000	[diff] [blame]	131
Raymond Hettinger	736c0ab	2008-03-13 02:09:15 +0000	[diff] [blame]	132	from itertools import islice, repeat, count, tee
Thomas Wouters	902d6eb	2007-01-09 23:18:33 +0000	[diff] [blame]	133	from operator import itemgetter, neg
Raymond Hettinger	b25aa36	2004-06-12 08:33:36 +0000	[diff] [blame]	134	import bisect
Raymond Hettinger	c46cb2a	2004-04-19 19:06:21 +0000	[diff] [blame]	135
				136	def heappush(heap, item):
				137	"""Push item onto heap, maintaining the heap invariant."""
				138	heap.append(item)
				139	_siftdown(heap, 0, len(heap)-1)
				140
				141	def heappop(heap):
				142	"""Pop the smallest item off the heap, maintaining the heap invariant."""
				143	lastelt = heap.pop() # raises appropriate IndexError if heap is empty
				144	if heap:
				145	returnitem = heap[0]
				146	heap[0] = lastelt
				147	_siftup(heap, 0)
				148	else:
				149	returnitem = lastelt
				150	return returnitem
				151
				152	def heapreplace(heap, item):
				153	"""Pop and return the current smallest value, and add the new item.
				154
				155	This is more efficient than heappop() followed by heappush(), and can be
				156	more appropriate when using a fixed-size heap. Note that the value
				157	returned may be larger than item! That constrains reasonable uses of
Raymond Hettinger	8158e84	2004-09-06 07:04:09 +0000	[diff] [blame]	158	this routine unless written as part of a conditional replacement:
Raymond Hettinger	28224f8	2004-06-20 09:07:53 +0000	[diff] [blame]	159
Raymond Hettinger	8158e84	2004-09-06 07:04:09 +0000	[diff] [blame]	160	if item > heap[0]:
				161	item = heapreplace(heap, item)
Raymond Hettinger	c46cb2a	2004-04-19 19:06:21 +0000	[diff] [blame]	162	"""
				163	returnitem = heap[0] # raises appropriate IndexError if heap is empty
				164	heap[0] = item
				165	_siftup(heap, 0)
				166	return returnitem
				167
Christian Heimes	dd15f6c	2008-03-16 00:07:10 +0000	[diff] [blame]	168	def heappushpop(heap, item):
				169	"""Fast version of a heappush followed by a heappop."""
Georg Brandl	f78e02b	2008-06-10 17:40:04 +0000	[diff] [blame]	170	if heap and heap[0] < item:
Christian Heimes	dd15f6c	2008-03-16 00:07:10 +0000	[diff] [blame]	171	item, heap[0] = heap[0], item
				172	_siftup(heap, 0)
				173	return item
				174
Raymond Hettinger	c46cb2a	2004-04-19 19:06:21 +0000	[diff] [blame]	175	def heapify(x):
				176	"""Transform list into a heap, in-place, in O(len(heap)) time."""
				177	n = len(x)
				178	# Transform bottom-up. The largest index there's any point to looking at
				179	# is the largest with a child index in-range, so must have 2*i + 1 < n,
				180	# or i < (n-1)/2. If n is even = 2j, this is (2j-1)/2 = j-1/2 so
				181	# j-1 is the largest, which is n//2 - 1. If n is odd = 2*j+1, this is
				182	# (2*j+1-1)/2 = j so j-1 is the largest, and that's again n//2-1.
Guido van Rossum	805365e	2007-05-07 22:24:25 +0000	[diff] [blame]	183	for i in reversed(range(n//2)):
Raymond Hettinger	c46cb2a	2004-04-19 19:06:21 +0000	[diff] [blame]	184	_siftup(x, i)
				185
Raymond Hettinger	e1defa4	2004-11-29 05:54:48 +0000	[diff] [blame]	186	def nlargest(n, iterable):
Raymond Hettinger	33ecffb	2004-06-10 05:03:17 +0000	[diff] [blame]	187	"""Find the n largest elements in a dataset.
				188
				189	Equivalent to: sorted(iterable, reverse=True)[:n]
				190	"""
				191	it = iter(iterable)
				192	result = list(islice(it, n))
				193	if not result:
				194	return result
				195	heapify(result)
Christian Heimes	dd15f6c	2008-03-16 00:07:10 +0000	[diff] [blame]	196	_heappushpop = heappushpop
Raymond Hettinger	33ecffb	2004-06-10 05:03:17 +0000	[diff] [blame]	197	for elem in it:
Christian Heimes	dd15f6c	2008-03-16 00:07:10 +0000	[diff] [blame]	198	heappushpop(result, elem)
Raymond Hettinger	33ecffb	2004-06-10 05:03:17 +0000	[diff] [blame]	199	result.sort(reverse=True)
				200	return result
				201
Raymond Hettinger	e1defa4	2004-11-29 05:54:48 +0000	[diff] [blame]	202	def nsmallest(n, iterable):
Raymond Hettinger	33ecffb	2004-06-10 05:03:17 +0000	[diff] [blame]	203	"""Find the n smallest elements in a dataset.
				204
				205	Equivalent to: sorted(iterable)[:n]
				206	"""
Raymond Hettinger	b25aa36	2004-06-12 08:33:36 +0000	[diff] [blame]	207	if hasattr(iterable, '__len__') and n * 10 <= len(iterable):
				208	# For smaller values of n, the bisect method is faster than a minheap.
				209	# It is also memory efficient, consuming only n elements of space.
				210	it = iter(iterable)
				211	result = sorted(islice(it, 0, n))
				212	if not result:
				213	return result
				214	insort = bisect.insort
				215	pop = result.pop
				216	los = result[-1] # los --> Largest of the nsmallest
				217	for elem in it:
				218	if los <= elem:
				219	continue
				220	insort(result, elem)
				221	pop()
				222	los = result[-1]
				223	return result
				224	# An alternative approach manifests the whole iterable in memory but
				225	# saves comparisons by heapifying all at once. Also, saves time
				226	# over bisect.insort() which has O(n) data movement time for every
				227	# insertion. Finding the n smallest of an m length iterable requires
				228	# O(m) + O(n log m) comparisons.
Raymond Hettinger	33ecffb	2004-06-10 05:03:17 +0000	[diff] [blame]	229	h = list(iterable)
				230	heapify(h)
Guido van Rossum	c1f779c	2007-07-03 08:25:58 +0000	[diff] [blame]	231	return list(map(heappop, repeat(h, min(n, len(h)))))
Raymond Hettinger	33ecffb	2004-06-10 05:03:17 +0000	[diff] [blame]	232
Raymond Hettinger	c46cb2a	2004-04-19 19:06:21 +0000	[diff] [blame]	233	# 'heap' is a heap at all indices >= startpos, except possibly for pos. pos
				234	# is the index of a leaf with a possibly out-of-order value. Restore the
				235	# heap invariant.
				236	def _siftdown(heap, startpos, pos):
				237	newitem = heap[pos]
				238	# Follow the path to the root, moving parents down until finding a place
				239	# newitem fits.
				240	while pos > startpos:
				241	parentpos = (pos - 1) >> 1
				242	parent = heap[parentpos]
Georg Brandl	f78e02b	2008-06-10 17:40:04 +0000	[diff] [blame]	243	if newitem < parent:
				244	heap[pos] = parent
				245	pos = parentpos
				246	continue
				247	break
Raymond Hettinger	c46cb2a	2004-04-19 19:06:21 +0000	[diff] [blame]	248	heap[pos] = newitem
				249
				250	# The child indices of heap index pos are already heaps, and we want to make
				251	# a heap at index pos too. We do this by bubbling the smaller child of
				252	# pos up (and so on with that child's children, etc) until hitting a leaf,
				253	# then using _siftdown to move the oddball originally at index pos into place.
				254	#
				255	# We could break out of the loop as soon as we find a pos where newitem <=
				256	# both its children, but turns out that's not a good idea, and despite that
				257	# many books write the algorithm that way. During a heap pop, the last array
				258	# element is sifted in, and that tends to be large, so that comparing it
				259	# against values starting from the root usually doesn't pay (= usually doesn't
				260	# get us out of the loop early). See Knuth, Volume 3, where this is
				261	# explained and quantified in an exercise.
				262	#
				263	# Cutting the # of comparisons is important, since these routines have no
				264	# way to extract "the priority" from an array element, so that intelligence
				265	# is likely to be hiding in custom __cmp__ methods, or in array elements
				266	# storing (priority, record) tuples. Comparisons are thus potentially
				267	# expensive.
				268	#
				269	# On random arrays of length 1000, making this change cut the number of
				270	# comparisons made by heapify() a little, and those made by exhaustive
				271	# heappop() a lot, in accord with theory. Here are typical results from 3
				272	# runs (3 just to demonstrate how small the variance is):
				273	#
				274	# Compares needed by heapify Compares needed by 1000 heappops
				275	# -------------------------- --------------------------------
				276	# 1837 cut to 1663 14996 cut to 8680
				277	# 1855 cut to 1659 14966 cut to 8678
				278	# 1847 cut to 1660 15024 cut to 8703
				279	#
				280	# Building the heap by using heappush() 1000 times instead required
				281	# 2198, 2148, and 2219 compares: heapify() is more efficient, when
				282	# you can use it.
				283	#
				284	# The total compares needed by list.sort() on the same lists were 8627,
				285	# 8627, and 8632 (this should be compared to the sum of heapify() and
				286	# heappop() compares): list.sort() is (unsurprisingly!) more efficient
				287	# for sorting.
				288
				289	def _siftup(heap, pos):
				290	endpos = len(heap)
				291	startpos = pos
				292	newitem = heap[pos]
				293	# Bubble up the smaller child until hitting a leaf.
				294	childpos = 2*pos + 1 # leftmost child position
				295	while childpos < endpos:
				296	# Set childpos to index of smaller child.
				297	rightpos = childpos + 1
Georg Brandl	f78e02b	2008-06-10 17:40:04 +0000	[diff] [blame]	298	if rightpos < endpos and not heap[childpos] < heap[rightpos]:
Raymond Hettinger	c46cb2a	2004-04-19 19:06:21 +0000	[diff] [blame]	299	childpos = rightpos
				300	# Move the smaller child up.
				301	heap[pos] = heap[childpos]
				302	pos = childpos
				303	childpos = 2*pos + 1
				304	# The leaf at pos is empty now. Put newitem there, and bubble it up
				305	# to its final resting place (by sifting its parents down).
				306	heap[pos] = newitem
				307	_siftdown(heap, startpos, pos)
				308
				309	# If available, use C implementation
				310	try:
Christian Heimes	dd15f6c	2008-03-16 00:07:10 +0000	[diff] [blame]	311	from _heapq import heappush, heappop, heapify, heapreplace, nlargest, nsmallest, heappushpop
Raymond Hettinger	c46cb2a	2004-04-19 19:06:21 +0000	[diff] [blame]	312	except ImportError:
				313	pass
				314
Thomas Wouters	cf297e4	2007-02-23 15:07:44 +0000	[diff] [blame]	315	def merge(*iterables):
				316	'''Merge multiple sorted inputs into a single sorted output.
				317
Guido van Rossum	d8faa36	2007-04-27 19:54:29 +0000	[diff] [blame]	318	Similar to sorted(itertools.chain(*iterables)) but returns a generator,
Thomas Wouters	cf297e4	2007-02-23 15:07:44 +0000	[diff] [blame]	319	does not pull the data into memory all at once, and assumes that each of
				320	the input streams is already sorted (smallest to largest).
				321
				322	>>> list(merge([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25]))
				323	[0, 1, 2, 3, 4, 5, 5, 7, 8, 10, 15, 20, 25]
				324
				325	'''
				326	_heappop, _heapreplace, _StopIteration = heappop, heapreplace, StopIteration
				327
				328	h = []
				329	h_append = h.append
				330	for itnum, it in enumerate(map(iter, iterables)):
				331	try:
Georg Brandl	a18af4e	2007-04-21 15:47:16 +0000	[diff] [blame]	332	next = it.__next__
Thomas Wouters	cf297e4	2007-02-23 15:07:44 +0000	[diff] [blame]	333	h_append([next(), itnum, next])
				334	except _StopIteration:
				335	pass
				336	heapify(h)
				337
				338	while 1:
				339	try:
				340	while 1:
				341	v, itnum, next = s = h[0] # raises IndexError when h is empty
				342	yield v
				343	s[0] = next() # raises StopIteration when exhausted
				344	_heapreplace(h, s) # restore heap condition
				345	except _StopIteration:
				346	_heappop(h) # remove empty iterator
				347	except IndexError:
				348	return
				349
Raymond Hettinger	4901a1f	2004-12-02 08:59:14 +0000	[diff] [blame]	350	# Extend the implementations of nsmallest and nlargest to use a key= argument
				351	_nsmallest = nsmallest
				352	def nsmallest(n, iterable, key=None):
				353	"""Find the n smallest elements in a dataset.
				354
				355	Equivalent to: sorted(iterable, key=key)[:n]
				356	"""
Georg Brandl	3a9b062	2009-01-03 22:07:57 +0000	[diff] [blame^]	357	if key is None:
				358	it = zip(iterable, count()) # decorate
				359	result = _nsmallest(n, it)
				360	return list(map(itemgetter(0), result)) # undecorate
Raymond Hettinger	4901a1f	2004-12-02 08:59:14 +0000	[diff] [blame]	361	in1, in2 = tee(iterable)
Georg Brandl	3a9b062	2009-01-03 22:07:57 +0000	[diff] [blame^]	362	it = zip(map(key, in1), count(), in2) # decorate
Raymond Hettinger	4901a1f	2004-12-02 08:59:14 +0000	[diff] [blame]	363	result = _nsmallest(n, it)
Guido van Rossum	c1f779c	2007-07-03 08:25:58 +0000	[diff] [blame]	364	return list(map(itemgetter(2), result)) # undecorate
Raymond Hettinger	4901a1f	2004-12-02 08:59:14 +0000	[diff] [blame]	365
				366	_nlargest = nlargest
				367	def nlargest(n, iterable, key=None):
				368	"""Find the n largest elements in a dataset.
				369
				370	Equivalent to: sorted(iterable, key=key, reverse=True)[:n]
				371	"""
Georg Brandl	3a9b062	2009-01-03 22:07:57 +0000	[diff] [blame^]	372	if key is None:
				373	it = zip(iterable, map(neg, count())) # decorate
				374	result = _nlargest(n, it)
				375	return list(map(itemgetter(0), result)) # undecorate
Raymond Hettinger	4901a1f	2004-12-02 08:59:14 +0000	[diff] [blame]	376	in1, in2 = tee(iterable)
Georg Brandl	3a9b062	2009-01-03 22:07:57 +0000	[diff] [blame^]	377	it = zip(map(key, in1), map(neg, count()), in2) # decorate
Raymond Hettinger	4901a1f	2004-12-02 08:59:14 +0000	[diff] [blame]	378	result = _nlargest(n, it)
Guido van Rossum	c1f779c	2007-07-03 08:25:58 +0000	[diff] [blame]	379	return list(map(itemgetter(2), result)) # undecorate
Raymond Hettinger	4901a1f	2004-12-02 08:59:14 +0000	[diff] [blame]	380
Raymond Hettinger	c46cb2a	2004-04-19 19:06:21 +0000	[diff] [blame]	381	if __name__ == "__main__":
				382	# Simple sanity test
				383	heap = []
				384	data = [1, 3, 5, 7, 9, 2, 4, 6, 8, 0]
				385	for item in data:
				386	heappush(heap, item)
				387	sort = []
				388	while heap:
				389	sort.append(heappop(heap))
Guido van Rossum	be19ed7	2007-02-09 05:37:30 +0000	[diff] [blame]	390	print(sort)
Thomas Wouters	cf297e4	2007-02-23 15:07:44 +0000	[diff] [blame]	391
				392	import doctest
				393	doctest.testmod()