Blame - Lib/heapq.py - platform/external/python/cpython3

blob: b74818e2ba7282295feacd71b1852ed3f0366e34 [file] [log] [blame]

Benjamin Peterson	aa06900	2009-01-23 03:26:36 +0000	[diff] [blame]	1	# -- coding: latin-1 --
Raymond Hettinger	c46cb2a	2004-04-19 19:06:21 +0000	[diff] [blame]	2
				3	"""Heap queue algorithm (a.k.a. priority queue).
				4
				5	Heaps are arrays for which a[k] <= a[2k+1] and a[k] <= a[2k+2] for
				6	all k, counting elements from 0. For the sake of comparison,
				7	non-existing elements are considered to be infinite. The interesting
				8	property of a heap is that a[0] is always its smallest element.
				9
				10	Usage:
				11
				12	heap = [] # creates an empty heap
				13	heappush(heap, item) # pushes a new item on the heap
				14	item = heappop(heap) # pops the smallest item from the heap
				15	item = heap[0] # smallest item on the heap without popping it
				16	heapify(x) # transforms list into a heap, in-place, in linear time
				17	item = heapreplace(heap, item) # pops and returns smallest item, and adds
				18	# new item; the heap size is unchanged
				19
				20	Our API differs from textbook heap algorithms as follows:
				21
				22	- We use 0-based indexing. This makes the relationship between the
				23	index for a node and the indexes for its children slightly less
				24	obvious, but is more suitable since Python uses 0-based indexing.
				25
				26	- Our heappop() method returns the smallest item, not the largest.
				27
				28	These two make it possible to view the heap as a regular Python list
				29	without surprises: heap[0] is the smallest item, and heap.sort()
				30	maintains the heap invariant!
				31	"""
				32
Raymond Hettinger	33ecffb	2004-06-10 05:03:17 +0000	[diff] [blame]	33	# Original code by Kevin O'Connor, augmented by Tim Peters and Raymond Hettinger
Raymond Hettinger	c46cb2a	2004-04-19 19:06:21 +0000	[diff] [blame]	34
				35	__about__ = """Heap queues
				36
				37	[explanation by François Pinard]
				38
				39	Heaps are arrays for which a[k] <= a[2k+1] and a[k] <= a[2k+2] for
				40	all k, counting elements from 0. For the sake of comparison,
				41	non-existing elements are considered to be infinite. The interesting
				42	property of a heap is that a[0] is always its smallest element.
				43
				44	The strange invariant above is meant to be an efficient memory
				45	representation for a tournament. The numbers below are `k', not a[k]:
				46
				47	0
				48
				49	1 2
				50
				51	3 4 5 6
				52
				53	7 8 9 10 11 12 13 14
				54
				55	15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
				56
				57
				58	In the tree above, each cell `k' is topping `2k+1' and `2k+2'. In
				59	an usual binary tournament we see in sports, each cell is the winner
				60	over the two cells it tops, and we can trace the winner down the tree
				61	to see all opponents s/he had. However, in many computer applications
				62	of such tournaments, we do not need to trace the history of a winner.
				63	To be more memory efficient, when a winner is promoted, we try to
				64	replace it by something else at a lower level, and the rule becomes
				65	that a cell and the two cells it tops contain three different items,
				66	but the top cell "wins" over the two topped cells.
				67
				68	If this heap invariant is protected at all time, index 0 is clearly
				69	the overall winner. The simplest algorithmic way to remove it and
				70	find the "next" winner is to move some loser (let's say cell 30 in the
				71	diagram above) into the 0 position, and then percolate this new 0 down
				72	the tree, exchanging values, until the invariant is re-established.
				73	This is clearly logarithmic on the total number of items in the tree.
				74	By iterating over all items, you get an O(n ln n) sort.
				75
				76	A nice feature of this sort is that you can efficiently insert new
				77	items while the sort is going on, provided that the inserted items are
				78	not "better" than the last 0'th element you extracted. This is
				79	especially useful in simulation contexts, where the tree holds all
				80	incoming events, and the "win" condition means the smallest scheduled
				81	time. When an event schedule other events for execution, they are
				82	scheduled into the future, so they can easily go into the heap. So, a
				83	heap is a good structure for implementing schedulers (this is what I
				84	used for my MIDI sequencer :-).
				85
				86	Various structures for implementing schedulers have been extensively
				87	studied, and heaps are good for this, as they are reasonably speedy,
				88	the speed is almost constant, and the worst case is not much different
				89	than the average case. However, there are other representations which
				90	are more efficient overall, yet the worst cases might be terrible.
				91
				92	Heaps are also very useful in big disk sorts. You most probably all
				93	know that a big sort implies producing "runs" (which are pre-sorted
				94	sequences, which size is usually related to the amount of CPU memory),
				95	followed by a merging passes for these runs, which merging is often
				96	very cleverly organised[1]. It is very important that the initial
				97	sort produces the longest runs possible. Tournaments are a good way
				98	to that. If, using all the memory available to hold a tournament, you
				99	replace and percolate items that happen to fit the current run, you'll
				100	produce runs which are twice the size of the memory for random input,
				101	and much better for input fuzzily ordered.
				102
				103	Moreover, if you output the 0'th item on disk and get an input which
				104	may not fit in the current tournament (because the value "wins" over
				105	the last output value), it cannot fit in the heap, so the size of the
				106	heap decreases. The freed memory could be cleverly reused immediately
				107	for progressively building a second heap, which grows at exactly the
				108	same rate the first heap is melting. When the first heap completely
				109	vanishes, you switch heaps and start a new run. Clever and quite
				110	effective!
				111
				112	In a word, heaps are useful memory structures to know. I use them in
				113	a few applications, and I think it is good to keep a `heap' module
				114	around. :-)
				115
				116	--------------------
				117	[1] The disk balancing algorithms which are current, nowadays, are
				118	more annoying than clever, and this is a consequence of the seeking
				119	capabilities of the disks. On devices which cannot seek, like big
				120	tape drives, the story was quite different, and one had to be very
				121	clever to ensure (far in advance) that each tape movement will be the
				122	most effective possible (that is, will best participate at
				123	"progressing" the merge). Some tapes were even able to read
				124	backwards, and this was also used to avoid the rewinding time.
				125	Believe me, real good tape sorts were quite spectacular to watch!
				126	From all times, sorting has always been a Great Art! :-)
				127	"""
				128
Thomas Wouters	cf297e4	2007-02-23 15:07:44 +0000	[diff] [blame]	129	__all__ = ['heappush', 'heappop', 'heapify', 'heapreplace', 'merge',
Christian Heimes	dd15f6c	2008-03-16 00:07:10 +0000	[diff] [blame]	130	'nlargest', 'nsmallest', 'heappushpop']
Raymond Hettinger	33ecffb	2004-06-10 05:03:17 +0000	[diff] [blame]	131
Benjamin Peterson	18e9512	2009-01-18 22:46:33 +0000	[diff] [blame]	132	from itertools import islice, repeat, count, tee, chain
Raymond Hettinger	b25aa36	2004-06-12 08:33:36 +0000	[diff] [blame]	133	import bisect
Raymond Hettinger	c46cb2a	2004-04-19 19:06:21 +0000	[diff] [blame]	134
				135	def heappush(heap, item):
				136	"""Push item onto heap, maintaining the heap invariant."""
				137	heap.append(item)
				138	_siftdown(heap, 0, len(heap)-1)
				139
				140	def heappop(heap):
				141	"""Pop the smallest item off the heap, maintaining the heap invariant."""
				142	lastelt = heap.pop() # raises appropriate IndexError if heap is empty
				143	if heap:
				144	returnitem = heap[0]
				145	heap[0] = lastelt
				146	_siftup(heap, 0)
				147	else:
				148	returnitem = lastelt
				149	return returnitem
				150
				151	def heapreplace(heap, item):
				152	"""Pop and return the current smallest value, and add the new item.
				153
				154	This is more efficient than heappop() followed by heappush(), and can be
				155	more appropriate when using a fixed-size heap. Note that the value
				156	returned may be larger than item! That constrains reasonable uses of
Raymond Hettinger	8158e84	2004-09-06 07:04:09 +0000	[diff] [blame]	157	this routine unless written as part of a conditional replacement:
Raymond Hettinger	28224f8	2004-06-20 09:07:53 +0000	[diff] [blame]	158
Raymond Hettinger	8158e84	2004-09-06 07:04:09 +0000	[diff] [blame]	159	if item > heap[0]:
				160	item = heapreplace(heap, item)
Raymond Hettinger	c46cb2a	2004-04-19 19:06:21 +0000	[diff] [blame]	161	"""
				162	returnitem = heap[0] # raises appropriate IndexError if heap is empty
				163	heap[0] = item
				164	_siftup(heap, 0)
				165	return returnitem
				166
Christian Heimes	dd15f6c	2008-03-16 00:07:10 +0000	[diff] [blame]	167	def heappushpop(heap, item):
				168	"""Fast version of a heappush followed by a heappop."""
Georg Brandl	f78e02b	2008-06-10 17:40:04 +0000	[diff] [blame]	169	if heap and heap[0] < item:
Christian Heimes	dd15f6c	2008-03-16 00:07:10 +0000	[diff] [blame]	170	item, heap[0] = heap[0], item
				171	_siftup(heap, 0)
				172	return item
				173
Raymond Hettinger	c46cb2a	2004-04-19 19:06:21 +0000	[diff] [blame]	174	def heapify(x):
				175	"""Transform list into a heap, in-place, in O(len(heap)) time."""
				176	n = len(x)
				177	# Transform bottom-up. The largest index there's any point to looking at
				178	# is the largest with a child index in-range, so must have 2*i + 1 < n,
				179	# or i < (n-1)/2. If n is even = 2j, this is (2j-1)/2 = j-1/2 so
				180	# j-1 is the largest, which is n//2 - 1. If n is odd = 2*j+1, this is
				181	# (2*j+1-1)/2 = j so j-1 is the largest, and that's again n//2-1.
Guido van Rossum	805365e	2007-05-07 22:24:25 +0000	[diff] [blame]	182	for i in reversed(range(n//2)):
Raymond Hettinger	c46cb2a	2004-04-19 19:06:21 +0000	[diff] [blame]	183	_siftup(x, i)
				184
Raymond Hettinger	e1defa4	2004-11-29 05:54:48 +0000	[diff] [blame]	185	def nlargest(n, iterable):
Raymond Hettinger	33ecffb	2004-06-10 05:03:17 +0000	[diff] [blame]	186	"""Find the n largest elements in a dataset.
				187
				188	Equivalent to: sorted(iterable, reverse=True)[:n]
				189	"""
				190	it = iter(iterable)
				191	result = list(islice(it, n))
				192	if not result:
				193	return result
				194	heapify(result)
Christian Heimes	dd15f6c	2008-03-16 00:07:10 +0000	[diff] [blame]	195	_heappushpop = heappushpop
Raymond Hettinger	33ecffb	2004-06-10 05:03:17 +0000	[diff] [blame]	196	for elem in it:
Benjamin Peterson	5c6d787	2009-02-06 02:40:07 +0000	[diff] [blame]	197	_heappushpop(result, elem)
Raymond Hettinger	33ecffb	2004-06-10 05:03:17 +0000	[diff] [blame]	198	result.sort(reverse=True)
				199	return result
				200
Raymond Hettinger	e1defa4	2004-11-29 05:54:48 +0000	[diff] [blame]	201	def nsmallest(n, iterable):
Raymond Hettinger	33ecffb	2004-06-10 05:03:17 +0000	[diff] [blame]	202	"""Find the n smallest elements in a dataset.
				203
				204	Equivalent to: sorted(iterable)[:n]
				205	"""
Raymond Hettinger	b25aa36	2004-06-12 08:33:36 +0000	[diff] [blame]	206	if hasattr(iterable, '__len__') and n * 10 <= len(iterable):
				207	# For smaller values of n, the bisect method is faster than a minheap.
				208	# It is also memory efficient, consuming only n elements of space.
				209	it = iter(iterable)
				210	result = sorted(islice(it, 0, n))
				211	if not result:
				212	return result
				213	insort = bisect.insort
				214	pop = result.pop
				215	los = result[-1] # los --> Largest of the nsmallest
				216	for elem in it:
				217	if los <= elem:
				218	continue
				219	insort(result, elem)
				220	pop()
				221	los = result[-1]
				222	return result
				223	# An alternative approach manifests the whole iterable in memory but
				224	# saves comparisons by heapifying all at once. Also, saves time
				225	# over bisect.insort() which has O(n) data movement time for every
				226	# insertion. Finding the n smallest of an m length iterable requires
				227	# O(m) + O(n log m) comparisons.
Raymond Hettinger	33ecffb	2004-06-10 05:03:17 +0000	[diff] [blame]	228	h = list(iterable)
				229	heapify(h)
Guido van Rossum	c1f779c	2007-07-03 08:25:58 +0000	[diff] [blame]	230	return list(map(heappop, repeat(h, min(n, len(h)))))
Raymond Hettinger	33ecffb	2004-06-10 05:03:17 +0000	[diff] [blame]	231
Raymond Hettinger	c46cb2a	2004-04-19 19:06:21 +0000	[diff] [blame]	232	# 'heap' is a heap at all indices >= startpos, except possibly for pos. pos
				233	# is the index of a leaf with a possibly out-of-order value. Restore the
				234	# heap invariant.
				235	def _siftdown(heap, startpos, pos):
				236	newitem = heap[pos]
				237	# Follow the path to the root, moving parents down until finding a place
				238	# newitem fits.
				239	while pos > startpos:
				240	parentpos = (pos - 1) >> 1
				241	parent = heap[parentpos]
Georg Brandl	f78e02b	2008-06-10 17:40:04 +0000	[diff] [blame]	242	if newitem < parent:
				243	heap[pos] = parent
				244	pos = parentpos
				245	continue
				246	break
Raymond Hettinger	c46cb2a	2004-04-19 19:06:21 +0000	[diff] [blame]	247	heap[pos] = newitem
				248
				249	# The child indices of heap index pos are already heaps, and we want to make
				250	# a heap at index pos too. We do this by bubbling the smaller child of
				251	# pos up (and so on with that child's children, etc) until hitting a leaf,
				252	# then using _siftdown to move the oddball originally at index pos into place.
				253	#
				254	# We could break out of the loop as soon as we find a pos where newitem <=
				255	# both its children, but turns out that's not a good idea, and despite that
				256	# many books write the algorithm that way. During a heap pop, the last array
				257	# element is sifted in, and that tends to be large, so that comparing it
				258	# against values starting from the root usually doesn't pay (= usually doesn't
				259	# get us out of the loop early). See Knuth, Volume 3, where this is
				260	# explained and quantified in an exercise.
				261	#
				262	# Cutting the # of comparisons is important, since these routines have no
				263	# way to extract "the priority" from an array element, so that intelligence
Mark Dickinson	a56c467	2009-01-27 18:17:45 +0000	[diff] [blame]	264	# is likely to be hiding in custom comparison methods, or in array elements
Raymond Hettinger	c46cb2a	2004-04-19 19:06:21 +0000	[diff] [blame]	265	# storing (priority, record) tuples. Comparisons are thus potentially
				266	# expensive.
				267	#
				268	# On random arrays of length 1000, making this change cut the number of
				269	# comparisons made by heapify() a little, and those made by exhaustive
				270	# heappop() a lot, in accord with theory. Here are typical results from 3
				271	# runs (3 just to demonstrate how small the variance is):
				272	#
				273	# Compares needed by heapify Compares needed by 1000 heappops
				274	# -------------------------- --------------------------------
				275	# 1837 cut to 1663 14996 cut to 8680
				276	# 1855 cut to 1659 14966 cut to 8678
				277	# 1847 cut to 1660 15024 cut to 8703
				278	#
				279	# Building the heap by using heappush() 1000 times instead required
				280	# 2198, 2148, and 2219 compares: heapify() is more efficient, when
				281	# you can use it.
				282	#
				283	# The total compares needed by list.sort() on the same lists were 8627,
				284	# 8627, and 8632 (this should be compared to the sum of heapify() and
				285	# heappop() compares): list.sort() is (unsurprisingly!) more efficient
				286	# for sorting.
				287
				288	def _siftup(heap, pos):
				289	endpos = len(heap)
				290	startpos = pos
				291	newitem = heap[pos]
				292	# Bubble up the smaller child until hitting a leaf.
				293	childpos = 2*pos + 1 # leftmost child position
				294	while childpos < endpos:
				295	# Set childpos to index of smaller child.
				296	rightpos = childpos + 1
Georg Brandl	f78e02b	2008-06-10 17:40:04 +0000	[diff] [blame]	297	if rightpos < endpos and not heap[childpos] < heap[rightpos]:
Raymond Hettinger	c46cb2a	2004-04-19 19:06:21 +0000	[diff] [blame]	298	childpos = rightpos
				299	# Move the smaller child up.
				300	heap[pos] = heap[childpos]
				301	pos = childpos
				302	childpos = 2*pos + 1
				303	# The leaf at pos is empty now. Put newitem there, and bubble it up
				304	# to its final resting place (by sifting its parents down).
				305	heap[pos] = newitem
				306	_siftdown(heap, startpos, pos)
				307
				308	# If available, use C implementation
				309	try:
Raymond Hettinger	0dd737b	2009-03-29 19:30:50 +0000	[diff] [blame]	310	from _heapq import *
Raymond Hettinger	c46cb2a	2004-04-19 19:06:21 +0000	[diff] [blame]	311	except ImportError:
				312	pass
				313
Thomas Wouters	cf297e4	2007-02-23 15:07:44 +0000	[diff] [blame]	314	def merge(*iterables):
				315	'''Merge multiple sorted inputs into a single sorted output.
				316
Guido van Rossum	d8faa36	2007-04-27 19:54:29 +0000	[diff] [blame]	317	Similar to sorted(itertools.chain(*iterables)) but returns a generator,
Thomas Wouters	cf297e4	2007-02-23 15:07:44 +0000	[diff] [blame]	318	does not pull the data into memory all at once, and assumes that each of
				319	the input streams is already sorted (smallest to largest).
				320
				321	>>> list(merge([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25]))
				322	[0, 1, 2, 3, 4, 5, 5, 7, 8, 10, 15, 20, 25]
				323
				324	'''
				325	_heappop, _heapreplace, _StopIteration = heappop, heapreplace, StopIteration
				326
				327	h = []
				328	h_append = h.append
				329	for itnum, it in enumerate(map(iter, iterables)):
				330	try:
Georg Brandl	a18af4e	2007-04-21 15:47:16 +0000	[diff] [blame]	331	next = it.__next__
Thomas Wouters	cf297e4	2007-02-23 15:07:44 +0000	[diff] [blame]	332	h_append([next(), itnum, next])
				333	except _StopIteration:
				334	pass
				335	heapify(h)
				336
				337	while 1:
				338	try:
				339	while 1:
				340	v, itnum, next = s = h[0] # raises IndexError when h is empty
				341	yield v
				342	s[0] = next() # raises StopIteration when exhausted
				343	_heapreplace(h, s) # restore heap condition
				344	except _StopIteration:
				345	_heappop(h) # remove empty iterator
				346	except IndexError:
				347	return
				348
Raymond Hettinger	4901a1f	2004-12-02 08:59:14 +0000	[diff] [blame]	349	# Extend the implementations of nsmallest and nlargest to use a key= argument
				350	_nsmallest = nsmallest
				351	def nsmallest(n, iterable, key=None):
				352	"""Find the n smallest elements in a dataset.
				353
				354	Equivalent to: sorted(iterable, key=key)[:n]
				355	"""
Benjamin Peterson	18e9512	2009-01-18 22:46:33 +0000	[diff] [blame]	356	# Short-cut for n==1 is to use min() when len(iterable)>0
				357	if n == 1:
				358	it = iter(iterable)
				359	head = list(islice(it, 1))
				360	if not head:
				361	return []
				362	if key is None:
				363	return [min(chain(head, it))]
				364	return [min(chain(head, it), key=key)]
				365
				366	# When n>=size, it's faster to use sort()
				367	try:
				368	size = len(iterable)
				369	except (TypeError, AttributeError):
				370	pass
				371	else:
				372	if n >= size:
				373	return sorted(iterable, key=key)[:n]
				374
				375	# When key is none, use simpler decoration
Georg Brandl	3a9b062	2009-01-03 22:07:57 +0000	[diff] [blame]	376	if key is None:
				377	it = zip(iterable, count()) # decorate
				378	result = _nsmallest(n, it)
Raymond Hettinger	ba86fa9	2009-02-21 23:20:57 +0000	[diff] [blame]	379	return [r[0] for r in result] # undecorate
Benjamin Peterson	18e9512	2009-01-18 22:46:33 +0000	[diff] [blame]	380
				381	# General case, slowest method
Raymond Hettinger	4901a1f	2004-12-02 08:59:14 +0000	[diff] [blame]	382	in1, in2 = tee(iterable)
Georg Brandl	3a9b062	2009-01-03 22:07:57 +0000	[diff] [blame]	383	it = zip(map(key, in1), count(), in2) # decorate
Raymond Hettinger	4901a1f	2004-12-02 08:59:14 +0000	[diff] [blame]	384	result = _nsmallest(n, it)
Raymond Hettinger	ba86fa9	2009-02-21 23:20:57 +0000	[diff] [blame]	385	return [r[2] for r in result] # undecorate
Raymond Hettinger	4901a1f	2004-12-02 08:59:14 +0000	[diff] [blame]	386
				387	_nlargest = nlargest
				388	def nlargest(n, iterable, key=None):
				389	"""Find the n largest elements in a dataset.
				390
				391	Equivalent to: sorted(iterable, key=key, reverse=True)[:n]
				392	"""
Benjamin Peterson	18e9512	2009-01-18 22:46:33 +0000	[diff] [blame]	393
				394	# Short-cut for n==1 is to use max() when len(iterable)>0
				395	if n == 1:
				396	it = iter(iterable)
				397	head = list(islice(it, 1))
				398	if not head:
				399	return []
				400	if key is None:
				401	return [max(chain(head, it))]
				402	return [max(chain(head, it), key=key)]
				403
				404	# When n>=size, it's faster to use sort()
				405	try:
				406	size = len(iterable)
				407	except (TypeError, AttributeError):
				408	pass
				409	else:
				410	if n >= size:
				411	return sorted(iterable, key=key, reverse=True)[:n]
				412
				413	# When key is none, use simpler decoration
Georg Brandl	3a9b062	2009-01-03 22:07:57 +0000	[diff] [blame]	414	if key is None:
Raymond Hettinger	bd171bc	2009-02-21 22:10:18 +0000	[diff] [blame]	415	it = zip(iterable, count(0,-1)) # decorate
Georg Brandl	3a9b062	2009-01-03 22:07:57 +0000	[diff] [blame]	416	result = _nlargest(n, it)
Raymond Hettinger	ba86fa9	2009-02-21 23:20:57 +0000	[diff] [blame]	417	return [r[0] for r in result] # undecorate
Benjamin Peterson	18e9512	2009-01-18 22:46:33 +0000	[diff] [blame]	418
				419	# General case, slowest method
Raymond Hettinger	4901a1f	2004-12-02 08:59:14 +0000	[diff] [blame]	420	in1, in2 = tee(iterable)
Raymond Hettinger	bd171bc	2009-02-21 22:10:18 +0000	[diff] [blame]	421	it = zip(map(key, in1), count(0,-1), in2) # decorate
Raymond Hettinger	4901a1f	2004-12-02 08:59:14 +0000	[diff] [blame]	422	result = _nlargest(n, it)
Raymond Hettinger	ba86fa9	2009-02-21 23:20:57 +0000	[diff] [blame]	423	return [r[2] for r in result] # undecorate
Raymond Hettinger	4901a1f	2004-12-02 08:59:14 +0000	[diff] [blame]	424
Raymond Hettinger	c46cb2a	2004-04-19 19:06:21 +0000	[diff] [blame]	425	if __name__ == "__main__":
				426	# Simple sanity test
				427	heap = []
				428	data = [1, 3, 5, 7, 9, 2, 4, 6, 8, 0]
				429	for item in data:
				430	heappush(heap, item)
				431	sort = []
				432	while heap:
				433	sort.append(heappop(heap))
Guido van Rossum	be19ed7	2007-02-09 05:37:30 +0000	[diff] [blame]	434	print(sort)
Thomas Wouters	cf297e4	2007-02-23 15:07:44 +0000	[diff] [blame]	435
				436	import doctest
				437	doctest.testmod()