Blame - Doc/library/itertools.rst - platform/external/python/cpython3

blob: d6e3291af955b5f5b96f9f91135b4219f6649310 [file] [log] [blame]

Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1
				2	:mod:`itertools` --- Functions creating iterators for efficient looping
				3	=======================================================================
				4
				5	.. module:: itertools
				6	:synopsis: Functions creating iterators for efficient looping.
				7	.. moduleauthor:: Raymond Hettinger <python@rcn.com>
				8	.. sectionauthor:: Raymond Hettinger <python@rcn.com>
				9
				10
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	11	This module implements a number of iterator building blocks inspired by
				12	constructs from the Haskell and SML programming languages. Each has been recast
				13	in a form suitable for Python.
				14
				15	The module standardizes a core set of fast, memory efficient tools that are
				16	useful by themselves or in combination. Standardization helps avoid the
				17	readability and reliability problems which arise when many different individuals
				18	create their own slightly varying implementations, each with their own quirks
				19	and naming conventions.
				20
				21	The tools are designed to combine readily with one another. This makes it easy
				22	to construct more specialized tools succinctly and efficiently in pure Python.
				23
				24	For instance, SML provides a tabulation tool: ``tabulate(f)`` which produces a
				25	sequence ``f(0), f(1), ...``. This toolbox provides :func:`imap` and
				26	:func:`count` which can be combined to form ``imap(f, count())`` and produce an
				27	equivalent result.
				28
				29	Likewise, the functional tools are designed to work well with the high-speed
				30	functions provided by the :mod:`operator` module.
				31
				32	The module author welcomes suggestions for other basic building blocks to be
				33	added to future versions of the module.
				34
				35	Whether cast in pure python form or compiled code, tools that use iterators are
				36	more memory efficient (and faster) than their list based counterparts. Adopting
				37	the principles of just-in-time manufacturing, they create data when and where
				38	needed instead of consuming memory with the computer equivalent of "inventory".
				39
				40	The performance advantage of iterators becomes more acute as the number of
				41	elements increases -- at some point, lists grow large enough to severely impact
				42	memory cache performance and start running slowly.
				43
				44
				45	.. seealso::
				46
				47	The Standard ML Basis Library, `The Standard ML Basis Library
				48	<http://www.standardml.org/Basis/>`_.
				49
				50	Haskell, A Purely Functional Language, `Definition of Haskell and the Standard
				51	Libraries <http://www.haskell.org/definition/>`_.
				52
				53
				54	.. _itertools-functions:
				55
				56	Itertool functions
				57	------------------
				58
				59	The following module functions all construct and return iterators. Some provide
				60	streams of infinite length, so they should only be accessed by functions or
				61	loops that truncate the stream.
				62
				63
				64	.. function:: chain(*iterables)
				65
				66	Make an iterator that returns elements from the first iterable until it is
				67	exhausted, then proceeds to the next iterable, until all of the iterables are
				68	exhausted. Used for treating consecutive sequences as a single sequence.
				69	Equivalent to::
				70
				71	def chain(*iterables):
				72	for it in iterables:
				73	for element in it:
				74	yield element
				75
				76
				77	.. function:: count([n])
				78
				79	Make an iterator that returns consecutive integers starting with n. If not
				80	specified n defaults to zero. Does not currently support python long
				81	integers. Often used as an argument to :func:`imap` to generate consecutive
				82	data points. Also, used with :func:`izip` to add sequence numbers. Equivalent
				83	to::
				84
				85	def count(n=0):
				86	while True:
				87	yield n
				88	n += 1
				89
				90	Note, :func:`count` does not check for overflow and will return negative numbers
				91	after exceeding ``sys.maxint``. This behavior may change in the future.
				92
				93
				94	.. function:: cycle(iterable)
				95
				96	Make an iterator returning elements from the iterable and saving a copy of each.
				97	When the iterable is exhausted, return elements from the saved copy. Repeats
				98	indefinitely. Equivalent to::
				99
				100	def cycle(iterable):
				101	saved = []
				102	for element in iterable:
				103	yield element
				104	saved.append(element)
				105	while saved:
				106	for element in saved:
				107	yield element
				108
				109	Note, this member of the toolkit may require significant auxiliary storage
				110	(depending on the length of the iterable).
				111
				112
				113	.. function:: dropwhile(predicate, iterable)
				114
				115	Make an iterator that drops elements from the iterable as long as the predicate
				116	is true; afterwards, returns every element. Note, the iterator does not produce
				117	any output until the predicate first becomes false, so it may have a lengthy
				118	start-up time. Equivalent to::
				119
				120	def dropwhile(predicate, iterable):
				121	iterable = iter(iterable)
				122	for x in iterable:
				123	if not predicate(x):
				124	yield x
				125	break
				126	for x in iterable:
				127	yield x
				128
				129
				130	.. function:: groupby(iterable[, key])
				131
				132	Make an iterator that returns consecutive keys and groups from the iterable.
				133	The key is a function computing a key value for each element. If not
				134	specified or is ``None``, key defaults to an identity function and returns
				135	the element unchanged. Generally, the iterable needs to already be sorted on
				136	the same key function.
				137
				138	The operation of :func:`groupby` is similar to the ``uniq`` filter in Unix. It
				139	generates a break or new group every time the value of the key function changes
				140	(which is why it is usually necessary to have sorted the data using the same key
				141	function). That behavior differs from SQL's GROUP BY which aggregates common
				142	elements regardless of their input order.
				143
				144	The returned group is itself an iterator that shares the underlying iterable
				145	with :func:`groupby`. Because the source is shared, when the :func:`groupby`
				146	object is advanced, the previous group is no longer visible. So, if that data
				147	is needed later, it should be stored as a list::
				148
				149	groups = []
				150	uniquekeys = []
				151	data = sorted(data, key=keyfunc)
				152	for k, g in groupby(data, keyfunc):
				153	groups.append(list(g)) # Store group iterator as a list
				154	uniquekeys.append(k)
				155
				156	:func:`groupby` is equivalent to::
				157
				158	class groupby(object):
				159	def __init__(self, iterable, key=None):
				160	if key is None:
				161	key = lambda x: x
				162	self.keyfunc = key
				163	self.it = iter(iterable)
				164	self.tgtkey = self.currkey = self.currvalue = []
				165	def __iter__(self):
				166	return self
				167	def __next__(self):
				168	while self.currkey == self.tgtkey:
				169	self.currvalue = next(self.it) # Exit on StopIteration
				170	self.currkey = self.keyfunc(self.currvalue)
				171	self.tgtkey = self.currkey
				172	return (self.currkey, self._grouper(self.tgtkey))
				173	def _grouper(self, tgtkey):
				174	while self.currkey == tgtkey:
				175	yield self.currvalue
				176	self.currvalue = next(self.it) # Exit on StopIteration
				177	self.currkey = self.keyfunc(self.currvalue)
				178
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	179
				180	.. function:: ifilter(predicate, iterable)
				181
				182	Make an iterator that filters elements from iterable returning only those for
				183	which the predicate is ``True``. If predicate is ``None``, return the items
				184	that are true. Equivalent to::
				185
				186	def ifilter(predicate, iterable):
				187	if predicate is None:
				188	predicate = bool
				189	for x in iterable:
				190	if predicate(x):
				191	yield x
				192
				193
				194	.. function:: ifilterfalse(predicate, iterable)
				195
				196	Make an iterator that filters elements from iterable returning only those for
				197	which the predicate is ``False``. If predicate is ``None``, return the items
				198	that are false. Equivalent to::
				199
				200	def ifilterfalse(predicate, iterable):
				201	if predicate is None:
				202	predicate = bool
				203	for x in iterable:
				204	if not predicate(x):
				205	yield x
				206
				207
				208	.. function:: imap(function, *iterables)
				209
				210	Make an iterator that computes the function using arguments from each of the
				211	iterables. If function is set to ``None``, then :func:`imap` returns the
				212	arguments as a tuple. Like :func:`map` but stops when the shortest iterable is
				213	exhausted instead of filling in ``None`` for shorter iterables. The reason for
				214	the difference is that infinite iterator arguments are typically an error for
				215	:func:`map` (because the output is fully evaluated) but represent a common and
				216	useful way of supplying arguments to :func:`imap`. Equivalent to::
				217
				218	def imap(function, *iterables):
				219	iterables = map(iter, iterables)
				220	while True:
				221	args = [next(i) for i in iterables]
				222	if function is None:
				223	yield tuple(args)
				224	else:
				225	yield function(*args)
				226
				227
				228	.. function:: islice(iterable, [start,] stop [, step])
				229
				230	Make an iterator that returns selected elements from the iterable. If start is
				231	non-zero, then elements from the iterable are skipped until start is reached.
				232	Afterward, elements are returned consecutively unless step is set higher than
				233	one which results in items being skipped. If stop is ``None``, then iteration
				234	continues until the iterator is exhausted, if at all; otherwise, it stops at the
				235	specified position. Unlike regular slicing, :func:`islice` does not support
				236	negative values for start, stop, or step. Can be used to extract related
				237	fields from data where the internal structure has been flattened (for example, a
				238	multi-line report may list a name field on every third line). Equivalent to::
				239
				240	def islice(iterable, *args):
				241	s = slice(*args)
				242	it = iter(range(s.start or 0, s.stop or sys.maxint, s.step or 1))
				243	nexti = next(it)
				244	for i, element in enumerate(iterable):
				245	if i == nexti:
				246	yield element
				247	nexti = next(it)
				248
				249	If start is ``None``, then iteration starts at zero. If step is ``None``,
				250	then the step defaults to one.
				251
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	252
				253	.. function:: izip(*iterables)
				254
				255	Make an iterator that aggregates elements from each of the iterables. Like
				256	:func:`zip` except that it returns an iterator instead of a list. Used for
				257	lock-step iteration over several iterables at a time. Equivalent to::
				258
				259	def izip(*iterables):
				260	iterables = map(iter, iterables)
				261	while iterables:
				262	result = [next(it) for it in iterables]
				263	yield tuple(result)
				264
Georg Brandl	55ac8f0	2007-09-01 13:51:09 +0000	[diff] [blame]	265	When no iterables are specified, return a zero length iterator.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	266
				267	Note, the left-to-right evaluation order of the iterables is guaranteed. This
				268	makes possible an idiom for clustering a data series into n-length groups using
				269	``izip([iter(s)]n)``. For data that doesn't fit n-length groups exactly, the
				270	last tuple can be pre-padded with fill values using ``izip(*[chain(s,
				271	[None](n-1))]n)``.
				272
				273	Note, when :func:`izip` is used with unequal length inputs, subsequent
				274	iteration over the longer iterables cannot reliably be continued after
				275	:func:`izip` terminates. Potentially, up to one entry will be missing from
				276	each of the left-over iterables. This occurs because a value is fetched from
				277	each iterator in- turn, but the process ends when one of the iterators
				278	terminates. This leaves the last fetched values in limbo (they cannot be
				279	returned in a final, incomplete tuple and they are cannot be pushed back into
				280	the iterator for retrieval with ``next(it)``). In general, :func:`izip`
				281	should only be used with unequal length inputs when you don't care about
				282	trailing, unmatched values from the longer iterables.
				283
				284
				285	.. function:: izip_longest(*iterables[, fillvalue])
				286
				287	Make an iterator that aggregates elements from each of the iterables. If the
				288	iterables are of uneven length, missing values are filled-in with fillvalue.
				289	Iteration continues until the longest iterable is exhausted. Equivalent to::
				290
				291	def izip_longest(args, *kwds):
				292	fillvalue = kwds.get('fillvalue')
				293	def sentinel(counter = ([fillvalue]*(len(args)-1)).pop):
				294	yield counter() # yields the fillvalue, or raises IndexError
				295	fillers = repeat(fillvalue)
				296	iters = [chain(it, sentinel(), fillers) for it in args]
				297	try:
				298	for tup in izip(*iters):
				299	yield tup
				300	except IndexError:
				301	pass
				302
				303	If one of the iterables is potentially infinite, then the :func:`izip_longest`
				304	function should be wrapped with something that limits the number of calls (for
				305	example :func:`islice` or :func:`takewhile`).
				306
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	307
				308	.. function:: repeat(object[, times])
				309
				310	Make an iterator that returns object over and over again. Runs indefinitely
				311	unless the times argument is specified. Used as argument to :func:`imap` for
				312	invariant parameters to the called function. Also used with :func:`izip` to
				313	create an invariant part of a tuple record. Equivalent to::
				314
				315	def repeat(object, times=None):
				316	if times is None:
				317	while True:
				318	yield object
				319	else:
				320	for i in range(times):
				321	yield object
				322
				323
				324	.. function:: starmap(function, iterable)
				325
				326	Make an iterator that computes the function using arguments tuples obtained from
				327	the iterable. Used instead of :func:`imap` when argument parameters are already
				328	grouped in tuples from a single iterable (the data has been "pre-zipped"). The
				329	difference between :func:`imap` and :func:`starmap` parallels the distinction
				330	between ``function(a,b)`` and ``function(*c)``. Equivalent to::
				331
				332	def starmap(function, iterable):
				333	iterable = iter(iterable)
				334	while True:
				335	yield function(*next(iterable))
				336
				337
				338	.. function:: takewhile(predicate, iterable)
				339
				340	Make an iterator that returns elements from the iterable as long as the
				341	predicate is true. Equivalent to::
				342
				343	def takewhile(predicate, iterable):
				344	for x in iterable:
				345	if predicate(x):
				346	yield x
				347	else:
				348	break
				349
				350
				351	.. function:: tee(iterable[, n=2])
				352
				353	Return n independent iterators from a single iterable. The case where ``n==2``
				354	is equivalent to::
				355
				356	def tee(iterable):
				357	def gen(next, data={}, cnt=[0]):
				358	for i in count():
				359	if i == cnt[0]:
				360	item = data[i] = next()
				361	cnt[0] += 1
				362	else:
				363	item = data.pop(i)
				364	yield item
				365	it = iter(iterable)
				366	return (gen(it.__next__), gen(it.__next__))
				367
				368	Note, once :func:`tee` has made a split, the original iterable should not be
				369	used anywhere else; otherwise, the iterable could get advanced without the tee
				370	objects being informed.
				371
				372	Note, this member of the toolkit may require significant auxiliary storage
				373	(depending on how much temporary data needs to be stored). In general, if one
				374	iterator is going to use most or all of the data before the other iterator, it
				375	is faster to use :func:`list` instead of :func:`tee`.
				376
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	377
				378	.. _itertools-example:
				379
				380	Examples
				381	--------
				382
				383	The following examples show common uses for each tool and demonstrate ways they
				384	can be combined. ::
				385
				386	>>> amounts = [120.15, 764.05, 823.14]
				387	>>> for checknum, amount in izip(count(1200), amounts):
Georg Brandl	6911e3c	2007-09-04 07:15:32 +0000	[diff] [blame^]	388	... print('Check %d is for $%.2f' % (checknum, amount))
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	389	...
				390	Check 1200 is for $120.15
				391	Check 1201 is for $764.05
				392	Check 1202 is for $823.14
				393
				394	>>> import operator
				395	>>> for cube in imap(operator.pow, range(1,5), repeat(3)):
Georg Brandl	6911e3c	2007-09-04 07:15:32 +0000	[diff] [blame^]	396	... print(cube)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	397	...
				398	1
				399	8
				400	27
				401	64
				402
				403	>>> reportlines = ['EuroPython', 'Roster', '', 'alex', '', 'laura',
				404	... '', 'martin', '', 'walter', '', 'mark']
				405	>>> for name in islice(reportlines, 3, None, 2):
Georg Brandl	6911e3c	2007-09-04 07:15:32 +0000	[diff] [blame^]	406	... print(name.title())
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	407	...
				408	Alex
				409	Laura
				410	Martin
				411	Walter
				412	Mark
				413
				414	# Show a dictionary sorted and grouped by value
				415	>>> from operator import itemgetter
				416	>>> d = dict(a=1, b=2, c=1, d=2, e=1, f=2, g=3)
				417	>>> di = sorted(d.iteritems(), key=itemgetter(1))
				418	>>> for k, g in groupby(di, key=itemgetter(1)):
Georg Brandl	6911e3c	2007-09-04 07:15:32 +0000	[diff] [blame^]	419	... print(k, map(itemgetter(0), g))
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	420	...
				421	1 ['a', 'c', 'e']
				422	2 ['b', 'd', 'f']
				423	3 ['g']
				424
				425	# Find runs of consecutive numbers using groupby. The key to the solution
				426	# is differencing with a range so that consecutive numbers all appear in
				427	# same group.
				428	>>> data = [ 1, 4,5,6, 10, 15,16,17,18, 22, 25,26,27,28]
				429	>>> for k, g in groupby(enumerate(data), lambda t:t[0]-t[1]):
Georg Brandl	6911e3c	2007-09-04 07:15:32 +0000	[diff] [blame^]	430	... print(map(operator.itemgetter(1), g))
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	431	...
				432	[1]
				433	[4, 5, 6]
				434	[10]
				435	[15, 16, 17, 18]
				436	[22]
				437	[25, 26, 27, 28]
				438
				439
				440
				441	.. _itertools-recipes:
				442
				443	Recipes
				444	-------
				445
				446	This section shows recipes for creating an extended toolset using the existing
				447	itertools as building blocks.
				448
				449	The extended tools offer the same high performance as the underlying toolset.
				450	The superior memory performance is kept by processing elements one at a time
				451	rather than bringing the whole iterable into memory all at once. Code volume is
				452	kept small by linking the tools together in a functional style which helps
				453	eliminate temporary variables. High speed is retained by preferring
				454	"vectorized" building blocks over the use of for-loops and generators which
				455	incur interpreter overhead. ::
				456
				457	def take(n, seq):
				458	return list(islice(seq, n))
				459
				460	def enumerate(iterable):
				461	return izip(count(), iterable)
				462
				463	def tabulate(function):
				464	"Return function(0), function(1), ..."
				465	return imap(function, count())
				466
				467	def iteritems(mapping):
				468	return izip(mapping.iterkeys(), mapping.itervalues())
				469
				470	def nth(iterable, n):
				471	"Returns the nth item or raise StopIteration"
				472	return islice(iterable, n, None).next()
				473
				474	def all(seq, pred=None):
				475	"Returns True if pred(x) is true for every element in the iterable"
				476	for elem in ifilterfalse(pred, seq):
				477	return False
				478	return True
				479
				480	def any(seq, pred=None):
				481	"Returns True if pred(x) is true for at least one element in the iterable"
				482	for elem in ifilter(pred, seq):
				483	return True
				484	return False
				485
				486	def no(seq, pred=None):
				487	"Returns True if pred(x) is false for every element in the iterable"
				488	for elem in ifilter(pred, seq):
				489	return False
				490	return True
				491
				492	def quantify(seq, pred=None):
				493	"Count how many times the predicate is true in the sequence"
				494	return sum(imap(pred, seq))
				495
				496	def padnone(seq):
				497	"""Returns the sequence elements and then returns None indefinitely.
				498
				499	Useful for emulating the behavior of the built-in map() function.
				500	"""
				501	return chain(seq, repeat(None))
				502
				503	def ncycles(seq, n):
				504	"Returns the sequence elements n times"
				505	return chain(*repeat(seq, n))
				506
				507	def dotproduct(vec1, vec2):
				508	return sum(imap(operator.mul, vec1, vec2))
				509
				510	def flatten(listOfLists):
				511	return list(chain(*listOfLists))
				512
				513	def repeatfunc(func, times=None, *args):
				514	"""Repeat calls to func with specified arguments.
				515
				516	Example: repeatfunc(random.random)
				517	"""
				518	if times is None:
				519	return starmap(func, repeat(args))
				520	else:
				521	return starmap(func, repeat(args, times))
				522
				523	def pairwise(iterable):
				524	"s -> (s0,s1), (s1,s2), (s2, s3), ..."
				525	a, b = tee(iterable)
				526	next(b, None)
				527	return izip(a, b)
				528
				529	def grouper(n, iterable, padvalue=None):
				530	"grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
				531	return izip([chain(iterable, repeat(padvalue, n-1))]n)
				532
				533
				534