Blame - Doc/library/itertools.rst - platform/external/python/cpython3

blob: e797aab6ac8a03058c5739859562996f696762bc [file] [log] [blame]

Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	1
				2	:mod:`itertools` --- Functions creating iterators for efficient looping
				3	=======================================================================
				4
				5	.. module:: itertools
				6	:synopsis: Functions creating iterators for efficient looping.
				7	.. moduleauthor:: Raymond Hettinger <python@rcn.com>
				8	.. sectionauthor:: Raymond Hettinger <python@rcn.com>
				9
				10
				11	.. versionadded:: 2.3
				12
Georg Brandl	e7a0990	2007-10-21 12:10:28 +0000	[diff] [blame]	13	This module implements a number of :term:`iterator` building blocks inspired by
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	14	constructs from the Haskell and SML programming languages. Each has been recast
				15	in a form suitable for Python.
				16
				17	The module standardizes a core set of fast, memory efficient tools that are
				18	useful by themselves or in combination. Standardization helps avoid the
				19	readability and reliability problems which arise when many different individuals
				20	create their own slightly varying implementations, each with their own quirks
				21	and naming conventions.
				22
				23	The tools are designed to combine readily with one another. This makes it easy
				24	to construct more specialized tools succinctly and efficiently in pure Python.
				25
				26	For instance, SML provides a tabulation tool: ``tabulate(f)`` which produces a
				27	sequence ``f(0), f(1), ...``. This toolbox provides :func:`imap` and
				28	:func:`count` which can be combined to form ``imap(f, count())`` and produce an
				29	equivalent result.
				30
				31	Likewise, the functional tools are designed to work well with the high-speed
				32	functions provided by the :mod:`operator` module.
				33
				34	The module author welcomes suggestions for other basic building blocks to be
				35	added to future versions of the module.
				36
				37	Whether cast in pure python form or compiled code, tools that use iterators are
				38	more memory efficient (and faster) than their list based counterparts. Adopting
				39	the principles of just-in-time manufacturing, they create data when and where
				40	needed instead of consuming memory with the computer equivalent of "inventory".
				41
				42	The performance advantage of iterators becomes more acute as the number of
				43	elements increases -- at some point, lists grow large enough to severely impact
				44	memory cache performance and start running slowly.
				45
				46
				47	.. seealso::
				48
				49	The Standard ML Basis Library, `The Standard ML Basis Library
				50	<http://www.standardml.org/Basis/>`_.
				51
				52	Haskell, A Purely Functional Language, `Definition of Haskell and the Standard
				53	Libraries <http://www.haskell.org/definition/>`_.
				54
				55
				56	.. _itertools-functions:
				57
				58	Itertool functions
				59	------------------
				60
				61	The following module functions all construct and return iterators. Some provide
				62	streams of infinite length, so they should only be accessed by functions or
				63	loops that truncate the stream.
				64
				65
				66	.. function:: chain(*iterables)
				67
				68	Make an iterator that returns elements from the first iterable until it is
				69	exhausted, then proceeds to the next iterable, until all of the iterables are
				70	exhausted. Used for treating consecutive sequences as a single sequence.
				71	Equivalent to::
				72
				73	def chain(*iterables):
				74	for it in iterables:
				75	for element in it:
				76	yield element
				77
				78
				79	.. function:: count([n])
				80
				81	Make an iterator that returns consecutive integers starting with n. If not
Raymond Hettinger	50e90e2	2007-10-04 00:20:27 +0000	[diff] [blame]	82	specified n defaults to zero. Often used as an argument to :func:`imap` to
				83	generate consecutive data points. Also, used with :func:`izip` to add sequence
				84	numbers. Equivalent to::
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	85
				86	def count(n=0):
				87	while True:
				88	yield n
				89	n += 1
				90
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	91
				92	.. function:: cycle(iterable)
				93
				94	Make an iterator returning elements from the iterable and saving a copy of each.
				95	When the iterable is exhausted, return elements from the saved copy. Repeats
				96	indefinitely. Equivalent to::
				97
				98	def cycle(iterable):
				99	saved = []
				100	for element in iterable:
				101	yield element
				102	saved.append(element)
				103	while saved:
				104	for element in saved:
				105	yield element
				106
				107	Note, this member of the toolkit may require significant auxiliary storage
				108	(depending on the length of the iterable).
				109
				110
				111	.. function:: dropwhile(predicate, iterable)
				112
				113	Make an iterator that drops elements from the iterable as long as the predicate
				114	is true; afterwards, returns every element. Note, the iterator does not produce
				115	any output until the predicate first becomes false, so it may have a lengthy
				116	start-up time. Equivalent to::
				117
				118	def dropwhile(predicate, iterable):
				119	iterable = iter(iterable)
				120	for x in iterable:
				121	if not predicate(x):
				122	yield x
				123	break
				124	for x in iterable:
				125	yield x
				126
				127
				128	.. function:: groupby(iterable[, key])
				129
				130	Make an iterator that returns consecutive keys and groups from the iterable.
				131	The key is a function computing a key value for each element. If not
				132	specified or is ``None``, key defaults to an identity function and returns
				133	the element unchanged. Generally, the iterable needs to already be sorted on
				134	the same key function.
				135
				136	The operation of :func:`groupby` is similar to the ``uniq`` filter in Unix. It
				137	generates a break or new group every time the value of the key function changes
				138	(which is why it is usually necessary to have sorted the data using the same key
				139	function). That behavior differs from SQL's GROUP BY which aggregates common
				140	elements regardless of their input order.
				141
				142	The returned group is itself an iterator that shares the underlying iterable
				143	with :func:`groupby`. Because the source is shared, when the :func:`groupby`
				144	object is advanced, the previous group is no longer visible. So, if that data
				145	is needed later, it should be stored as a list::
				146
				147	groups = []
				148	uniquekeys = []
				149	data = sorted(data, key=keyfunc)
				150	for k, g in groupby(data, keyfunc):
				151	groups.append(list(g)) # Store group iterator as a list
				152	uniquekeys.append(k)
				153
				154	:func:`groupby` is equivalent to::
				155
				156	class groupby(object):
				157	def __init__(self, iterable, key=None):
				158	if key is None:
				159	key = lambda x: x
				160	self.keyfunc = key
				161	self.it = iter(iterable)
Raymond Hettinger	81a885a	2007-12-29 22:16:24 +0000	[diff] [blame]	162	self.tgtkey = self.currkey = self.currvalue = object()
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	163	def __iter__(self):
				164	return self
				165	def next(self):
				166	while self.currkey == self.tgtkey:
				167	self.currvalue = self.it.next() # Exit on StopIteration
				168	self.currkey = self.keyfunc(self.currvalue)
				169	self.tgtkey = self.currkey
				170	return (self.currkey, self._grouper(self.tgtkey))
				171	def _grouper(self, tgtkey):
				172	while self.currkey == tgtkey:
				173	yield self.currvalue
				174	self.currvalue = self.it.next() # Exit on StopIteration
				175	self.currkey = self.keyfunc(self.currvalue)
				176
				177	.. versionadded:: 2.4
				178
				179
				180	.. function:: ifilter(predicate, iterable)
				181
				182	Make an iterator that filters elements from iterable returning only those for
				183	which the predicate is ``True``. If predicate is ``None``, return the items
				184	that are true. Equivalent to::
				185
				186	def ifilter(predicate, iterable):
				187	if predicate is None:
				188	predicate = bool
				189	for x in iterable:
				190	if predicate(x):
				191	yield x
				192
				193
				194	.. function:: ifilterfalse(predicate, iterable)
				195
				196	Make an iterator that filters elements from iterable returning only those for
				197	which the predicate is ``False``. If predicate is ``None``, return the items
				198	that are false. Equivalent to::
				199
				200	def ifilterfalse(predicate, iterable):
				201	if predicate is None:
				202	predicate = bool
				203	for x in iterable:
				204	if not predicate(x):
				205	yield x
				206
				207
				208	.. function:: imap(function, *iterables)
				209
				210	Make an iterator that computes the function using arguments from each of the
				211	iterables. If function is set to ``None``, then :func:`imap` returns the
				212	arguments as a tuple. Like :func:`map` but stops when the shortest iterable is
				213	exhausted instead of filling in ``None`` for shorter iterables. The reason for
				214	the difference is that infinite iterator arguments are typically an error for
				215	:func:`map` (because the output is fully evaluated) but represent a common and
				216	useful way of supplying arguments to :func:`imap`. Equivalent to::
				217
				218	def imap(function, *iterables):
				219	iterables = map(iter, iterables)
				220	while True:
				221	args = [i.next() for i in iterables]
				222	if function is None:
				223	yield tuple(args)
				224	else:
				225	yield function(*args)
				226
				227
				228	.. function:: islice(iterable, [start,] stop [, step])
				229
				230	Make an iterator that returns selected elements from the iterable. If start is
				231	non-zero, then elements from the iterable are skipped until start is reached.
				232	Afterward, elements are returned consecutively unless step is set higher than
				233	one which results in items being skipped. If stop is ``None``, then iteration
				234	continues until the iterator is exhausted, if at all; otherwise, it stops at the
				235	specified position. Unlike regular slicing, :func:`islice` does not support
				236	negative values for start, stop, or step. Can be used to extract related
				237	fields from data where the internal structure has been flattened (for example, a
				238	multi-line report may list a name field on every third line). Equivalent to::
				239
				240	def islice(iterable, *args):
				241	s = slice(*args)
				242	it = iter(xrange(s.start or 0, s.stop or sys.maxint, s.step or 1))
				243	nexti = it.next()
				244	for i, element in enumerate(iterable):
				245	if i == nexti:
				246	yield element
				247	nexti = it.next()
				248
				249	If start is ``None``, then iteration starts at zero. If step is ``None``,
				250	then the step defaults to one.
				251
				252	.. versionchanged:: 2.5
				253	accept ``None`` values for default start and step.
				254
				255
				256	.. function:: izip(*iterables)
				257
				258	Make an iterator that aggregates elements from each of the iterables. Like
				259	:func:`zip` except that it returns an iterator instead of a list. Used for
				260	lock-step iteration over several iterables at a time. Equivalent to::
				261
				262	def izip(*iterables):
				263	iterables = map(iter, iterables)
				264	while iterables:
				265	result = [it.next() for it in iterables]
				266	yield tuple(result)
				267
				268	.. versionchanged:: 2.4
				269	When no iterables are specified, returns a zero length iterator instead of
				270	raising a :exc:`TypeError` exception.
				271
				272	Note, the left-to-right evaluation order of the iterables is guaranteed. This
				273	makes possible an idiom for clustering a data series into n-length groups using
				274	``izip([iter(s)]n)``. For data that doesn't fit n-length groups exactly, the
				275	last tuple can be pre-padded with fill values using ``izip(*[chain(s,
				276	[None](n-1))]n)``.
				277
				278	Note, when :func:`izip` is used with unequal length inputs, subsequent
				279	iteration over the longer iterables cannot reliably be continued after
				280	:func:`izip` terminates. Potentially, up to one entry will be missing from
				281	each of the left-over iterables. This occurs because a value is fetched from
				282	each iterator in turn, but the process ends when one of the iterators
				283	terminates. This leaves the last fetched values in limbo (they cannot be
				284	returned in a final, incomplete tuple and they are cannot be pushed back into
				285	the iterator for retrieval with ``it.next()``). In general, :func:`izip`
				286	should only be used with unequal length inputs when you don't care about
				287	trailing, unmatched values from the longer iterables.
				288
				289
				290	.. function:: izip_longest(*iterables[, fillvalue])
				291
				292	Make an iterator that aggregates elements from each of the iterables. If the
				293	iterables are of uneven length, missing values are filled-in with fillvalue.
				294	Iteration continues until the longest iterable is exhausted. Equivalent to::
				295
				296	def izip_longest(args, *kwds):
				297	fillvalue = kwds.get('fillvalue')
				298	def sentinel(counter = ([fillvalue]*(len(args)-1)).pop):
				299	yield counter() # yields the fillvalue, or raises IndexError
				300	fillers = repeat(fillvalue)
				301	iters = [chain(it, sentinel(), fillers) for it in args]
				302	try:
				303	for tup in izip(*iters):
				304	yield tup
				305	except IndexError:
				306	pass
				307
				308	If one of the iterables is potentially infinite, then the :func:`izip_longest`
				309	function should be wrapped with something that limits the number of calls (for
				310	example :func:`islice` or :func:`takewhile`).
				311
				312	.. versionadded:: 2.6
				313
				314
				315	.. function:: repeat(object[, times])
				316
				317	Make an iterator that returns object over and over again. Runs indefinitely
				318	unless the times argument is specified. Used as argument to :func:`imap` for
				319	invariant parameters to the called function. Also used with :func:`izip` to
				320	create an invariant part of a tuple record. Equivalent to::
				321
				322	def repeat(object, times=None):
				323	if times is None:
				324	while True:
				325	yield object
				326	else:
				327	for i in xrange(times):
				328	yield object
				329
				330
				331	.. function:: starmap(function, iterable)
				332
Raymond Hettinger	4731709	2008-01-17 03:02:14 +0000	[diff] [blame^]	333	Make an iterator that computes the function using arguments obtained from
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	334	the iterable. Used instead of :func:`imap` when argument parameters are already
				335	grouped in tuples from a single iterable (the data has been "pre-zipped"). The
				336	difference between :func:`imap` and :func:`starmap` parallels the distinction
				337	between ``function(a,b)`` and ``function(*c)``. Equivalent to::
				338
				339	def starmap(function, iterable):
Raymond Hettinger	4731709	2008-01-17 03:02:14 +0000	[diff] [blame^]	340	for args in iterable:
				341	yield function(*args)
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	342
Raymond Hettinger	4731709	2008-01-17 03:02:14 +0000	[diff] [blame^]	343	.. versionchanged:: 2.6
				344	Previously, :func:`starmap` required the function arguments to be tuples.
				345	Now, any iterable is allowed.
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	346
				347	.. function:: takewhile(predicate, iterable)
				348
				349	Make an iterator that returns elements from the iterable as long as the
				350	predicate is true. Equivalent to::
				351
				352	def takewhile(predicate, iterable):
				353	for x in iterable:
				354	if predicate(x):
				355	yield x
				356	else:
				357	break
				358
				359
				360	.. function:: tee(iterable[, n=2])
				361
				362	Return n independent iterators from a single iterable. The case where ``n==2``
				363	is equivalent to::
				364
				365	def tee(iterable):
Raymond Hettinger	5d332bb	2007-12-29 22:09:34 +0000	[diff] [blame]	366	def gen(next, data={}):
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	367	for i in count():
Raymond Hettinger	5d332bb	2007-12-29 22:09:34 +0000	[diff] [blame]	368	if i in data:
				369	yield data.pop(i)
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	370	else:
Raymond Hettinger	5d332bb	2007-12-29 22:09:34 +0000	[diff] [blame]	371	data[i] = next()
				372	yield data[i]
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	373	it = iter(iterable)
Raymond Hettinger	5d332bb	2007-12-29 22:09:34 +0000	[diff] [blame]	374	return gen(it.next), gen(it.next)
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	375
				376	Note, once :func:`tee` has made a split, the original iterable should not be
				377	used anywhere else; otherwise, the iterable could get advanced without the tee
				378	objects being informed.
				379
				380	Note, this member of the toolkit may require significant auxiliary storage
				381	(depending on how much temporary data needs to be stored). In general, if one
				382	iterator is going to use most or all of the data before the other iterator, it
				383	is faster to use :func:`list` instead of :func:`tee`.
				384
				385	.. versionadded:: 2.4
				386
				387
				388	.. _itertools-example:
				389
				390	Examples
				391	--------
				392
				393	The following examples show common uses for each tool and demonstrate ways they
				394	can be combined. ::
				395
				396	>>> amounts = [120.15, 764.05, 823.14]
				397	>>> for checknum, amount in izip(count(1200), amounts):
				398	... print 'Check %d is for $%.2f' % (checknum, amount)
				399	...
				400	Check 1200 is for $120.15
				401	Check 1201 is for $764.05
				402	Check 1202 is for $823.14
				403
				404	>>> import operator
				405	>>> for cube in imap(operator.pow, xrange(1,5), repeat(3)):
				406	... print cube
				407	...
				408	1
				409	8
				410	27
				411	64
				412
				413	>>> reportlines = ['EuroPython', 'Roster', '', 'alex', '', 'laura',
				414	... '', 'martin', '', 'walter', '', 'mark']
				415	>>> for name in islice(reportlines, 3, None, 2):
				416	... print name.title()
				417	...
				418	Alex
				419	Laura
				420	Martin
				421	Walter
				422	Mark
				423
				424	# Show a dictionary sorted and grouped by value
				425	>>> from operator import itemgetter
				426	>>> d = dict(a=1, b=2, c=1, d=2, e=1, f=2, g=3)
				427	>>> di = sorted(d.iteritems(), key=itemgetter(1))
				428	>>> for k, g in groupby(di, key=itemgetter(1)):
				429	... print k, map(itemgetter(0), g)
				430	...
				431	1 ['a', 'c', 'e']
				432	2 ['b', 'd', 'f']
				433	3 ['g']
				434
				435	# Find runs of consecutive numbers using groupby. The key to the solution
				436	# is differencing with a range so that consecutive numbers all appear in
				437	# same group.
				438	>>> data = [ 1, 4,5,6, 10, 15,16,17,18, 22, 25,26,27,28]
				439	>>> for k, g in groupby(enumerate(data), lambda (i,x):i-x):
				440	... print map(operator.itemgetter(1), g)
				441	...
				442	[1]
				443	[4, 5, 6]
				444	[10]
				445	[15, 16, 17, 18]
				446	[22]
				447	[25, 26, 27, 28]
				448
				449
				450
				451	.. _itertools-recipes:
				452
				453	Recipes
				454	-------
				455
				456	This section shows recipes for creating an extended toolset using the existing
				457	itertools as building blocks.
				458
				459	The extended tools offer the same high performance as the underlying toolset.
				460	The superior memory performance is kept by processing elements one at a time
				461	rather than bringing the whole iterable into memory all at once. Code volume is
				462	kept small by linking the tools together in a functional style which helps
				463	eliminate temporary variables. High speed is retained by preferring
Georg Brandl	cf3fb25	2007-10-21 10:52:38 +0000	[diff] [blame]	464	"vectorized" building blocks over the use of for-loops and :term:`generator`\s
				465	which incur interpreter overhead. ::
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	466
				467	def take(n, seq):
				468	return list(islice(seq, n))
				469
				470	def enumerate(iterable):
				471	return izip(count(), iterable)
				472
				473	def tabulate(function):
				474	"Return function(0), function(1), ..."
				475	return imap(function, count())
				476
				477	def iteritems(mapping):
				478	return izip(mapping.iterkeys(), mapping.itervalues())
				479
				480	def nth(iterable, n):
				481	"Returns the nth item or raise StopIteration"
				482	return islice(iterable, n, None).next()
				483
				484	def all(seq, pred=None):
				485	"Returns True if pred(x) is true for every element in the iterable"
				486	for elem in ifilterfalse(pred, seq):
				487	return False
				488	return True
				489
				490	def any(seq, pred=None):
				491	"Returns True if pred(x) is true for at least one element in the iterable"
				492	for elem in ifilter(pred, seq):
				493	return True
				494	return False
				495
				496	def no(seq, pred=None):
				497	"Returns True if pred(x) is false for every element in the iterable"
				498	for elem in ifilter(pred, seq):
				499	return False
				500	return True
				501
				502	def quantify(seq, pred=None):
				503	"Count how many times the predicate is true in the sequence"
				504	return sum(imap(pred, seq))
				505
				506	def padnone(seq):
				507	"""Returns the sequence elements and then returns None indefinitely.
				508
				509	Useful for emulating the behavior of the built-in map() function.
				510	"""
				511	return chain(seq, repeat(None))
				512
				513	def ncycles(seq, n):
				514	"Returns the sequence elements n times"
				515	return chain(*repeat(seq, n))
				516
				517	def dotproduct(vec1, vec2):
				518	return sum(imap(operator.mul, vec1, vec2))
				519
				520	def flatten(listOfLists):
				521	return list(chain(*listOfLists))
				522
				523	def repeatfunc(func, times=None, *args):
				524	"""Repeat calls to func with specified arguments.
				525
				526	Example: repeatfunc(random.random)
				527	"""
				528	if times is None:
				529	return starmap(func, repeat(args))
				530	else:
				531	return starmap(func, repeat(args, times))
				532
				533	def pairwise(iterable):
				534	"s -> (s0,s1), (s1,s2), (s2, s3), ..."
				535	a, b = tee(iterable)
				536	try:
				537	b.next()
				538	except StopIteration:
				539	pass
				540	return izip(a, b)
				541
				542	def grouper(n, iterable, padvalue=None):
				543	"grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
				544	return izip([chain(iterable, repeat(padvalue, n-1))]n)
				545
				546
				547