Blame - Doc/library/itertools.rst - platform/external/python/cpython2

blob: 788d931c2841dd94f2b41fd10c1927dedd207a90 [file] [log] [blame]

Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	1
				2	:mod:`itertools` --- Functions creating iterators for efficient looping
				3	=======================================================================
				4
				5	.. module:: itertools
				6	:synopsis: Functions creating iterators for efficient looping.
				7	.. moduleauthor:: Raymond Hettinger <python@rcn.com>
				8	.. sectionauthor:: Raymond Hettinger <python@rcn.com>
				9
				10
				11	.. versionadded:: 2.3
				12
Georg Brandl	e7a0990	2007-10-21 12:10:28 +0000	[diff] [blame]	13	This module implements a number of :term:`iterator` building blocks inspired by
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	14	constructs from the Haskell and SML programming languages. Each has been recast
				15	in a form suitable for Python.
				16
				17	The module standardizes a core set of fast, memory efficient tools that are
				18	useful by themselves or in combination. Standardization helps avoid the
				19	readability and reliability problems which arise when many different individuals
				20	create their own slightly varying implementations, each with their own quirks
				21	and naming conventions.
				22
				23	The tools are designed to combine readily with one another. This makes it easy
				24	to construct more specialized tools succinctly and efficiently in pure Python.
				25
				26	For instance, SML provides a tabulation tool: ``tabulate(f)`` which produces a
				27	sequence ``f(0), f(1), ...``. This toolbox provides :func:`imap` and
				28	:func:`count` which can be combined to form ``imap(f, count())`` and produce an
				29	equivalent result.
				30
				31	Likewise, the functional tools are designed to work well with the high-speed
				32	functions provided by the :mod:`operator` module.
				33
				34	The module author welcomes suggestions for other basic building blocks to be
				35	added to future versions of the module.
				36
				37	Whether cast in pure python form or compiled code, tools that use iterators are
				38	more memory efficient (and faster) than their list based counterparts. Adopting
				39	the principles of just-in-time manufacturing, they create data when and where
				40	needed instead of consuming memory with the computer equivalent of "inventory".
				41
				42	The performance advantage of iterators becomes more acute as the number of
				43	elements increases -- at some point, lists grow large enough to severely impact
				44	memory cache performance and start running slowly.
				45
				46
				47	.. seealso::
				48
				49	The Standard ML Basis Library, `The Standard ML Basis Library
				50	<http://www.standardml.org/Basis/>`_.
				51
				52	Haskell, A Purely Functional Language, `Definition of Haskell and the Standard
				53	Libraries <http://www.haskell.org/definition/>`_.
				54
				55
				56	.. _itertools-functions:
				57
				58	Itertool functions
				59	------------------
				60
				61	The following module functions all construct and return iterators. Some provide
				62	streams of infinite length, so they should only be accessed by functions or
				63	loops that truncate the stream.
				64
				65
				66	.. function:: chain(*iterables)
				67
				68	Make an iterator that returns elements from the first iterable until it is
				69	exhausted, then proceeds to the next iterable, until all of the iterables are
				70	exhausted. Used for treating consecutive sequences as a single sequence.
				71	Equivalent to::
				72
				73	def chain(*iterables):
				74	for it in iterables:
				75	for element in it:
				76	yield element
				77
				78
				79	.. function:: count([n])
				80
				81	Make an iterator that returns consecutive integers starting with n. If not
Raymond Hettinger	50e90e2	2007-10-04 00:20:27 +0000	[diff] [blame]	82	specified n defaults to zero. Often used as an argument to :func:`imap` to
				83	generate consecutive data points. Also, used with :func:`izip` to add sequence
				84	numbers. Equivalent to::
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	85
				86	def count(n=0):
				87	while True:
				88	yield n
				89	n += 1
				90
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	91
				92	.. function:: cycle(iterable)
				93
				94	Make an iterator returning elements from the iterable and saving a copy of each.
				95	When the iterable is exhausted, return elements from the saved copy. Repeats
				96	indefinitely. Equivalent to::
				97
				98	def cycle(iterable):
				99	saved = []
				100	for element in iterable:
				101	yield element
				102	saved.append(element)
				103	while saved:
				104	for element in saved:
				105	yield element
				106
				107	Note, this member of the toolkit may require significant auxiliary storage
				108	(depending on the length of the iterable).
				109
				110
				111	.. function:: dropwhile(predicate, iterable)
				112
				113	Make an iterator that drops elements from the iterable as long as the predicate
				114	is true; afterwards, returns every element. Note, the iterator does not produce
				115	any output until the predicate first becomes false, so it may have a lengthy
				116	start-up time. Equivalent to::
				117
				118	def dropwhile(predicate, iterable):
				119	iterable = iter(iterable)
				120	for x in iterable:
				121	if not predicate(x):
				122	yield x
				123	break
				124	for x in iterable:
				125	yield x
				126
				127
				128	.. function:: groupby(iterable[, key])
				129
				130	Make an iterator that returns consecutive keys and groups from the iterable.
				131	The key is a function computing a key value for each element. If not
				132	specified or is ``None``, key defaults to an identity function and returns
				133	the element unchanged. Generally, the iterable needs to already be sorted on
				134	the same key function.
				135
				136	The operation of :func:`groupby` is similar to the ``uniq`` filter in Unix. It
				137	generates a break or new group every time the value of the key function changes
				138	(which is why it is usually necessary to have sorted the data using the same key
				139	function). That behavior differs from SQL's GROUP BY which aggregates common
				140	elements regardless of their input order.
				141
				142	The returned group is itself an iterator that shares the underlying iterable
				143	with :func:`groupby`. Because the source is shared, when the :func:`groupby`
				144	object is advanced, the previous group is no longer visible. So, if that data
				145	is needed later, it should be stored as a list::
				146
				147	groups = []
				148	uniquekeys = []
				149	data = sorted(data, key=keyfunc)
				150	for k, g in groupby(data, keyfunc):
				151	groups.append(list(g)) # Store group iterator as a list
				152	uniquekeys.append(k)
				153
				154	:func:`groupby` is equivalent to::
				155
				156	class groupby(object):
				157	def __init__(self, iterable, key=None):
				158	if key is None:
				159	key = lambda x: x
				160	self.keyfunc = key
				161	self.it = iter(iterable)
Raymond Hettinger	81a885a	2007-12-29 22:16:24 +0000	[diff] [blame^]	162	self.tgtkey = self.currkey = self.currvalue = object()
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	163	def __iter__(self):
				164	return self
				165	def next(self):
				166	while self.currkey == self.tgtkey:
				167	self.currvalue = self.it.next() # Exit on StopIteration
				168	self.currkey = self.keyfunc(self.currvalue)
				169	self.tgtkey = self.currkey
				170	return (self.currkey, self._grouper(self.tgtkey))
				171	def _grouper(self, tgtkey):
				172	while self.currkey == tgtkey:
				173	yield self.currvalue
				174	self.currvalue = self.it.next() # Exit on StopIteration
				175	self.currkey = self.keyfunc(self.currvalue)
				176
				177	.. versionadded:: 2.4
				178
				179
				180	.. function:: ifilter(predicate, iterable)
				181
				182	Make an iterator that filters elements from iterable returning only those for
				183	which the predicate is ``True``. If predicate is ``None``, return the items
				184	that are true. Equivalent to::
				185
				186	def ifilter(predicate, iterable):
				187	if predicate is None:
				188	predicate = bool
				189	for x in iterable:
				190	if predicate(x):
				191	yield x
				192
				193
				194	.. function:: ifilterfalse(predicate, iterable)
				195
				196	Make an iterator that filters elements from iterable returning only those for
				197	which the predicate is ``False``. If predicate is ``None``, return the items
				198	that are false. Equivalent to::
				199
				200	def ifilterfalse(predicate, iterable):
				201	if predicate is None:
				202	predicate = bool
				203	for x in iterable:
				204	if not predicate(x):
				205	yield x
				206
				207
				208	.. function:: imap(function, *iterables)
				209
				210	Make an iterator that computes the function using arguments from each of the
				211	iterables. If function is set to ``None``, then :func:`imap` returns the
				212	arguments as a tuple. Like :func:`map` but stops when the shortest iterable is
				213	exhausted instead of filling in ``None`` for shorter iterables. The reason for
				214	the difference is that infinite iterator arguments are typically an error for
				215	:func:`map` (because the output is fully evaluated) but represent a common and
				216	useful way of supplying arguments to :func:`imap`. Equivalent to::
				217
				218	def imap(function, *iterables):
				219	iterables = map(iter, iterables)
				220	while True:
				221	args = [i.next() for i in iterables]
				222	if function is None:
				223	yield tuple(args)
				224	else:
				225	yield function(*args)
				226
				227
				228	.. function:: islice(iterable, [start,] stop [, step])
				229
				230	Make an iterator that returns selected elements from the iterable. If start is
				231	non-zero, then elements from the iterable are skipped until start is reached.
				232	Afterward, elements are returned consecutively unless step is set higher than
				233	one which results in items being skipped. If stop is ``None``, then iteration
				234	continues until the iterator is exhausted, if at all; otherwise, it stops at the
				235	specified position. Unlike regular slicing, :func:`islice` does not support
				236	negative values for start, stop, or step. Can be used to extract related
				237	fields from data where the internal structure has been flattened (for example, a
				238	multi-line report may list a name field on every third line). Equivalent to::
				239
				240	def islice(iterable, *args):
				241	s = slice(*args)
				242	it = iter(xrange(s.start or 0, s.stop or sys.maxint, s.step or 1))
				243	nexti = it.next()
				244	for i, element in enumerate(iterable):
				245	if i == nexti:
				246	yield element
				247	nexti = it.next()
				248
				249	If start is ``None``, then iteration starts at zero. If step is ``None``,
				250	then the step defaults to one.
				251
				252	.. versionchanged:: 2.5
				253	accept ``None`` values for default start and step.
				254
				255
				256	.. function:: izip(*iterables)
				257
				258	Make an iterator that aggregates elements from each of the iterables. Like
				259	:func:`zip` except that it returns an iterator instead of a list. Used for
				260	lock-step iteration over several iterables at a time. Equivalent to::
				261
				262	def izip(*iterables):
				263	iterables = map(iter, iterables)
				264	while iterables:
				265	result = [it.next() for it in iterables]
				266	yield tuple(result)
				267
				268	.. versionchanged:: 2.4
				269	When no iterables are specified, returns a zero length iterator instead of
				270	raising a :exc:`TypeError` exception.
				271
				272	Note, the left-to-right evaluation order of the iterables is guaranteed. This
				273	makes possible an idiom for clustering a data series into n-length groups using
				274	``izip([iter(s)]n)``. For data that doesn't fit n-length groups exactly, the
				275	last tuple can be pre-padded with fill values using ``izip(*[chain(s,
				276	[None](n-1))]n)``.
				277
				278	Note, when :func:`izip` is used with unequal length inputs, subsequent
				279	iteration over the longer iterables cannot reliably be continued after
				280	:func:`izip` terminates. Potentially, up to one entry will be missing from
				281	each of the left-over iterables. This occurs because a value is fetched from
				282	each iterator in turn, but the process ends when one of the iterators
				283	terminates. This leaves the last fetched values in limbo (they cannot be
				284	returned in a final, incomplete tuple and they are cannot be pushed back into
				285	the iterator for retrieval with ``it.next()``). In general, :func:`izip`
				286	should only be used with unequal length inputs when you don't care about
				287	trailing, unmatched values from the longer iterables.
				288
				289
				290	.. function:: izip_longest(*iterables[, fillvalue])
				291
				292	Make an iterator that aggregates elements from each of the iterables. If the
				293	iterables are of uneven length, missing values are filled-in with fillvalue.
				294	Iteration continues until the longest iterable is exhausted. Equivalent to::
				295
				296	def izip_longest(args, *kwds):
				297	fillvalue = kwds.get('fillvalue')
				298	def sentinel(counter = ([fillvalue]*(len(args)-1)).pop):
				299	yield counter() # yields the fillvalue, or raises IndexError
				300	fillers = repeat(fillvalue)
				301	iters = [chain(it, sentinel(), fillers) for it in args]
				302	try:
				303	for tup in izip(*iters):
				304	yield tup
				305	except IndexError:
				306	pass
				307
				308	If one of the iterables is potentially infinite, then the :func:`izip_longest`
				309	function should be wrapped with something that limits the number of calls (for
				310	example :func:`islice` or :func:`takewhile`).
				311
				312	.. versionadded:: 2.6
				313
				314
				315	.. function:: repeat(object[, times])
				316
				317	Make an iterator that returns object over and over again. Runs indefinitely
				318	unless the times argument is specified. Used as argument to :func:`imap` for
				319	invariant parameters to the called function. Also used with :func:`izip` to
				320	create an invariant part of a tuple record. Equivalent to::
				321
				322	def repeat(object, times=None):
				323	if times is None:
				324	while True:
				325	yield object
				326	else:
				327	for i in xrange(times):
				328	yield object
				329
				330
				331	.. function:: starmap(function, iterable)
				332
				333	Make an iterator that computes the function using arguments tuples obtained from
				334	the iterable. Used instead of :func:`imap` when argument parameters are already
				335	grouped in tuples from a single iterable (the data has been "pre-zipped"). The
				336	difference between :func:`imap` and :func:`starmap` parallels the distinction
				337	between ``function(a,b)`` and ``function(*c)``. Equivalent to::
				338
				339	def starmap(function, iterable):
				340	iterable = iter(iterable)
				341	while True:
				342	yield function(*iterable.next())
				343
				344
				345	.. function:: takewhile(predicate, iterable)
				346
				347	Make an iterator that returns elements from the iterable as long as the
				348	predicate is true. Equivalent to::
				349
				350	def takewhile(predicate, iterable):
				351	for x in iterable:
				352	if predicate(x):
				353	yield x
				354	else:
				355	break
				356
				357
				358	.. function:: tee(iterable[, n=2])
				359
				360	Return n independent iterators from a single iterable. The case where ``n==2``
				361	is equivalent to::
				362
				363	def tee(iterable):
Raymond Hettinger	5d332bb	2007-12-29 22:09:34 +0000	[diff] [blame]	364	def gen(next, data={}):
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	365	for i in count():
Raymond Hettinger	5d332bb	2007-12-29 22:09:34 +0000	[diff] [blame]	366	if i in data:
				367	yield data.pop(i)
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	368	else:
Raymond Hettinger	5d332bb	2007-12-29 22:09:34 +0000	[diff] [blame]	369	data[i] = next()
				370	yield data[i]
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	371	it = iter(iterable)
Raymond Hettinger	5d332bb	2007-12-29 22:09:34 +0000	[diff] [blame]	372	return gen(it.next), gen(it.next)
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	373
				374	Note, once :func:`tee` has made a split, the original iterable should not be
				375	used anywhere else; otherwise, the iterable could get advanced without the tee
				376	objects being informed.
				377
				378	Note, this member of the toolkit may require significant auxiliary storage
				379	(depending on how much temporary data needs to be stored). In general, if one
				380	iterator is going to use most or all of the data before the other iterator, it
				381	is faster to use :func:`list` instead of :func:`tee`.
				382
				383	.. versionadded:: 2.4
				384
				385
				386	.. _itertools-example:
				387
				388	Examples
				389	--------
				390
				391	The following examples show common uses for each tool and demonstrate ways they
				392	can be combined. ::
				393
				394	>>> amounts = [120.15, 764.05, 823.14]
				395	>>> for checknum, amount in izip(count(1200), amounts):
				396	... print 'Check %d is for $%.2f' % (checknum, amount)
				397	...
				398	Check 1200 is for $120.15
				399	Check 1201 is for $764.05
				400	Check 1202 is for $823.14
				401
				402	>>> import operator
				403	>>> for cube in imap(operator.pow, xrange(1,5), repeat(3)):
				404	... print cube
				405	...
				406	1
				407	8
				408	27
				409	64
				410
				411	>>> reportlines = ['EuroPython', 'Roster', '', 'alex', '', 'laura',
				412	... '', 'martin', '', 'walter', '', 'mark']
				413	>>> for name in islice(reportlines, 3, None, 2):
				414	... print name.title()
				415	...
				416	Alex
				417	Laura
				418	Martin
				419	Walter
				420	Mark
				421
				422	# Show a dictionary sorted and grouped by value
				423	>>> from operator import itemgetter
				424	>>> d = dict(a=1, b=2, c=1, d=2, e=1, f=2, g=3)
				425	>>> di = sorted(d.iteritems(), key=itemgetter(1))
				426	>>> for k, g in groupby(di, key=itemgetter(1)):
				427	... print k, map(itemgetter(0), g)
				428	...
				429	1 ['a', 'c', 'e']
				430	2 ['b', 'd', 'f']
				431	3 ['g']
				432
				433	# Find runs of consecutive numbers using groupby. The key to the solution
				434	# is differencing with a range so that consecutive numbers all appear in
				435	# same group.
				436	>>> data = [ 1, 4,5,6, 10, 15,16,17,18, 22, 25,26,27,28]
				437	>>> for k, g in groupby(enumerate(data), lambda (i,x):i-x):
				438	... print map(operator.itemgetter(1), g)
				439	...
				440	[1]
				441	[4, 5, 6]
				442	[10]
				443	[15, 16, 17, 18]
				444	[22]
				445	[25, 26, 27, 28]
				446
				447
				448
				449	.. _itertools-recipes:
				450
				451	Recipes
				452	-------
				453
				454	This section shows recipes for creating an extended toolset using the existing
				455	itertools as building blocks.
				456
				457	The extended tools offer the same high performance as the underlying toolset.
				458	The superior memory performance is kept by processing elements one at a time
				459	rather than bringing the whole iterable into memory all at once. Code volume is
				460	kept small by linking the tools together in a functional style which helps
				461	eliminate temporary variables. High speed is retained by preferring
Georg Brandl	cf3fb25	2007-10-21 10:52:38 +0000	[diff] [blame]	462	"vectorized" building blocks over the use of for-loops and :term:`generator`\s
				463	which incur interpreter overhead. ::
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	464
				465	def take(n, seq):
				466	return list(islice(seq, n))
				467
				468	def enumerate(iterable):
				469	return izip(count(), iterable)
				470
				471	def tabulate(function):
				472	"Return function(0), function(1), ..."
				473	return imap(function, count())
				474
				475	def iteritems(mapping):
				476	return izip(mapping.iterkeys(), mapping.itervalues())
				477
				478	def nth(iterable, n):
				479	"Returns the nth item or raise StopIteration"
				480	return islice(iterable, n, None).next()
				481
				482	def all(seq, pred=None):
				483	"Returns True if pred(x) is true for every element in the iterable"
				484	for elem in ifilterfalse(pred, seq):
				485	return False
				486	return True
				487
				488	def any(seq, pred=None):
				489	"Returns True if pred(x) is true for at least one element in the iterable"
				490	for elem in ifilter(pred, seq):
				491	return True
				492	return False
				493
				494	def no(seq, pred=None):
				495	"Returns True if pred(x) is false for every element in the iterable"
				496	for elem in ifilter(pred, seq):
				497	return False
				498	return True
				499
				500	def quantify(seq, pred=None):
				501	"Count how many times the predicate is true in the sequence"
				502	return sum(imap(pred, seq))
				503
				504	def padnone(seq):
				505	"""Returns the sequence elements and then returns None indefinitely.
				506
				507	Useful for emulating the behavior of the built-in map() function.
				508	"""
				509	return chain(seq, repeat(None))
				510
				511	def ncycles(seq, n):
				512	"Returns the sequence elements n times"
				513	return chain(*repeat(seq, n))
				514
				515	def dotproduct(vec1, vec2):
				516	return sum(imap(operator.mul, vec1, vec2))
				517
				518	def flatten(listOfLists):
				519	return list(chain(*listOfLists))
				520
				521	def repeatfunc(func, times=None, *args):
				522	"""Repeat calls to func with specified arguments.
				523
				524	Example: repeatfunc(random.random)
				525	"""
				526	if times is None:
				527	return starmap(func, repeat(args))
				528	else:
				529	return starmap(func, repeat(args, times))
				530
				531	def pairwise(iterable):
				532	"s -> (s0,s1), (s1,s2), (s2, s3), ..."
				533	a, b = tee(iterable)
				534	try:
				535	b.next()
				536	except StopIteration:
				537	pass
				538	return izip(a, b)
				539
				540	def grouper(n, iterable, padvalue=None):
				541	"grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
				542	return izip([chain(iterable, repeat(padvalue, n-1))]n)
				543
				544
				545