Blame - Doc/library/itertools.rst - platform/external/python/cpython2

blob: c1bffa431db803997094cb275d0d761e2a9ca903 [file] [log] [blame]

Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	1
				2	:mod:`itertools` --- Functions creating iterators for efficient looping
				3	=======================================================================
				4
				5	.. module:: itertools
				6	:synopsis: Functions creating iterators for efficient looping.
				7	.. moduleauthor:: Raymond Hettinger <python@rcn.com>
				8	.. sectionauthor:: Raymond Hettinger <python@rcn.com>
				9
				10
				11	.. versionadded:: 2.3
				12
Georg Brandl	e7a0990	2007-10-21 12:10:28 +0000	[diff] [blame]	13	This module implements a number of :term:`iterator` building blocks inspired by
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	14	constructs from the Haskell and SML programming languages. Each has been recast
				15	in a form suitable for Python.
				16
				17	The module standardizes a core set of fast, memory efficient tools that are
				18	useful by themselves or in combination. Standardization helps avoid the
				19	readability and reliability problems which arise when many different individuals
				20	create their own slightly varying implementations, each with their own quirks
				21	and naming conventions.
				22
				23	The tools are designed to combine readily with one another. This makes it easy
				24	to construct more specialized tools succinctly and efficiently in pure Python.
				25
				26	For instance, SML provides a tabulation tool: ``tabulate(f)`` which produces a
				27	sequence ``f(0), f(1), ...``. This toolbox provides :func:`imap` and
				28	:func:`count` which can be combined to form ``imap(f, count())`` and produce an
				29	equivalent result.
				30
				31	Likewise, the functional tools are designed to work well with the high-speed
				32	functions provided by the :mod:`operator` module.
				33
				34	The module author welcomes suggestions for other basic building blocks to be
				35	added to future versions of the module.
				36
				37	Whether cast in pure python form or compiled code, tools that use iterators are
				38	more memory efficient (and faster) than their list based counterparts. Adopting
				39	the principles of just-in-time manufacturing, they create data when and where
				40	needed instead of consuming memory with the computer equivalent of "inventory".
				41
				42	The performance advantage of iterators becomes more acute as the number of
				43	elements increases -- at some point, lists grow large enough to severely impact
				44	memory cache performance and start running slowly.
				45
				46
				47	.. seealso::
				48
				49	The Standard ML Basis Library, `The Standard ML Basis Library
				50	<http://www.standardml.org/Basis/>`_.
				51
				52	Haskell, A Purely Functional Language, `Definition of Haskell and the Standard
				53	Libraries <http://www.haskell.org/definition/>`_.
				54
				55
				56	.. _itertools-functions:
				57
				58	Itertool functions
				59	------------------
				60
				61	The following module functions all construct and return iterators. Some provide
				62	streams of infinite length, so they should only be accessed by functions or
				63	loops that truncate the stream.
				64
				65
				66	.. function:: chain(*iterables)
				67
				68	Make an iterator that returns elements from the first iterable until it is
				69	exhausted, then proceeds to the next iterable, until all of the iterables are
				70	exhausted. Used for treating consecutive sequences as a single sequence.
				71	Equivalent to::
				72
				73	def chain(*iterables):
				74	for it in iterables:
				75	for element in it:
				76	yield element
				77
				78
				79	.. function:: count([n])
				80
				81	Make an iterator that returns consecutive integers starting with n. If not
Raymond Hettinger	50e90e2	2007-10-04 00:20:27 +0000	[diff] [blame]	82	specified n defaults to zero. Often used as an argument to :func:`imap` to
				83	generate consecutive data points. Also, used with :func:`izip` to add sequence
				84	numbers. Equivalent to::
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	85
				86	def count(n=0):
				87	while True:
				88	yield n
				89	n += 1
				90
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	91
				92	.. function:: cycle(iterable)
				93
				94	Make an iterator returning elements from the iterable and saving a copy of each.
				95	When the iterable is exhausted, return elements from the saved copy. Repeats
				96	indefinitely. Equivalent to::
				97
				98	def cycle(iterable):
				99	saved = []
				100	for element in iterable:
				101	yield element
				102	saved.append(element)
				103	while saved:
				104	for element in saved:
				105	yield element
				106
				107	Note, this member of the toolkit may require significant auxiliary storage
				108	(depending on the length of the iterable).
				109
				110
				111	.. function:: dropwhile(predicate, iterable)
				112
				113	Make an iterator that drops elements from the iterable as long as the predicate
				114	is true; afterwards, returns every element. Note, the iterator does not produce
				115	any output until the predicate first becomes false, so it may have a lengthy
				116	start-up time. Equivalent to::
				117
				118	def dropwhile(predicate, iterable):
				119	iterable = iter(iterable)
				120	for x in iterable:
				121	if not predicate(x):
				122	yield x
				123	break
				124	for x in iterable:
				125	yield x
				126
				127
				128	.. function:: groupby(iterable[, key])
				129
				130	Make an iterator that returns consecutive keys and groups from the iterable.
				131	The key is a function computing a key value for each element. If not
				132	specified or is ``None``, key defaults to an identity function and returns
				133	the element unchanged. Generally, the iterable needs to already be sorted on
				134	the same key function.
				135
				136	The operation of :func:`groupby` is similar to the ``uniq`` filter in Unix. It
				137	generates a break or new group every time the value of the key function changes
				138	(which is why it is usually necessary to have sorted the data using the same key
				139	function). That behavior differs from SQL's GROUP BY which aggregates common
				140	elements regardless of their input order.
				141
				142	The returned group is itself an iterator that shares the underlying iterable
				143	with :func:`groupby`. Because the source is shared, when the :func:`groupby`
				144	object is advanced, the previous group is no longer visible. So, if that data
				145	is needed later, it should be stored as a list::
				146
				147	groups = []
				148	uniquekeys = []
				149	data = sorted(data, key=keyfunc)
				150	for k, g in groupby(data, keyfunc):
				151	groups.append(list(g)) # Store group iterator as a list
				152	uniquekeys.append(k)
				153
				154	:func:`groupby` is equivalent to::
				155
				156	class groupby(object):
				157	def __init__(self, iterable, key=None):
				158	if key is None:
				159	key = lambda x: x
				160	self.keyfunc = key
				161	self.it = iter(iterable)
				162	self.tgtkey = self.currkey = self.currvalue = xrange(0)
				163	def __iter__(self):
				164	return self
				165	def next(self):
				166	while self.currkey == self.tgtkey:
				167	self.currvalue = self.it.next() # Exit on StopIteration
				168	self.currkey = self.keyfunc(self.currvalue)
				169	self.tgtkey = self.currkey
				170	return (self.currkey, self._grouper(self.tgtkey))
				171	def _grouper(self, tgtkey):
				172	while self.currkey == tgtkey:
				173	yield self.currvalue
				174	self.currvalue = self.it.next() # Exit on StopIteration
				175	self.currkey = self.keyfunc(self.currvalue)
				176
				177	.. versionadded:: 2.4
				178
				179
				180	.. function:: ifilter(predicate, iterable)
				181
				182	Make an iterator that filters elements from iterable returning only those for
				183	which the predicate is ``True``. If predicate is ``None``, return the items
				184	that are true. Equivalent to::
				185
				186	def ifilter(predicate, iterable):
				187	if predicate is None:
				188	predicate = bool
				189	for x in iterable:
				190	if predicate(x):
				191	yield x
				192
				193
				194	.. function:: ifilterfalse(predicate, iterable)
				195
				196	Make an iterator that filters elements from iterable returning only those for
				197	which the predicate is ``False``. If predicate is ``None``, return the items
				198	that are false. Equivalent to::
				199
				200	def ifilterfalse(predicate, iterable):
				201	if predicate is None:
				202	predicate = bool
				203	for x in iterable:
				204	if not predicate(x):
				205	yield x
				206
				207
				208	.. function:: imap(function, *iterables)
				209
				210	Make an iterator that computes the function using arguments from each of the
				211	iterables. If function is set to ``None``, then :func:`imap` returns the
				212	arguments as a tuple. Like :func:`map` but stops when the shortest iterable is
				213	exhausted instead of filling in ``None`` for shorter iterables. The reason for
				214	the difference is that infinite iterator arguments are typically an error for
				215	:func:`map` (because the output is fully evaluated) but represent a common and
				216	useful way of supplying arguments to :func:`imap`. Equivalent to::
				217
				218	def imap(function, *iterables):
				219	iterables = map(iter, iterables)
				220	while True:
				221	args = [i.next() for i in iterables]
				222	if function is None:
				223	yield tuple(args)
				224	else:
				225	yield function(*args)
				226
				227
				228	.. function:: islice(iterable, [start,] stop [, step])
				229
				230	Make an iterator that returns selected elements from the iterable. If start is
				231	non-zero, then elements from the iterable are skipped until start is reached.
				232	Afterward, elements are returned consecutively unless step is set higher than
				233	one which results in items being skipped. If stop is ``None``, then iteration
				234	continues until the iterator is exhausted, if at all; otherwise, it stops at the
				235	specified position. Unlike regular slicing, :func:`islice` does not support
				236	negative values for start, stop, or step. Can be used to extract related
				237	fields from data where the internal structure has been flattened (for example, a
				238	multi-line report may list a name field on every third line). Equivalent to::
				239
				240	def islice(iterable, *args):
				241	s = slice(*args)
				242	it = iter(xrange(s.start or 0, s.stop or sys.maxint, s.step or 1))
				243	nexti = it.next()
				244	for i, element in enumerate(iterable):
				245	if i == nexti:
				246	yield element
				247	nexti = it.next()
				248
				249	If start is ``None``, then iteration starts at zero. If step is ``None``,
				250	then the step defaults to one.
				251
				252	.. versionchanged:: 2.5
				253	accept ``None`` values for default start and step.
				254
				255
				256	.. function:: izip(*iterables)
				257
				258	Make an iterator that aggregates elements from each of the iterables. Like
				259	:func:`zip` except that it returns an iterator instead of a list. Used for
				260	lock-step iteration over several iterables at a time. Equivalent to::
				261
				262	def izip(*iterables):
				263	iterables = map(iter, iterables)
				264	while iterables:
				265	result = [it.next() for it in iterables]
				266	yield tuple(result)
				267
				268	.. versionchanged:: 2.4
				269	When no iterables are specified, returns a zero length iterator instead of
				270	raising a :exc:`TypeError` exception.
				271
				272	Note, the left-to-right evaluation order of the iterables is guaranteed. This
				273	makes possible an idiom for clustering a data series into n-length groups using
				274	``izip([iter(s)]n)``. For data that doesn't fit n-length groups exactly, the
				275	last tuple can be pre-padded with fill values using ``izip(*[chain(s,
				276	[None](n-1))]n)``.
				277
				278	Note, when :func:`izip` is used with unequal length inputs, subsequent
				279	iteration over the longer iterables cannot reliably be continued after
				280	:func:`izip` terminates. Potentially, up to one entry will be missing from
				281	each of the left-over iterables. This occurs because a value is fetched from
				282	each iterator in turn, but the process ends when one of the iterators
				283	terminates. This leaves the last fetched values in limbo (they cannot be
				284	returned in a final, incomplete tuple and they are cannot be pushed back into
				285	the iterator for retrieval with ``it.next()``). In general, :func:`izip`
				286	should only be used with unequal length inputs when you don't care about
				287	trailing, unmatched values from the longer iterables.
				288
				289
				290	.. function:: izip_longest(*iterables[, fillvalue])
				291
				292	Make an iterator that aggregates elements from each of the iterables. If the
				293	iterables are of uneven length, missing values are filled-in with fillvalue.
				294	Iteration continues until the longest iterable is exhausted. Equivalent to::
				295
				296	def izip_longest(args, *kwds):
				297	fillvalue = kwds.get('fillvalue')
				298	def sentinel(counter = ([fillvalue]*(len(args)-1)).pop):
				299	yield counter() # yields the fillvalue, or raises IndexError
				300	fillers = repeat(fillvalue)
				301	iters = [chain(it, sentinel(), fillers) for it in args]
				302	try:
				303	for tup in izip(*iters):
				304	yield tup
				305	except IndexError:
				306	pass
				307
				308	If one of the iterables is potentially infinite, then the :func:`izip_longest`
				309	function should be wrapped with something that limits the number of calls (for
				310	example :func:`islice` or :func:`takewhile`).
				311
				312	.. versionadded:: 2.6
				313
				314
				315	.. function:: repeat(object[, times])
				316
				317	Make an iterator that returns object over and over again. Runs indefinitely
				318	unless the times argument is specified. Used as argument to :func:`imap` for
				319	invariant parameters to the called function. Also used with :func:`izip` to
				320	create an invariant part of a tuple record. Equivalent to::
				321
				322	def repeat(object, times=None):
				323	if times is None:
				324	while True:
				325	yield object
				326	else:
				327	for i in xrange(times):
				328	yield object
				329
				330
				331	.. function:: starmap(function, iterable)
				332
				333	Make an iterator that computes the function using arguments tuples obtained from
				334	the iterable. Used instead of :func:`imap` when argument parameters are already
				335	grouped in tuples from a single iterable (the data has been "pre-zipped"). The
				336	difference between :func:`imap` and :func:`starmap` parallels the distinction
				337	between ``function(a,b)`` and ``function(*c)``. Equivalent to::
				338
				339	def starmap(function, iterable):
				340	iterable = iter(iterable)
				341	while True:
				342	yield function(*iterable.next())
				343
				344
				345	.. function:: takewhile(predicate, iterable)
				346
				347	Make an iterator that returns elements from the iterable as long as the
				348	predicate is true. Equivalent to::
				349
				350	def takewhile(predicate, iterable):
				351	for x in iterable:
				352	if predicate(x):
				353	yield x
				354	else:
				355	break
				356
				357
				358	.. function:: tee(iterable[, n=2])
				359
				360	Return n independent iterators from a single iterable. The case where ``n==2``
				361	is equivalent to::
				362
				363	def tee(iterable):
				364	def gen(next, data={}, cnt=[0]):
				365	for i in count():
				366	if i == cnt[0]:
				367	item = data[i] = next()
				368	cnt[0] += 1
				369	else:
				370	item = data.pop(i)
				371	yield item
				372	it = iter(iterable)
				373	return (gen(it.next), gen(it.next))
				374
				375	Note, once :func:`tee` has made a split, the original iterable should not be
				376	used anywhere else; otherwise, the iterable could get advanced without the tee
				377	objects being informed.
				378
				379	Note, this member of the toolkit may require significant auxiliary storage
				380	(depending on how much temporary data needs to be stored). In general, if one
				381	iterator is going to use most or all of the data before the other iterator, it
				382	is faster to use :func:`list` instead of :func:`tee`.
				383
				384	.. versionadded:: 2.4
				385
				386
				387	.. _itertools-example:
				388
				389	Examples
				390	--------
				391
				392	The following examples show common uses for each tool and demonstrate ways they
				393	can be combined. ::
				394
				395	>>> amounts = [120.15, 764.05, 823.14]
				396	>>> for checknum, amount in izip(count(1200), amounts):
				397	... print 'Check %d is for $%.2f' % (checknum, amount)
				398	...
				399	Check 1200 is for $120.15
				400	Check 1201 is for $764.05
				401	Check 1202 is for $823.14
				402
				403	>>> import operator
				404	>>> for cube in imap(operator.pow, xrange(1,5), repeat(3)):
				405	... print cube
				406	...
				407	1
				408	8
				409	27
				410	64
				411
				412	>>> reportlines = ['EuroPython', 'Roster', '', 'alex', '', 'laura',
				413	... '', 'martin', '', 'walter', '', 'mark']
				414	>>> for name in islice(reportlines, 3, None, 2):
				415	... print name.title()
				416	...
				417	Alex
				418	Laura
				419	Martin
				420	Walter
				421	Mark
				422
				423	# Show a dictionary sorted and grouped by value
				424	>>> from operator import itemgetter
				425	>>> d = dict(a=1, b=2, c=1, d=2, e=1, f=2, g=3)
				426	>>> di = sorted(d.iteritems(), key=itemgetter(1))
				427	>>> for k, g in groupby(di, key=itemgetter(1)):
				428	... print k, map(itemgetter(0), g)
				429	...
				430	1 ['a', 'c', 'e']
				431	2 ['b', 'd', 'f']
				432	3 ['g']
				433
				434	# Find runs of consecutive numbers using groupby. The key to the solution
				435	# is differencing with a range so that consecutive numbers all appear in
				436	# same group.
				437	>>> data = [ 1, 4,5,6, 10, 15,16,17,18, 22, 25,26,27,28]
				438	>>> for k, g in groupby(enumerate(data), lambda (i,x):i-x):
				439	... print map(operator.itemgetter(1), g)
				440	...
				441	[1]
				442	[4, 5, 6]
				443	[10]
				444	[15, 16, 17, 18]
				445	[22]
				446	[25, 26, 27, 28]
				447
				448
				449
				450	.. _itertools-recipes:
				451
				452	Recipes
				453	-------
				454
				455	This section shows recipes for creating an extended toolset using the existing
				456	itertools as building blocks.
				457
				458	The extended tools offer the same high performance as the underlying toolset.
				459	The superior memory performance is kept by processing elements one at a time
				460	rather than bringing the whole iterable into memory all at once. Code volume is
				461	kept small by linking the tools together in a functional style which helps
				462	eliminate temporary variables. High speed is retained by preferring
Georg Brandl	cf3fb25	2007-10-21 10:52:38 +0000	[diff] [blame]	463	"vectorized" building blocks over the use of for-loops and :term:`generator`\s
				464	which incur interpreter overhead. ::
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	465
				466	def take(n, seq):
				467	return list(islice(seq, n))
				468
				469	def enumerate(iterable):
				470	return izip(count(), iterable)
				471
				472	def tabulate(function):
				473	"Return function(0), function(1), ..."
				474	return imap(function, count())
				475
				476	def iteritems(mapping):
				477	return izip(mapping.iterkeys(), mapping.itervalues())
				478
				479	def nth(iterable, n):
				480	"Returns the nth item or raise StopIteration"
				481	return islice(iterable, n, None).next()
				482
				483	def all(seq, pred=None):
				484	"Returns True if pred(x) is true for every element in the iterable"
				485	for elem in ifilterfalse(pred, seq):
				486	return False
				487	return True
				488
				489	def any(seq, pred=None):
				490	"Returns True if pred(x) is true for at least one element in the iterable"
				491	for elem in ifilter(pred, seq):
				492	return True
				493	return False
				494
				495	def no(seq, pred=None):
				496	"Returns True if pred(x) is false for every element in the iterable"
				497	for elem in ifilter(pred, seq):
				498	return False
				499	return True
				500
				501	def quantify(seq, pred=None):
				502	"Count how many times the predicate is true in the sequence"
				503	return sum(imap(pred, seq))
				504
				505	def padnone(seq):
				506	"""Returns the sequence elements and then returns None indefinitely.
				507
				508	Useful for emulating the behavior of the built-in map() function.
				509	"""
				510	return chain(seq, repeat(None))
				511
				512	def ncycles(seq, n):
				513	"Returns the sequence elements n times"
				514	return chain(*repeat(seq, n))
				515
				516	def dotproduct(vec1, vec2):
				517	return sum(imap(operator.mul, vec1, vec2))
				518
				519	def flatten(listOfLists):
				520	return list(chain(*listOfLists))
				521
				522	def repeatfunc(func, times=None, *args):
				523	"""Repeat calls to func with specified arguments.
				524
				525	Example: repeatfunc(random.random)
				526	"""
				527	if times is None:
				528	return starmap(func, repeat(args))
				529	else:
				530	return starmap(func, repeat(args, times))
				531
				532	def pairwise(iterable):
				533	"s -> (s0,s1), (s1,s2), (s2, s3), ..."
				534	a, b = tee(iterable)
				535	try:
				536	b.next()
				537	except StopIteration:
				538	pass
				539	return izip(a, b)
				540
				541	def grouper(n, iterable, padvalue=None):
				542	"grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
				543	return izip([chain(iterable, repeat(padvalue, n-1))]n)
				544
				545
				546