blob: ca2b1ff221d1712db856847e1048f3e07be8f324 [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001
2:mod:`itertools` --- Functions creating iterators for efficient looping
3=======================================================================
4
5.. module:: itertools
6 :synopsis: Functions creating iterators for efficient looping.
7.. moduleauthor:: Raymond Hettinger <python@rcn.com>
8.. sectionauthor:: Raymond Hettinger <python@rcn.com>
9
10
Georg Brandl9afde1c2007-11-01 20:32:30 +000011This module implements a number of :term:`iterator` building blocks inspired by
Georg Brandl116aa622007-08-15 14:28:22 +000012constructs from the Haskell and SML programming languages. Each has been recast
13in a form suitable for Python.
14
15The module standardizes a core set of fast, memory efficient tools that are
16useful by themselves or in combination. Standardization helps avoid the
17readability and reliability problems which arise when many different individuals
18create their own slightly varying implementations, each with their own quirks
19and naming conventions.
20
21The tools are designed to combine readily with one another. This makes it easy
22to construct more specialized tools succinctly and efficiently in pure Python.
23
24For instance, SML provides a tabulation tool: ``tabulate(f)`` which produces a
25sequence ``f(0), f(1), ...``. This toolbox provides :func:`imap` and
26:func:`count` which can be combined to form ``imap(f, count())`` and produce an
27equivalent result.
28
29Likewise, the functional tools are designed to work well with the high-speed
30functions provided by the :mod:`operator` module.
31
32The module author welcomes suggestions for other basic building blocks to be
33added to future versions of the module.
34
35Whether cast in pure python form or compiled code, tools that use iterators are
36more memory efficient (and faster) than their list based counterparts. Adopting
37the principles of just-in-time manufacturing, they create data when and where
38needed instead of consuming memory with the computer equivalent of "inventory".
39
40The performance advantage of iterators becomes more acute as the number of
41elements increases -- at some point, lists grow large enough to severely impact
42memory cache performance and start running slowly.
43
44
45.. seealso::
46
47 The Standard ML Basis Library, `The Standard ML Basis Library
48 <http://www.standardml.org/Basis/>`_.
49
50 Haskell, A Purely Functional Language, `Definition of Haskell and the Standard
51 Libraries <http://www.haskell.org/definition/>`_.
52
53
54.. _itertools-functions:
55
56Itertool functions
57------------------
58
59The following module functions all construct and return iterators. Some provide
60streams of infinite length, so they should only be accessed by functions or
61loops that truncate the stream.
62
63
64.. function:: chain(*iterables)
65
66 Make an iterator that returns elements from the first iterable until it is
67 exhausted, then proceeds to the next iterable, until all of the iterables are
68 exhausted. Used for treating consecutive sequences as a single sequence.
69 Equivalent to::
70
71 def chain(*iterables):
72 for it in iterables:
73 for element in it:
74 yield element
75
76
77.. function:: count([n])
78
79 Make an iterator that returns consecutive integers starting with *n*. If not
Georg Brandl9afde1c2007-11-01 20:32:30 +000080 specified *n* defaults to zero. Often used as an argument to :func:`imap` to
81 generate consecutive data points. Also, used with :func:`izip` to add sequence
82 numbers. Equivalent to::
Georg Brandl116aa622007-08-15 14:28:22 +000083
84 def count(n=0):
85 while True:
86 yield n
87 n += 1
88
Georg Brandl116aa622007-08-15 14:28:22 +000089
90.. function:: cycle(iterable)
91
92 Make an iterator returning elements from the iterable and saving a copy of each.
93 When the iterable is exhausted, return elements from the saved copy. Repeats
94 indefinitely. Equivalent to::
95
96 def cycle(iterable):
97 saved = []
98 for element in iterable:
99 yield element
100 saved.append(element)
101 while saved:
102 for element in saved:
103 yield element
104
105 Note, this member of the toolkit may require significant auxiliary storage
106 (depending on the length of the iterable).
107
108
109.. function:: dropwhile(predicate, iterable)
110
111 Make an iterator that drops elements from the iterable as long as the predicate
112 is true; afterwards, returns every element. Note, the iterator does not produce
113 *any* output until the predicate first becomes false, so it may have a lengthy
114 start-up time. Equivalent to::
115
116 def dropwhile(predicate, iterable):
117 iterable = iter(iterable)
118 for x in iterable:
119 if not predicate(x):
120 yield x
121 break
122 for x in iterable:
123 yield x
124
125
126.. function:: groupby(iterable[, key])
127
128 Make an iterator that returns consecutive keys and groups from the *iterable*.
129 The *key* is a function computing a key value for each element. If not
130 specified or is ``None``, *key* defaults to an identity function and returns
131 the element unchanged. Generally, the iterable needs to already be sorted on
132 the same key function.
133
134 The operation of :func:`groupby` is similar to the ``uniq`` filter in Unix. It
135 generates a break or new group every time the value of the key function changes
136 (which is why it is usually necessary to have sorted the data using the same key
137 function). That behavior differs from SQL's GROUP BY which aggregates common
138 elements regardless of their input order.
139
140 The returned group is itself an iterator that shares the underlying iterable
141 with :func:`groupby`. Because the source is shared, when the :func:`groupby`
142 object is advanced, the previous group is no longer visible. So, if that data
143 is needed later, it should be stored as a list::
144
145 groups = []
146 uniquekeys = []
147 data = sorted(data, key=keyfunc)
148 for k, g in groupby(data, keyfunc):
149 groups.append(list(g)) # Store group iterator as a list
150 uniquekeys.append(k)
151
152 :func:`groupby` is equivalent to::
153
154 class groupby(object):
155 def __init__(self, iterable, key=None):
156 if key is None:
157 key = lambda x: x
158 self.keyfunc = key
159 self.it = iter(iterable)
Christian Heimes5b5e81c2007-12-31 16:14:33 +0000160 self.tgtkey = self.currkey = self.currvalue = object()
Georg Brandl116aa622007-08-15 14:28:22 +0000161 def __iter__(self):
162 return self
163 def __next__(self):
164 while self.currkey == self.tgtkey:
165 self.currvalue = next(self.it) # Exit on StopIteration
166 self.currkey = self.keyfunc(self.currvalue)
167 self.tgtkey = self.currkey
168 return (self.currkey, self._grouper(self.tgtkey))
169 def _grouper(self, tgtkey):
170 while self.currkey == tgtkey:
171 yield self.currvalue
172 self.currvalue = next(self.it) # Exit on StopIteration
173 self.currkey = self.keyfunc(self.currvalue)
174
Georg Brandl116aa622007-08-15 14:28:22 +0000175
176.. function:: ifilter(predicate, iterable)
177
178 Make an iterator that filters elements from iterable returning only those for
179 which the predicate is ``True``. If *predicate* is ``None``, return the items
180 that are true. Equivalent to::
181
182 def ifilter(predicate, iterable):
183 if predicate is None:
184 predicate = bool
185 for x in iterable:
186 if predicate(x):
187 yield x
188
189
190.. function:: ifilterfalse(predicate, iterable)
191
192 Make an iterator that filters elements from iterable returning only those for
193 which the predicate is ``False``. If *predicate* is ``None``, return the items
194 that are false. Equivalent to::
195
196 def ifilterfalse(predicate, iterable):
197 if predicate is None:
198 predicate = bool
199 for x in iterable:
200 if not predicate(x):
201 yield x
202
203
204.. function:: imap(function, *iterables)
205
206 Make an iterator that computes the function using arguments from each of the
207 iterables. If *function* is set to ``None``, then :func:`imap` returns the
208 arguments as a tuple. Like :func:`map` but stops when the shortest iterable is
209 exhausted instead of filling in ``None`` for shorter iterables. The reason for
210 the difference is that infinite iterator arguments are typically an error for
211 :func:`map` (because the output is fully evaluated) but represent a common and
212 useful way of supplying arguments to :func:`imap`. Equivalent to::
213
214 def imap(function, *iterables):
215 iterables = map(iter, iterables)
216 while True:
217 args = [next(i) for i in iterables]
218 if function is None:
219 yield tuple(args)
220 else:
221 yield function(*args)
222
223
224.. function:: islice(iterable, [start,] stop [, step])
225
226 Make an iterator that returns selected elements from the iterable. If *start* is
227 non-zero, then elements from the iterable are skipped until start is reached.
228 Afterward, elements are returned consecutively unless *step* is set higher than
229 one which results in items being skipped. If *stop* is ``None``, then iteration
230 continues until the iterator is exhausted, if at all; otherwise, it stops at the
231 specified position. Unlike regular slicing, :func:`islice` does not support
232 negative values for *start*, *stop*, or *step*. Can be used to extract related
233 fields from data where the internal structure has been flattened (for example, a
234 multi-line report may list a name field on every third line). Equivalent to::
235
236 def islice(iterable, *args):
237 s = slice(*args)
Christian Heimesa37d4c62007-12-04 23:02:19 +0000238 it = iter(range(s.start or 0, s.stop or sys.maxsize, s.step or 1))
Georg Brandl116aa622007-08-15 14:28:22 +0000239 nexti = next(it)
240 for i, element in enumerate(iterable):
241 if i == nexti:
242 yield element
243 nexti = next(it)
244
245 If *start* is ``None``, then iteration starts at zero. If *step* is ``None``,
246 then the step defaults to one.
247
Georg Brandl116aa622007-08-15 14:28:22 +0000248
249.. function:: izip(*iterables)
250
251 Make an iterator that aggregates elements from each of the iterables. Like
252 :func:`zip` except that it returns an iterator instead of a list. Used for
253 lock-step iteration over several iterables at a time. Equivalent to::
254
255 def izip(*iterables):
256 iterables = map(iter, iterables)
257 while iterables:
258 result = [next(it) for it in iterables]
259 yield tuple(result)
260
Georg Brandl55ac8f02007-09-01 13:51:09 +0000261 When no iterables are specified, return a zero length iterator.
Georg Brandl116aa622007-08-15 14:28:22 +0000262
263 Note, the left-to-right evaluation order of the iterables is guaranteed. This
264 makes possible an idiom for clustering a data series into n-length groups using
265 ``izip(*[iter(s)]*n)``. For data that doesn't fit n-length groups exactly, the
266 last tuple can be pre-padded with fill values using ``izip(*[chain(s,
267 [None]*(n-1))]*n)``.
268
269 Note, when :func:`izip` is used with unequal length inputs, subsequent
270 iteration over the longer iterables cannot reliably be continued after
271 :func:`izip` terminates. Potentially, up to one entry will be missing from
272 each of the left-over iterables. This occurs because a value is fetched from
273 each iterator in- turn, but the process ends when one of the iterators
274 terminates. This leaves the last fetched values in limbo (they cannot be
275 returned in a final, incomplete tuple and they are cannot be pushed back into
276 the iterator for retrieval with ``next(it)``). In general, :func:`izip`
277 should only be used with unequal length inputs when you don't care about
278 trailing, unmatched values from the longer iterables.
279
280
281.. function:: izip_longest(*iterables[, fillvalue])
282
283 Make an iterator that aggregates elements from each of the iterables. If the
284 iterables are of uneven length, missing values are filled-in with *fillvalue*.
285 Iteration continues until the longest iterable is exhausted. Equivalent to::
286
287 def izip_longest(*args, **kwds):
288 fillvalue = kwds.get('fillvalue')
289 def sentinel(counter = ([fillvalue]*(len(args)-1)).pop):
290 yield counter() # yields the fillvalue, or raises IndexError
291 fillers = repeat(fillvalue)
292 iters = [chain(it, sentinel(), fillers) for it in args]
293 try:
294 for tup in izip(*iters):
295 yield tup
296 except IndexError:
297 pass
298
299 If one of the iterables is potentially infinite, then the :func:`izip_longest`
300 function should be wrapped with something that limits the number of calls (for
301 example :func:`islice` or :func:`takewhile`).
302
Georg Brandl116aa622007-08-15 14:28:22 +0000303
304.. function:: repeat(object[, times])
305
306 Make an iterator that returns *object* over and over again. Runs indefinitely
307 unless the *times* argument is specified. Used as argument to :func:`imap` for
308 invariant parameters to the called function. Also used with :func:`izip` to
309 create an invariant part of a tuple record. Equivalent to::
310
311 def repeat(object, times=None):
312 if times is None:
313 while True:
314 yield object
315 else:
316 for i in range(times):
317 yield object
318
319
320.. function:: starmap(function, iterable)
321
Christian Heimes679db4a2008-01-18 09:56:22 +0000322 Make an iterator that computes the function using arguments obtained from
Georg Brandl116aa622007-08-15 14:28:22 +0000323 the iterable. Used instead of :func:`imap` when argument parameters are already
324 grouped in tuples from a single iterable (the data has been "pre-zipped"). The
325 difference between :func:`imap` and :func:`starmap` parallels the distinction
326 between ``function(a,b)`` and ``function(*c)``. Equivalent to::
327
328 def starmap(function, iterable):
Christian Heimes679db4a2008-01-18 09:56:22 +0000329 for args in iterable:
330 yield function(*args)
331
332 .. versionchanged:: 2.6
333 Previously, :func:`starmap` required the function arguments to be tuples.
334 Now, any iterable is allowed.
Georg Brandl116aa622007-08-15 14:28:22 +0000335
336
337.. function:: takewhile(predicate, iterable)
338
339 Make an iterator that returns elements from the iterable as long as the
340 predicate is true. Equivalent to::
341
342 def takewhile(predicate, iterable):
343 for x in iterable:
344 if predicate(x):
345 yield x
346 else:
347 break
348
349
350.. function:: tee(iterable[, n=2])
351
352 Return *n* independent iterators from a single iterable. The case where ``n==2``
353 is equivalent to::
354
355 def tee(iterable):
Christian Heimes5b5e81c2007-12-31 16:14:33 +0000356 def gen(next, data={}):
Georg Brandl116aa622007-08-15 14:28:22 +0000357 for i in count():
Christian Heimes5b5e81c2007-12-31 16:14:33 +0000358 if i in data:
359 yield data.pop(i)
Georg Brandl116aa622007-08-15 14:28:22 +0000360 else:
Christian Heimes5b5e81c2007-12-31 16:14:33 +0000361 data[i] = next()
362 yield data[i]
Georg Brandl116aa622007-08-15 14:28:22 +0000363 it = iter(iterable)
364 return (gen(it.__next__), gen(it.__next__))
365
366 Note, once :func:`tee` has made a split, the original *iterable* should not be
367 used anywhere else; otherwise, the *iterable* could get advanced without the tee
368 objects being informed.
369
370 Note, this member of the toolkit may require significant auxiliary storage
371 (depending on how much temporary data needs to be stored). In general, if one
372 iterator is going to use most or all of the data before the other iterator, it
373 is faster to use :func:`list` instead of :func:`tee`.
374
Georg Brandl116aa622007-08-15 14:28:22 +0000375
376.. _itertools-example:
377
378Examples
379--------
380
381The following examples show common uses for each tool and demonstrate ways they
382can be combined. ::
383
384 >>> amounts = [120.15, 764.05, 823.14]
385 >>> for checknum, amount in izip(count(1200), amounts):
Georg Brandl6911e3c2007-09-04 07:15:32 +0000386 ... print('Check %d is for $%.2f' % (checknum, amount))
Georg Brandl116aa622007-08-15 14:28:22 +0000387 ...
388 Check 1200 is for $120.15
389 Check 1201 is for $764.05
390 Check 1202 is for $823.14
391
392 >>> import operator
393 >>> for cube in imap(operator.pow, range(1,5), repeat(3)):
Georg Brandl6911e3c2007-09-04 07:15:32 +0000394 ... print(cube)
Georg Brandl116aa622007-08-15 14:28:22 +0000395 ...
396 1
397 8
398 27
399 64
400
401 >>> reportlines = ['EuroPython', 'Roster', '', 'alex', '', 'laura',
402 ... '', 'martin', '', 'walter', '', 'mark']
403 >>> for name in islice(reportlines, 3, None, 2):
Georg Brandl6911e3c2007-09-04 07:15:32 +0000404 ... print(name.title())
Georg Brandl116aa622007-08-15 14:28:22 +0000405 ...
406 Alex
407 Laura
408 Martin
409 Walter
410 Mark
411
412 # Show a dictionary sorted and grouped by value
413 >>> from operator import itemgetter
414 >>> d = dict(a=1, b=2, c=1, d=2, e=1, f=2, g=3)
Fred Drake2e748782007-09-04 17:33:11 +0000415 >>> di = sorted(d.items(), key=itemgetter(1))
Georg Brandl116aa622007-08-15 14:28:22 +0000416 >>> for k, g in groupby(di, key=itemgetter(1)):
Georg Brandl6911e3c2007-09-04 07:15:32 +0000417 ... print(k, map(itemgetter(0), g))
Georg Brandl116aa622007-08-15 14:28:22 +0000418 ...
419 1 ['a', 'c', 'e']
420 2 ['b', 'd', 'f']
421 3 ['g']
422
423 # Find runs of consecutive numbers using groupby. The key to the solution
424 # is differencing with a range so that consecutive numbers all appear in
425 # same group.
426 >>> data = [ 1, 4,5,6, 10, 15,16,17,18, 22, 25,26,27,28]
427 >>> for k, g in groupby(enumerate(data), lambda t:t[0]-t[1]):
Georg Brandl6911e3c2007-09-04 07:15:32 +0000428 ... print(map(operator.itemgetter(1), g))
Georg Brandl116aa622007-08-15 14:28:22 +0000429 ...
430 [1]
431 [4, 5, 6]
432 [10]
433 [15, 16, 17, 18]
434 [22]
435 [25, 26, 27, 28]
436
437
438
439.. _itertools-recipes:
440
441Recipes
442-------
443
444This section shows recipes for creating an extended toolset using the existing
445itertools as building blocks.
446
447The extended tools offer the same high performance as the underlying toolset.
448The superior memory performance is kept by processing elements one at a time
449rather than bringing the whole iterable into memory all at once. Code volume is
450kept small by linking the tools together in a functional style which helps
451eliminate temporary variables. High speed is retained by preferring
Georg Brandl9afde1c2007-11-01 20:32:30 +0000452"vectorized" building blocks over the use of for-loops and :term:`generator`\s
453which incur interpreter overhead. ::
Georg Brandl116aa622007-08-15 14:28:22 +0000454
455 def take(n, seq):
456 return list(islice(seq, n))
457
458 def enumerate(iterable):
459 return izip(count(), iterable)
460
461 def tabulate(function):
462 "Return function(0), function(1), ..."
463 return imap(function, count())
464
Georg Brandl116aa622007-08-15 14:28:22 +0000465 def nth(iterable, n):
466 "Returns the nth item or raise StopIteration"
467 return islice(iterable, n, None).next()
468
469 def all(seq, pred=None):
470 "Returns True if pred(x) is true for every element in the iterable"
471 for elem in ifilterfalse(pred, seq):
472 return False
473 return True
474
475 def any(seq, pred=None):
476 "Returns True if pred(x) is true for at least one element in the iterable"
477 for elem in ifilter(pred, seq):
478 return True
479 return False
480
481 def no(seq, pred=None):
482 "Returns True if pred(x) is false for every element in the iterable"
483 for elem in ifilter(pred, seq):
484 return False
485 return True
486
487 def quantify(seq, pred=None):
488 "Count how many times the predicate is true in the sequence"
489 return sum(imap(pred, seq))
490
491 def padnone(seq):
492 """Returns the sequence elements and then returns None indefinitely.
493
494 Useful for emulating the behavior of the built-in map() function.
495 """
496 return chain(seq, repeat(None))
497
498 def ncycles(seq, n):
499 "Returns the sequence elements n times"
500 return chain(*repeat(seq, n))
501
502 def dotproduct(vec1, vec2):
503 return sum(imap(operator.mul, vec1, vec2))
504
505 def flatten(listOfLists):
506 return list(chain(*listOfLists))
507
508 def repeatfunc(func, times=None, *args):
509 """Repeat calls to func with specified arguments.
510
511 Example: repeatfunc(random.random)
512 """
513 if times is None:
514 return starmap(func, repeat(args))
515 else:
516 return starmap(func, repeat(args, times))
517
518 def pairwise(iterable):
519 "s -> (s0,s1), (s1,s2), (s2, s3), ..."
520 a, b = tee(iterable)
521 next(b, None)
522 return izip(a, b)
523
524 def grouper(n, iterable, padvalue=None):
525 "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
526 return izip(*[chain(iterable, repeat(padvalue, n-1))]*n)
527
528
529