blob: 73abb89d651cc9fa13a364a6a6387ab2736fe331 [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001
2:mod:`itertools` --- Functions creating iterators for efficient looping
3=======================================================================
4
5.. module:: itertools
6 :synopsis: Functions creating iterators for efficient looping.
7.. moduleauthor:: Raymond Hettinger <python@rcn.com>
8.. sectionauthor:: Raymond Hettinger <python@rcn.com>
9
10
Georg Brandl9afde1c2007-11-01 20:32:30 +000011This module implements a number of :term:`iterator` building blocks inspired by
Georg Brandl116aa622007-08-15 14:28:22 +000012constructs from the Haskell and SML programming languages. Each has been recast
13in a form suitable for Python.
14
15The module standardizes a core set of fast, memory efficient tools that are
16useful by themselves or in combination. Standardization helps avoid the
17readability and reliability problems which arise when many different individuals
18create their own slightly varying implementations, each with their own quirks
19and naming conventions.
20
21The tools are designed to combine readily with one another. This makes it easy
22to construct more specialized tools succinctly and efficiently in pure Python.
23
24For instance, SML provides a tabulation tool: ``tabulate(f)`` which produces a
25sequence ``f(0), f(1), ...``. This toolbox provides :func:`imap` and
26:func:`count` which can be combined to form ``imap(f, count())`` and produce an
27equivalent result.
28
29Likewise, the functional tools are designed to work well with the high-speed
30functions provided by the :mod:`operator` module.
31
32The module author welcomes suggestions for other basic building blocks to be
33added to future versions of the module.
34
35Whether cast in pure python form or compiled code, tools that use iterators are
36more memory efficient (and faster) than their list based counterparts. Adopting
37the principles of just-in-time manufacturing, they create data when and where
38needed instead of consuming memory with the computer equivalent of "inventory".
39
40The performance advantage of iterators becomes more acute as the number of
41elements increases -- at some point, lists grow large enough to severely impact
42memory cache performance and start running slowly.
43
44
45.. seealso::
46
47 The Standard ML Basis Library, `The Standard ML Basis Library
48 <http://www.standardml.org/Basis/>`_.
49
50 Haskell, A Purely Functional Language, `Definition of Haskell and the Standard
51 Libraries <http://www.haskell.org/definition/>`_.
52
53
54.. _itertools-functions:
55
56Itertool functions
57------------------
58
59The following module functions all construct and return iterators. Some provide
60streams of infinite length, so they should only be accessed by functions or
61loops that truncate the stream.
62
63
64.. function:: chain(*iterables)
65
66 Make an iterator that returns elements from the first iterable until it is
67 exhausted, then proceeds to the next iterable, until all of the iterables are
68 exhausted. Used for treating consecutive sequences as a single sequence.
69 Equivalent to::
70
71 def chain(*iterables):
72 for it in iterables:
73 for element in it:
74 yield element
75
76
Christian Heimes836baa52008-02-26 08:18:30 +000077.. function:: combinations(iterable, r)
78
79 Return successive *r* length combinations of elements in the *iterable*.
80
81 Combinations are emitted in a lexicographic sort order. So, if the
82 input *iterable* is sorted, the combination tuples will be produced
83 in sorted order.
84
85 Elements are treated as unique based on their position, not on their
86 value. So if the input elements are unique, there will be no repeat
87 values within a single combination.
88
89 Each result tuple is ordered to match the input order. So, every
90 combination is a subsequence of the input *iterable*.
91
92 Example: ``combinations(range(4), 3) --> (0,1,2), (0,1,3), (0,2,3), (1,2,3)``
93
94 Equivalent to::
95
96 def combinations(iterable, r):
97 pool = tuple(iterable)
Christian Heimes380f7f22008-02-28 11:19:05 +000098 n = len(pool)
99 assert 0 <= r <= n
100 vec = range(r)
101 yield tuple(pool[i] for i in vec)
102 while 1:
103 for i in reversed(range(r)):
104 if vec[i] != i + n - r:
Christian Heimes836baa52008-02-26 08:18:30 +0000105 break
Christian Heimes380f7f22008-02-28 11:19:05 +0000106 else:
107 return
108 vec[i] += 1
109 for j in range(i+1, r):
110 vec[j] = vec[j-1] + 1
111 yield tuple(pool[i] for i in vec)
Christian Heimes836baa52008-02-26 08:18:30 +0000112
113 .. versionadded:: 2.6
114
Georg Brandl116aa622007-08-15 14:28:22 +0000115.. function:: count([n])
116
117 Make an iterator that returns consecutive integers starting with *n*. If not
Georg Brandl9afde1c2007-11-01 20:32:30 +0000118 specified *n* defaults to zero. Often used as an argument to :func:`imap` to
119 generate consecutive data points. Also, used with :func:`izip` to add sequence
120 numbers. Equivalent to::
Georg Brandl116aa622007-08-15 14:28:22 +0000121
122 def count(n=0):
123 while True:
124 yield n
125 n += 1
126
Georg Brandl116aa622007-08-15 14:28:22 +0000127
128.. function:: cycle(iterable)
129
130 Make an iterator returning elements from the iterable and saving a copy of each.
131 When the iterable is exhausted, return elements from the saved copy. Repeats
132 indefinitely. Equivalent to::
133
134 def cycle(iterable):
135 saved = []
136 for element in iterable:
137 yield element
138 saved.append(element)
139 while saved:
140 for element in saved:
141 yield element
142
143 Note, this member of the toolkit may require significant auxiliary storage
144 (depending on the length of the iterable).
145
146
147.. function:: dropwhile(predicate, iterable)
148
149 Make an iterator that drops elements from the iterable as long as the predicate
150 is true; afterwards, returns every element. Note, the iterator does not produce
151 *any* output until the predicate first becomes false, so it may have a lengthy
152 start-up time. Equivalent to::
153
154 def dropwhile(predicate, iterable):
155 iterable = iter(iterable)
156 for x in iterable:
157 if not predicate(x):
158 yield x
159 break
160 for x in iterable:
161 yield x
162
163
164.. function:: groupby(iterable[, key])
165
166 Make an iterator that returns consecutive keys and groups from the *iterable*.
167 The *key* is a function computing a key value for each element. If not
168 specified or is ``None``, *key* defaults to an identity function and returns
169 the element unchanged. Generally, the iterable needs to already be sorted on
170 the same key function.
171
172 The operation of :func:`groupby` is similar to the ``uniq`` filter in Unix. It
173 generates a break or new group every time the value of the key function changes
174 (which is why it is usually necessary to have sorted the data using the same key
175 function). That behavior differs from SQL's GROUP BY which aggregates common
176 elements regardless of their input order.
177
178 The returned group is itself an iterator that shares the underlying iterable
179 with :func:`groupby`. Because the source is shared, when the :func:`groupby`
180 object is advanced, the previous group is no longer visible. So, if that data
181 is needed later, it should be stored as a list::
182
183 groups = []
184 uniquekeys = []
185 data = sorted(data, key=keyfunc)
186 for k, g in groupby(data, keyfunc):
187 groups.append(list(g)) # Store group iterator as a list
188 uniquekeys.append(k)
189
190 :func:`groupby` is equivalent to::
191
192 class groupby(object):
193 def __init__(self, iterable, key=None):
194 if key is None:
195 key = lambda x: x
196 self.keyfunc = key
197 self.it = iter(iterable)
Christian Heimes5b5e81c2007-12-31 16:14:33 +0000198 self.tgtkey = self.currkey = self.currvalue = object()
Georg Brandl116aa622007-08-15 14:28:22 +0000199 def __iter__(self):
200 return self
201 def __next__(self):
202 while self.currkey == self.tgtkey:
203 self.currvalue = next(self.it) # Exit on StopIteration
204 self.currkey = self.keyfunc(self.currvalue)
205 self.tgtkey = self.currkey
206 return (self.currkey, self._grouper(self.tgtkey))
207 def _grouper(self, tgtkey):
208 while self.currkey == tgtkey:
209 yield self.currvalue
210 self.currvalue = next(self.it) # Exit on StopIteration
211 self.currkey = self.keyfunc(self.currvalue)
212
Georg Brandl116aa622007-08-15 14:28:22 +0000213
214.. function:: ifilter(predicate, iterable)
215
216 Make an iterator that filters elements from iterable returning only those for
217 which the predicate is ``True``. If *predicate* is ``None``, return the items
Georg Brandlf6945182008-02-01 11:56:49 +0000218 that are true. This function is the same as the built-in :func:`filter`
219 function. Equivalent to::
Georg Brandl116aa622007-08-15 14:28:22 +0000220
221 def ifilter(predicate, iterable):
222 if predicate is None:
223 predicate = bool
224 for x in iterable:
225 if predicate(x):
226 yield x
227
228
229.. function:: ifilterfalse(predicate, iterable)
230
231 Make an iterator that filters elements from iterable returning only those for
232 which the predicate is ``False``. If *predicate* is ``None``, return the items
233 that are false. Equivalent to::
234
235 def ifilterfalse(predicate, iterable):
236 if predicate is None:
237 predicate = bool
238 for x in iterable:
239 if not predicate(x):
240 yield x
241
242
243.. function:: imap(function, *iterables)
244
245 Make an iterator that computes the function using arguments from each of the
Georg Brandlf6945182008-02-01 11:56:49 +0000246 iterables. This function is the same as the built-in :func:`map` function.
247 Equivalent to::
Georg Brandl116aa622007-08-15 14:28:22 +0000248
249 def imap(function, *iterables):
Raymond Hettinger1dfde1d2008-01-22 23:25:35 +0000250 iterables = [iter(it) for it in iterables)
Georg Brandl116aa622007-08-15 14:28:22 +0000251 while True:
Raymond Hettinger1dfde1d2008-01-22 23:25:35 +0000252 args = [next(it) for it in iterables]
Christian Heimes1af737c2008-01-23 08:24:23 +0000253 if function is None:
254 yield tuple(args)
255 else:
256 yield function(*args)
Georg Brandl116aa622007-08-15 14:28:22 +0000257
258
259.. function:: islice(iterable, [start,] stop [, step])
260
261 Make an iterator that returns selected elements from the iterable. If *start* is
262 non-zero, then elements from the iterable are skipped until start is reached.
263 Afterward, elements are returned consecutively unless *step* is set higher than
264 one which results in items being skipped. If *stop* is ``None``, then iteration
265 continues until the iterator is exhausted, if at all; otherwise, it stops at the
266 specified position. Unlike regular slicing, :func:`islice` does not support
267 negative values for *start*, *stop*, or *step*. Can be used to extract related
268 fields from data where the internal structure has been flattened (for example, a
269 multi-line report may list a name field on every third line). Equivalent to::
270
271 def islice(iterable, *args):
272 s = slice(*args)
Georg Brandlf6945182008-02-01 11:56:49 +0000273 it = range(s.start or 0, s.stop or sys.maxsize, s.step or 1)
Georg Brandl116aa622007-08-15 14:28:22 +0000274 nexti = next(it)
275 for i, element in enumerate(iterable):
276 if i == nexti:
277 yield element
278 nexti = next(it)
279
280 If *start* is ``None``, then iteration starts at zero. If *step* is ``None``,
281 then the step defaults to one.
282
Georg Brandl116aa622007-08-15 14:28:22 +0000283
284.. function:: izip(*iterables)
285
286 Make an iterator that aggregates elements from each of the iterables. Like
287 :func:`zip` except that it returns an iterator instead of a list. Used for
288 lock-step iteration over several iterables at a time. Equivalent to::
289
290 def izip(*iterables):
291 iterables = map(iter, iterables)
292 while iterables:
293 result = [next(it) for it in iterables]
294 yield tuple(result)
295
Georg Brandl55ac8f02007-09-01 13:51:09 +0000296 When no iterables are specified, return a zero length iterator.
Georg Brandl116aa622007-08-15 14:28:22 +0000297
Christian Heimes1af737c2008-01-23 08:24:23 +0000298 The left-to-right evaluation order of the iterables is guaranteed. This
299 makes possible an idiom for clustering a data series into n-length groups
300 using ``izip(*[iter(s)]*n)``.
Georg Brandl116aa622007-08-15 14:28:22 +0000301
Christian Heimes1af737c2008-01-23 08:24:23 +0000302 :func:`izip` should only be used with unequal length inputs when you don't
303 care about trailing, unmatched values from the longer iterables. If those
304 values are important, use :func:`izip_longest` instead.
Georg Brandl116aa622007-08-15 14:28:22 +0000305
306
307.. function:: izip_longest(*iterables[, fillvalue])
308
309 Make an iterator that aggregates elements from each of the iterables. If the
310 iterables are of uneven length, missing values are filled-in with *fillvalue*.
311 Iteration continues until the longest iterable is exhausted. Equivalent to::
312
313 def izip_longest(*args, **kwds):
314 fillvalue = kwds.get('fillvalue')
315 def sentinel(counter = ([fillvalue]*(len(args)-1)).pop):
316 yield counter() # yields the fillvalue, or raises IndexError
317 fillers = repeat(fillvalue)
318 iters = [chain(it, sentinel(), fillers) for it in args]
319 try:
320 for tup in izip(*iters):
321 yield tup
322 except IndexError:
323 pass
324
325 If one of the iterables is potentially infinite, then the :func:`izip_longest`
326 function should be wrapped with something that limits the number of calls (for
327 example :func:`islice` or :func:`takewhile`).
328
Georg Brandl116aa622007-08-15 14:28:22 +0000329
Christian Heimes9e7f1d22008-02-28 12:27:11 +0000330.. function:: product(*iterables[, repeat])
Christian Heimes90c3d9b2008-02-23 13:18:03 +0000331
332 Cartesian product of input iterables.
333
334 Equivalent to nested for-loops in a generator expression. For example,
335 ``product(A, B)`` returns the same as ``((x,y) for x in A for y in B)``.
336
337 The leftmost iterators are in the outermost for-loop, so the output tuples
338 cycle in a manner similar to an odometer (with the rightmost element
Christian Heimes836baa52008-02-26 08:18:30 +0000339 changing on every iteration). This results in a lexicographic ordering
340 so that if the inputs iterables are sorted, the product tuples are emitted
341 in sorted order.
Christian Heimes90c3d9b2008-02-23 13:18:03 +0000342
Christian Heimes9e7f1d22008-02-28 12:27:11 +0000343 To compute the product of an iterable with itself, specify the number of
344 repetitions with the optional *repeat* keyword argument. For example,
345 ``product(A, repeat=4)`` means the same as ``product(A, A, A, A)``.
346
Christian Heimes836baa52008-02-26 08:18:30 +0000347 Equivalent to the following except that the actual implementation does not
348 build-up intermediate results in memory::
Christian Heimes90c3d9b2008-02-23 13:18:03 +0000349
Christian Heimes9e7f1d22008-02-28 12:27:11 +0000350 def product(*args, **kwds):
351 pools = map(tuple, args) * kwds.get('repeat', 1)
Christian Heimes90c3d9b2008-02-23 13:18:03 +0000352 if pools:
353 result = [[]]
354 for pool in pools:
355 result = [x+[y] for x in result for y in pool]
356 for prod in result:
357 yield tuple(prod)
358
359
Georg Brandl116aa622007-08-15 14:28:22 +0000360.. function:: repeat(object[, times])
361
362 Make an iterator that returns *object* over and over again. Runs indefinitely
363 unless the *times* argument is specified. Used as argument to :func:`imap` for
364 invariant parameters to the called function. Also used with :func:`izip` to
365 create an invariant part of a tuple record. Equivalent to::
366
367 def repeat(object, times=None):
368 if times is None:
369 while True:
370 yield object
371 else:
372 for i in range(times):
373 yield object
374
375
376.. function:: starmap(function, iterable)
377
Christian Heimes679db4a2008-01-18 09:56:22 +0000378 Make an iterator that computes the function using arguments obtained from
Georg Brandl116aa622007-08-15 14:28:22 +0000379 the iterable. Used instead of :func:`imap` when argument parameters are already
380 grouped in tuples from a single iterable (the data has been "pre-zipped"). The
381 difference between :func:`imap` and :func:`starmap` parallels the distinction
382 between ``function(a,b)`` and ``function(*c)``. Equivalent to::
383
384 def starmap(function, iterable):
Christian Heimes679db4a2008-01-18 09:56:22 +0000385 for args in iterable:
386 yield function(*args)
387
388 .. versionchanged:: 2.6
389 Previously, :func:`starmap` required the function arguments to be tuples.
390 Now, any iterable is allowed.
Georg Brandl116aa622007-08-15 14:28:22 +0000391
392
393.. function:: takewhile(predicate, iterable)
394
395 Make an iterator that returns elements from the iterable as long as the
396 predicate is true. Equivalent to::
397
398 def takewhile(predicate, iterable):
399 for x in iterable:
400 if predicate(x):
401 yield x
402 else:
403 break
404
405
406.. function:: tee(iterable[, n=2])
407
408 Return *n* independent iterators from a single iterable. The case where ``n==2``
409 is equivalent to::
410
411 def tee(iterable):
Christian Heimes5b5e81c2007-12-31 16:14:33 +0000412 def gen(next, data={}):
Georg Brandl116aa622007-08-15 14:28:22 +0000413 for i in count():
Christian Heimes5b5e81c2007-12-31 16:14:33 +0000414 if i in data:
415 yield data.pop(i)
Georg Brandl116aa622007-08-15 14:28:22 +0000416 else:
Christian Heimes5b5e81c2007-12-31 16:14:33 +0000417 data[i] = next()
418 yield data[i]
Georg Brandl116aa622007-08-15 14:28:22 +0000419 it = iter(iterable)
420 return (gen(it.__next__), gen(it.__next__))
421
422 Note, once :func:`tee` has made a split, the original *iterable* should not be
423 used anywhere else; otherwise, the *iterable* could get advanced without the tee
424 objects being informed.
425
426 Note, this member of the toolkit may require significant auxiliary storage
427 (depending on how much temporary data needs to be stored). In general, if one
428 iterator is going to use most or all of the data before the other iterator, it
429 is faster to use :func:`list` instead of :func:`tee`.
430
Georg Brandl116aa622007-08-15 14:28:22 +0000431
432.. _itertools-example:
433
434Examples
435--------
436
437The following examples show common uses for each tool and demonstrate ways they
438can be combined. ::
439
440 >>> amounts = [120.15, 764.05, 823.14]
441 >>> for checknum, amount in izip(count(1200), amounts):
Georg Brandl6911e3c2007-09-04 07:15:32 +0000442 ... print('Check %d is for $%.2f' % (checknum, amount))
Georg Brandl116aa622007-08-15 14:28:22 +0000443 ...
444 Check 1200 is for $120.15
445 Check 1201 is for $764.05
446 Check 1202 is for $823.14
447
448 >>> import operator
449 >>> for cube in imap(operator.pow, range(1,5), repeat(3)):
Georg Brandl6911e3c2007-09-04 07:15:32 +0000450 ... print(cube)
Georg Brandl116aa622007-08-15 14:28:22 +0000451 ...
452 1
453 8
454 27
455 64
456
457 >>> reportlines = ['EuroPython', 'Roster', '', 'alex', '', 'laura',
458 ... '', 'martin', '', 'walter', '', 'mark']
459 >>> for name in islice(reportlines, 3, None, 2):
Georg Brandl6911e3c2007-09-04 07:15:32 +0000460 ... print(name.title())
Georg Brandl116aa622007-08-15 14:28:22 +0000461 ...
462 Alex
463 Laura
464 Martin
465 Walter
466 Mark
467
468 # Show a dictionary sorted and grouped by value
469 >>> from operator import itemgetter
470 >>> d = dict(a=1, b=2, c=1, d=2, e=1, f=2, g=3)
Fred Drake2e748782007-09-04 17:33:11 +0000471 >>> di = sorted(d.items(), key=itemgetter(1))
Georg Brandl116aa622007-08-15 14:28:22 +0000472 >>> for k, g in groupby(di, key=itemgetter(1)):
Georg Brandl6911e3c2007-09-04 07:15:32 +0000473 ... print(k, map(itemgetter(0), g))
Georg Brandl116aa622007-08-15 14:28:22 +0000474 ...
475 1 ['a', 'c', 'e']
476 2 ['b', 'd', 'f']
477 3 ['g']
478
479 # Find runs of consecutive numbers using groupby. The key to the solution
480 # is differencing with a range so that consecutive numbers all appear in
481 # same group.
482 >>> data = [ 1, 4,5,6, 10, 15,16,17,18, 22, 25,26,27,28]
483 >>> for k, g in groupby(enumerate(data), lambda t:t[0]-t[1]):
Georg Brandl6911e3c2007-09-04 07:15:32 +0000484 ... print(map(operator.itemgetter(1), g))
Georg Brandl116aa622007-08-15 14:28:22 +0000485 ...
486 [1]
487 [4, 5, 6]
488 [10]
489 [15, 16, 17, 18]
490 [22]
491 [25, 26, 27, 28]
492
493
494
495.. _itertools-recipes:
496
497Recipes
498-------
499
500This section shows recipes for creating an extended toolset using the existing
501itertools as building blocks.
502
503The extended tools offer the same high performance as the underlying toolset.
504The superior memory performance is kept by processing elements one at a time
505rather than bringing the whole iterable into memory all at once. Code volume is
506kept small by linking the tools together in a functional style which helps
507eliminate temporary variables. High speed is retained by preferring
Georg Brandl9afde1c2007-11-01 20:32:30 +0000508"vectorized" building blocks over the use of for-loops and :term:`generator`\s
509which incur interpreter overhead. ::
Georg Brandl116aa622007-08-15 14:28:22 +0000510
511 def take(n, seq):
512 return list(islice(seq, n))
513
514 def enumerate(iterable):
515 return izip(count(), iterable)
516
517 def tabulate(function):
518 "Return function(0), function(1), ..."
519 return imap(function, count())
520
Georg Brandl116aa622007-08-15 14:28:22 +0000521 def nth(iterable, n):
522 "Returns the nth item or raise StopIteration"
523 return islice(iterable, n, None).next()
524
525 def all(seq, pred=None):
526 "Returns True if pred(x) is true for every element in the iterable"
527 for elem in ifilterfalse(pred, seq):
528 return False
529 return True
530
531 def any(seq, pred=None):
532 "Returns True if pred(x) is true for at least one element in the iterable"
533 for elem in ifilter(pred, seq):
534 return True
535 return False
536
537 def no(seq, pred=None):
538 "Returns True if pred(x) is false for every element in the iterable"
539 for elem in ifilter(pred, seq):
540 return False
541 return True
542
543 def quantify(seq, pred=None):
544 "Count how many times the predicate is true in the sequence"
545 return sum(imap(pred, seq))
546
547 def padnone(seq):
548 """Returns the sequence elements and then returns None indefinitely.
549
550 Useful for emulating the behavior of the built-in map() function.
551 """
552 return chain(seq, repeat(None))
553
554 def ncycles(seq, n):
555 "Returns the sequence elements n times"
556 return chain(*repeat(seq, n))
557
558 def dotproduct(vec1, vec2):
559 return sum(imap(operator.mul, vec1, vec2))
560
561 def flatten(listOfLists):
562 return list(chain(*listOfLists))
563
564 def repeatfunc(func, times=None, *args):
565 """Repeat calls to func with specified arguments.
566
567 Example: repeatfunc(random.random)
568 """
569 if times is None:
570 return starmap(func, repeat(args))
571 else:
572 return starmap(func, repeat(args, times))
573
574 def pairwise(iterable):
575 "s -> (s0,s1), (s1,s2), (s2, s3), ..."
576 a, b = tee(iterable)
577 next(b, None)
578 return izip(a, b)
579
580 def grouper(n, iterable, padvalue=None):
581 "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
582 return izip(*[chain(iterable, repeat(padvalue, n-1))]*n)
583
Christian Heimes7b3ce6a2008-01-31 14:31:45 +0000584 def roundrobin(*iterables):
585 "roundrobin('abc', 'd', 'ef') --> 'a', 'd', 'e', 'b', 'f', 'c'"
586 # Recipe contributed by George Sakkis
587 pending = len(iterables)
588 nexts = cycle(iter(it).next for it in iterables)
589 while pending:
590 try:
591 for next in nexts:
592 yield next()
593 except StopIteration:
594 pending -= 1
595 nexts = cycle(islice(nexts, pending))
Georg Brandl116aa622007-08-15 14:28:22 +0000596
Christian Heimes90c3d9b2008-02-23 13:18:03 +0000597 def powerset(iterable):
598 "powerset('ab') --> set([]), set(['b']), set(['a']), set(['a', 'b'])"
599 skip = object()
600 for t in product(*izip(repeat(skip), iterable)):
601 yield set(e for e in t if e is not skip)
602