blob: d04f33b2e7b1548ecb3cb6c0ce224934daf423f2 [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001
2:mod:`itertools` --- Functions creating iterators for efficient looping
3=======================================================================
4
5.. module:: itertools
6 :synopsis: Functions creating iterators for efficient looping.
7.. moduleauthor:: Raymond Hettinger <python@rcn.com>
8.. sectionauthor:: Raymond Hettinger <python@rcn.com>
9
10
Georg Brandl9afde1c2007-11-01 20:32:30 +000011This module implements a number of :term:`iterator` building blocks inspired by
Georg Brandl116aa622007-08-15 14:28:22 +000012constructs from the Haskell and SML programming languages. Each has been recast
13in a form suitable for Python.
14
15The module standardizes a core set of fast, memory efficient tools that are
16useful by themselves or in combination. Standardization helps avoid the
17readability and reliability problems which arise when many different individuals
18create their own slightly varying implementations, each with their own quirks
19and naming conventions.
20
21The tools are designed to combine readily with one another. This makes it easy
22to construct more specialized tools succinctly and efficiently in pure Python.
23
24For instance, SML provides a tabulation tool: ``tabulate(f)`` which produces a
25sequence ``f(0), f(1), ...``. This toolbox provides :func:`imap` and
26:func:`count` which can be combined to form ``imap(f, count())`` and produce an
27equivalent result.
28
29Likewise, the functional tools are designed to work well with the high-speed
30functions provided by the :mod:`operator` module.
31
32The module author welcomes suggestions for other basic building blocks to be
33added to future versions of the module.
34
35Whether cast in pure python form or compiled code, tools that use iterators are
36more memory efficient (and faster) than their list based counterparts. Adopting
37the principles of just-in-time manufacturing, they create data when and where
38needed instead of consuming memory with the computer equivalent of "inventory".
39
40The performance advantage of iterators becomes more acute as the number of
41elements increases -- at some point, lists grow large enough to severely impact
42memory cache performance and start running slowly.
43
44
45.. seealso::
46
47 The Standard ML Basis Library, `The Standard ML Basis Library
48 <http://www.standardml.org/Basis/>`_.
49
50 Haskell, A Purely Functional Language, `Definition of Haskell and the Standard
51 Libraries <http://www.haskell.org/definition/>`_.
52
53
54.. _itertools-functions:
55
56Itertool functions
57------------------
58
59The following module functions all construct and return iterators. Some provide
60streams of infinite length, so they should only be accessed by functions or
61loops that truncate the stream.
62
63
64.. function:: chain(*iterables)
65
66 Make an iterator that returns elements from the first iterable until it is
67 exhausted, then proceeds to the next iterable, until all of the iterables are
68 exhausted. Used for treating consecutive sequences as a single sequence.
69 Equivalent to::
70
71 def chain(*iterables):
72 for it in iterables:
73 for element in it:
74 yield element
75
76
Christian Heimes836baa52008-02-26 08:18:30 +000077.. function:: combinations(iterable, r)
78
79 Return successive *r* length combinations of elements in the *iterable*.
80
81 Combinations are emitted in a lexicographic sort order. So, if the
82 input *iterable* is sorted, the combination tuples will be produced
83 in sorted order.
84
85 Elements are treated as unique based on their position, not on their
86 value. So if the input elements are unique, there will be no repeat
87 values within a single combination.
88
89 Each result tuple is ordered to match the input order. So, every
90 combination is a subsequence of the input *iterable*.
91
92 Example: ``combinations(range(4), 3) --> (0,1,2), (0,1,3), (0,2,3), (1,2,3)``
93
94 Equivalent to::
95
96 def combinations(iterable, r):
97 pool = tuple(iterable)
98 if pool:
99 n = len(pool)
100 vec = range(r)
101 yield tuple(pool[i] for i in vec)
102 while 1:
103 for i in reversed(range(r)):
104 if vec[i] == i + n-r:
105 continue
106 vec[i] += 1
107 for j in range(i+1, r):
108 vec[j] = vec[j-1] + 1
109 yield tuple(pool[i] for i in vec)
110 break
111 else:
112 return
113
114 .. versionadded:: 2.6
115
Georg Brandl116aa622007-08-15 14:28:22 +0000116.. function:: count([n])
117
118 Make an iterator that returns consecutive integers starting with *n*. If not
Georg Brandl9afde1c2007-11-01 20:32:30 +0000119 specified *n* defaults to zero. Often used as an argument to :func:`imap` to
120 generate consecutive data points. Also, used with :func:`izip` to add sequence
121 numbers. Equivalent to::
Georg Brandl116aa622007-08-15 14:28:22 +0000122
123 def count(n=0):
124 while True:
125 yield n
126 n += 1
127
Georg Brandl116aa622007-08-15 14:28:22 +0000128
129.. function:: cycle(iterable)
130
131 Make an iterator returning elements from the iterable and saving a copy of each.
132 When the iterable is exhausted, return elements from the saved copy. Repeats
133 indefinitely. Equivalent to::
134
135 def cycle(iterable):
136 saved = []
137 for element in iterable:
138 yield element
139 saved.append(element)
140 while saved:
141 for element in saved:
142 yield element
143
144 Note, this member of the toolkit may require significant auxiliary storage
145 (depending on the length of the iterable).
146
147
148.. function:: dropwhile(predicate, iterable)
149
150 Make an iterator that drops elements from the iterable as long as the predicate
151 is true; afterwards, returns every element. Note, the iterator does not produce
152 *any* output until the predicate first becomes false, so it may have a lengthy
153 start-up time. Equivalent to::
154
155 def dropwhile(predicate, iterable):
156 iterable = iter(iterable)
157 for x in iterable:
158 if not predicate(x):
159 yield x
160 break
161 for x in iterable:
162 yield x
163
164
165.. function:: groupby(iterable[, key])
166
167 Make an iterator that returns consecutive keys and groups from the *iterable*.
168 The *key* is a function computing a key value for each element. If not
169 specified or is ``None``, *key* defaults to an identity function and returns
170 the element unchanged. Generally, the iterable needs to already be sorted on
171 the same key function.
172
173 The operation of :func:`groupby` is similar to the ``uniq`` filter in Unix. It
174 generates a break or new group every time the value of the key function changes
175 (which is why it is usually necessary to have sorted the data using the same key
176 function). That behavior differs from SQL's GROUP BY which aggregates common
177 elements regardless of their input order.
178
179 The returned group is itself an iterator that shares the underlying iterable
180 with :func:`groupby`. Because the source is shared, when the :func:`groupby`
181 object is advanced, the previous group is no longer visible. So, if that data
182 is needed later, it should be stored as a list::
183
184 groups = []
185 uniquekeys = []
186 data = sorted(data, key=keyfunc)
187 for k, g in groupby(data, keyfunc):
188 groups.append(list(g)) # Store group iterator as a list
189 uniquekeys.append(k)
190
191 :func:`groupby` is equivalent to::
192
193 class groupby(object):
194 def __init__(self, iterable, key=None):
195 if key is None:
196 key = lambda x: x
197 self.keyfunc = key
198 self.it = iter(iterable)
Christian Heimes5b5e81c2007-12-31 16:14:33 +0000199 self.tgtkey = self.currkey = self.currvalue = object()
Georg Brandl116aa622007-08-15 14:28:22 +0000200 def __iter__(self):
201 return self
202 def __next__(self):
203 while self.currkey == self.tgtkey:
204 self.currvalue = next(self.it) # Exit on StopIteration
205 self.currkey = self.keyfunc(self.currvalue)
206 self.tgtkey = self.currkey
207 return (self.currkey, self._grouper(self.tgtkey))
208 def _grouper(self, tgtkey):
209 while self.currkey == tgtkey:
210 yield self.currvalue
211 self.currvalue = next(self.it) # Exit on StopIteration
212 self.currkey = self.keyfunc(self.currvalue)
213
Georg Brandl116aa622007-08-15 14:28:22 +0000214
215.. function:: ifilter(predicate, iterable)
216
217 Make an iterator that filters elements from iterable returning only those for
218 which the predicate is ``True``. If *predicate* is ``None``, return the items
Georg Brandlf6945182008-02-01 11:56:49 +0000219 that are true. This function is the same as the built-in :func:`filter`
220 function. Equivalent to::
Georg Brandl116aa622007-08-15 14:28:22 +0000221
222 def ifilter(predicate, iterable):
223 if predicate is None:
224 predicate = bool
225 for x in iterable:
226 if predicate(x):
227 yield x
228
229
230.. function:: ifilterfalse(predicate, iterable)
231
232 Make an iterator that filters elements from iterable returning only those for
233 which the predicate is ``False``. If *predicate* is ``None``, return the items
234 that are false. Equivalent to::
235
236 def ifilterfalse(predicate, iterable):
237 if predicate is None:
238 predicate = bool
239 for x in iterable:
240 if not predicate(x):
241 yield x
242
243
244.. function:: imap(function, *iterables)
245
246 Make an iterator that computes the function using arguments from each of the
Georg Brandlf6945182008-02-01 11:56:49 +0000247 iterables. This function is the same as the built-in :func:`map` function.
248 Equivalent to::
Georg Brandl116aa622007-08-15 14:28:22 +0000249
250 def imap(function, *iterables):
Raymond Hettinger1dfde1d2008-01-22 23:25:35 +0000251 iterables = [iter(it) for it in iterables)
Georg Brandl116aa622007-08-15 14:28:22 +0000252 while True:
Raymond Hettinger1dfde1d2008-01-22 23:25:35 +0000253 args = [next(it) for it in iterables]
Christian Heimes1af737c2008-01-23 08:24:23 +0000254 if function is None:
255 yield tuple(args)
256 else:
257 yield function(*args)
Georg Brandl116aa622007-08-15 14:28:22 +0000258
259
260.. function:: islice(iterable, [start,] stop [, step])
261
262 Make an iterator that returns selected elements from the iterable. If *start* is
263 non-zero, then elements from the iterable are skipped until start is reached.
264 Afterward, elements are returned consecutively unless *step* is set higher than
265 one which results in items being skipped. If *stop* is ``None``, then iteration
266 continues until the iterator is exhausted, if at all; otherwise, it stops at the
267 specified position. Unlike regular slicing, :func:`islice` does not support
268 negative values for *start*, *stop*, or *step*. Can be used to extract related
269 fields from data where the internal structure has been flattened (for example, a
270 multi-line report may list a name field on every third line). Equivalent to::
271
272 def islice(iterable, *args):
273 s = slice(*args)
Georg Brandlf6945182008-02-01 11:56:49 +0000274 it = range(s.start or 0, s.stop or sys.maxsize, s.step or 1)
Georg Brandl116aa622007-08-15 14:28:22 +0000275 nexti = next(it)
276 for i, element in enumerate(iterable):
277 if i == nexti:
278 yield element
279 nexti = next(it)
280
281 If *start* is ``None``, then iteration starts at zero. If *step* is ``None``,
282 then the step defaults to one.
283
Georg Brandl116aa622007-08-15 14:28:22 +0000284
285.. function:: izip(*iterables)
286
287 Make an iterator that aggregates elements from each of the iterables. Like
288 :func:`zip` except that it returns an iterator instead of a list. Used for
289 lock-step iteration over several iterables at a time. Equivalent to::
290
291 def izip(*iterables):
292 iterables = map(iter, iterables)
293 while iterables:
294 result = [next(it) for it in iterables]
295 yield tuple(result)
296
Georg Brandl55ac8f02007-09-01 13:51:09 +0000297 When no iterables are specified, return a zero length iterator.
Georg Brandl116aa622007-08-15 14:28:22 +0000298
Christian Heimes1af737c2008-01-23 08:24:23 +0000299 The left-to-right evaluation order of the iterables is guaranteed. This
300 makes possible an idiom for clustering a data series into n-length groups
301 using ``izip(*[iter(s)]*n)``.
Georg Brandl116aa622007-08-15 14:28:22 +0000302
Christian Heimes1af737c2008-01-23 08:24:23 +0000303 :func:`izip` should only be used with unequal length inputs when you don't
304 care about trailing, unmatched values from the longer iterables. If those
305 values are important, use :func:`izip_longest` instead.
Georg Brandl116aa622007-08-15 14:28:22 +0000306
307
308.. function:: izip_longest(*iterables[, fillvalue])
309
310 Make an iterator that aggregates elements from each of the iterables. If the
311 iterables are of uneven length, missing values are filled-in with *fillvalue*.
312 Iteration continues until the longest iterable is exhausted. Equivalent to::
313
314 def izip_longest(*args, **kwds):
315 fillvalue = kwds.get('fillvalue')
316 def sentinel(counter = ([fillvalue]*(len(args)-1)).pop):
317 yield counter() # yields the fillvalue, or raises IndexError
318 fillers = repeat(fillvalue)
319 iters = [chain(it, sentinel(), fillers) for it in args]
320 try:
321 for tup in izip(*iters):
322 yield tup
323 except IndexError:
324 pass
325
326 If one of the iterables is potentially infinite, then the :func:`izip_longest`
327 function should be wrapped with something that limits the number of calls (for
328 example :func:`islice` or :func:`takewhile`).
329
Georg Brandl116aa622007-08-15 14:28:22 +0000330
Christian Heimes90c3d9b2008-02-23 13:18:03 +0000331.. function:: product(*iterables)
332
333 Cartesian product of input iterables.
334
335 Equivalent to nested for-loops in a generator expression. For example,
336 ``product(A, B)`` returns the same as ``((x,y) for x in A for y in B)``.
337
338 The leftmost iterators are in the outermost for-loop, so the output tuples
339 cycle in a manner similar to an odometer (with the rightmost element
Christian Heimes836baa52008-02-26 08:18:30 +0000340 changing on every iteration). This results in a lexicographic ordering
341 so that if the inputs iterables are sorted, the product tuples are emitted
342 in sorted order.
Christian Heimes90c3d9b2008-02-23 13:18:03 +0000343
Christian Heimes836baa52008-02-26 08:18:30 +0000344 Equivalent to the following except that the actual implementation does not
345 build-up intermediate results in memory::
Christian Heimes90c3d9b2008-02-23 13:18:03 +0000346
347 def product(*args):
348 pools = map(tuple, args)
349 if pools:
350 result = [[]]
351 for pool in pools:
352 result = [x+[y] for x in result for y in pool]
353 for prod in result:
354 yield tuple(prod)
355
356
Georg Brandl116aa622007-08-15 14:28:22 +0000357.. function:: repeat(object[, times])
358
359 Make an iterator that returns *object* over and over again. Runs indefinitely
360 unless the *times* argument is specified. Used as argument to :func:`imap` for
361 invariant parameters to the called function. Also used with :func:`izip` to
362 create an invariant part of a tuple record. Equivalent to::
363
364 def repeat(object, times=None):
365 if times is None:
366 while True:
367 yield object
368 else:
369 for i in range(times):
370 yield object
371
372
373.. function:: starmap(function, iterable)
374
Christian Heimes679db4a2008-01-18 09:56:22 +0000375 Make an iterator that computes the function using arguments obtained from
Georg Brandl116aa622007-08-15 14:28:22 +0000376 the iterable. Used instead of :func:`imap` when argument parameters are already
377 grouped in tuples from a single iterable (the data has been "pre-zipped"). The
378 difference between :func:`imap` and :func:`starmap` parallels the distinction
379 between ``function(a,b)`` and ``function(*c)``. Equivalent to::
380
381 def starmap(function, iterable):
Christian Heimes679db4a2008-01-18 09:56:22 +0000382 for args in iterable:
383 yield function(*args)
384
385 .. versionchanged:: 2.6
386 Previously, :func:`starmap` required the function arguments to be tuples.
387 Now, any iterable is allowed.
Georg Brandl116aa622007-08-15 14:28:22 +0000388
389
390.. function:: takewhile(predicate, iterable)
391
392 Make an iterator that returns elements from the iterable as long as the
393 predicate is true. Equivalent to::
394
395 def takewhile(predicate, iterable):
396 for x in iterable:
397 if predicate(x):
398 yield x
399 else:
400 break
401
402
403.. function:: tee(iterable[, n=2])
404
405 Return *n* independent iterators from a single iterable. The case where ``n==2``
406 is equivalent to::
407
408 def tee(iterable):
Christian Heimes5b5e81c2007-12-31 16:14:33 +0000409 def gen(next, data={}):
Georg Brandl116aa622007-08-15 14:28:22 +0000410 for i in count():
Christian Heimes5b5e81c2007-12-31 16:14:33 +0000411 if i in data:
412 yield data.pop(i)
Georg Brandl116aa622007-08-15 14:28:22 +0000413 else:
Christian Heimes5b5e81c2007-12-31 16:14:33 +0000414 data[i] = next()
415 yield data[i]
Georg Brandl116aa622007-08-15 14:28:22 +0000416 it = iter(iterable)
417 return (gen(it.__next__), gen(it.__next__))
418
419 Note, once :func:`tee` has made a split, the original *iterable* should not be
420 used anywhere else; otherwise, the *iterable* could get advanced without the tee
421 objects being informed.
422
423 Note, this member of the toolkit may require significant auxiliary storage
424 (depending on how much temporary data needs to be stored). In general, if one
425 iterator is going to use most or all of the data before the other iterator, it
426 is faster to use :func:`list` instead of :func:`tee`.
427
Georg Brandl116aa622007-08-15 14:28:22 +0000428
429.. _itertools-example:
430
431Examples
432--------
433
434The following examples show common uses for each tool and demonstrate ways they
435can be combined. ::
436
437 >>> amounts = [120.15, 764.05, 823.14]
438 >>> for checknum, amount in izip(count(1200), amounts):
Georg Brandl6911e3c2007-09-04 07:15:32 +0000439 ... print('Check %d is for $%.2f' % (checknum, amount))
Georg Brandl116aa622007-08-15 14:28:22 +0000440 ...
441 Check 1200 is for $120.15
442 Check 1201 is for $764.05
443 Check 1202 is for $823.14
444
445 >>> import operator
446 >>> for cube in imap(operator.pow, range(1,5), repeat(3)):
Georg Brandl6911e3c2007-09-04 07:15:32 +0000447 ... print(cube)
Georg Brandl116aa622007-08-15 14:28:22 +0000448 ...
449 1
450 8
451 27
452 64
453
454 >>> reportlines = ['EuroPython', 'Roster', '', 'alex', '', 'laura',
455 ... '', 'martin', '', 'walter', '', 'mark']
456 >>> for name in islice(reportlines, 3, None, 2):
Georg Brandl6911e3c2007-09-04 07:15:32 +0000457 ... print(name.title())
Georg Brandl116aa622007-08-15 14:28:22 +0000458 ...
459 Alex
460 Laura
461 Martin
462 Walter
463 Mark
464
465 # Show a dictionary sorted and grouped by value
466 >>> from operator import itemgetter
467 >>> d = dict(a=1, b=2, c=1, d=2, e=1, f=2, g=3)
Fred Drake2e748782007-09-04 17:33:11 +0000468 >>> di = sorted(d.items(), key=itemgetter(1))
Georg Brandl116aa622007-08-15 14:28:22 +0000469 >>> for k, g in groupby(di, key=itemgetter(1)):
Georg Brandl6911e3c2007-09-04 07:15:32 +0000470 ... print(k, map(itemgetter(0), g))
Georg Brandl116aa622007-08-15 14:28:22 +0000471 ...
472 1 ['a', 'c', 'e']
473 2 ['b', 'd', 'f']
474 3 ['g']
475
476 # Find runs of consecutive numbers using groupby. The key to the solution
477 # is differencing with a range so that consecutive numbers all appear in
478 # same group.
479 >>> data = [ 1, 4,5,6, 10, 15,16,17,18, 22, 25,26,27,28]
480 >>> for k, g in groupby(enumerate(data), lambda t:t[0]-t[1]):
Georg Brandl6911e3c2007-09-04 07:15:32 +0000481 ... print(map(operator.itemgetter(1), g))
Georg Brandl116aa622007-08-15 14:28:22 +0000482 ...
483 [1]
484 [4, 5, 6]
485 [10]
486 [15, 16, 17, 18]
487 [22]
488 [25, 26, 27, 28]
489
490
491
492.. _itertools-recipes:
493
494Recipes
495-------
496
497This section shows recipes for creating an extended toolset using the existing
498itertools as building blocks.
499
500The extended tools offer the same high performance as the underlying toolset.
501The superior memory performance is kept by processing elements one at a time
502rather than bringing the whole iterable into memory all at once. Code volume is
503kept small by linking the tools together in a functional style which helps
504eliminate temporary variables. High speed is retained by preferring
Georg Brandl9afde1c2007-11-01 20:32:30 +0000505"vectorized" building blocks over the use of for-loops and :term:`generator`\s
506which incur interpreter overhead. ::
Georg Brandl116aa622007-08-15 14:28:22 +0000507
508 def take(n, seq):
509 return list(islice(seq, n))
510
511 def enumerate(iterable):
512 return izip(count(), iterable)
513
514 def tabulate(function):
515 "Return function(0), function(1), ..."
516 return imap(function, count())
517
Georg Brandl116aa622007-08-15 14:28:22 +0000518 def nth(iterable, n):
519 "Returns the nth item or raise StopIteration"
520 return islice(iterable, n, None).next()
521
522 def all(seq, pred=None):
523 "Returns True if pred(x) is true for every element in the iterable"
524 for elem in ifilterfalse(pred, seq):
525 return False
526 return True
527
528 def any(seq, pred=None):
529 "Returns True if pred(x) is true for at least one element in the iterable"
530 for elem in ifilter(pred, seq):
531 return True
532 return False
533
534 def no(seq, pred=None):
535 "Returns True if pred(x) is false for every element in the iterable"
536 for elem in ifilter(pred, seq):
537 return False
538 return True
539
540 def quantify(seq, pred=None):
541 "Count how many times the predicate is true in the sequence"
542 return sum(imap(pred, seq))
543
544 def padnone(seq):
545 """Returns the sequence elements and then returns None indefinitely.
546
547 Useful for emulating the behavior of the built-in map() function.
548 """
549 return chain(seq, repeat(None))
550
551 def ncycles(seq, n):
552 "Returns the sequence elements n times"
553 return chain(*repeat(seq, n))
554
555 def dotproduct(vec1, vec2):
556 return sum(imap(operator.mul, vec1, vec2))
557
558 def flatten(listOfLists):
559 return list(chain(*listOfLists))
560
561 def repeatfunc(func, times=None, *args):
562 """Repeat calls to func with specified arguments.
563
564 Example: repeatfunc(random.random)
565 """
566 if times is None:
567 return starmap(func, repeat(args))
568 else:
569 return starmap(func, repeat(args, times))
570
571 def pairwise(iterable):
572 "s -> (s0,s1), (s1,s2), (s2, s3), ..."
573 a, b = tee(iterable)
574 next(b, None)
575 return izip(a, b)
576
577 def grouper(n, iterable, padvalue=None):
578 "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
579 return izip(*[chain(iterable, repeat(padvalue, n-1))]*n)
580
Christian Heimes7b3ce6a2008-01-31 14:31:45 +0000581 def roundrobin(*iterables):
582 "roundrobin('abc', 'd', 'ef') --> 'a', 'd', 'e', 'b', 'f', 'c'"
583 # Recipe contributed by George Sakkis
584 pending = len(iterables)
585 nexts = cycle(iter(it).next for it in iterables)
586 while pending:
587 try:
588 for next in nexts:
589 yield next()
590 except StopIteration:
591 pending -= 1
592 nexts = cycle(islice(nexts, pending))
Georg Brandl116aa622007-08-15 14:28:22 +0000593
Christian Heimes90c3d9b2008-02-23 13:18:03 +0000594 def powerset(iterable):
595 "powerset('ab') --> set([]), set(['b']), set(['a']), set(['a', 'b'])"
596 skip = object()
597 for t in product(*izip(repeat(skip), iterable)):
598 yield set(e for e in t if e is not skip)
599