blob: af73b570ad445f03a3e466be8293eaadd6ea15ca [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001
2:mod:`itertools` --- Functions creating iterators for efficient looping
3=======================================================================
4
5.. module:: itertools
6 :synopsis: Functions creating iterators for efficient looping.
7.. moduleauthor:: Raymond Hettinger <python@rcn.com>
8.. sectionauthor:: Raymond Hettinger <python@rcn.com>
9
10
Georg Brandl9afde1c2007-11-01 20:32:30 +000011This module implements a number of :term:`iterator` building blocks inspired by
Georg Brandl116aa622007-08-15 14:28:22 +000012constructs from the Haskell and SML programming languages. Each has been recast
13in a form suitable for Python.
14
15The module standardizes a core set of fast, memory efficient tools that are
16useful by themselves or in combination. Standardization helps avoid the
17readability and reliability problems which arise when many different individuals
18create their own slightly varying implementations, each with their own quirks
19and naming conventions.
20
21The tools are designed to combine readily with one another. This makes it easy
22to construct more specialized tools succinctly and efficiently in pure Python.
23
24For instance, SML provides a tabulation tool: ``tabulate(f)`` which produces a
25sequence ``f(0), f(1), ...``. This toolbox provides :func:`imap` and
26:func:`count` which can be combined to form ``imap(f, count())`` and produce an
27equivalent result.
28
29Likewise, the functional tools are designed to work well with the high-speed
30functions provided by the :mod:`operator` module.
31
32The module author welcomes suggestions for other basic building blocks to be
33added to future versions of the module.
34
35Whether cast in pure python form or compiled code, tools that use iterators are
36more memory efficient (and faster) than their list based counterparts. Adopting
37the principles of just-in-time manufacturing, they create data when and where
38needed instead of consuming memory with the computer equivalent of "inventory".
39
40The performance advantage of iterators becomes more acute as the number of
41elements increases -- at some point, lists grow large enough to severely impact
42memory cache performance and start running slowly.
43
44
45.. seealso::
46
47 The Standard ML Basis Library, `The Standard ML Basis Library
48 <http://www.standardml.org/Basis/>`_.
49
50 Haskell, A Purely Functional Language, `Definition of Haskell and the Standard
51 Libraries <http://www.haskell.org/definition/>`_.
52
53
54.. _itertools-functions:
55
56Itertool functions
57------------------
58
59The following module functions all construct and return iterators. Some provide
60streams of infinite length, so they should only be accessed by functions or
61loops that truncate the stream.
62
63
64.. function:: chain(*iterables)
65
66 Make an iterator that returns elements from the first iterable until it is
67 exhausted, then proceeds to the next iterable, until all of the iterables are
68 exhausted. Used for treating consecutive sequences as a single sequence.
69 Equivalent to::
70
71 def chain(*iterables):
72 for it in iterables:
73 for element in it:
74 yield element
75
76
77.. function:: count([n])
78
79 Make an iterator that returns consecutive integers starting with *n*. If not
Georg Brandl9afde1c2007-11-01 20:32:30 +000080 specified *n* defaults to zero. Often used as an argument to :func:`imap` to
81 generate consecutive data points. Also, used with :func:`izip` to add sequence
82 numbers. Equivalent to::
Georg Brandl116aa622007-08-15 14:28:22 +000083
84 def count(n=0):
85 while True:
86 yield n
87 n += 1
88
Georg Brandl116aa622007-08-15 14:28:22 +000089
90.. function:: cycle(iterable)
91
92 Make an iterator returning elements from the iterable and saving a copy of each.
93 When the iterable is exhausted, return elements from the saved copy. Repeats
94 indefinitely. Equivalent to::
95
96 def cycle(iterable):
97 saved = []
98 for element in iterable:
99 yield element
100 saved.append(element)
101 while saved:
102 for element in saved:
103 yield element
104
105 Note, this member of the toolkit may require significant auxiliary storage
106 (depending on the length of the iterable).
107
108
109.. function:: dropwhile(predicate, iterable)
110
111 Make an iterator that drops elements from the iterable as long as the predicate
112 is true; afterwards, returns every element. Note, the iterator does not produce
113 *any* output until the predicate first becomes false, so it may have a lengthy
114 start-up time. Equivalent to::
115
116 def dropwhile(predicate, iterable):
117 iterable = iter(iterable)
118 for x in iterable:
119 if not predicate(x):
120 yield x
121 break
122 for x in iterable:
123 yield x
124
125
126.. function:: groupby(iterable[, key])
127
128 Make an iterator that returns consecutive keys and groups from the *iterable*.
129 The *key* is a function computing a key value for each element. If not
130 specified or is ``None``, *key* defaults to an identity function and returns
131 the element unchanged. Generally, the iterable needs to already be sorted on
132 the same key function.
133
134 The operation of :func:`groupby` is similar to the ``uniq`` filter in Unix. It
135 generates a break or new group every time the value of the key function changes
136 (which is why it is usually necessary to have sorted the data using the same key
137 function). That behavior differs from SQL's GROUP BY which aggregates common
138 elements regardless of their input order.
139
140 The returned group is itself an iterator that shares the underlying iterable
141 with :func:`groupby`. Because the source is shared, when the :func:`groupby`
142 object is advanced, the previous group is no longer visible. So, if that data
143 is needed later, it should be stored as a list::
144
145 groups = []
146 uniquekeys = []
147 data = sorted(data, key=keyfunc)
148 for k, g in groupby(data, keyfunc):
149 groups.append(list(g)) # Store group iterator as a list
150 uniquekeys.append(k)
151
152 :func:`groupby` is equivalent to::
153
154 class groupby(object):
155 def __init__(self, iterable, key=None):
156 if key is None:
157 key = lambda x: x
158 self.keyfunc = key
159 self.it = iter(iterable)
Christian Heimes5b5e81c2007-12-31 16:14:33 +0000160 self.tgtkey = self.currkey = self.currvalue = object()
Georg Brandl116aa622007-08-15 14:28:22 +0000161 def __iter__(self):
162 return self
163 def __next__(self):
164 while self.currkey == self.tgtkey:
165 self.currvalue = next(self.it) # Exit on StopIteration
166 self.currkey = self.keyfunc(self.currvalue)
167 self.tgtkey = self.currkey
168 return (self.currkey, self._grouper(self.tgtkey))
169 def _grouper(self, tgtkey):
170 while self.currkey == tgtkey:
171 yield self.currvalue
172 self.currvalue = next(self.it) # Exit on StopIteration
173 self.currkey = self.keyfunc(self.currvalue)
174
Georg Brandl116aa622007-08-15 14:28:22 +0000175
176.. function:: ifilter(predicate, iterable)
177
178 Make an iterator that filters elements from iterable returning only those for
179 which the predicate is ``True``. If *predicate* is ``None``, return the items
Georg Brandlf6945182008-02-01 11:56:49 +0000180 that are true. This function is the same as the built-in :func:`filter`
181 function. Equivalent to::
Georg Brandl116aa622007-08-15 14:28:22 +0000182
183 def ifilter(predicate, iterable):
184 if predicate is None:
185 predicate = bool
186 for x in iterable:
187 if predicate(x):
188 yield x
189
190
191.. function:: ifilterfalse(predicate, iterable)
192
193 Make an iterator that filters elements from iterable returning only those for
194 which the predicate is ``False``. If *predicate* is ``None``, return the items
195 that are false. Equivalent to::
196
197 def ifilterfalse(predicate, iterable):
198 if predicate is None:
199 predicate = bool
200 for x in iterable:
201 if not predicate(x):
202 yield x
203
204
205.. function:: imap(function, *iterables)
206
207 Make an iterator that computes the function using arguments from each of the
Georg Brandlf6945182008-02-01 11:56:49 +0000208 iterables. This function is the same as the built-in :func:`map` function.
209 Equivalent to::
Georg Brandl116aa622007-08-15 14:28:22 +0000210
211 def imap(function, *iterables):
Raymond Hettinger1dfde1d2008-01-22 23:25:35 +0000212 iterables = [iter(it) for it in iterables)
Georg Brandl116aa622007-08-15 14:28:22 +0000213 while True:
Raymond Hettinger1dfde1d2008-01-22 23:25:35 +0000214 args = [next(it) for it in iterables]
Christian Heimes1af737c2008-01-23 08:24:23 +0000215 if function is None:
216 yield tuple(args)
217 else:
218 yield function(*args)
Georg Brandl116aa622007-08-15 14:28:22 +0000219
220
221.. function:: islice(iterable, [start,] stop [, step])
222
223 Make an iterator that returns selected elements from the iterable. If *start* is
224 non-zero, then elements from the iterable are skipped until start is reached.
225 Afterward, elements are returned consecutively unless *step* is set higher than
226 one which results in items being skipped. If *stop* is ``None``, then iteration
227 continues until the iterator is exhausted, if at all; otherwise, it stops at the
228 specified position. Unlike regular slicing, :func:`islice` does not support
229 negative values for *start*, *stop*, or *step*. Can be used to extract related
230 fields from data where the internal structure has been flattened (for example, a
231 multi-line report may list a name field on every third line). Equivalent to::
232
233 def islice(iterable, *args):
234 s = slice(*args)
Georg Brandlf6945182008-02-01 11:56:49 +0000235 it = range(s.start or 0, s.stop or sys.maxsize, s.step or 1)
Georg Brandl116aa622007-08-15 14:28:22 +0000236 nexti = next(it)
237 for i, element in enumerate(iterable):
238 if i == nexti:
239 yield element
240 nexti = next(it)
241
242 If *start* is ``None``, then iteration starts at zero. If *step* is ``None``,
243 then the step defaults to one.
244
Georg Brandl116aa622007-08-15 14:28:22 +0000245
246.. function:: izip(*iterables)
247
248 Make an iterator that aggregates elements from each of the iterables. Like
249 :func:`zip` except that it returns an iterator instead of a list. Used for
250 lock-step iteration over several iterables at a time. Equivalent to::
251
252 def izip(*iterables):
253 iterables = map(iter, iterables)
254 while iterables:
255 result = [next(it) for it in iterables]
256 yield tuple(result)
257
Georg Brandl55ac8f02007-09-01 13:51:09 +0000258 When no iterables are specified, return a zero length iterator.
Georg Brandl116aa622007-08-15 14:28:22 +0000259
Christian Heimes1af737c2008-01-23 08:24:23 +0000260 The left-to-right evaluation order of the iterables is guaranteed. This
261 makes possible an idiom for clustering a data series into n-length groups
262 using ``izip(*[iter(s)]*n)``.
Georg Brandl116aa622007-08-15 14:28:22 +0000263
Christian Heimes1af737c2008-01-23 08:24:23 +0000264 :func:`izip` should only be used with unequal length inputs when you don't
265 care about trailing, unmatched values from the longer iterables. If those
266 values are important, use :func:`izip_longest` instead.
Georg Brandl116aa622007-08-15 14:28:22 +0000267
268
269.. function:: izip_longest(*iterables[, fillvalue])
270
271 Make an iterator that aggregates elements from each of the iterables. If the
272 iterables are of uneven length, missing values are filled-in with *fillvalue*.
273 Iteration continues until the longest iterable is exhausted. Equivalent to::
274
275 def izip_longest(*args, **kwds):
276 fillvalue = kwds.get('fillvalue')
277 def sentinel(counter = ([fillvalue]*(len(args)-1)).pop):
278 yield counter() # yields the fillvalue, or raises IndexError
279 fillers = repeat(fillvalue)
280 iters = [chain(it, sentinel(), fillers) for it in args]
281 try:
282 for tup in izip(*iters):
283 yield tup
284 except IndexError:
285 pass
286
287 If one of the iterables is potentially infinite, then the :func:`izip_longest`
288 function should be wrapped with something that limits the number of calls (for
289 example :func:`islice` or :func:`takewhile`).
290
Georg Brandl116aa622007-08-15 14:28:22 +0000291
Christian Heimes90c3d9b2008-02-23 13:18:03 +0000292.. function:: product(*iterables)
293
294 Cartesian product of input iterables.
295
296 Equivalent to nested for-loops in a generator expression. For example,
297 ``product(A, B)`` returns the same as ``((x,y) for x in A for y in B)``.
298
299 The leftmost iterators are in the outermost for-loop, so the output tuples
300 cycle in a manner similar to an odometer (with the rightmost element
301 changing on every iteration).
302
303 Equivalent to (but without building the entire result in memory)::
304
305 def product(*args):
306 pools = map(tuple, args)
307 if pools:
308 result = [[]]
309 for pool in pools:
310 result = [x+[y] for x in result for y in pool]
311 for prod in result:
312 yield tuple(prod)
313
314
Georg Brandl116aa622007-08-15 14:28:22 +0000315.. function:: repeat(object[, times])
316
317 Make an iterator that returns *object* over and over again. Runs indefinitely
318 unless the *times* argument is specified. Used as argument to :func:`imap` for
319 invariant parameters to the called function. Also used with :func:`izip` to
320 create an invariant part of a tuple record. Equivalent to::
321
322 def repeat(object, times=None):
323 if times is None:
324 while True:
325 yield object
326 else:
327 for i in range(times):
328 yield object
329
330
331.. function:: starmap(function, iterable)
332
Christian Heimes679db4a2008-01-18 09:56:22 +0000333 Make an iterator that computes the function using arguments obtained from
Georg Brandl116aa622007-08-15 14:28:22 +0000334 the iterable. Used instead of :func:`imap` when argument parameters are already
335 grouped in tuples from a single iterable (the data has been "pre-zipped"). The
336 difference between :func:`imap` and :func:`starmap` parallels the distinction
337 between ``function(a,b)`` and ``function(*c)``. Equivalent to::
338
339 def starmap(function, iterable):
Christian Heimes679db4a2008-01-18 09:56:22 +0000340 for args in iterable:
341 yield function(*args)
342
343 .. versionchanged:: 2.6
344 Previously, :func:`starmap` required the function arguments to be tuples.
345 Now, any iterable is allowed.
Georg Brandl116aa622007-08-15 14:28:22 +0000346
347
348.. function:: takewhile(predicate, iterable)
349
350 Make an iterator that returns elements from the iterable as long as the
351 predicate is true. Equivalent to::
352
353 def takewhile(predicate, iterable):
354 for x in iterable:
355 if predicate(x):
356 yield x
357 else:
358 break
359
360
361.. function:: tee(iterable[, n=2])
362
363 Return *n* independent iterators from a single iterable. The case where ``n==2``
364 is equivalent to::
365
366 def tee(iterable):
Christian Heimes5b5e81c2007-12-31 16:14:33 +0000367 def gen(next, data={}):
Georg Brandl116aa622007-08-15 14:28:22 +0000368 for i in count():
Christian Heimes5b5e81c2007-12-31 16:14:33 +0000369 if i in data:
370 yield data.pop(i)
Georg Brandl116aa622007-08-15 14:28:22 +0000371 else:
Christian Heimes5b5e81c2007-12-31 16:14:33 +0000372 data[i] = next()
373 yield data[i]
Georg Brandl116aa622007-08-15 14:28:22 +0000374 it = iter(iterable)
375 return (gen(it.__next__), gen(it.__next__))
376
377 Note, once :func:`tee` has made a split, the original *iterable* should not be
378 used anywhere else; otherwise, the *iterable* could get advanced without the tee
379 objects being informed.
380
381 Note, this member of the toolkit may require significant auxiliary storage
382 (depending on how much temporary data needs to be stored). In general, if one
383 iterator is going to use most or all of the data before the other iterator, it
384 is faster to use :func:`list` instead of :func:`tee`.
385
Georg Brandl116aa622007-08-15 14:28:22 +0000386
387.. _itertools-example:
388
389Examples
390--------
391
392The following examples show common uses for each tool and demonstrate ways they
393can be combined. ::
394
395 >>> amounts = [120.15, 764.05, 823.14]
396 >>> for checknum, amount in izip(count(1200), amounts):
Georg Brandl6911e3c2007-09-04 07:15:32 +0000397 ... print('Check %d is for $%.2f' % (checknum, amount))
Georg Brandl116aa622007-08-15 14:28:22 +0000398 ...
399 Check 1200 is for $120.15
400 Check 1201 is for $764.05
401 Check 1202 is for $823.14
402
403 >>> import operator
404 >>> for cube in imap(operator.pow, range(1,5), repeat(3)):
Georg Brandl6911e3c2007-09-04 07:15:32 +0000405 ... print(cube)
Georg Brandl116aa622007-08-15 14:28:22 +0000406 ...
407 1
408 8
409 27
410 64
411
412 >>> reportlines = ['EuroPython', 'Roster', '', 'alex', '', 'laura',
413 ... '', 'martin', '', 'walter', '', 'mark']
414 >>> for name in islice(reportlines, 3, None, 2):
Georg Brandl6911e3c2007-09-04 07:15:32 +0000415 ... print(name.title())
Georg Brandl116aa622007-08-15 14:28:22 +0000416 ...
417 Alex
418 Laura
419 Martin
420 Walter
421 Mark
422
423 # Show a dictionary sorted and grouped by value
424 >>> from operator import itemgetter
425 >>> d = dict(a=1, b=2, c=1, d=2, e=1, f=2, g=3)
Fred Drake2e748782007-09-04 17:33:11 +0000426 >>> di = sorted(d.items(), key=itemgetter(1))
Georg Brandl116aa622007-08-15 14:28:22 +0000427 >>> for k, g in groupby(di, key=itemgetter(1)):
Georg Brandl6911e3c2007-09-04 07:15:32 +0000428 ... print(k, map(itemgetter(0), g))
Georg Brandl116aa622007-08-15 14:28:22 +0000429 ...
430 1 ['a', 'c', 'e']
431 2 ['b', 'd', 'f']
432 3 ['g']
433
434 # Find runs of consecutive numbers using groupby. The key to the solution
435 # is differencing with a range so that consecutive numbers all appear in
436 # same group.
437 >>> data = [ 1, 4,5,6, 10, 15,16,17,18, 22, 25,26,27,28]
438 >>> for k, g in groupby(enumerate(data), lambda t:t[0]-t[1]):
Georg Brandl6911e3c2007-09-04 07:15:32 +0000439 ... print(map(operator.itemgetter(1), g))
Georg Brandl116aa622007-08-15 14:28:22 +0000440 ...
441 [1]
442 [4, 5, 6]
443 [10]
444 [15, 16, 17, 18]
445 [22]
446 [25, 26, 27, 28]
447
448
449
450.. _itertools-recipes:
451
452Recipes
453-------
454
455This section shows recipes for creating an extended toolset using the existing
456itertools as building blocks.
457
458The extended tools offer the same high performance as the underlying toolset.
459The superior memory performance is kept by processing elements one at a time
460rather than bringing the whole iterable into memory all at once. Code volume is
461kept small by linking the tools together in a functional style which helps
462eliminate temporary variables. High speed is retained by preferring
Georg Brandl9afde1c2007-11-01 20:32:30 +0000463"vectorized" building blocks over the use of for-loops and :term:`generator`\s
464which incur interpreter overhead. ::
Georg Brandl116aa622007-08-15 14:28:22 +0000465
466 def take(n, seq):
467 return list(islice(seq, n))
468
469 def enumerate(iterable):
470 return izip(count(), iterable)
471
472 def tabulate(function):
473 "Return function(0), function(1), ..."
474 return imap(function, count())
475
Georg Brandl116aa622007-08-15 14:28:22 +0000476 def nth(iterable, n):
477 "Returns the nth item or raise StopIteration"
478 return islice(iterable, n, None).next()
479
480 def all(seq, pred=None):
481 "Returns True if pred(x) is true for every element in the iterable"
482 for elem in ifilterfalse(pred, seq):
483 return False
484 return True
485
486 def any(seq, pred=None):
487 "Returns True if pred(x) is true for at least one element in the iterable"
488 for elem in ifilter(pred, seq):
489 return True
490 return False
491
492 def no(seq, pred=None):
493 "Returns True if pred(x) is false for every element in the iterable"
494 for elem in ifilter(pred, seq):
495 return False
496 return True
497
498 def quantify(seq, pred=None):
499 "Count how many times the predicate is true in the sequence"
500 return sum(imap(pred, seq))
501
502 def padnone(seq):
503 """Returns the sequence elements and then returns None indefinitely.
504
505 Useful for emulating the behavior of the built-in map() function.
506 """
507 return chain(seq, repeat(None))
508
509 def ncycles(seq, n):
510 "Returns the sequence elements n times"
511 return chain(*repeat(seq, n))
512
513 def dotproduct(vec1, vec2):
514 return sum(imap(operator.mul, vec1, vec2))
515
516 def flatten(listOfLists):
517 return list(chain(*listOfLists))
518
519 def repeatfunc(func, times=None, *args):
520 """Repeat calls to func with specified arguments.
521
522 Example: repeatfunc(random.random)
523 """
524 if times is None:
525 return starmap(func, repeat(args))
526 else:
527 return starmap(func, repeat(args, times))
528
529 def pairwise(iterable):
530 "s -> (s0,s1), (s1,s2), (s2, s3), ..."
531 a, b = tee(iterable)
532 next(b, None)
533 return izip(a, b)
534
535 def grouper(n, iterable, padvalue=None):
536 "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
537 return izip(*[chain(iterable, repeat(padvalue, n-1))]*n)
538
Christian Heimes7b3ce6a2008-01-31 14:31:45 +0000539 def roundrobin(*iterables):
540 "roundrobin('abc', 'd', 'ef') --> 'a', 'd', 'e', 'b', 'f', 'c'"
541 # Recipe contributed by George Sakkis
542 pending = len(iterables)
543 nexts = cycle(iter(it).next for it in iterables)
544 while pending:
545 try:
546 for next in nexts:
547 yield next()
548 except StopIteration:
549 pending -= 1
550 nexts = cycle(islice(nexts, pending))
Georg Brandl116aa622007-08-15 14:28:22 +0000551
Christian Heimes90c3d9b2008-02-23 13:18:03 +0000552 def powerset(iterable):
553 "powerset('ab') --> set([]), set(['b']), set(['a']), set(['a', 'b'])"
554 skip = object()
555 for t in product(*izip(repeat(skip), iterable)):
556 yield set(e for e in t if e is not skip)
557