blob: d9a2b331e0bbb15a835ccd4ab9a80a5a3d3ec702 [file] [log] [blame]
Georg Brandl8ec7f652007-08-15 14:28:01 +00001
2:mod:`itertools` --- Functions creating iterators for efficient looping
3=======================================================================
4
5.. module:: itertools
6 :synopsis: Functions creating iterators for efficient looping.
7.. moduleauthor:: Raymond Hettinger <python@rcn.com>
8.. sectionauthor:: Raymond Hettinger <python@rcn.com>
9
10
11.. versionadded:: 2.3
12
Georg Brandle7a09902007-10-21 12:10:28 +000013This module implements a number of :term:`iterator` building blocks inspired by
Georg Brandl8ec7f652007-08-15 14:28:01 +000014constructs from the Haskell and SML programming languages. Each has been recast
15in a form suitable for Python.
16
17The module standardizes a core set of fast, memory efficient tools that are
18useful by themselves or in combination. Standardization helps avoid the
19readability and reliability problems which arise when many different individuals
20create their own slightly varying implementations, each with their own quirks
21and naming conventions.
22
23The tools are designed to combine readily with one another. This makes it easy
24to construct more specialized tools succinctly and efficiently in pure Python.
25
26For instance, SML provides a tabulation tool: ``tabulate(f)`` which produces a
27sequence ``f(0), f(1), ...``. This toolbox provides :func:`imap` and
28:func:`count` which can be combined to form ``imap(f, count())`` and produce an
29equivalent result.
30
31Likewise, the functional tools are designed to work well with the high-speed
32functions provided by the :mod:`operator` module.
33
34The module author welcomes suggestions for other basic building blocks to be
35added to future versions of the module.
36
37Whether cast in pure python form or compiled code, tools that use iterators are
38more memory efficient (and faster) than their list based counterparts. Adopting
39the principles of just-in-time manufacturing, they create data when and where
40needed instead of consuming memory with the computer equivalent of "inventory".
41
42The performance advantage of iterators becomes more acute as the number of
43elements increases -- at some point, lists grow large enough to severely impact
44memory cache performance and start running slowly.
45
46
47.. seealso::
48
49 The Standard ML Basis Library, `The Standard ML Basis Library
50 <http://www.standardml.org/Basis/>`_.
51
52 Haskell, A Purely Functional Language, `Definition of Haskell and the Standard
53 Libraries <http://www.haskell.org/definition/>`_.
54
55
56.. _itertools-functions:
57
58Itertool functions
59------------------
60
61The following module functions all construct and return iterators. Some provide
62streams of infinite length, so they should only be accessed by functions or
63loops that truncate the stream.
64
65
66.. function:: chain(*iterables)
67
68 Make an iterator that returns elements from the first iterable until it is
69 exhausted, then proceeds to the next iterable, until all of the iterables are
70 exhausted. Used for treating consecutive sequences as a single sequence.
71 Equivalent to::
72
73 def chain(*iterables):
74 for it in iterables:
75 for element in it:
76 yield element
77
78
Raymond Hettinger3fa41d52008-02-26 02:46:54 +000079.. function:: combinations(iterable, r)
80
81 Return successive *r* length combinations of elements in the *iterable*.
82
83 Combinations are emitted in a lexicographic sort order. So, if the
84 input *iterable* is sorted, the combination tuples will be produced
85 in sorted order.
86
87 Elements are treated as unique based on their position, not on their
88 value. So if the input elements are unique, there will be no repeat
89 values within a single combination.
90
91 Each result tuple is ordered to match the input order. So, every
92 combination is a subsequence of the input *iterable*.
93
94 Example: ``combinations(range(4), 3) --> (0,1,2), (0,1,3), (0,2,3), (1,2,3)``
95
96 Equivalent to::
97
98 def combinations(iterable, r):
99 pool = tuple(iterable)
Raymond Hettinger93e804d2008-02-26 23:40:50 +0000100 n = len(pool)
101 assert 0 <= r <= n
102 vec = range(r)
103 yield tuple(pool[i] for i in vec)
104 while 1:
105 for i in reversed(range(r)):
Raymond Hettingerc1052892008-02-27 01:44:34 +0000106 if vec[i] != i + n - r:
107 break
Raymond Hettinger93e804d2008-02-26 23:40:50 +0000108 else:
109 return
Raymond Hettingerc1052892008-02-27 01:44:34 +0000110 vec[i] += 1
111 for j in range(i+1, r):
112 vec[j] = vec[j-1] + 1
113 yield tuple(pool[i] for i in vec)
Raymond Hettinger3fa41d52008-02-26 02:46:54 +0000114
115 .. versionadded:: 2.6
116
Georg Brandl8ec7f652007-08-15 14:28:01 +0000117.. function:: count([n])
118
119 Make an iterator that returns consecutive integers starting with *n*. If not
Raymond Hettinger50e90e22007-10-04 00:20:27 +0000120 specified *n* defaults to zero. Often used as an argument to :func:`imap` to
121 generate consecutive data points. Also, used with :func:`izip` to add sequence
122 numbers. Equivalent to::
Georg Brandl8ec7f652007-08-15 14:28:01 +0000123
124 def count(n=0):
125 while True:
126 yield n
127 n += 1
128
Georg Brandl8ec7f652007-08-15 14:28:01 +0000129
130.. function:: cycle(iterable)
131
132 Make an iterator returning elements from the iterable and saving a copy of each.
133 When the iterable is exhausted, return elements from the saved copy. Repeats
134 indefinitely. Equivalent to::
135
136 def cycle(iterable):
137 saved = []
138 for element in iterable:
139 yield element
140 saved.append(element)
141 while saved:
142 for element in saved:
143 yield element
144
145 Note, this member of the toolkit may require significant auxiliary storage
146 (depending on the length of the iterable).
147
148
149.. function:: dropwhile(predicate, iterable)
150
151 Make an iterator that drops elements from the iterable as long as the predicate
152 is true; afterwards, returns every element. Note, the iterator does not produce
153 *any* output until the predicate first becomes false, so it may have a lengthy
154 start-up time. Equivalent to::
155
156 def dropwhile(predicate, iterable):
157 iterable = iter(iterable)
158 for x in iterable:
159 if not predicate(x):
160 yield x
161 break
162 for x in iterable:
163 yield x
164
165
166.. function:: groupby(iterable[, key])
167
168 Make an iterator that returns consecutive keys and groups from the *iterable*.
169 The *key* is a function computing a key value for each element. If not
170 specified or is ``None``, *key* defaults to an identity function and returns
171 the element unchanged. Generally, the iterable needs to already be sorted on
172 the same key function.
173
174 The operation of :func:`groupby` is similar to the ``uniq`` filter in Unix. It
175 generates a break or new group every time the value of the key function changes
176 (which is why it is usually necessary to have sorted the data using the same key
177 function). That behavior differs from SQL's GROUP BY which aggregates common
178 elements regardless of their input order.
179
180 The returned group is itself an iterator that shares the underlying iterable
181 with :func:`groupby`. Because the source is shared, when the :func:`groupby`
182 object is advanced, the previous group is no longer visible. So, if that data
183 is needed later, it should be stored as a list::
184
185 groups = []
186 uniquekeys = []
187 data = sorted(data, key=keyfunc)
188 for k, g in groupby(data, keyfunc):
189 groups.append(list(g)) # Store group iterator as a list
190 uniquekeys.append(k)
191
192 :func:`groupby` is equivalent to::
193
194 class groupby(object):
195 def __init__(self, iterable, key=None):
196 if key is None:
197 key = lambda x: x
198 self.keyfunc = key
199 self.it = iter(iterable)
Raymond Hettinger81a885a2007-12-29 22:16:24 +0000200 self.tgtkey = self.currkey = self.currvalue = object()
Georg Brandl8ec7f652007-08-15 14:28:01 +0000201 def __iter__(self):
202 return self
203 def next(self):
204 while self.currkey == self.tgtkey:
205 self.currvalue = self.it.next() # Exit on StopIteration
206 self.currkey = self.keyfunc(self.currvalue)
207 self.tgtkey = self.currkey
208 return (self.currkey, self._grouper(self.tgtkey))
209 def _grouper(self, tgtkey):
210 while self.currkey == tgtkey:
211 yield self.currvalue
212 self.currvalue = self.it.next() # Exit on StopIteration
213 self.currkey = self.keyfunc(self.currvalue)
214
215 .. versionadded:: 2.4
216
217
218.. function:: ifilter(predicate, iterable)
219
220 Make an iterator that filters elements from iterable returning only those for
221 which the predicate is ``True``. If *predicate* is ``None``, return the items
222 that are true. Equivalent to::
223
224 def ifilter(predicate, iterable):
225 if predicate is None:
226 predicate = bool
227 for x in iterable:
228 if predicate(x):
229 yield x
230
231
232.. function:: ifilterfalse(predicate, iterable)
233
234 Make an iterator that filters elements from iterable returning only those for
235 which the predicate is ``False``. If *predicate* is ``None``, return the items
236 that are false. Equivalent to::
237
238 def ifilterfalse(predicate, iterable):
239 if predicate is None:
240 predicate = bool
241 for x in iterable:
242 if not predicate(x):
243 yield x
244
245
246.. function:: imap(function, *iterables)
247
248 Make an iterator that computes the function using arguments from each of the
249 iterables. If *function* is set to ``None``, then :func:`imap` returns the
250 arguments as a tuple. Like :func:`map` but stops when the shortest iterable is
251 exhausted instead of filling in ``None`` for shorter iterables. The reason for
252 the difference is that infinite iterator arguments are typically an error for
253 :func:`map` (because the output is fully evaluated) but represent a common and
254 useful way of supplying arguments to :func:`imap`. Equivalent to::
255
256 def imap(function, *iterables):
257 iterables = map(iter, iterables)
258 while True:
Raymond Hettinger2dec48d2008-01-22 22:09:26 +0000259 args = [it.next() for it in iterables]
Georg Brandl8ec7f652007-08-15 14:28:01 +0000260 if function is None:
261 yield tuple(args)
262 else:
263 yield function(*args)
264
265
266.. function:: islice(iterable, [start,] stop [, step])
267
268 Make an iterator that returns selected elements from the iterable. If *start* is
269 non-zero, then elements from the iterable are skipped until start is reached.
270 Afterward, elements are returned consecutively unless *step* is set higher than
271 one which results in items being skipped. If *stop* is ``None``, then iteration
272 continues until the iterator is exhausted, if at all; otherwise, it stops at the
273 specified position. Unlike regular slicing, :func:`islice` does not support
274 negative values for *start*, *stop*, or *step*. Can be used to extract related
275 fields from data where the internal structure has been flattened (for example, a
276 multi-line report may list a name field on every third line). Equivalent to::
277
278 def islice(iterable, *args):
279 s = slice(*args)
280 it = iter(xrange(s.start or 0, s.stop or sys.maxint, s.step or 1))
281 nexti = it.next()
282 for i, element in enumerate(iterable):
283 if i == nexti:
284 yield element
285 nexti = it.next()
286
287 If *start* is ``None``, then iteration starts at zero. If *step* is ``None``,
288 then the step defaults to one.
289
290 .. versionchanged:: 2.5
291 accept ``None`` values for default *start* and *step*.
292
293
294.. function:: izip(*iterables)
295
296 Make an iterator that aggregates elements from each of the iterables. Like
297 :func:`zip` except that it returns an iterator instead of a list. Used for
298 lock-step iteration over several iterables at a time. Equivalent to::
299
300 def izip(*iterables):
301 iterables = map(iter, iterables)
302 while iterables:
303 result = [it.next() for it in iterables]
304 yield tuple(result)
305
306 .. versionchanged:: 2.4
307 When no iterables are specified, returns a zero length iterator instead of
308 raising a :exc:`TypeError` exception.
309
Raymond Hettinger48c62932008-01-22 19:51:41 +0000310 The left-to-right evaluation order of the iterables is guaranteed. This
311 makes possible an idiom for clustering a data series into n-length groups
312 using ``izip(*[iter(s)]*n)``.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000313
Raymond Hettinger48c62932008-01-22 19:51:41 +0000314 :func:`izip` should only be used with unequal length inputs when you don't
315 care about trailing, unmatched values from the longer iterables. If those
316 values are important, use :func:`izip_longest` instead.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000317
318
319.. function:: izip_longest(*iterables[, fillvalue])
320
321 Make an iterator that aggregates elements from each of the iterables. If the
322 iterables are of uneven length, missing values are filled-in with *fillvalue*.
323 Iteration continues until the longest iterable is exhausted. Equivalent to::
324
325 def izip_longest(*args, **kwds):
326 fillvalue = kwds.get('fillvalue')
327 def sentinel(counter = ([fillvalue]*(len(args)-1)).pop):
328 yield counter() # yields the fillvalue, or raises IndexError
329 fillers = repeat(fillvalue)
330 iters = [chain(it, sentinel(), fillers) for it in args]
331 try:
332 for tup in izip(*iters):
333 yield tup
334 except IndexError:
335 pass
336
337 If one of the iterables is potentially infinite, then the :func:`izip_longest`
338 function should be wrapped with something that limits the number of calls (for
339 example :func:`islice` or :func:`takewhile`).
340
341 .. versionadded:: 2.6
342
Raymond Hettinger18750ab2008-02-28 09:23:48 +0000343.. function:: product(*iterables[, repeat])
Raymond Hettingerc5705a82008-02-22 19:50:06 +0000344
345 Cartesian product of input iterables.
346
347 Equivalent to nested for-loops in a generator expression. For example,
348 ``product(A, B)`` returns the same as ``((x,y) for x in A for y in B)``.
349
350 The leftmost iterators are in the outermost for-loop, so the output tuples
351 cycle in a manner similar to an odometer (with the rightmost element
Raymond Hettinger3fa41d52008-02-26 02:46:54 +0000352 changing on every iteration). This results in a lexicographic ordering
353 so that if the inputs iterables are sorted, the product tuples are emitted
354 in sorted order.
Raymond Hettingerc5705a82008-02-22 19:50:06 +0000355
Raymond Hettinger18750ab2008-02-28 09:23:48 +0000356 To compute the product of an iterable with itself, specify the number of
357 repetitions with the optional *repeat* keyword argument. For example,
358 ``product(A, repeat=4)`` means the same as ``product(A, A, A, A)``.
359
Raymond Hettinger3fa41d52008-02-26 02:46:54 +0000360 Equivalent to the following except that the actual implementation does not
361 build-up intermediate results in memory::
Raymond Hettingerc5705a82008-02-22 19:50:06 +0000362
Raymond Hettinger18750ab2008-02-28 09:23:48 +0000363 def product(*args, **kwds):
364 pools = map(tuple, args) * kwds.get('repeat', 1)
Raymond Hettingerc5705a82008-02-22 19:50:06 +0000365 if pools:
366 result = [[]]
367 for pool in pools:
368 result = [x+[y] for x in result for y in pool]
369 for prod in result:
370 yield tuple(prod)
371
372 .. versionadded:: 2.6
Georg Brandl8ec7f652007-08-15 14:28:01 +0000373
374.. function:: repeat(object[, times])
375
376 Make an iterator that returns *object* over and over again. Runs indefinitely
377 unless the *times* argument is specified. Used as argument to :func:`imap` for
378 invariant parameters to the called function. Also used with :func:`izip` to
379 create an invariant part of a tuple record. Equivalent to::
380
381 def repeat(object, times=None):
382 if times is None:
383 while True:
384 yield object
385 else:
386 for i in xrange(times):
387 yield object
388
389
390.. function:: starmap(function, iterable)
391
Raymond Hettinger47317092008-01-17 03:02:14 +0000392 Make an iterator that computes the function using arguments obtained from
Georg Brandl8ec7f652007-08-15 14:28:01 +0000393 the iterable. Used instead of :func:`imap` when argument parameters are already
394 grouped in tuples from a single iterable (the data has been "pre-zipped"). The
395 difference between :func:`imap` and :func:`starmap` parallels the distinction
396 between ``function(a,b)`` and ``function(*c)``. Equivalent to::
397
398 def starmap(function, iterable):
Raymond Hettinger47317092008-01-17 03:02:14 +0000399 for args in iterable:
400 yield function(*args)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000401
Raymond Hettinger47317092008-01-17 03:02:14 +0000402 .. versionchanged:: 2.6
403 Previously, :func:`starmap` required the function arguments to be tuples.
404 Now, any iterable is allowed.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000405
406.. function:: takewhile(predicate, iterable)
407
408 Make an iterator that returns elements from the iterable as long as the
409 predicate is true. Equivalent to::
410
411 def takewhile(predicate, iterable):
412 for x in iterable:
413 if predicate(x):
414 yield x
415 else:
416 break
417
418
419.. function:: tee(iterable[, n=2])
420
421 Return *n* independent iterators from a single iterable. The case where ``n==2``
422 is equivalent to::
423
424 def tee(iterable):
Raymond Hettinger5d332bb2007-12-29 22:09:34 +0000425 def gen(next, data={}):
Georg Brandl8ec7f652007-08-15 14:28:01 +0000426 for i in count():
Raymond Hettinger5d332bb2007-12-29 22:09:34 +0000427 if i in data:
428 yield data.pop(i)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000429 else:
Raymond Hettinger5d332bb2007-12-29 22:09:34 +0000430 data[i] = next()
431 yield data[i]
Georg Brandl8ec7f652007-08-15 14:28:01 +0000432 it = iter(iterable)
Raymond Hettinger5d332bb2007-12-29 22:09:34 +0000433 return gen(it.next), gen(it.next)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000434
435 Note, once :func:`tee` has made a split, the original *iterable* should not be
436 used anywhere else; otherwise, the *iterable* could get advanced without the tee
437 objects being informed.
438
439 Note, this member of the toolkit may require significant auxiliary storage
440 (depending on how much temporary data needs to be stored). In general, if one
441 iterator is going to use most or all of the data before the other iterator, it
442 is faster to use :func:`list` instead of :func:`tee`.
443
444 .. versionadded:: 2.4
445
446
447.. _itertools-example:
448
449Examples
450--------
451
452The following examples show common uses for each tool and demonstrate ways they
453can be combined. ::
454
455 >>> amounts = [120.15, 764.05, 823.14]
456 >>> for checknum, amount in izip(count(1200), amounts):
457 ... print 'Check %d is for $%.2f' % (checknum, amount)
458 ...
459 Check 1200 is for $120.15
460 Check 1201 is for $764.05
461 Check 1202 is for $823.14
462
463 >>> import operator
464 >>> for cube in imap(operator.pow, xrange(1,5), repeat(3)):
465 ... print cube
466 ...
467 1
468 8
469 27
470 64
471
Georg Brandl8ec7f652007-08-15 14:28:01 +0000472 # Show a dictionary sorted and grouped by value
473 >>> from operator import itemgetter
474 >>> d = dict(a=1, b=2, c=1, d=2, e=1, f=2, g=3)
475 >>> di = sorted(d.iteritems(), key=itemgetter(1))
476 >>> for k, g in groupby(di, key=itemgetter(1)):
477 ... print k, map(itemgetter(0), g)
478 ...
479 1 ['a', 'c', 'e']
480 2 ['b', 'd', 'f']
481 3 ['g']
482
483 # Find runs of consecutive numbers using groupby. The key to the solution
484 # is differencing with a range so that consecutive numbers all appear in
485 # same group.
486 >>> data = [ 1, 4,5,6, 10, 15,16,17,18, 22, 25,26,27,28]
487 >>> for k, g in groupby(enumerate(data), lambda (i,x):i-x):
488 ... print map(operator.itemgetter(1), g)
489 ...
490 [1]
491 [4, 5, 6]
492 [10]
493 [15, 16, 17, 18]
494 [22]
495 [25, 26, 27, 28]
496
497
498
499.. _itertools-recipes:
500
501Recipes
502-------
503
504This section shows recipes for creating an extended toolset using the existing
505itertools as building blocks.
506
507The extended tools offer the same high performance as the underlying toolset.
508The superior memory performance is kept by processing elements one at a time
509rather than bringing the whole iterable into memory all at once. Code volume is
510kept small by linking the tools together in a functional style which helps
511eliminate temporary variables. High speed is retained by preferring
Georg Brandlcf3fb252007-10-21 10:52:38 +0000512"vectorized" building blocks over the use of for-loops and :term:`generator`\s
513which incur interpreter overhead. ::
Georg Brandl8ec7f652007-08-15 14:28:01 +0000514
515 def take(n, seq):
516 return list(islice(seq, n))
517
518 def enumerate(iterable):
519 return izip(count(), iterable)
520
521 def tabulate(function):
522 "Return function(0), function(1), ..."
523 return imap(function, count())
524
525 def iteritems(mapping):
526 return izip(mapping.iterkeys(), mapping.itervalues())
527
528 def nth(iterable, n):
529 "Returns the nth item or raise StopIteration"
530 return islice(iterable, n, None).next()
531
532 def all(seq, pred=None):
533 "Returns True if pred(x) is true for every element in the iterable"
534 for elem in ifilterfalse(pred, seq):
535 return False
536 return True
537
538 def any(seq, pred=None):
539 "Returns True if pred(x) is true for at least one element in the iterable"
540 for elem in ifilter(pred, seq):
541 return True
542 return False
543
544 def no(seq, pred=None):
545 "Returns True if pred(x) is false for every element in the iterable"
546 for elem in ifilter(pred, seq):
547 return False
548 return True
549
550 def quantify(seq, pred=None):
551 "Count how many times the predicate is true in the sequence"
552 return sum(imap(pred, seq))
553
554 def padnone(seq):
555 """Returns the sequence elements and then returns None indefinitely.
556
557 Useful for emulating the behavior of the built-in map() function.
558 """
559 return chain(seq, repeat(None))
560
561 def ncycles(seq, n):
562 "Returns the sequence elements n times"
563 return chain(*repeat(seq, n))
564
565 def dotproduct(vec1, vec2):
566 return sum(imap(operator.mul, vec1, vec2))
567
568 def flatten(listOfLists):
569 return list(chain(*listOfLists))
570
571 def repeatfunc(func, times=None, *args):
572 """Repeat calls to func with specified arguments.
573
574 Example: repeatfunc(random.random)
575 """
576 if times is None:
577 return starmap(func, repeat(args))
578 else:
579 return starmap(func, repeat(args, times))
580
581 def pairwise(iterable):
582 "s -> (s0,s1), (s1,s2), (s2, s3), ..."
583 a, b = tee(iterable)
584 try:
585 b.next()
586 except StopIteration:
587 pass
588 return izip(a, b)
589
590 def grouper(n, iterable, padvalue=None):
591 "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
592 return izip(*[chain(iterable, repeat(padvalue, n-1))]*n)
593
Raymond Hettingera44327a2008-01-30 22:17:31 +0000594 def roundrobin(*iterables):
595 "roundrobin('abc', 'd', 'ef') --> 'a', 'd', 'e', 'b', 'f', 'c'"
596 # Recipe contributed by George Sakkis
597 pending = len(iterables)
598 nexts = cycle(iter(it).next for it in iterables)
599 while pending:
600 try:
601 for next in nexts:
602 yield next()
603 except StopIteration:
604 pending -= 1
605 nexts = cycle(islice(nexts, pending))
Georg Brandl8ec7f652007-08-15 14:28:01 +0000606
Raymond Hettinger7832d4d2008-02-23 10:04:15 +0000607 def powerset(iterable):
608 "powerset('ab') --> set([]), set(['b']), set(['a']), set(['a', 'b'])"
609 skip = object()
610 for t in product(*izip(repeat(skip), iterable)):
611 yield set(e for e in t if e is not skip)
612