blob: 6dc19a17a094504aac352a855ab5d8c84c13ef7b [file] [log] [blame]
Georg Brandl8ec7f652007-08-15 14:28:01 +00001
2:mod:`itertools` --- Functions creating iterators for efficient looping
3=======================================================================
4
5.. module:: itertools
6 :synopsis: Functions creating iterators for efficient looping.
7.. moduleauthor:: Raymond Hettinger <python@rcn.com>
8.. sectionauthor:: Raymond Hettinger <python@rcn.com>
9
10
11.. versionadded:: 2.3
12
Georg Brandle7a09902007-10-21 12:10:28 +000013This module implements a number of :term:`iterator` building blocks inspired by
Georg Brandl8ec7f652007-08-15 14:28:01 +000014constructs from the Haskell and SML programming languages. Each has been recast
15in a form suitable for Python.
16
17The module standardizes a core set of fast, memory efficient tools that are
18useful by themselves or in combination. Standardization helps avoid the
19readability and reliability problems which arise when many different individuals
20create their own slightly varying implementations, each with their own quirks
21and naming conventions.
22
23The tools are designed to combine readily with one another. This makes it easy
24to construct more specialized tools succinctly and efficiently in pure Python.
25
26For instance, SML provides a tabulation tool: ``tabulate(f)`` which produces a
27sequence ``f(0), f(1), ...``. This toolbox provides :func:`imap` and
28:func:`count` which can be combined to form ``imap(f, count())`` and produce an
29equivalent result.
30
31Likewise, the functional tools are designed to work well with the high-speed
32functions provided by the :mod:`operator` module.
33
34The module author welcomes suggestions for other basic building blocks to be
35added to future versions of the module.
36
37Whether cast in pure python form or compiled code, tools that use iterators are
38more memory efficient (and faster) than their list based counterparts. Adopting
39the principles of just-in-time manufacturing, they create data when and where
40needed instead of consuming memory with the computer equivalent of "inventory".
41
42The performance advantage of iterators becomes more acute as the number of
43elements increases -- at some point, lists grow large enough to severely impact
44memory cache performance and start running slowly.
45
46
47.. seealso::
48
49 The Standard ML Basis Library, `The Standard ML Basis Library
50 <http://www.standardml.org/Basis/>`_.
51
52 Haskell, A Purely Functional Language, `Definition of Haskell and the Standard
53 Libraries <http://www.haskell.org/definition/>`_.
54
55
56.. _itertools-functions:
57
58Itertool functions
59------------------
60
61The following module functions all construct and return iterators. Some provide
62streams of infinite length, so they should only be accessed by functions or
63loops that truncate the stream.
64
65
66.. function:: chain(*iterables)
67
68 Make an iterator that returns elements from the first iterable until it is
69 exhausted, then proceeds to the next iterable, until all of the iterables are
70 exhausted. Used for treating consecutive sequences as a single sequence.
71 Equivalent to::
72
73 def chain(*iterables):
74 for it in iterables:
75 for element in it:
76 yield element
77
78
Raymond Hettinger3fa41d52008-02-26 02:46:54 +000079.. function:: combinations(iterable, r)
80
81 Return successive *r* length combinations of elements in the *iterable*.
82
83 Combinations are emitted in a lexicographic sort order. So, if the
84 input *iterable* is sorted, the combination tuples will be produced
85 in sorted order.
86
87 Elements are treated as unique based on their position, not on their
88 value. So if the input elements are unique, there will be no repeat
89 values within a single combination.
90
91 Each result tuple is ordered to match the input order. So, every
92 combination is a subsequence of the input *iterable*.
93
94 Example: ``combinations(range(4), 3) --> (0,1,2), (0,1,3), (0,2,3), (1,2,3)``
95
96 Equivalent to::
97
98 def combinations(iterable, r):
99 pool = tuple(iterable)
Raymond Hettinger93e804d2008-02-26 23:40:50 +0000100 n = len(pool)
101 assert 0 <= r <= n
102 vec = range(r)
103 yield tuple(pool[i] for i in vec)
104 while 1:
105 for i in reversed(range(r)):
106 if vec[i] == i + n-r:
107 continue
108 vec[i] += 1
109 for j in range(i+1, r):
110 vec[j] = vec[j-1] + 1
111 yield tuple(pool[i] for i in vec)
112 break
113 else:
114 return
Raymond Hettinger3fa41d52008-02-26 02:46:54 +0000115
116 .. versionadded:: 2.6
117
Georg Brandl8ec7f652007-08-15 14:28:01 +0000118.. function:: count([n])
119
120 Make an iterator that returns consecutive integers starting with *n*. If not
Raymond Hettinger50e90e22007-10-04 00:20:27 +0000121 specified *n* defaults to zero. Often used as an argument to :func:`imap` to
122 generate consecutive data points. Also, used with :func:`izip` to add sequence
123 numbers. Equivalent to::
Georg Brandl8ec7f652007-08-15 14:28:01 +0000124
125 def count(n=0):
126 while True:
127 yield n
128 n += 1
129
Georg Brandl8ec7f652007-08-15 14:28:01 +0000130
131.. function:: cycle(iterable)
132
133 Make an iterator returning elements from the iterable and saving a copy of each.
134 When the iterable is exhausted, return elements from the saved copy. Repeats
135 indefinitely. Equivalent to::
136
137 def cycle(iterable):
138 saved = []
139 for element in iterable:
140 yield element
141 saved.append(element)
142 while saved:
143 for element in saved:
144 yield element
145
146 Note, this member of the toolkit may require significant auxiliary storage
147 (depending on the length of the iterable).
148
149
150.. function:: dropwhile(predicate, iterable)
151
152 Make an iterator that drops elements from the iterable as long as the predicate
153 is true; afterwards, returns every element. Note, the iterator does not produce
154 *any* output until the predicate first becomes false, so it may have a lengthy
155 start-up time. Equivalent to::
156
157 def dropwhile(predicate, iterable):
158 iterable = iter(iterable)
159 for x in iterable:
160 if not predicate(x):
161 yield x
162 break
163 for x in iterable:
164 yield x
165
166
167.. function:: groupby(iterable[, key])
168
169 Make an iterator that returns consecutive keys and groups from the *iterable*.
170 The *key* is a function computing a key value for each element. If not
171 specified or is ``None``, *key* defaults to an identity function and returns
172 the element unchanged. Generally, the iterable needs to already be sorted on
173 the same key function.
174
175 The operation of :func:`groupby` is similar to the ``uniq`` filter in Unix. It
176 generates a break or new group every time the value of the key function changes
177 (which is why it is usually necessary to have sorted the data using the same key
178 function). That behavior differs from SQL's GROUP BY which aggregates common
179 elements regardless of their input order.
180
181 The returned group is itself an iterator that shares the underlying iterable
182 with :func:`groupby`. Because the source is shared, when the :func:`groupby`
183 object is advanced, the previous group is no longer visible. So, if that data
184 is needed later, it should be stored as a list::
185
186 groups = []
187 uniquekeys = []
188 data = sorted(data, key=keyfunc)
189 for k, g in groupby(data, keyfunc):
190 groups.append(list(g)) # Store group iterator as a list
191 uniquekeys.append(k)
192
193 :func:`groupby` is equivalent to::
194
195 class groupby(object):
196 def __init__(self, iterable, key=None):
197 if key is None:
198 key = lambda x: x
199 self.keyfunc = key
200 self.it = iter(iterable)
Raymond Hettinger81a885a2007-12-29 22:16:24 +0000201 self.tgtkey = self.currkey = self.currvalue = object()
Georg Brandl8ec7f652007-08-15 14:28:01 +0000202 def __iter__(self):
203 return self
204 def next(self):
205 while self.currkey == self.tgtkey:
206 self.currvalue = self.it.next() # Exit on StopIteration
207 self.currkey = self.keyfunc(self.currvalue)
208 self.tgtkey = self.currkey
209 return (self.currkey, self._grouper(self.tgtkey))
210 def _grouper(self, tgtkey):
211 while self.currkey == tgtkey:
212 yield self.currvalue
213 self.currvalue = self.it.next() # Exit on StopIteration
214 self.currkey = self.keyfunc(self.currvalue)
215
216 .. versionadded:: 2.4
217
218
219.. function:: ifilter(predicate, iterable)
220
221 Make an iterator that filters elements from iterable returning only those for
222 which the predicate is ``True``. If *predicate* is ``None``, return the items
223 that are true. Equivalent to::
224
225 def ifilter(predicate, iterable):
226 if predicate is None:
227 predicate = bool
228 for x in iterable:
229 if predicate(x):
230 yield x
231
232
233.. function:: ifilterfalse(predicate, iterable)
234
235 Make an iterator that filters elements from iterable returning only those for
236 which the predicate is ``False``. If *predicate* is ``None``, return the items
237 that are false. Equivalent to::
238
239 def ifilterfalse(predicate, iterable):
240 if predicate is None:
241 predicate = bool
242 for x in iterable:
243 if not predicate(x):
244 yield x
245
246
247.. function:: imap(function, *iterables)
248
249 Make an iterator that computes the function using arguments from each of the
250 iterables. If *function* is set to ``None``, then :func:`imap` returns the
251 arguments as a tuple. Like :func:`map` but stops when the shortest iterable is
252 exhausted instead of filling in ``None`` for shorter iterables. The reason for
253 the difference is that infinite iterator arguments are typically an error for
254 :func:`map` (because the output is fully evaluated) but represent a common and
255 useful way of supplying arguments to :func:`imap`. Equivalent to::
256
257 def imap(function, *iterables):
258 iterables = map(iter, iterables)
259 while True:
Raymond Hettinger2dec48d2008-01-22 22:09:26 +0000260 args = [it.next() for it in iterables]
Georg Brandl8ec7f652007-08-15 14:28:01 +0000261 if function is None:
262 yield tuple(args)
263 else:
264 yield function(*args)
265
266
267.. function:: islice(iterable, [start,] stop [, step])
268
269 Make an iterator that returns selected elements from the iterable. If *start* is
270 non-zero, then elements from the iterable are skipped until start is reached.
271 Afterward, elements are returned consecutively unless *step* is set higher than
272 one which results in items being skipped. If *stop* is ``None``, then iteration
273 continues until the iterator is exhausted, if at all; otherwise, it stops at the
274 specified position. Unlike regular slicing, :func:`islice` does not support
275 negative values for *start*, *stop*, or *step*. Can be used to extract related
276 fields from data where the internal structure has been flattened (for example, a
277 multi-line report may list a name field on every third line). Equivalent to::
278
279 def islice(iterable, *args):
280 s = slice(*args)
281 it = iter(xrange(s.start or 0, s.stop or sys.maxint, s.step or 1))
282 nexti = it.next()
283 for i, element in enumerate(iterable):
284 if i == nexti:
285 yield element
286 nexti = it.next()
287
288 If *start* is ``None``, then iteration starts at zero. If *step* is ``None``,
289 then the step defaults to one.
290
291 .. versionchanged:: 2.5
292 accept ``None`` values for default *start* and *step*.
293
294
295.. function:: izip(*iterables)
296
297 Make an iterator that aggregates elements from each of the iterables. Like
298 :func:`zip` except that it returns an iterator instead of a list. Used for
299 lock-step iteration over several iterables at a time. Equivalent to::
300
301 def izip(*iterables):
302 iterables = map(iter, iterables)
303 while iterables:
304 result = [it.next() for it in iterables]
305 yield tuple(result)
306
307 .. versionchanged:: 2.4
308 When no iterables are specified, returns a zero length iterator instead of
309 raising a :exc:`TypeError` exception.
310
Raymond Hettinger48c62932008-01-22 19:51:41 +0000311 The left-to-right evaluation order of the iterables is guaranteed. This
312 makes possible an idiom for clustering a data series into n-length groups
313 using ``izip(*[iter(s)]*n)``.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000314
Raymond Hettinger48c62932008-01-22 19:51:41 +0000315 :func:`izip` should only be used with unequal length inputs when you don't
316 care about trailing, unmatched values from the longer iterables. If those
317 values are important, use :func:`izip_longest` instead.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000318
319
320.. function:: izip_longest(*iterables[, fillvalue])
321
322 Make an iterator that aggregates elements from each of the iterables. If the
323 iterables are of uneven length, missing values are filled-in with *fillvalue*.
324 Iteration continues until the longest iterable is exhausted. Equivalent to::
325
326 def izip_longest(*args, **kwds):
327 fillvalue = kwds.get('fillvalue')
328 def sentinel(counter = ([fillvalue]*(len(args)-1)).pop):
329 yield counter() # yields the fillvalue, or raises IndexError
330 fillers = repeat(fillvalue)
331 iters = [chain(it, sentinel(), fillers) for it in args]
332 try:
333 for tup in izip(*iters):
334 yield tup
335 except IndexError:
336 pass
337
338 If one of the iterables is potentially infinite, then the :func:`izip_longest`
339 function should be wrapped with something that limits the number of calls (for
340 example :func:`islice` or :func:`takewhile`).
341
342 .. versionadded:: 2.6
343
Raymond Hettingerc5705a82008-02-22 19:50:06 +0000344.. function:: product(*iterables)
345
346 Cartesian product of input iterables.
347
348 Equivalent to nested for-loops in a generator expression. For example,
349 ``product(A, B)`` returns the same as ``((x,y) for x in A for y in B)``.
350
351 The leftmost iterators are in the outermost for-loop, so the output tuples
352 cycle in a manner similar to an odometer (with the rightmost element
Raymond Hettinger3fa41d52008-02-26 02:46:54 +0000353 changing on every iteration). This results in a lexicographic ordering
354 so that if the inputs iterables are sorted, the product tuples are emitted
355 in sorted order.
Raymond Hettingerc5705a82008-02-22 19:50:06 +0000356
Raymond Hettinger3fa41d52008-02-26 02:46:54 +0000357 Equivalent to the following except that the actual implementation does not
358 build-up intermediate results in memory::
Raymond Hettingerc5705a82008-02-22 19:50:06 +0000359
360 def product(*args):
361 pools = map(tuple, args)
362 if pools:
363 result = [[]]
364 for pool in pools:
365 result = [x+[y] for x in result for y in pool]
366 for prod in result:
367 yield tuple(prod)
368
369 .. versionadded:: 2.6
Georg Brandl8ec7f652007-08-15 14:28:01 +0000370
371.. function:: repeat(object[, times])
372
373 Make an iterator that returns *object* over and over again. Runs indefinitely
374 unless the *times* argument is specified. Used as argument to :func:`imap` for
375 invariant parameters to the called function. Also used with :func:`izip` to
376 create an invariant part of a tuple record. Equivalent to::
377
378 def repeat(object, times=None):
379 if times is None:
380 while True:
381 yield object
382 else:
383 for i in xrange(times):
384 yield object
385
386
387.. function:: starmap(function, iterable)
388
Raymond Hettinger47317092008-01-17 03:02:14 +0000389 Make an iterator that computes the function using arguments obtained from
Georg Brandl8ec7f652007-08-15 14:28:01 +0000390 the iterable. Used instead of :func:`imap` when argument parameters are already
391 grouped in tuples from a single iterable (the data has been "pre-zipped"). The
392 difference between :func:`imap` and :func:`starmap` parallels the distinction
393 between ``function(a,b)`` and ``function(*c)``. Equivalent to::
394
395 def starmap(function, iterable):
Raymond Hettinger47317092008-01-17 03:02:14 +0000396 for args in iterable:
397 yield function(*args)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000398
Raymond Hettinger47317092008-01-17 03:02:14 +0000399 .. versionchanged:: 2.6
400 Previously, :func:`starmap` required the function arguments to be tuples.
401 Now, any iterable is allowed.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000402
403.. function:: takewhile(predicate, iterable)
404
405 Make an iterator that returns elements from the iterable as long as the
406 predicate is true. Equivalent to::
407
408 def takewhile(predicate, iterable):
409 for x in iterable:
410 if predicate(x):
411 yield x
412 else:
413 break
414
415
416.. function:: tee(iterable[, n=2])
417
418 Return *n* independent iterators from a single iterable. The case where ``n==2``
419 is equivalent to::
420
421 def tee(iterable):
Raymond Hettinger5d332bb2007-12-29 22:09:34 +0000422 def gen(next, data={}):
Georg Brandl8ec7f652007-08-15 14:28:01 +0000423 for i in count():
Raymond Hettinger5d332bb2007-12-29 22:09:34 +0000424 if i in data:
425 yield data.pop(i)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000426 else:
Raymond Hettinger5d332bb2007-12-29 22:09:34 +0000427 data[i] = next()
428 yield data[i]
Georg Brandl8ec7f652007-08-15 14:28:01 +0000429 it = iter(iterable)
Raymond Hettinger5d332bb2007-12-29 22:09:34 +0000430 return gen(it.next), gen(it.next)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000431
432 Note, once :func:`tee` has made a split, the original *iterable* should not be
433 used anywhere else; otherwise, the *iterable* could get advanced without the tee
434 objects being informed.
435
436 Note, this member of the toolkit may require significant auxiliary storage
437 (depending on how much temporary data needs to be stored). In general, if one
438 iterator is going to use most or all of the data before the other iterator, it
439 is faster to use :func:`list` instead of :func:`tee`.
440
441 .. versionadded:: 2.4
442
443
444.. _itertools-example:
445
446Examples
447--------
448
449The following examples show common uses for each tool and demonstrate ways they
450can be combined. ::
451
452 >>> amounts = [120.15, 764.05, 823.14]
453 >>> for checknum, amount in izip(count(1200), amounts):
454 ... print 'Check %d is for $%.2f' % (checknum, amount)
455 ...
456 Check 1200 is for $120.15
457 Check 1201 is for $764.05
458 Check 1202 is for $823.14
459
460 >>> import operator
461 >>> for cube in imap(operator.pow, xrange(1,5), repeat(3)):
462 ... print cube
463 ...
464 1
465 8
466 27
467 64
468
Georg Brandl8ec7f652007-08-15 14:28:01 +0000469 # Show a dictionary sorted and grouped by value
470 >>> from operator import itemgetter
471 >>> d = dict(a=1, b=2, c=1, d=2, e=1, f=2, g=3)
472 >>> di = sorted(d.iteritems(), key=itemgetter(1))
473 >>> for k, g in groupby(di, key=itemgetter(1)):
474 ... print k, map(itemgetter(0), g)
475 ...
476 1 ['a', 'c', 'e']
477 2 ['b', 'd', 'f']
478 3 ['g']
479
480 # Find runs of consecutive numbers using groupby. The key to the solution
481 # is differencing with a range so that consecutive numbers all appear in
482 # same group.
483 >>> data = [ 1, 4,5,6, 10, 15,16,17,18, 22, 25,26,27,28]
484 >>> for k, g in groupby(enumerate(data), lambda (i,x):i-x):
485 ... print map(operator.itemgetter(1), g)
486 ...
487 [1]
488 [4, 5, 6]
489 [10]
490 [15, 16, 17, 18]
491 [22]
492 [25, 26, 27, 28]
493
494
495
496.. _itertools-recipes:
497
498Recipes
499-------
500
501This section shows recipes for creating an extended toolset using the existing
502itertools as building blocks.
503
504The extended tools offer the same high performance as the underlying toolset.
505The superior memory performance is kept by processing elements one at a time
506rather than bringing the whole iterable into memory all at once. Code volume is
507kept small by linking the tools together in a functional style which helps
508eliminate temporary variables. High speed is retained by preferring
Georg Brandlcf3fb252007-10-21 10:52:38 +0000509"vectorized" building blocks over the use of for-loops and :term:`generator`\s
510which incur interpreter overhead. ::
Georg Brandl8ec7f652007-08-15 14:28:01 +0000511
512 def take(n, seq):
513 return list(islice(seq, n))
514
515 def enumerate(iterable):
516 return izip(count(), iterable)
517
518 def tabulate(function):
519 "Return function(0), function(1), ..."
520 return imap(function, count())
521
522 def iteritems(mapping):
523 return izip(mapping.iterkeys(), mapping.itervalues())
524
525 def nth(iterable, n):
526 "Returns the nth item or raise StopIteration"
527 return islice(iterable, n, None).next()
528
529 def all(seq, pred=None):
530 "Returns True if pred(x) is true for every element in the iterable"
531 for elem in ifilterfalse(pred, seq):
532 return False
533 return True
534
535 def any(seq, pred=None):
536 "Returns True if pred(x) is true for at least one element in the iterable"
537 for elem in ifilter(pred, seq):
538 return True
539 return False
540
541 def no(seq, pred=None):
542 "Returns True if pred(x) is false for every element in the iterable"
543 for elem in ifilter(pred, seq):
544 return False
545 return True
546
547 def quantify(seq, pred=None):
548 "Count how many times the predicate is true in the sequence"
549 return sum(imap(pred, seq))
550
551 def padnone(seq):
552 """Returns the sequence elements and then returns None indefinitely.
553
554 Useful for emulating the behavior of the built-in map() function.
555 """
556 return chain(seq, repeat(None))
557
558 def ncycles(seq, n):
559 "Returns the sequence elements n times"
560 return chain(*repeat(seq, n))
561
562 def dotproduct(vec1, vec2):
563 return sum(imap(operator.mul, vec1, vec2))
564
565 def flatten(listOfLists):
566 return list(chain(*listOfLists))
567
568 def repeatfunc(func, times=None, *args):
569 """Repeat calls to func with specified arguments.
570
571 Example: repeatfunc(random.random)
572 """
573 if times is None:
574 return starmap(func, repeat(args))
575 else:
576 return starmap(func, repeat(args, times))
577
578 def pairwise(iterable):
579 "s -> (s0,s1), (s1,s2), (s2, s3), ..."
580 a, b = tee(iterable)
581 try:
582 b.next()
583 except StopIteration:
584 pass
585 return izip(a, b)
586
587 def grouper(n, iterable, padvalue=None):
588 "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
589 return izip(*[chain(iterable, repeat(padvalue, n-1))]*n)
590
Raymond Hettingera44327a2008-01-30 22:17:31 +0000591 def roundrobin(*iterables):
592 "roundrobin('abc', 'd', 'ef') --> 'a', 'd', 'e', 'b', 'f', 'c'"
593 # Recipe contributed by George Sakkis
594 pending = len(iterables)
595 nexts = cycle(iter(it).next for it in iterables)
596 while pending:
597 try:
598 for next in nexts:
599 yield next()
600 except StopIteration:
601 pending -= 1
602 nexts = cycle(islice(nexts, pending))
Georg Brandl8ec7f652007-08-15 14:28:01 +0000603
Raymond Hettinger7832d4d2008-02-23 10:04:15 +0000604 def powerset(iterable):
605 "powerset('ab') --> set([]), set(['b']), set(['a']), set(['a', 'b'])"
606 skip = object()
607 for t in product(*izip(repeat(skip), iterable)):
608 yield set(e for e in t if e is not skip)
609