blob: 5cc3e08fa826afb934b6f855640c25ecff13d912 [file] [log] [blame]
Georg Brandl8ec7f652007-08-15 14:28:01 +00001
2:mod:`itertools` --- Functions creating iterators for efficient looping
3=======================================================================
4
5.. module:: itertools
6 :synopsis: Functions creating iterators for efficient looping.
7.. moduleauthor:: Raymond Hettinger <python@rcn.com>
8.. sectionauthor:: Raymond Hettinger <python@rcn.com>
9
10
11.. versionadded:: 2.3
12
Georg Brandle7a09902007-10-21 12:10:28 +000013This module implements a number of :term:`iterator` building blocks inspired by
Georg Brandl8ec7f652007-08-15 14:28:01 +000014constructs from the Haskell and SML programming languages. Each has been recast
15in a form suitable for Python.
16
17The module standardizes a core set of fast, memory efficient tools that are
18useful by themselves or in combination. Standardization helps avoid the
19readability and reliability problems which arise when many different individuals
20create their own slightly varying implementations, each with their own quirks
21and naming conventions.
22
23The tools are designed to combine readily with one another. This makes it easy
24to construct more specialized tools succinctly and efficiently in pure Python.
25
26For instance, SML provides a tabulation tool: ``tabulate(f)`` which produces a
27sequence ``f(0), f(1), ...``. This toolbox provides :func:`imap` and
28:func:`count` which can be combined to form ``imap(f, count())`` and produce an
29equivalent result.
30
31Likewise, the functional tools are designed to work well with the high-speed
32functions provided by the :mod:`operator` module.
33
34The module author welcomes suggestions for other basic building blocks to be
35added to future versions of the module.
36
37Whether cast in pure python form or compiled code, tools that use iterators are
38more memory efficient (and faster) than their list based counterparts. Adopting
39the principles of just-in-time manufacturing, they create data when and where
40needed instead of consuming memory with the computer equivalent of "inventory".
41
42The performance advantage of iterators becomes more acute as the number of
43elements increases -- at some point, lists grow large enough to severely impact
44memory cache performance and start running slowly.
45
46
47.. seealso::
48
49 The Standard ML Basis Library, `The Standard ML Basis Library
50 <http://www.standardml.org/Basis/>`_.
51
52 Haskell, A Purely Functional Language, `Definition of Haskell and the Standard
53 Libraries <http://www.haskell.org/definition/>`_.
54
55
56.. _itertools-functions:
57
58Itertool functions
59------------------
60
61The following module functions all construct and return iterators. Some provide
62streams of infinite length, so they should only be accessed by functions or
63loops that truncate the stream.
64
65
66.. function:: chain(*iterables)
67
68 Make an iterator that returns elements from the first iterable until it is
69 exhausted, then proceeds to the next iterable, until all of the iterables are
70 exhausted. Used for treating consecutive sequences as a single sequence.
71 Equivalent to::
72
73 def chain(*iterables):
74 for it in iterables:
75 for element in it:
76 yield element
77
78
79.. function:: count([n])
80
81 Make an iterator that returns consecutive integers starting with *n*. If not
Raymond Hettinger50e90e22007-10-04 00:20:27 +000082 specified *n* defaults to zero. Often used as an argument to :func:`imap` to
83 generate consecutive data points. Also, used with :func:`izip` to add sequence
84 numbers. Equivalent to::
Georg Brandl8ec7f652007-08-15 14:28:01 +000085
86 def count(n=0):
87 while True:
88 yield n
89 n += 1
90
Georg Brandl8ec7f652007-08-15 14:28:01 +000091
92.. function:: cycle(iterable)
93
94 Make an iterator returning elements from the iterable and saving a copy of each.
95 When the iterable is exhausted, return elements from the saved copy. Repeats
96 indefinitely. Equivalent to::
97
98 def cycle(iterable):
99 saved = []
100 for element in iterable:
101 yield element
102 saved.append(element)
103 while saved:
104 for element in saved:
105 yield element
106
107 Note, this member of the toolkit may require significant auxiliary storage
108 (depending on the length of the iterable).
109
110
111.. function:: dropwhile(predicate, iterable)
112
113 Make an iterator that drops elements from the iterable as long as the predicate
114 is true; afterwards, returns every element. Note, the iterator does not produce
115 *any* output until the predicate first becomes false, so it may have a lengthy
116 start-up time. Equivalent to::
117
118 def dropwhile(predicate, iterable):
119 iterable = iter(iterable)
120 for x in iterable:
121 if not predicate(x):
122 yield x
123 break
124 for x in iterable:
125 yield x
126
127
128.. function:: groupby(iterable[, key])
129
130 Make an iterator that returns consecutive keys and groups from the *iterable*.
131 The *key* is a function computing a key value for each element. If not
132 specified or is ``None``, *key* defaults to an identity function and returns
133 the element unchanged. Generally, the iterable needs to already be sorted on
134 the same key function.
135
136 The operation of :func:`groupby` is similar to the ``uniq`` filter in Unix. It
137 generates a break or new group every time the value of the key function changes
138 (which is why it is usually necessary to have sorted the data using the same key
139 function). That behavior differs from SQL's GROUP BY which aggregates common
140 elements regardless of their input order.
141
142 The returned group is itself an iterator that shares the underlying iterable
143 with :func:`groupby`. Because the source is shared, when the :func:`groupby`
144 object is advanced, the previous group is no longer visible. So, if that data
145 is needed later, it should be stored as a list::
146
147 groups = []
148 uniquekeys = []
149 data = sorted(data, key=keyfunc)
150 for k, g in groupby(data, keyfunc):
151 groups.append(list(g)) # Store group iterator as a list
152 uniquekeys.append(k)
153
154 :func:`groupby` is equivalent to::
155
156 class groupby(object):
157 def __init__(self, iterable, key=None):
158 if key is None:
159 key = lambda x: x
160 self.keyfunc = key
161 self.it = iter(iterable)
Raymond Hettinger81a885a2007-12-29 22:16:24 +0000162 self.tgtkey = self.currkey = self.currvalue = object()
Georg Brandl8ec7f652007-08-15 14:28:01 +0000163 def __iter__(self):
164 return self
165 def next(self):
166 while self.currkey == self.tgtkey:
167 self.currvalue = self.it.next() # Exit on StopIteration
168 self.currkey = self.keyfunc(self.currvalue)
169 self.tgtkey = self.currkey
170 return (self.currkey, self._grouper(self.tgtkey))
171 def _grouper(self, tgtkey):
172 while self.currkey == tgtkey:
173 yield self.currvalue
174 self.currvalue = self.it.next() # Exit on StopIteration
175 self.currkey = self.keyfunc(self.currvalue)
176
177 .. versionadded:: 2.4
178
179
180.. function:: ifilter(predicate, iterable)
181
182 Make an iterator that filters elements from iterable returning only those for
183 which the predicate is ``True``. If *predicate* is ``None``, return the items
184 that are true. Equivalent to::
185
186 def ifilter(predicate, iterable):
187 if predicate is None:
188 predicate = bool
189 for x in iterable:
190 if predicate(x):
191 yield x
192
193
194.. function:: ifilterfalse(predicate, iterable)
195
196 Make an iterator that filters elements from iterable returning only those for
197 which the predicate is ``False``. If *predicate* is ``None``, return the items
198 that are false. Equivalent to::
199
200 def ifilterfalse(predicate, iterable):
201 if predicate is None:
202 predicate = bool
203 for x in iterable:
204 if not predicate(x):
205 yield x
206
207
208.. function:: imap(function, *iterables)
209
210 Make an iterator that computes the function using arguments from each of the
211 iterables. If *function* is set to ``None``, then :func:`imap` returns the
212 arguments as a tuple. Like :func:`map` but stops when the shortest iterable is
213 exhausted instead of filling in ``None`` for shorter iterables. The reason for
214 the difference is that infinite iterator arguments are typically an error for
215 :func:`map` (because the output is fully evaluated) but represent a common and
216 useful way of supplying arguments to :func:`imap`. Equivalent to::
217
218 def imap(function, *iterables):
219 iterables = map(iter, iterables)
220 while True:
Raymond Hettinger2dec48d2008-01-22 22:09:26 +0000221 args = [it.next() for it in iterables]
Georg Brandl8ec7f652007-08-15 14:28:01 +0000222 if function is None:
223 yield tuple(args)
224 else:
225 yield function(*args)
226
227
228.. function:: islice(iterable, [start,] stop [, step])
229
230 Make an iterator that returns selected elements from the iterable. If *start* is
231 non-zero, then elements from the iterable are skipped until start is reached.
232 Afterward, elements are returned consecutively unless *step* is set higher than
233 one which results in items being skipped. If *stop* is ``None``, then iteration
234 continues until the iterator is exhausted, if at all; otherwise, it stops at the
235 specified position. Unlike regular slicing, :func:`islice` does not support
236 negative values for *start*, *stop*, or *step*. Can be used to extract related
237 fields from data where the internal structure has been flattened (for example, a
238 multi-line report may list a name field on every third line). Equivalent to::
239
240 def islice(iterable, *args):
241 s = slice(*args)
242 it = iter(xrange(s.start or 0, s.stop or sys.maxint, s.step or 1))
243 nexti = it.next()
244 for i, element in enumerate(iterable):
245 if i == nexti:
246 yield element
247 nexti = it.next()
248
249 If *start* is ``None``, then iteration starts at zero. If *step* is ``None``,
250 then the step defaults to one.
251
252 .. versionchanged:: 2.5
253 accept ``None`` values for default *start* and *step*.
254
255
256.. function:: izip(*iterables)
257
258 Make an iterator that aggregates elements from each of the iterables. Like
259 :func:`zip` except that it returns an iterator instead of a list. Used for
260 lock-step iteration over several iterables at a time. Equivalent to::
261
262 def izip(*iterables):
263 iterables = map(iter, iterables)
264 while iterables:
265 result = [it.next() for it in iterables]
266 yield tuple(result)
267
268 .. versionchanged:: 2.4
269 When no iterables are specified, returns a zero length iterator instead of
270 raising a :exc:`TypeError` exception.
271
Raymond Hettinger48c62932008-01-22 19:51:41 +0000272 The left-to-right evaluation order of the iterables is guaranteed. This
273 makes possible an idiom for clustering a data series into n-length groups
274 using ``izip(*[iter(s)]*n)``.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000275
Raymond Hettinger48c62932008-01-22 19:51:41 +0000276 :func:`izip` should only be used with unequal length inputs when you don't
277 care about trailing, unmatched values from the longer iterables. If those
278 values are important, use :func:`izip_longest` instead.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000279
280
281.. function:: izip_longest(*iterables[, fillvalue])
282
283 Make an iterator that aggregates elements from each of the iterables. If the
284 iterables are of uneven length, missing values are filled-in with *fillvalue*.
285 Iteration continues until the longest iterable is exhausted. Equivalent to::
286
287 def izip_longest(*args, **kwds):
288 fillvalue = kwds.get('fillvalue')
289 def sentinel(counter = ([fillvalue]*(len(args)-1)).pop):
290 yield counter() # yields the fillvalue, or raises IndexError
291 fillers = repeat(fillvalue)
292 iters = [chain(it, sentinel(), fillers) for it in args]
293 try:
294 for tup in izip(*iters):
295 yield tup
296 except IndexError:
297 pass
298
299 If one of the iterables is potentially infinite, then the :func:`izip_longest`
300 function should be wrapped with something that limits the number of calls (for
301 example :func:`islice` or :func:`takewhile`).
302
303 .. versionadded:: 2.6
304
Raymond Hettingerc5705a82008-02-22 19:50:06 +0000305.. function:: product(*iterables)
306
307 Cartesian product of input iterables.
308
309 Equivalent to nested for-loops in a generator expression. For example,
310 ``product(A, B)`` returns the same as ``((x,y) for x in A for y in B)``.
311
312 The leftmost iterators are in the outermost for-loop, so the output tuples
313 cycle in a manner similar to an odometer (with the rightmost element
314 changing on every iteration).
315
316 Equivalent to (but without building the entire result in memory)::
317
318 def product(*args):
319 pools = map(tuple, args)
320 if pools:
321 result = [[]]
322 for pool in pools:
323 result = [x+[y] for x in result for y in pool]
324 for prod in result:
325 yield tuple(prod)
326
327 .. versionadded:: 2.6
Georg Brandl8ec7f652007-08-15 14:28:01 +0000328
329.. function:: repeat(object[, times])
330
331 Make an iterator that returns *object* over and over again. Runs indefinitely
332 unless the *times* argument is specified. Used as argument to :func:`imap` for
333 invariant parameters to the called function. Also used with :func:`izip` to
334 create an invariant part of a tuple record. Equivalent to::
335
336 def repeat(object, times=None):
337 if times is None:
338 while True:
339 yield object
340 else:
341 for i in xrange(times):
342 yield object
343
344
345.. function:: starmap(function, iterable)
346
Raymond Hettinger47317092008-01-17 03:02:14 +0000347 Make an iterator that computes the function using arguments obtained from
Georg Brandl8ec7f652007-08-15 14:28:01 +0000348 the iterable. Used instead of :func:`imap` when argument parameters are already
349 grouped in tuples from a single iterable (the data has been "pre-zipped"). The
350 difference between :func:`imap` and :func:`starmap` parallels the distinction
351 between ``function(a,b)`` and ``function(*c)``. Equivalent to::
352
353 def starmap(function, iterable):
Raymond Hettinger47317092008-01-17 03:02:14 +0000354 for args in iterable:
355 yield function(*args)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000356
Raymond Hettinger47317092008-01-17 03:02:14 +0000357 .. versionchanged:: 2.6
358 Previously, :func:`starmap` required the function arguments to be tuples.
359 Now, any iterable is allowed.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000360
361.. function:: takewhile(predicate, iterable)
362
363 Make an iterator that returns elements from the iterable as long as the
364 predicate is true. Equivalent to::
365
366 def takewhile(predicate, iterable):
367 for x in iterable:
368 if predicate(x):
369 yield x
370 else:
371 break
372
373
374.. function:: tee(iterable[, n=2])
375
376 Return *n* independent iterators from a single iterable. The case where ``n==2``
377 is equivalent to::
378
379 def tee(iterable):
Raymond Hettinger5d332bb2007-12-29 22:09:34 +0000380 def gen(next, data={}):
Georg Brandl8ec7f652007-08-15 14:28:01 +0000381 for i in count():
Raymond Hettinger5d332bb2007-12-29 22:09:34 +0000382 if i in data:
383 yield data.pop(i)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000384 else:
Raymond Hettinger5d332bb2007-12-29 22:09:34 +0000385 data[i] = next()
386 yield data[i]
Georg Brandl8ec7f652007-08-15 14:28:01 +0000387 it = iter(iterable)
Raymond Hettinger5d332bb2007-12-29 22:09:34 +0000388 return gen(it.next), gen(it.next)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000389
390 Note, once :func:`tee` has made a split, the original *iterable* should not be
391 used anywhere else; otherwise, the *iterable* could get advanced without the tee
392 objects being informed.
393
394 Note, this member of the toolkit may require significant auxiliary storage
395 (depending on how much temporary data needs to be stored). In general, if one
396 iterator is going to use most or all of the data before the other iterator, it
397 is faster to use :func:`list` instead of :func:`tee`.
398
399 .. versionadded:: 2.4
400
401
402.. _itertools-example:
403
404Examples
405--------
406
407The following examples show common uses for each tool and demonstrate ways they
408can be combined. ::
409
410 >>> amounts = [120.15, 764.05, 823.14]
411 >>> for checknum, amount in izip(count(1200), amounts):
412 ... print 'Check %d is for $%.2f' % (checknum, amount)
413 ...
414 Check 1200 is for $120.15
415 Check 1201 is for $764.05
416 Check 1202 is for $823.14
417
418 >>> import operator
419 >>> for cube in imap(operator.pow, xrange(1,5), repeat(3)):
420 ... print cube
421 ...
422 1
423 8
424 27
425 64
426
Georg Brandl8ec7f652007-08-15 14:28:01 +0000427 # Show a dictionary sorted and grouped by value
428 >>> from operator import itemgetter
429 >>> d = dict(a=1, b=2, c=1, d=2, e=1, f=2, g=3)
430 >>> di = sorted(d.iteritems(), key=itemgetter(1))
431 >>> for k, g in groupby(di, key=itemgetter(1)):
432 ... print k, map(itemgetter(0), g)
433 ...
434 1 ['a', 'c', 'e']
435 2 ['b', 'd', 'f']
436 3 ['g']
437
438 # Find runs of consecutive numbers using groupby. The key to the solution
439 # is differencing with a range so that consecutive numbers all appear in
440 # same group.
441 >>> data = [ 1, 4,5,6, 10, 15,16,17,18, 22, 25,26,27,28]
442 >>> for k, g in groupby(enumerate(data), lambda (i,x):i-x):
443 ... print map(operator.itemgetter(1), g)
444 ...
445 [1]
446 [4, 5, 6]
447 [10]
448 [15, 16, 17, 18]
449 [22]
450 [25, 26, 27, 28]
451
452
453
454.. _itertools-recipes:
455
456Recipes
457-------
458
459This section shows recipes for creating an extended toolset using the existing
460itertools as building blocks.
461
462The extended tools offer the same high performance as the underlying toolset.
463The superior memory performance is kept by processing elements one at a time
464rather than bringing the whole iterable into memory all at once. Code volume is
465kept small by linking the tools together in a functional style which helps
466eliminate temporary variables. High speed is retained by preferring
Georg Brandlcf3fb252007-10-21 10:52:38 +0000467"vectorized" building blocks over the use of for-loops and :term:`generator`\s
468which incur interpreter overhead. ::
Georg Brandl8ec7f652007-08-15 14:28:01 +0000469
470 def take(n, seq):
471 return list(islice(seq, n))
472
473 def enumerate(iterable):
474 return izip(count(), iterable)
475
476 def tabulate(function):
477 "Return function(0), function(1), ..."
478 return imap(function, count())
479
480 def iteritems(mapping):
481 return izip(mapping.iterkeys(), mapping.itervalues())
482
483 def nth(iterable, n):
484 "Returns the nth item or raise StopIteration"
485 return islice(iterable, n, None).next()
486
487 def all(seq, pred=None):
488 "Returns True if pred(x) is true for every element in the iterable"
489 for elem in ifilterfalse(pred, seq):
490 return False
491 return True
492
493 def any(seq, pred=None):
494 "Returns True if pred(x) is true for at least one element in the iterable"
495 for elem in ifilter(pred, seq):
496 return True
497 return False
498
499 def no(seq, pred=None):
500 "Returns True if pred(x) is false for every element in the iterable"
501 for elem in ifilter(pred, seq):
502 return False
503 return True
504
505 def quantify(seq, pred=None):
506 "Count how many times the predicate is true in the sequence"
507 return sum(imap(pred, seq))
508
509 def padnone(seq):
510 """Returns the sequence elements and then returns None indefinitely.
511
512 Useful for emulating the behavior of the built-in map() function.
513 """
514 return chain(seq, repeat(None))
515
516 def ncycles(seq, n):
517 "Returns the sequence elements n times"
518 return chain(*repeat(seq, n))
519
520 def dotproduct(vec1, vec2):
521 return sum(imap(operator.mul, vec1, vec2))
522
523 def flatten(listOfLists):
524 return list(chain(*listOfLists))
525
526 def repeatfunc(func, times=None, *args):
527 """Repeat calls to func with specified arguments.
528
529 Example: repeatfunc(random.random)
530 """
531 if times is None:
532 return starmap(func, repeat(args))
533 else:
534 return starmap(func, repeat(args, times))
535
536 def pairwise(iterable):
537 "s -> (s0,s1), (s1,s2), (s2, s3), ..."
538 a, b = tee(iterable)
539 try:
540 b.next()
541 except StopIteration:
542 pass
543 return izip(a, b)
544
545 def grouper(n, iterable, padvalue=None):
546 "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
547 return izip(*[chain(iterable, repeat(padvalue, n-1))]*n)
548
Raymond Hettingera44327a2008-01-30 22:17:31 +0000549 def roundrobin(*iterables):
550 "roundrobin('abc', 'd', 'ef') --> 'a', 'd', 'e', 'b', 'f', 'c'"
551 # Recipe contributed by George Sakkis
552 pending = len(iterables)
553 nexts = cycle(iter(it).next for it in iterables)
554 while pending:
555 try:
556 for next in nexts:
557 yield next()
558 except StopIteration:
559 pending -= 1
560 nexts = cycle(islice(nexts, pending))
Georg Brandl8ec7f652007-08-15 14:28:01 +0000561
Raymond Hettinger7832d4d2008-02-23 10:04:15 +0000562 def powerset(iterable):
563 "powerset('ab') --> set([]), set(['b']), set(['a']), set(['a', 'b'])"
564 skip = object()
565 for t in product(*izip(repeat(skip), iterable)):
566 yield set(e for e in t if e is not skip)
567