blob: d6e3291af955b5f5b96f9f91135b4219f6649310 [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001
2:mod:`itertools` --- Functions creating iterators for efficient looping
3=======================================================================
4
5.. module:: itertools
6 :synopsis: Functions creating iterators for efficient looping.
7.. moduleauthor:: Raymond Hettinger <python@rcn.com>
8.. sectionauthor:: Raymond Hettinger <python@rcn.com>
9
10
Georg Brandl116aa622007-08-15 14:28:22 +000011This module implements a number of iterator building blocks inspired by
12constructs from the Haskell and SML programming languages. Each has been recast
13in a form suitable for Python.
14
15The module standardizes a core set of fast, memory efficient tools that are
16useful by themselves or in combination. Standardization helps avoid the
17readability and reliability problems which arise when many different individuals
18create their own slightly varying implementations, each with their own quirks
19and naming conventions.
20
21The tools are designed to combine readily with one another. This makes it easy
22to construct more specialized tools succinctly and efficiently in pure Python.
23
24For instance, SML provides a tabulation tool: ``tabulate(f)`` which produces a
25sequence ``f(0), f(1), ...``. This toolbox provides :func:`imap` and
26:func:`count` which can be combined to form ``imap(f, count())`` and produce an
27equivalent result.
28
29Likewise, the functional tools are designed to work well with the high-speed
30functions provided by the :mod:`operator` module.
31
32The module author welcomes suggestions for other basic building blocks to be
33added to future versions of the module.
34
35Whether cast in pure python form or compiled code, tools that use iterators are
36more memory efficient (and faster) than their list based counterparts. Adopting
37the principles of just-in-time manufacturing, they create data when and where
38needed instead of consuming memory with the computer equivalent of "inventory".
39
40The performance advantage of iterators becomes more acute as the number of
41elements increases -- at some point, lists grow large enough to severely impact
42memory cache performance and start running slowly.
43
44
45.. seealso::
46
47 The Standard ML Basis Library, `The Standard ML Basis Library
48 <http://www.standardml.org/Basis/>`_.
49
50 Haskell, A Purely Functional Language, `Definition of Haskell and the Standard
51 Libraries <http://www.haskell.org/definition/>`_.
52
53
54.. _itertools-functions:
55
56Itertool functions
57------------------
58
59The following module functions all construct and return iterators. Some provide
60streams of infinite length, so they should only be accessed by functions or
61loops that truncate the stream.
62
63
64.. function:: chain(*iterables)
65
66 Make an iterator that returns elements from the first iterable until it is
67 exhausted, then proceeds to the next iterable, until all of the iterables are
68 exhausted. Used for treating consecutive sequences as a single sequence.
69 Equivalent to::
70
71 def chain(*iterables):
72 for it in iterables:
73 for element in it:
74 yield element
75
76
77.. function:: count([n])
78
79 Make an iterator that returns consecutive integers starting with *n*. If not
80 specified *n* defaults to zero. Does not currently support python long
81 integers. Often used as an argument to :func:`imap` to generate consecutive
82 data points. Also, used with :func:`izip` to add sequence numbers. Equivalent
83 to::
84
85 def count(n=0):
86 while True:
87 yield n
88 n += 1
89
90 Note, :func:`count` does not check for overflow and will return negative numbers
91 after exceeding ``sys.maxint``. This behavior may change in the future.
92
93
94.. function:: cycle(iterable)
95
96 Make an iterator returning elements from the iterable and saving a copy of each.
97 When the iterable is exhausted, return elements from the saved copy. Repeats
98 indefinitely. Equivalent to::
99
100 def cycle(iterable):
101 saved = []
102 for element in iterable:
103 yield element
104 saved.append(element)
105 while saved:
106 for element in saved:
107 yield element
108
109 Note, this member of the toolkit may require significant auxiliary storage
110 (depending on the length of the iterable).
111
112
113.. function:: dropwhile(predicate, iterable)
114
115 Make an iterator that drops elements from the iterable as long as the predicate
116 is true; afterwards, returns every element. Note, the iterator does not produce
117 *any* output until the predicate first becomes false, so it may have a lengthy
118 start-up time. Equivalent to::
119
120 def dropwhile(predicate, iterable):
121 iterable = iter(iterable)
122 for x in iterable:
123 if not predicate(x):
124 yield x
125 break
126 for x in iterable:
127 yield x
128
129
130.. function:: groupby(iterable[, key])
131
132 Make an iterator that returns consecutive keys and groups from the *iterable*.
133 The *key* is a function computing a key value for each element. If not
134 specified or is ``None``, *key* defaults to an identity function and returns
135 the element unchanged. Generally, the iterable needs to already be sorted on
136 the same key function.
137
138 The operation of :func:`groupby` is similar to the ``uniq`` filter in Unix. It
139 generates a break or new group every time the value of the key function changes
140 (which is why it is usually necessary to have sorted the data using the same key
141 function). That behavior differs from SQL's GROUP BY which aggregates common
142 elements regardless of their input order.
143
144 The returned group is itself an iterator that shares the underlying iterable
145 with :func:`groupby`. Because the source is shared, when the :func:`groupby`
146 object is advanced, the previous group is no longer visible. So, if that data
147 is needed later, it should be stored as a list::
148
149 groups = []
150 uniquekeys = []
151 data = sorted(data, key=keyfunc)
152 for k, g in groupby(data, keyfunc):
153 groups.append(list(g)) # Store group iterator as a list
154 uniquekeys.append(k)
155
156 :func:`groupby` is equivalent to::
157
158 class groupby(object):
159 def __init__(self, iterable, key=None):
160 if key is None:
161 key = lambda x: x
162 self.keyfunc = key
163 self.it = iter(iterable)
164 self.tgtkey = self.currkey = self.currvalue = []
165 def __iter__(self):
166 return self
167 def __next__(self):
168 while self.currkey == self.tgtkey:
169 self.currvalue = next(self.it) # Exit on StopIteration
170 self.currkey = self.keyfunc(self.currvalue)
171 self.tgtkey = self.currkey
172 return (self.currkey, self._grouper(self.tgtkey))
173 def _grouper(self, tgtkey):
174 while self.currkey == tgtkey:
175 yield self.currvalue
176 self.currvalue = next(self.it) # Exit on StopIteration
177 self.currkey = self.keyfunc(self.currvalue)
178
Georg Brandl116aa622007-08-15 14:28:22 +0000179
180.. function:: ifilter(predicate, iterable)
181
182 Make an iterator that filters elements from iterable returning only those for
183 which the predicate is ``True``. If *predicate* is ``None``, return the items
184 that are true. Equivalent to::
185
186 def ifilter(predicate, iterable):
187 if predicate is None:
188 predicate = bool
189 for x in iterable:
190 if predicate(x):
191 yield x
192
193
194.. function:: ifilterfalse(predicate, iterable)
195
196 Make an iterator that filters elements from iterable returning only those for
197 which the predicate is ``False``. If *predicate* is ``None``, return the items
198 that are false. Equivalent to::
199
200 def ifilterfalse(predicate, iterable):
201 if predicate is None:
202 predicate = bool
203 for x in iterable:
204 if not predicate(x):
205 yield x
206
207
208.. function:: imap(function, *iterables)
209
210 Make an iterator that computes the function using arguments from each of the
211 iterables. If *function* is set to ``None``, then :func:`imap` returns the
212 arguments as a tuple. Like :func:`map` but stops when the shortest iterable is
213 exhausted instead of filling in ``None`` for shorter iterables. The reason for
214 the difference is that infinite iterator arguments are typically an error for
215 :func:`map` (because the output is fully evaluated) but represent a common and
216 useful way of supplying arguments to :func:`imap`. Equivalent to::
217
218 def imap(function, *iterables):
219 iterables = map(iter, iterables)
220 while True:
221 args = [next(i) for i in iterables]
222 if function is None:
223 yield tuple(args)
224 else:
225 yield function(*args)
226
227
228.. function:: islice(iterable, [start,] stop [, step])
229
230 Make an iterator that returns selected elements from the iterable. If *start* is
231 non-zero, then elements from the iterable are skipped until start is reached.
232 Afterward, elements are returned consecutively unless *step* is set higher than
233 one which results in items being skipped. If *stop* is ``None``, then iteration
234 continues until the iterator is exhausted, if at all; otherwise, it stops at the
235 specified position. Unlike regular slicing, :func:`islice` does not support
236 negative values for *start*, *stop*, or *step*. Can be used to extract related
237 fields from data where the internal structure has been flattened (for example, a
238 multi-line report may list a name field on every third line). Equivalent to::
239
240 def islice(iterable, *args):
241 s = slice(*args)
242 it = iter(range(s.start or 0, s.stop or sys.maxint, s.step or 1))
243 nexti = next(it)
244 for i, element in enumerate(iterable):
245 if i == nexti:
246 yield element
247 nexti = next(it)
248
249 If *start* is ``None``, then iteration starts at zero. If *step* is ``None``,
250 then the step defaults to one.
251
Georg Brandl116aa622007-08-15 14:28:22 +0000252
253.. function:: izip(*iterables)
254
255 Make an iterator that aggregates elements from each of the iterables. Like
256 :func:`zip` except that it returns an iterator instead of a list. Used for
257 lock-step iteration over several iterables at a time. Equivalent to::
258
259 def izip(*iterables):
260 iterables = map(iter, iterables)
261 while iterables:
262 result = [next(it) for it in iterables]
263 yield tuple(result)
264
Georg Brandl55ac8f02007-09-01 13:51:09 +0000265 When no iterables are specified, return a zero length iterator.
Georg Brandl116aa622007-08-15 14:28:22 +0000266
267 Note, the left-to-right evaluation order of the iterables is guaranteed. This
268 makes possible an idiom for clustering a data series into n-length groups using
269 ``izip(*[iter(s)]*n)``. For data that doesn't fit n-length groups exactly, the
270 last tuple can be pre-padded with fill values using ``izip(*[chain(s,
271 [None]*(n-1))]*n)``.
272
273 Note, when :func:`izip` is used with unequal length inputs, subsequent
274 iteration over the longer iterables cannot reliably be continued after
275 :func:`izip` terminates. Potentially, up to one entry will be missing from
276 each of the left-over iterables. This occurs because a value is fetched from
277 each iterator in- turn, but the process ends when one of the iterators
278 terminates. This leaves the last fetched values in limbo (they cannot be
279 returned in a final, incomplete tuple and they are cannot be pushed back into
280 the iterator for retrieval with ``next(it)``). In general, :func:`izip`
281 should only be used with unequal length inputs when you don't care about
282 trailing, unmatched values from the longer iterables.
283
284
285.. function:: izip_longest(*iterables[, fillvalue])
286
287 Make an iterator that aggregates elements from each of the iterables. If the
288 iterables are of uneven length, missing values are filled-in with *fillvalue*.
289 Iteration continues until the longest iterable is exhausted. Equivalent to::
290
291 def izip_longest(*args, **kwds):
292 fillvalue = kwds.get('fillvalue')
293 def sentinel(counter = ([fillvalue]*(len(args)-1)).pop):
294 yield counter() # yields the fillvalue, or raises IndexError
295 fillers = repeat(fillvalue)
296 iters = [chain(it, sentinel(), fillers) for it in args]
297 try:
298 for tup in izip(*iters):
299 yield tup
300 except IndexError:
301 pass
302
303 If one of the iterables is potentially infinite, then the :func:`izip_longest`
304 function should be wrapped with something that limits the number of calls (for
305 example :func:`islice` or :func:`takewhile`).
306
Georg Brandl116aa622007-08-15 14:28:22 +0000307
308.. function:: repeat(object[, times])
309
310 Make an iterator that returns *object* over and over again. Runs indefinitely
311 unless the *times* argument is specified. Used as argument to :func:`imap` for
312 invariant parameters to the called function. Also used with :func:`izip` to
313 create an invariant part of a tuple record. Equivalent to::
314
315 def repeat(object, times=None):
316 if times is None:
317 while True:
318 yield object
319 else:
320 for i in range(times):
321 yield object
322
323
324.. function:: starmap(function, iterable)
325
326 Make an iterator that computes the function using arguments tuples obtained from
327 the iterable. Used instead of :func:`imap` when argument parameters are already
328 grouped in tuples from a single iterable (the data has been "pre-zipped"). The
329 difference between :func:`imap` and :func:`starmap` parallels the distinction
330 between ``function(a,b)`` and ``function(*c)``. Equivalent to::
331
332 def starmap(function, iterable):
333 iterable = iter(iterable)
334 while True:
335 yield function(*next(iterable))
336
337
338.. function:: takewhile(predicate, iterable)
339
340 Make an iterator that returns elements from the iterable as long as the
341 predicate is true. Equivalent to::
342
343 def takewhile(predicate, iterable):
344 for x in iterable:
345 if predicate(x):
346 yield x
347 else:
348 break
349
350
351.. function:: tee(iterable[, n=2])
352
353 Return *n* independent iterators from a single iterable. The case where ``n==2``
354 is equivalent to::
355
356 def tee(iterable):
357 def gen(next, data={}, cnt=[0]):
358 for i in count():
359 if i == cnt[0]:
360 item = data[i] = next()
361 cnt[0] += 1
362 else:
363 item = data.pop(i)
364 yield item
365 it = iter(iterable)
366 return (gen(it.__next__), gen(it.__next__))
367
368 Note, once :func:`tee` has made a split, the original *iterable* should not be
369 used anywhere else; otherwise, the *iterable* could get advanced without the tee
370 objects being informed.
371
372 Note, this member of the toolkit may require significant auxiliary storage
373 (depending on how much temporary data needs to be stored). In general, if one
374 iterator is going to use most or all of the data before the other iterator, it
375 is faster to use :func:`list` instead of :func:`tee`.
376
Georg Brandl116aa622007-08-15 14:28:22 +0000377
378.. _itertools-example:
379
380Examples
381--------
382
383The following examples show common uses for each tool and demonstrate ways they
384can be combined. ::
385
386 >>> amounts = [120.15, 764.05, 823.14]
387 >>> for checknum, amount in izip(count(1200), amounts):
Georg Brandl6911e3c2007-09-04 07:15:32 +0000388 ... print('Check %d is for $%.2f' % (checknum, amount))
Georg Brandl116aa622007-08-15 14:28:22 +0000389 ...
390 Check 1200 is for $120.15
391 Check 1201 is for $764.05
392 Check 1202 is for $823.14
393
394 >>> import operator
395 >>> for cube in imap(operator.pow, range(1,5), repeat(3)):
Georg Brandl6911e3c2007-09-04 07:15:32 +0000396 ... print(cube)
Georg Brandl116aa622007-08-15 14:28:22 +0000397 ...
398 1
399 8
400 27
401 64
402
403 >>> reportlines = ['EuroPython', 'Roster', '', 'alex', '', 'laura',
404 ... '', 'martin', '', 'walter', '', 'mark']
405 >>> for name in islice(reportlines, 3, None, 2):
Georg Brandl6911e3c2007-09-04 07:15:32 +0000406 ... print(name.title())
Georg Brandl116aa622007-08-15 14:28:22 +0000407 ...
408 Alex
409 Laura
410 Martin
411 Walter
412 Mark
413
414 # Show a dictionary sorted and grouped by value
415 >>> from operator import itemgetter
416 >>> d = dict(a=1, b=2, c=1, d=2, e=1, f=2, g=3)
417 >>> di = sorted(d.iteritems(), key=itemgetter(1))
418 >>> for k, g in groupby(di, key=itemgetter(1)):
Georg Brandl6911e3c2007-09-04 07:15:32 +0000419 ... print(k, map(itemgetter(0), g))
Georg Brandl116aa622007-08-15 14:28:22 +0000420 ...
421 1 ['a', 'c', 'e']
422 2 ['b', 'd', 'f']
423 3 ['g']
424
425 # Find runs of consecutive numbers using groupby. The key to the solution
426 # is differencing with a range so that consecutive numbers all appear in
427 # same group.
428 >>> data = [ 1, 4,5,6, 10, 15,16,17,18, 22, 25,26,27,28]
429 >>> for k, g in groupby(enumerate(data), lambda t:t[0]-t[1]):
Georg Brandl6911e3c2007-09-04 07:15:32 +0000430 ... print(map(operator.itemgetter(1), g))
Georg Brandl116aa622007-08-15 14:28:22 +0000431 ...
432 [1]
433 [4, 5, 6]
434 [10]
435 [15, 16, 17, 18]
436 [22]
437 [25, 26, 27, 28]
438
439
440
441.. _itertools-recipes:
442
443Recipes
444-------
445
446This section shows recipes for creating an extended toolset using the existing
447itertools as building blocks.
448
449The extended tools offer the same high performance as the underlying toolset.
450The superior memory performance is kept by processing elements one at a time
451rather than bringing the whole iterable into memory all at once. Code volume is
452kept small by linking the tools together in a functional style which helps
453eliminate temporary variables. High speed is retained by preferring
454"vectorized" building blocks over the use of for-loops and generators which
455incur interpreter overhead. ::
456
457 def take(n, seq):
458 return list(islice(seq, n))
459
460 def enumerate(iterable):
461 return izip(count(), iterable)
462
463 def tabulate(function):
464 "Return function(0), function(1), ..."
465 return imap(function, count())
466
467 def iteritems(mapping):
468 return izip(mapping.iterkeys(), mapping.itervalues())
469
470 def nth(iterable, n):
471 "Returns the nth item or raise StopIteration"
472 return islice(iterable, n, None).next()
473
474 def all(seq, pred=None):
475 "Returns True if pred(x) is true for every element in the iterable"
476 for elem in ifilterfalse(pred, seq):
477 return False
478 return True
479
480 def any(seq, pred=None):
481 "Returns True if pred(x) is true for at least one element in the iterable"
482 for elem in ifilter(pred, seq):
483 return True
484 return False
485
486 def no(seq, pred=None):
487 "Returns True if pred(x) is false for every element in the iterable"
488 for elem in ifilter(pred, seq):
489 return False
490 return True
491
492 def quantify(seq, pred=None):
493 "Count how many times the predicate is true in the sequence"
494 return sum(imap(pred, seq))
495
496 def padnone(seq):
497 """Returns the sequence elements and then returns None indefinitely.
498
499 Useful for emulating the behavior of the built-in map() function.
500 """
501 return chain(seq, repeat(None))
502
503 def ncycles(seq, n):
504 "Returns the sequence elements n times"
505 return chain(*repeat(seq, n))
506
507 def dotproduct(vec1, vec2):
508 return sum(imap(operator.mul, vec1, vec2))
509
510 def flatten(listOfLists):
511 return list(chain(*listOfLists))
512
513 def repeatfunc(func, times=None, *args):
514 """Repeat calls to func with specified arguments.
515
516 Example: repeatfunc(random.random)
517 """
518 if times is None:
519 return starmap(func, repeat(args))
520 else:
521 return starmap(func, repeat(args, times))
522
523 def pairwise(iterable):
524 "s -> (s0,s1), (s1,s2), (s2, s3), ..."
525 a, b = tee(iterable)
526 next(b, None)
527 return izip(a, b)
528
529 def grouper(n, iterable, padvalue=None):
530 "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
531 return izip(*[chain(iterable, repeat(padvalue, n-1))]*n)
532
533
534