blob: 9f9cb24864941316f1c2883a50a4f62c9cbb8af1 [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001
2:mod:`itertools` --- Functions creating iterators for efficient looping
3=======================================================================
4
5.. module:: itertools
6 :synopsis: Functions creating iterators for efficient looping.
7.. moduleauthor:: Raymond Hettinger <python@rcn.com>
8.. sectionauthor:: Raymond Hettinger <python@rcn.com>
9
10
11.. versionadded:: 2.3
12
13This module implements a number of iterator building blocks inspired by
14constructs from the Haskell and SML programming languages. Each has been recast
15in a form suitable for Python.
16
17The module standardizes a core set of fast, memory efficient tools that are
18useful by themselves or in combination. Standardization helps avoid the
19readability and reliability problems which arise when many different individuals
20create their own slightly varying implementations, each with their own quirks
21and naming conventions.
22
23The tools are designed to combine readily with one another. This makes it easy
24to construct more specialized tools succinctly and efficiently in pure Python.
25
26For instance, SML provides a tabulation tool: ``tabulate(f)`` which produces a
27sequence ``f(0), f(1), ...``. This toolbox provides :func:`imap` and
28:func:`count` which can be combined to form ``imap(f, count())`` and produce an
29equivalent result.
30
31Likewise, the functional tools are designed to work well with the high-speed
32functions provided by the :mod:`operator` module.
33
34The module author welcomes suggestions for other basic building blocks to be
35added to future versions of the module.
36
37Whether cast in pure python form or compiled code, tools that use iterators are
38more memory efficient (and faster) than their list based counterparts. Adopting
39the principles of just-in-time manufacturing, they create data when and where
40needed instead of consuming memory with the computer equivalent of "inventory".
41
42The performance advantage of iterators becomes more acute as the number of
43elements increases -- at some point, lists grow large enough to severely impact
44memory cache performance and start running slowly.
45
46
47.. seealso::
48
49 The Standard ML Basis Library, `The Standard ML Basis Library
50 <http://www.standardml.org/Basis/>`_.
51
52 Haskell, A Purely Functional Language, `Definition of Haskell and the Standard
53 Libraries <http://www.haskell.org/definition/>`_.
54
55
56.. _itertools-functions:
57
58Itertool functions
59------------------
60
61The following module functions all construct and return iterators. Some provide
62streams of infinite length, so they should only be accessed by functions or
63loops that truncate the stream.
64
65
66.. function:: chain(*iterables)
67
68 Make an iterator that returns elements from the first iterable until it is
69 exhausted, then proceeds to the next iterable, until all of the iterables are
70 exhausted. Used for treating consecutive sequences as a single sequence.
71 Equivalent to::
72
73 def chain(*iterables):
74 for it in iterables:
75 for element in it:
76 yield element
77
78
79.. function:: count([n])
80
81 Make an iterator that returns consecutive integers starting with *n*. If not
82 specified *n* defaults to zero. Does not currently support python long
83 integers. Often used as an argument to :func:`imap` to generate consecutive
84 data points. Also, used with :func:`izip` to add sequence numbers. Equivalent
85 to::
86
87 def count(n=0):
88 while True:
89 yield n
90 n += 1
91
92 Note, :func:`count` does not check for overflow and will return negative numbers
93 after exceeding ``sys.maxint``. This behavior may change in the future.
94
95
96.. function:: cycle(iterable)
97
98 Make an iterator returning elements from the iterable and saving a copy of each.
99 When the iterable is exhausted, return elements from the saved copy. Repeats
100 indefinitely. Equivalent to::
101
102 def cycle(iterable):
103 saved = []
104 for element in iterable:
105 yield element
106 saved.append(element)
107 while saved:
108 for element in saved:
109 yield element
110
111 Note, this member of the toolkit may require significant auxiliary storage
112 (depending on the length of the iterable).
113
114
115.. function:: dropwhile(predicate, iterable)
116
117 Make an iterator that drops elements from the iterable as long as the predicate
118 is true; afterwards, returns every element. Note, the iterator does not produce
119 *any* output until the predicate first becomes false, so it may have a lengthy
120 start-up time. Equivalent to::
121
122 def dropwhile(predicate, iterable):
123 iterable = iter(iterable)
124 for x in iterable:
125 if not predicate(x):
126 yield x
127 break
128 for x in iterable:
129 yield x
130
131
132.. function:: groupby(iterable[, key])
133
134 Make an iterator that returns consecutive keys and groups from the *iterable*.
135 The *key* is a function computing a key value for each element. If not
136 specified or is ``None``, *key* defaults to an identity function and returns
137 the element unchanged. Generally, the iterable needs to already be sorted on
138 the same key function.
139
140 The operation of :func:`groupby` is similar to the ``uniq`` filter in Unix. It
141 generates a break or new group every time the value of the key function changes
142 (which is why it is usually necessary to have sorted the data using the same key
143 function). That behavior differs from SQL's GROUP BY which aggregates common
144 elements regardless of their input order.
145
146 The returned group is itself an iterator that shares the underlying iterable
147 with :func:`groupby`. Because the source is shared, when the :func:`groupby`
148 object is advanced, the previous group is no longer visible. So, if that data
149 is needed later, it should be stored as a list::
150
151 groups = []
152 uniquekeys = []
153 data = sorted(data, key=keyfunc)
154 for k, g in groupby(data, keyfunc):
155 groups.append(list(g)) # Store group iterator as a list
156 uniquekeys.append(k)
157
158 :func:`groupby` is equivalent to::
159
160 class groupby(object):
161 def __init__(self, iterable, key=None):
162 if key is None:
163 key = lambda x: x
164 self.keyfunc = key
165 self.it = iter(iterable)
166 self.tgtkey = self.currkey = self.currvalue = []
167 def __iter__(self):
168 return self
169 def __next__(self):
170 while self.currkey == self.tgtkey:
171 self.currvalue = next(self.it) # Exit on StopIteration
172 self.currkey = self.keyfunc(self.currvalue)
173 self.tgtkey = self.currkey
174 return (self.currkey, self._grouper(self.tgtkey))
175 def _grouper(self, tgtkey):
176 while self.currkey == tgtkey:
177 yield self.currvalue
178 self.currvalue = next(self.it) # Exit on StopIteration
179 self.currkey = self.keyfunc(self.currvalue)
180
181 .. versionadded:: 2.4
182
183
184.. function:: ifilter(predicate, iterable)
185
186 Make an iterator that filters elements from iterable returning only those for
187 which the predicate is ``True``. If *predicate* is ``None``, return the items
188 that are true. Equivalent to::
189
190 def ifilter(predicate, iterable):
191 if predicate is None:
192 predicate = bool
193 for x in iterable:
194 if predicate(x):
195 yield x
196
197
198.. function:: ifilterfalse(predicate, iterable)
199
200 Make an iterator that filters elements from iterable returning only those for
201 which the predicate is ``False``. If *predicate* is ``None``, return the items
202 that are false. Equivalent to::
203
204 def ifilterfalse(predicate, iterable):
205 if predicate is None:
206 predicate = bool
207 for x in iterable:
208 if not predicate(x):
209 yield x
210
211
212.. function:: imap(function, *iterables)
213
214 Make an iterator that computes the function using arguments from each of the
215 iterables. If *function* is set to ``None``, then :func:`imap` returns the
216 arguments as a tuple. Like :func:`map` but stops when the shortest iterable is
217 exhausted instead of filling in ``None`` for shorter iterables. The reason for
218 the difference is that infinite iterator arguments are typically an error for
219 :func:`map` (because the output is fully evaluated) but represent a common and
220 useful way of supplying arguments to :func:`imap`. Equivalent to::
221
222 def imap(function, *iterables):
223 iterables = map(iter, iterables)
224 while True:
225 args = [next(i) for i in iterables]
226 if function is None:
227 yield tuple(args)
228 else:
229 yield function(*args)
230
231
232.. function:: islice(iterable, [start,] stop [, step])
233
234 Make an iterator that returns selected elements from the iterable. If *start* is
235 non-zero, then elements from the iterable are skipped until start is reached.
236 Afterward, elements are returned consecutively unless *step* is set higher than
237 one which results in items being skipped. If *stop* is ``None``, then iteration
238 continues until the iterator is exhausted, if at all; otherwise, it stops at the
239 specified position. Unlike regular slicing, :func:`islice` does not support
240 negative values for *start*, *stop*, or *step*. Can be used to extract related
241 fields from data where the internal structure has been flattened (for example, a
242 multi-line report may list a name field on every third line). Equivalent to::
243
244 def islice(iterable, *args):
245 s = slice(*args)
246 it = iter(range(s.start or 0, s.stop or sys.maxint, s.step or 1))
247 nexti = next(it)
248 for i, element in enumerate(iterable):
249 if i == nexti:
250 yield element
251 nexti = next(it)
252
253 If *start* is ``None``, then iteration starts at zero. If *step* is ``None``,
254 then the step defaults to one.
255
256 .. versionchanged:: 2.5
257 accept ``None`` values for default *start* and *step*.
258
259
260.. function:: izip(*iterables)
261
262 Make an iterator that aggregates elements from each of the iterables. Like
263 :func:`zip` except that it returns an iterator instead of a list. Used for
264 lock-step iteration over several iterables at a time. Equivalent to::
265
266 def izip(*iterables):
267 iterables = map(iter, iterables)
268 while iterables:
269 result = [next(it) for it in iterables]
270 yield tuple(result)
271
272 .. versionchanged:: 2.4
273 When no iterables are specified, returns a zero length iterator instead of
274 raising a :exc:`TypeError` exception.
275
276 Note, the left-to-right evaluation order of the iterables is guaranteed. This
277 makes possible an idiom for clustering a data series into n-length groups using
278 ``izip(*[iter(s)]*n)``. For data that doesn't fit n-length groups exactly, the
279 last tuple can be pre-padded with fill values using ``izip(*[chain(s,
280 [None]*(n-1))]*n)``.
281
282 Note, when :func:`izip` is used with unequal length inputs, subsequent
283 iteration over the longer iterables cannot reliably be continued after
284 :func:`izip` terminates. Potentially, up to one entry will be missing from
285 each of the left-over iterables. This occurs because a value is fetched from
286 each iterator in- turn, but the process ends when one of the iterators
287 terminates. This leaves the last fetched values in limbo (they cannot be
288 returned in a final, incomplete tuple and they are cannot be pushed back into
289 the iterator for retrieval with ``next(it)``). In general, :func:`izip`
290 should only be used with unequal length inputs when you don't care about
291 trailing, unmatched values from the longer iterables.
292
293
294.. function:: izip_longest(*iterables[, fillvalue])
295
296 Make an iterator that aggregates elements from each of the iterables. If the
297 iterables are of uneven length, missing values are filled-in with *fillvalue*.
298 Iteration continues until the longest iterable is exhausted. Equivalent to::
299
300 def izip_longest(*args, **kwds):
301 fillvalue = kwds.get('fillvalue')
302 def sentinel(counter = ([fillvalue]*(len(args)-1)).pop):
303 yield counter() # yields the fillvalue, or raises IndexError
304 fillers = repeat(fillvalue)
305 iters = [chain(it, sentinel(), fillers) for it in args]
306 try:
307 for tup in izip(*iters):
308 yield tup
309 except IndexError:
310 pass
311
312 If one of the iterables is potentially infinite, then the :func:`izip_longest`
313 function should be wrapped with something that limits the number of calls (for
314 example :func:`islice` or :func:`takewhile`).
315
316 .. versionadded:: 2.6
317
318
319.. function:: repeat(object[, times])
320
321 Make an iterator that returns *object* over and over again. Runs indefinitely
322 unless the *times* argument is specified. Used as argument to :func:`imap` for
323 invariant parameters to the called function. Also used with :func:`izip` to
324 create an invariant part of a tuple record. Equivalent to::
325
326 def repeat(object, times=None):
327 if times is None:
328 while True:
329 yield object
330 else:
331 for i in range(times):
332 yield object
333
334
335.. function:: starmap(function, iterable)
336
337 Make an iterator that computes the function using arguments tuples obtained from
338 the iterable. Used instead of :func:`imap` when argument parameters are already
339 grouped in tuples from a single iterable (the data has been "pre-zipped"). The
340 difference between :func:`imap` and :func:`starmap` parallels the distinction
341 between ``function(a,b)`` and ``function(*c)``. Equivalent to::
342
343 def starmap(function, iterable):
344 iterable = iter(iterable)
345 while True:
346 yield function(*next(iterable))
347
348
349.. function:: takewhile(predicate, iterable)
350
351 Make an iterator that returns elements from the iterable as long as the
352 predicate is true. Equivalent to::
353
354 def takewhile(predicate, iterable):
355 for x in iterable:
356 if predicate(x):
357 yield x
358 else:
359 break
360
361
362.. function:: tee(iterable[, n=2])
363
364 Return *n* independent iterators from a single iterable. The case where ``n==2``
365 is equivalent to::
366
367 def tee(iterable):
368 def gen(next, data={}, cnt=[0]):
369 for i in count():
370 if i == cnt[0]:
371 item = data[i] = next()
372 cnt[0] += 1
373 else:
374 item = data.pop(i)
375 yield item
376 it = iter(iterable)
377 return (gen(it.__next__), gen(it.__next__))
378
379 Note, once :func:`tee` has made a split, the original *iterable* should not be
380 used anywhere else; otherwise, the *iterable* could get advanced without the tee
381 objects being informed.
382
383 Note, this member of the toolkit may require significant auxiliary storage
384 (depending on how much temporary data needs to be stored). In general, if one
385 iterator is going to use most or all of the data before the other iterator, it
386 is faster to use :func:`list` instead of :func:`tee`.
387
388 .. versionadded:: 2.4
389
390
391.. _itertools-example:
392
393Examples
394--------
395
396The following examples show common uses for each tool and demonstrate ways they
397can be combined. ::
398
399 >>> amounts = [120.15, 764.05, 823.14]
400 >>> for checknum, amount in izip(count(1200), amounts):
401 ... print 'Check %d is for $%.2f' % (checknum, amount)
402 ...
403 Check 1200 is for $120.15
404 Check 1201 is for $764.05
405 Check 1202 is for $823.14
406
407 >>> import operator
408 >>> for cube in imap(operator.pow, range(1,5), repeat(3)):
409 ... print cube
410 ...
411 1
412 8
413 27
414 64
415
416 >>> reportlines = ['EuroPython', 'Roster', '', 'alex', '', 'laura',
417 ... '', 'martin', '', 'walter', '', 'mark']
418 >>> for name in islice(reportlines, 3, None, 2):
419 ... print name.title()
420 ...
421 Alex
422 Laura
423 Martin
424 Walter
425 Mark
426
427 # Show a dictionary sorted and grouped by value
428 >>> from operator import itemgetter
429 >>> d = dict(a=1, b=2, c=1, d=2, e=1, f=2, g=3)
430 >>> di = sorted(d.iteritems(), key=itemgetter(1))
431 >>> for k, g in groupby(di, key=itemgetter(1)):
432 ... print k, map(itemgetter(0), g)
433 ...
434 1 ['a', 'c', 'e']
435 2 ['b', 'd', 'f']
436 3 ['g']
437
438 # Find runs of consecutive numbers using groupby. The key to the solution
439 # is differencing with a range so that consecutive numbers all appear in
440 # same group.
441 >>> data = [ 1, 4,5,6, 10, 15,16,17,18, 22, 25,26,27,28]
442 >>> for k, g in groupby(enumerate(data), lambda t:t[0]-t[1]):
443 ... print map(operator.itemgetter(1), g)
444 ...
445 [1]
446 [4, 5, 6]
447 [10]
448 [15, 16, 17, 18]
449 [22]
450 [25, 26, 27, 28]
451
452
453
454.. _itertools-recipes:
455
456Recipes
457-------
458
459This section shows recipes for creating an extended toolset using the existing
460itertools as building blocks.
461
462The extended tools offer the same high performance as the underlying toolset.
463The superior memory performance is kept by processing elements one at a time
464rather than bringing the whole iterable into memory all at once. Code volume is
465kept small by linking the tools together in a functional style which helps
466eliminate temporary variables. High speed is retained by preferring
467"vectorized" building blocks over the use of for-loops and generators which
468incur interpreter overhead. ::
469
470 def take(n, seq):
471 return list(islice(seq, n))
472
473 def enumerate(iterable):
474 return izip(count(), iterable)
475
476 def tabulate(function):
477 "Return function(0), function(1), ..."
478 return imap(function, count())
479
480 def iteritems(mapping):
481 return izip(mapping.iterkeys(), mapping.itervalues())
482
483 def nth(iterable, n):
484 "Returns the nth item or raise StopIteration"
485 return islice(iterable, n, None).next()
486
487 def all(seq, pred=None):
488 "Returns True if pred(x) is true for every element in the iterable"
489 for elem in ifilterfalse(pred, seq):
490 return False
491 return True
492
493 def any(seq, pred=None):
494 "Returns True if pred(x) is true for at least one element in the iterable"
495 for elem in ifilter(pred, seq):
496 return True
497 return False
498
499 def no(seq, pred=None):
500 "Returns True if pred(x) is false for every element in the iterable"
501 for elem in ifilter(pred, seq):
502 return False
503 return True
504
505 def quantify(seq, pred=None):
506 "Count how many times the predicate is true in the sequence"
507 return sum(imap(pred, seq))
508
509 def padnone(seq):
510 """Returns the sequence elements and then returns None indefinitely.
511
512 Useful for emulating the behavior of the built-in map() function.
513 """
514 return chain(seq, repeat(None))
515
516 def ncycles(seq, n):
517 "Returns the sequence elements n times"
518 return chain(*repeat(seq, n))
519
520 def dotproduct(vec1, vec2):
521 return sum(imap(operator.mul, vec1, vec2))
522
523 def flatten(listOfLists):
524 return list(chain(*listOfLists))
525
526 def repeatfunc(func, times=None, *args):
527 """Repeat calls to func with specified arguments.
528
529 Example: repeatfunc(random.random)
530 """
531 if times is None:
532 return starmap(func, repeat(args))
533 else:
534 return starmap(func, repeat(args, times))
535
536 def pairwise(iterable):
537 "s -> (s0,s1), (s1,s2), (s2, s3), ..."
538 a, b = tee(iterable)
539 next(b, None)
540 return izip(a, b)
541
542 def grouper(n, iterable, padvalue=None):
543 "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
544 return izip(*[chain(iterable, repeat(padvalue, n-1))]*n)
545
546
547