blob: 50d4b04116d1fde0b202c6f1c20a1889359897c3 [file] [log] [blame]
Georg Brandl8ec7f652007-08-15 14:28:01 +00001********************************
2 Functional Programming HOWTO
3********************************
4
5:Author: \A. M. Kuchling
6:Release: 0.30
7
8(This is a first draft. Please send comments/error reports/suggestions to
Andrew M. Kuchlingdf2d7452007-12-14 22:41:18 +00009amk@amk.ca.)
Georg Brandl8ec7f652007-08-15 14:28:01 +000010
11In this document, we'll take a tour of Python's features suitable for
12implementing programs in a functional style. After an introduction to the
13concepts of functional programming, we'll look at language features such as
Georg Brandle7a09902007-10-21 12:10:28 +000014:term:`iterator`\s and :term:`generator`\s and relevant library modules such as
Georg Brandlcf3fb252007-10-21 10:52:38 +000015:mod:`itertools` and :mod:`functools`.
Georg Brandl8ec7f652007-08-15 14:28:01 +000016
17
18Introduction
19============
20
21This section explains the basic concept of functional programming; if you're
22just interested in learning about Python language features, skip to the next
23section.
24
25Programming languages support decomposing problems in several different ways:
26
27* Most programming languages are **procedural**: programs are lists of
28 instructions that tell the computer what to do with the program's input. C,
29 Pascal, and even Unix shells are procedural languages.
30
31* In **declarative** languages, you write a specification that describes the
32 problem to be solved, and the language implementation figures out how to
33 perform the computation efficiently. SQL is the declarative language you're
34 most likely to be familiar with; a SQL query describes the data set you want
35 to retrieve, and the SQL engine decides whether to scan tables or use indexes,
36 which subclauses should be performed first, etc.
37
38* **Object-oriented** programs manipulate collections of objects. Objects have
39 internal state and support methods that query or modify this internal state in
40 some way. Smalltalk and Java are object-oriented languages. C++ and Python
41 are languages that support object-oriented programming, but don't force the
42 use of object-oriented features.
43
44* **Functional** programming decomposes a problem into a set of functions.
45 Ideally, functions only take inputs and produce outputs, and don't have any
46 internal state that affects the output produced for a given input. Well-known
47 functional languages include the ML family (Standard ML, OCaml, and other
48 variants) and Haskell.
49
50The designers of some computer languages have chosen one approach to programming
51that's emphasized. This often makes it difficult to write programs that use a
52different approach. Other languages are multi-paradigm languages that support
53several different approaches. Lisp, C++, and Python are multi-paradigm; you can
54write programs or libraries that are largely procedural, object-oriented, or
55functional in all of these languages. In a large program, different sections
56might be written using different approaches; the GUI might be object-oriented
57while the processing logic is procedural or functional, for example.
58
59In a functional program, input flows through a set of functions. Each function
60operates on its input and produces some output. Functional style frowns upon
61functions with side effects that modify internal state or make other changes
62that aren't visible in the function's return value. Functions that have no side
63effects at all are called **purely functional**. Avoiding side effects means
64not using data structures that get updated as a program runs; every function's
65output must only depend on its input.
66
67Some languages are very strict about purity and don't even have assignment
68statements such as ``a=3`` or ``c = a + b``, but it's difficult to avoid all
69side effects. Printing to the screen or writing to a disk file are side
70effects, for example. For example, in Python a ``print`` statement or a
71``time.sleep(1)`` both return no useful value; they're only called for their
72side effects of sending some text to the screen or pausing execution for a
73second.
74
75Python programs written in functional style usually won't go to the extreme of
76avoiding all I/O or all assignments; instead, they'll provide a
77functional-appearing interface but will use non-functional features internally.
78For example, the implementation of a function will still use assignments to
79local variables, but won't modify global variables or have other side effects.
80
81Functional programming can be considered the opposite of object-oriented
82programming. Objects are little capsules containing some internal state along
83with a collection of method calls that let you modify this state, and programs
84consist of making the right set of state changes. Functional programming wants
85to avoid state changes as much as possible and works with data flowing between
86functions. In Python you might combine the two approaches by writing functions
87that take and return instances representing objects in your application (e-mail
88messages, transactions, etc.).
89
90Functional design may seem like an odd constraint to work under. Why should you
91avoid objects and side effects? There are theoretical and practical advantages
92to the functional style:
93
94* Formal provability.
95* Modularity.
96* Composability.
97* Ease of debugging and testing.
98
99Formal provability
100------------------
101
102A theoretical benefit is that it's easier to construct a mathematical proof that
103a functional program is correct.
104
105For a long time researchers have been interested in finding ways to
106mathematically prove programs correct. This is different from testing a program
107on numerous inputs and concluding that its output is usually correct, or reading
108a program's source code and concluding that the code looks right; the goal is
109instead a rigorous proof that a program produces the right result for all
110possible inputs.
111
112The technique used to prove programs correct is to write down **invariants**,
113properties of the input data and of the program's variables that are always
114true. For each line of code, you then show that if invariants X and Y are true
115**before** the line is executed, the slightly different invariants X' and Y' are
116true **after** the line is executed. This continues until you reach the end of
117the program, at which point the invariants should match the desired conditions
118on the program's output.
119
120Functional programming's avoidance of assignments arose because assignments are
121difficult to handle with this technique; assignments can break invariants that
122were true before the assignment without producing any new invariants that can be
123propagated onward.
124
125Unfortunately, proving programs correct is largely impractical and not relevant
126to Python software. Even trivial programs require proofs that are several pages
127long; the proof of correctness for a moderately complicated program would be
128enormous, and few or none of the programs you use daily (the Python interpreter,
129your XML parser, your web browser) could be proven correct. Even if you wrote
130down or generated a proof, there would then be the question of verifying the
131proof; maybe there's an error in it, and you wrongly believe you've proved the
132program correct.
133
134Modularity
135----------
136
137A more practical benefit of functional programming is that it forces you to
138break apart your problem into small pieces. Programs are more modular as a
139result. It's easier to specify and write a small function that does one thing
140than a large function that performs a complicated transformation. Small
141functions are also easier to read and to check for errors.
142
143
144Ease of debugging and testing
145-----------------------------
146
147Testing and debugging a functional-style program is easier.
148
149Debugging is simplified because functions are generally small and clearly
150specified. When a program doesn't work, each function is an interface point
151where you can check that the data are correct. You can look at the intermediate
152inputs and outputs to quickly isolate the function that's responsible for a bug.
153
154Testing is easier because each function is a potential subject for a unit test.
155Functions don't depend on system state that needs to be replicated before
156running a test; instead you only have to synthesize the right input and then
157check that the output matches expectations.
158
159
160
161Composability
162-------------
163
164As you work on a functional-style program, you'll write a number of functions
165with varying inputs and outputs. Some of these functions will be unavoidably
166specialized to a particular application, but others will be useful in a wide
167variety of programs. For example, a function that takes a directory path and
168returns all the XML files in the directory, or a function that takes a filename
169and returns its contents, can be applied to many different situations.
170
171Over time you'll form a personal library of utilities. Often you'll assemble
172new programs by arranging existing functions in a new configuration and writing
173a few functions specialized for the current task.
174
175
176
177Iterators
178=========
179
180I'll start by looking at a Python language feature that's an important
181foundation for writing functional-style programs: iterators.
182
183An iterator is an object representing a stream of data; this object returns the
184data one element at a time. A Python iterator must support a method called
185``next()`` that takes no arguments and always returns the next element of the
186stream. If there are no more elements in the stream, ``next()`` must raise the
187``StopIteration`` exception. Iterators don't have to be finite, though; it's
188perfectly reasonable to write an iterator that produces an infinite stream of
189data.
190
191The built-in :func:`iter` function takes an arbitrary object and tries to return
192an iterator that will return the object's contents or elements, raising
193:exc:`TypeError` if the object doesn't support iteration. Several of Python's
194built-in data types support iteration, the most common being lists and
195dictionaries. An object is called an **iterable** object if you can get an
196iterator for it.
197
198You can experiment with the iteration interface manually::
199
200 >>> L = [1,2,3]
201 >>> it = iter(L)
202 >>> print it
203 <iterator object at 0x8116870>
204 >>> it.next()
205 1
206 >>> it.next()
207 2
208 >>> it.next()
209 3
210 >>> it.next()
211 Traceback (most recent call last):
212 File "<stdin>", line 1, in ?
213 StopIteration
214 >>>
215
216Python expects iterable objects in several different contexts, the most
217important being the ``for`` statement. In the statement ``for X in Y``, Y must
218be an iterator or some object for which ``iter()`` can create an iterator.
219These two statements are equivalent::
220
221 for i in iter(obj):
222 print i
223
224 for i in obj:
225 print i
226
227Iterators can be materialized as lists or tuples by using the :func:`list` or
228:func:`tuple` constructor functions::
229
230 >>> L = [1,2,3]
231 >>> iterator = iter(L)
232 >>> t = tuple(iterator)
233 >>> t
234 (1, 2, 3)
235
236Sequence unpacking also supports iterators: if you know an iterator will return
237N elements, you can unpack them into an N-tuple::
238
239 >>> L = [1,2,3]
240 >>> iterator = iter(L)
241 >>> a,b,c = iterator
242 >>> a,b,c
243 (1, 2, 3)
244
245Built-in functions such as :func:`max` and :func:`min` can take a single
246iterator argument and will return the largest or smallest element. The ``"in"``
247and ``"not in"`` operators also support iterators: ``X in iterator`` is true if
248X is found in the stream returned by the iterator. You'll run into obvious
249problems if the iterator is infinite; ``max()``, ``min()``, and ``"not in"``
250will never return, and if the element X never appears in the stream, the
251``"in"`` operator won't return either.
252
253Note that you can only go forward in an iterator; there's no way to get the
254previous element, reset the iterator, or make a copy of it. Iterator objects
255can optionally provide these additional capabilities, but the iterator protocol
256only specifies the ``next()`` method. Functions may therefore consume all of
257the iterator's output, and if you need to do something different with the same
258stream, you'll have to create a new iterator.
259
260
261
262Data Types That Support Iterators
263---------------------------------
264
265We've already seen how lists and tuples support iterators. In fact, any Python
266sequence type, such as strings, will automatically support creation of an
267iterator.
268
269Calling :func:`iter` on a dictionary returns an iterator that will loop over the
270dictionary's keys::
271
272 >>> m = {'Jan': 1, 'Feb': 2, 'Mar': 3, 'Apr': 4, 'May': 5, 'Jun': 6,
273 ... 'Jul': 7, 'Aug': 8, 'Sep': 9, 'Oct': 10, 'Nov': 11, 'Dec': 12}
274 >>> for key in m:
275 ... print key, m[key]
276 Mar 3
277 Feb 2
278 Aug 8
279 Sep 9
280 May 5
281 Jun 6
282 Jul 7
283 Jan 1
284 Apr 4
285 Nov 11
286 Dec 12
287 Oct 10
288
289Note that the order is essentially random, because it's based on the hash
290ordering of the objects in the dictionary.
291
292Applying ``iter()`` to a dictionary always loops over the keys, but dictionaries
293have methods that return other iterators. If you want to iterate over keys,
294values, or key/value pairs, you can explicitly call the ``iterkeys()``,
295``itervalues()``, or ``iteritems()`` methods to get an appropriate iterator.
296
297The :func:`dict` constructor can accept an iterator that returns a finite stream
298of ``(key, value)`` tuples::
299
300 >>> L = [('Italy', 'Rome'), ('France', 'Paris'), ('US', 'Washington DC')]
301 >>> dict(iter(L))
302 {'Italy': 'Rome', 'US': 'Washington DC', 'France': 'Paris'}
303
304Files also support iteration by calling the ``readline()`` method until there
305are no more lines in the file. This means you can read each line of a file like
306this::
307
308 for line in file:
309 # do something for each line
310 ...
311
312Sets can take their contents from an iterable and let you iterate over the set's
313elements::
314
315 S = set((2, 3, 5, 7, 11, 13))
316 for i in S:
317 print i
318
319
320
321Generator expressions and list comprehensions
322=============================================
323
324Two common operations on an iterator's output are 1) performing some operation
325for every element, 2) selecting a subset of elements that meet some condition.
326For example, given a list of strings, you might want to strip off trailing
327whitespace from each line or extract all the strings containing a given
328substring.
329
330List comprehensions and generator expressions (short form: "listcomps" and
331"genexps") are a concise notation for such operations, borrowed from the
332functional programming language Haskell (http://www.haskell.org). You can strip
333all the whitespace from a stream of strings with the following code::
334
335 line_list = [' line 1\n', 'line 2 \n', ...]
336
337 # Generator expression -- returns iterator
338 stripped_iter = (line.strip() for line in line_list)
339
340 # List comprehension -- returns list
341 stripped_list = [line.strip() for line in line_list]
342
343You can select only certain elements by adding an ``"if"`` condition::
344
345 stripped_list = [line.strip() for line in line_list
346 if line != ""]
347
348With a list comprehension, you get back a Python list; ``stripped_list`` is a
349list containing the resulting lines, not an iterator. Generator expressions
350return an iterator that computes the values as necessary, not needing to
351materialize all the values at once. This means that list comprehensions aren't
352useful if you're working with iterators that return an infinite stream or a very
353large amount of data. Generator expressions are preferable in these situations.
354
355Generator expressions are surrounded by parentheses ("()") and list
356comprehensions are surrounded by square brackets ("[]"). Generator expressions
357have the form::
358
359 ( expression for expr in sequence1
360 if condition1
361 for expr2 in sequence2
362 if condition2
363 for expr3 in sequence3 ...
364 if condition3
365 for exprN in sequenceN
366 if conditionN )
367
368Again, for a list comprehension only the outside brackets are different (square
369brackets instead of parentheses).
370
371The elements of the generated output will be the successive values of
372``expression``. The ``if`` clauses are all optional; if present, ``expression``
373is only evaluated and added to the result when ``condition`` is true.
374
375Generator expressions always have to be written inside parentheses, but the
376parentheses signalling a function call also count. If you want to create an
377iterator that will be immediately passed to a function you can write::
378
379 obj_total = sum(obj.count for obj in list_all_objects())
380
381The ``for...in`` clauses contain the sequences to be iterated over. The
382sequences do not have to be the same length, because they are iterated over from
383left to right, **not** in parallel. For each element in ``sequence1``,
384``sequence2`` is looped over from the beginning. ``sequence3`` is then looped
385over for each resulting pair of elements from ``sequence1`` and ``sequence2``.
386
387To put it another way, a list comprehension or generator expression is
388equivalent to the following Python code::
389
390 for expr1 in sequence1:
391 if not (condition1):
392 continue # Skip this element
393 for expr2 in sequence2:
394 if not (condition2):
395 continue # Skip this element
396 ...
397 for exprN in sequenceN:
398 if not (conditionN):
399 continue # Skip this element
400
401 # Output the value of
402 # the expression.
403
404This means that when there are multiple ``for...in`` clauses but no ``if``
405clauses, the length of the resulting output will be equal to the product of the
406lengths of all the sequences. If you have two lists of length 3, the output
407list is 9 elements long::
408
409 seq1 = 'abc'
410 seq2 = (1,2,3)
411 >>> [ (x,y) for x in seq1 for y in seq2]
412 [('a', 1), ('a', 2), ('a', 3),
413 ('b', 1), ('b', 2), ('b', 3),
414 ('c', 1), ('c', 2), ('c', 3)]
415
416To avoid introducing an ambiguity into Python's grammar, if ``expression`` is
417creating a tuple, it must be surrounded with parentheses. The first list
418comprehension below is a syntax error, while the second one is correct::
419
420 # Syntax error
421 [ x,y for x in seq1 for y in seq2]
422 # Correct
423 [ (x,y) for x in seq1 for y in seq2]
424
425
426Generators
427==========
428
429Generators are a special class of functions that simplify the task of writing
430iterators. Regular functions compute a value and return it, but generators
431return an iterator that returns a stream of values.
432
433You're doubtless familiar with how regular function calls work in Python or C.
434When you call a function, it gets a private namespace where its local variables
435are created. When the function reaches a ``return`` statement, the local
436variables are destroyed and the value is returned to the caller. A later call
437to the same function creates a new private namespace and a fresh set of local
438variables. But, what if the local variables weren't thrown away on exiting a
439function? What if you could later resume the function where it left off? This
440is what generators provide; they can be thought of as resumable functions.
441
442Here's the simplest example of a generator function::
443
444 def generate_ints(N):
445 for i in range(N):
446 yield i
447
448Any function containing a ``yield`` keyword is a generator function; this is
Georg Brandl5e52db02007-10-21 10:45:46 +0000449detected by Python's :term:`bytecode` compiler which compiles the function
450specially as a result.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000451
452When you call a generator function, it doesn't return a single value; instead it
453returns a generator object that supports the iterator protocol. On executing
454the ``yield`` expression, the generator outputs the value of ``i``, similar to a
455``return`` statement. The big difference between ``yield`` and a ``return``
456statement is that on reaching a ``yield`` the generator's state of execution is
457suspended and local variables are preserved. On the next call to the
458generator's ``.next()`` method, the function will resume executing.
459
460Here's a sample usage of the ``generate_ints()`` generator::
461
462 >>> gen = generate_ints(3)
463 >>> gen
464 <generator object at 0x8117f90>
465 >>> gen.next()
466 0
467 >>> gen.next()
468 1
469 >>> gen.next()
470 2
471 >>> gen.next()
472 Traceback (most recent call last):
473 File "stdin", line 1, in ?
474 File "stdin", line 2, in generate_ints
475 StopIteration
476
477You could equally write ``for i in generate_ints(5)``, or ``a,b,c =
478generate_ints(3)``.
479
480Inside a generator function, the ``return`` statement can only be used without a
481value, and signals the end of the procession of values; after executing a
482``return`` the generator cannot return any further values. ``return`` with a
483value, such as ``return 5``, is a syntax error inside a generator function. The
484end of the generator's results can also be indicated by raising
485``StopIteration`` manually, or by just letting the flow of execution fall off
486the bottom of the function.
487
488You could achieve the effect of generators manually by writing your own class
489and storing all the local variables of the generator as instance variables. For
490example, returning a list of integers could be done by setting ``self.count`` to
4910, and having the ``next()`` method increment ``self.count`` and return it.
492However, for a moderately complicated generator, writing a corresponding class
493can be much messier.
494
495The test suite included with Python's library, ``test_generators.py``, contains
496a number of more interesting examples. Here's one generator that implements an
497in-order traversal of a tree using generators recursively.
498
499::
500
501 # A recursive generator that generates Tree leaves in in-order.
502 def inorder(t):
503 if t:
504 for x in inorder(t.left):
505 yield x
506
507 yield t.label
508
509 for x in inorder(t.right):
510 yield x
511
512Two other examples in ``test_generators.py`` produce solutions for the N-Queens
513problem (placing N queens on an NxN chess board so that no queen threatens
514another) and the Knight's Tour (finding a route that takes a knight to every
515square of an NxN chessboard without visiting any square twice).
516
517
518
519Passing values into a generator
520-------------------------------
521
522In Python 2.4 and earlier, generators only produced output. Once a generator's
523code was invoked to create an iterator, there was no way to pass any new
524information into the function when its execution is resumed. You could hack
525together this ability by making the generator look at a global variable or by
526passing in some mutable object that callers then modify, but these approaches
527are messy.
528
529In Python 2.5 there's a simple way to pass values into a generator.
530:keyword:`yield` became an expression, returning a value that can be assigned to
531a variable or otherwise operated on::
532
533 val = (yield i)
534
535I recommend that you **always** put parentheses around a ``yield`` expression
536when you're doing something with the returned value, as in the above example.
537The parentheses aren't always necessary, but it's easier to always add them
538instead of having to remember when they're needed.
539
540(PEP 342 explains the exact rules, which are that a ``yield``-expression must
541always be parenthesized except when it occurs at the top-level expression on the
542right-hand side of an assignment. This means you can write ``val = yield i``
543but have to use parentheses when there's an operation, as in ``val = (yield i)
544+ 12``.)
545
546Values are sent into a generator by calling its ``send(value)`` method. This
547method resumes the generator's code and the ``yield`` expression returns the
548specified value. If the regular ``next()`` method is called, the ``yield``
549returns ``None``.
550
551Here's a simple counter that increments by 1 and allows changing the value of
552the internal counter.
553
554::
555
556 def counter (maximum):
557 i = 0
558 while i < maximum:
559 val = (yield i)
560 # If value provided, change counter
561 if val is not None:
562 i = val
563 else:
564 i += 1
565
566And here's an example of changing the counter:
567
568 >>> it = counter(10)
569 >>> print it.next()
570 0
571 >>> print it.next()
572 1
573 >>> print it.send(8)
574 8
575 >>> print it.next()
576 9
577 >>> print it.next()
578 Traceback (most recent call last):
579 File ``t.py'', line 15, in ?
580 print it.next()
581 StopIteration
582
583Because ``yield`` will often be returning ``None``, you should always check for
584this case. Don't just use its value in expressions unless you're sure that the
585``send()`` method will be the only method used resume your generator function.
586
587In addition to ``send()``, there are two other new methods on generators:
588
589* ``throw(type, value=None, traceback=None)`` is used to raise an exception
590 inside the generator; the exception is raised by the ``yield`` expression
591 where the generator's execution is paused.
592
593* ``close()`` raises a :exc:`GeneratorExit` exception inside the generator to
594 terminate the iteration. On receiving this exception, the generator's code
595 must either raise :exc:`GeneratorExit` or :exc:`StopIteration`; catching the
596 exception and doing anything else is illegal and will trigger a
597 :exc:`RuntimeError`. ``close()`` will also be called by Python's garbage
598 collector when the generator is garbage-collected.
599
600 If you need to run cleanup code when a :exc:`GeneratorExit` occurs, I suggest
601 using a ``try: ... finally:`` suite instead of catching :exc:`GeneratorExit`.
602
603The cumulative effect of these changes is to turn generators from one-way
604producers of information into both producers and consumers.
605
606Generators also become **coroutines**, a more generalized form of subroutines.
607Subroutines are entered at one point and exited at another point (the top of the
608function, and a ``return`` statement), but coroutines can be entered, exited,
609and resumed at many different points (the ``yield`` statements).
610
611
612Built-in functions
613==================
614
615Let's look in more detail at built-in functions often used with iterators.
616
617Two Python's built-in functions, :func:`map` and :func:`filter`, are somewhat
618obsolete; they duplicate the features of list comprehensions but return actual
619lists instead of iterators.
620
621``map(f, iterA, iterB, ...)`` returns a list containing ``f(iterA[0], iterB[0]),
622f(iterA[1], iterB[1]), f(iterA[2], iterB[2]), ...``.
623
624::
625
626 def upper(s):
627 return s.upper()
628 map(upper, ['sentence', 'fragment']) =>
629 ['SENTENCE', 'FRAGMENT']
630
631 [upper(s) for s in ['sentence', 'fragment']] =>
632 ['SENTENCE', 'FRAGMENT']
633
634As shown above, you can achieve the same effect with a list comprehension. The
635:func:`itertools.imap` function does the same thing but can handle infinite
636iterators; it'll be discussed later, in the section on the :mod:`itertools` module.
637
638``filter(predicate, iter)`` returns a list that contains all the sequence
639elements that meet a certain condition, and is similarly duplicated by list
640comprehensions. A **predicate** is a function that returns the truth value of
641some condition; for use with :func:`filter`, the predicate must take a single
642value.
643
644::
645
646 def is_even(x):
647 return (x % 2) == 0
648
649 filter(is_even, range(10)) =>
650 [0, 2, 4, 6, 8]
651
652This can also be written as a list comprehension::
653
654 >>> [x for x in range(10) if is_even(x)]
655 [0, 2, 4, 6, 8]
656
657:func:`filter` also has a counterpart in the :mod:`itertools` module,
658:func:`itertools.ifilter`, that returns an iterator and can therefore handle
659infinite sequences just as :func:`itertools.imap` can.
660
661``reduce(func, iter, [initial_value])`` doesn't have a counterpart in the
662:mod:`itertools` module because it cumulatively performs an operation on all the
663iterable's elements and therefore can't be applied to infinite iterables.
664``func`` must be a function that takes two elements and returns a single value.
665:func:`reduce` takes the first two elements A and B returned by the iterator and
666calculates ``func(A, B)``. It then requests the third element, C, calculates
667``func(func(A, B), C)``, combines this result with the fourth element returned,
668and continues until the iterable is exhausted. If the iterable returns no
669values at all, a :exc:`TypeError` exception is raised. If the initial value is
670supplied, it's used as a starting point and ``func(initial_value, A)`` is the
671first calculation.
672
673::
674
675 import operator
676 reduce(operator.concat, ['A', 'BB', 'C']) =>
677 'ABBC'
678 reduce(operator.concat, []) =>
679 TypeError: reduce() of empty sequence with no initial value
680 reduce(operator.mul, [1,2,3], 1) =>
681 6
682 reduce(operator.mul, [], 1) =>
683 1
684
685If you use :func:`operator.add` with :func:`reduce`, you'll add up all the
686elements of the iterable. This case is so common that there's a special
687built-in called :func:`sum` to compute it::
688
689 reduce(operator.add, [1,2,3,4], 0) =>
690 10
691 sum([1,2,3,4]) =>
692 10
693 sum([]) =>
694 0
695
696For many uses of :func:`reduce`, though, it can be clearer to just write the
697obvious :keyword:`for` loop::
698
699 # Instead of:
700 product = reduce(operator.mul, [1,2,3], 1)
701
702 # You can write:
703 product = 1
704 for i in [1,2,3]:
705 product *= i
706
707
708``enumerate(iter)`` counts off the elements in the iterable, returning 2-tuples
709containing the count and each element.
710
711::
712
713 enumerate(['subject', 'verb', 'object']) =>
714 (0, 'subject'), (1, 'verb'), (2, 'object')
715
716:func:`enumerate` is often used when looping through a list and recording the
717indexes at which certain conditions are met::
718
719 f = open('data.txt', 'r')
720 for i, line in enumerate(f):
721 if line.strip() == '':
722 print 'Blank line at line #%i' % i
723
724``sorted(iterable, [cmp=None], [key=None], [reverse=False)`` collects all the
725elements of the iterable into a list, sorts the list, and returns the sorted
726result. The ``cmp``, ``key``, and ``reverse`` arguments are passed through to
727the constructed list's ``.sort()`` method.
728
729::
730
731 import random
732 # Generate 8 random numbers between [0, 10000)
733 rand_list = random.sample(range(10000), 8)
734 rand_list =>
735 [769, 7953, 9828, 6431, 8442, 9878, 6213, 2207]
736 sorted(rand_list) =>
737 [769, 2207, 6213, 6431, 7953, 8442, 9828, 9878]
738 sorted(rand_list, reverse=True) =>
739 [9878, 9828, 8442, 7953, 6431, 6213, 2207, 769]
740
741(For a more detailed discussion of sorting, see the Sorting mini-HOWTO in the
742Python wiki at http://wiki.python.org/moin/HowTo/Sorting.)
743
744The ``any(iter)`` and ``all(iter)`` built-ins look at the truth values of an
745iterable's contents. :func:`any` returns True if any element in the iterable is
746a true value, and :func:`all` returns True if all of the elements are true
747values::
748
749 any([0,1,0]) =>
750 True
751 any([0,0,0]) =>
752 False
753 any([1,1,1]) =>
754 True
755 all([0,1,0]) =>
756 False
757 all([0,0,0]) =>
758 False
759 all([1,1,1]) =>
760 True
761
762
763Small functions and the lambda expression
764=========================================
765
766When writing functional-style programs, you'll often need little functions that
767act as predicates or that combine elements in some way.
768
769If there's a Python built-in or a module function that's suitable, you don't
770need to define a new function at all::
771
772 stripped_lines = [line.strip() for line in lines]
773 existing_files = filter(os.path.exists, file_list)
774
775If the function you need doesn't exist, you need to write it. One way to write
776small functions is to use the ``lambda`` statement. ``lambda`` takes a number
777of parameters and an expression combining these parameters, and creates a small
778function that returns the value of the expression::
779
780 lowercase = lambda x: x.lower()
781
782 print_assign = lambda name, value: name + '=' + str(value)
783
784 adder = lambda x, y: x+y
785
786An alternative is to just use the ``def`` statement and define a function in the
787usual way::
788
789 def lowercase(x):
790 return x.lower()
791
792 def print_assign(name, value):
793 return name + '=' + str(value)
794
795 def adder(x,y):
796 return x + y
797
798Which alternative is preferable? That's a style question; my usual course is to
799avoid using ``lambda``.
800
801One reason for my preference is that ``lambda`` is quite limited in the
802functions it can define. The result has to be computable as a single
803expression, which means you can't have multiway ``if... elif... else``
804comparisons or ``try... except`` statements. If you try to do too much in a
805``lambda`` statement, you'll end up with an overly complicated expression that's
806hard to read. Quick, what's the following code doing?
807
808::
809
810 total = reduce(lambda a, b: (0, a[1] + b[1]), items)[1]
811
812You can figure it out, but it takes time to disentangle the expression to figure
813out what's going on. Using a short nested ``def`` statements makes things a
814little bit better::
815
816 def combine (a, b):
817 return 0, a[1] + b[1]
818
819 total = reduce(combine, items)[1]
820
821But it would be best of all if I had simply used a ``for`` loop::
822
823 total = 0
824 for a, b in items:
825 total += b
826
827Or the :func:`sum` built-in and a generator expression::
828
829 total = sum(b for a,b in items)
830
831Many uses of :func:`reduce` are clearer when written as ``for`` loops.
832
833Fredrik Lundh once suggested the following set of rules for refactoring uses of
834``lambda``:
835
8361) Write a lambda function.
8372) Write a comment explaining what the heck that lambda does.
8383) Study the comment for a while, and think of a name that captures the essence
839 of the comment.
8404) Convert the lambda to a def statement, using that name.
8415) Remove the comment.
842
843I really like these rules, but you're free to disagree that this lambda-free
844style is better.
845
846
847The itertools module
848====================
849
850The :mod:`itertools` module contains a number of commonly-used iterators as well
851as functions for combining several iterators. This section will introduce the
852module's contents by showing small examples.
853
854The module's functions fall into a few broad classes:
855
856* Functions that create a new iterator based on an existing iterator.
857* Functions for treating an iterator's elements as function arguments.
858* Functions for selecting portions of an iterator's output.
859* A function for grouping an iterator's output.
860
861Creating new iterators
862----------------------
863
864``itertools.count(n)`` returns an infinite stream of integers, increasing by 1
865each time. You can optionally supply the starting number, which defaults to 0::
866
867 itertools.count() =>
868 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ...
869 itertools.count(10) =>
870 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ...
871
872``itertools.cycle(iter)`` saves a copy of the contents of a provided iterable
873and returns a new iterator that returns its elements from first to last. The
874new iterator will repeat these elements infinitely.
875
876::
877
878 itertools.cycle([1,2,3,4,5]) =>
879 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, ...
880
881``itertools.repeat(elem, [n])`` returns the provided element ``n`` times, or
882returns the element endlessly if ``n`` is not provided.
883
884::
885
886 itertools.repeat('abc') =>
887 abc, abc, abc, abc, abc, abc, abc, abc, abc, abc, ...
888 itertools.repeat('abc', 5) =>
889 abc, abc, abc, abc, abc
890
891``itertools.chain(iterA, iterB, ...)`` takes an arbitrary number of iterables as
892input, and returns all the elements of the first iterator, then all the elements
893of the second, and so on, until all of the iterables have been exhausted.
894
895::
896
897 itertools.chain(['a', 'b', 'c'], (1, 2, 3)) =>
898 a, b, c, 1, 2, 3
899
900``itertools.izip(iterA, iterB, ...)`` takes one element from each iterable and
901returns them in a tuple::
902
903 itertools.izip(['a', 'b', 'c'], (1, 2, 3)) =>
904 ('a', 1), ('b', 2), ('c', 3)
905
906It's similiar to the built-in :func:`zip` function, but doesn't construct an
907in-memory list and exhaust all the input iterators before returning; instead
908tuples are constructed and returned only if they're requested. (The technical
909term for this behaviour is `lazy evaluation
910<http://en.wikipedia.org/wiki/Lazy_evaluation>`__.)
911
912This iterator is intended to be used with iterables that are all of the same
913length. If the iterables are of different lengths, the resulting stream will be
914the same length as the shortest iterable.
915
916::
917
918 itertools.izip(['a', 'b'], (1, 2, 3)) =>
919 ('a', 1), ('b', 2)
920
921You should avoid doing this, though, because an element may be taken from the
922longer iterators and discarded. This means you can't go on to use the iterators
923further because you risk skipping a discarded element.
924
925``itertools.islice(iter, [start], stop, [step])`` returns a stream that's a
926slice of the iterator. With a single ``stop`` argument, it will return the
927first ``stop`` elements. If you supply a starting index, you'll get
928``stop-start`` elements, and if you supply a value for ``step``, elements will
929be skipped accordingly. Unlike Python's string and list slicing, you can't use
930negative values for ``start``, ``stop``, or ``step``.
931
932::
933
934 itertools.islice(range(10), 8) =>
935 0, 1, 2, 3, 4, 5, 6, 7
936 itertools.islice(range(10), 2, 8) =>
937 2, 3, 4, 5, 6, 7
938 itertools.islice(range(10), 2, 8, 2) =>
939 2, 4, 6
940
941``itertools.tee(iter, [n])`` replicates an iterator; it returns ``n``
942independent iterators that will all return the contents of the source iterator.
943If you don't supply a value for ``n``, the default is 2. Replicating iterators
944requires saving some of the contents of the source iterator, so this can consume
945significant memory if the iterator is large and one of the new iterators is
946consumed more than the others.
947
948::
949
950 itertools.tee( itertools.count() ) =>
951 iterA, iterB
952
953 where iterA ->
954 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ...
955
956 and iterB ->
957 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ...
958
959
960Calling functions on elements
961-----------------------------
962
963Two functions are used for calling other functions on the contents of an
964iterable.
965
966``itertools.imap(f, iterA, iterB, ...)`` returns a stream containing
967``f(iterA[0], iterB[0]), f(iterA[1], iterB[1]), f(iterA[2], iterB[2]), ...``::
968
969 itertools.imap(operator.add, [5, 6, 5], [1, 2, 3]) =>
970 6, 8, 8
971
972The ``operator`` module contains a set of functions corresponding to Python's
973operators. Some examples are ``operator.add(a, b)`` (adds two values),
974``operator.ne(a, b)`` (same as ``a!=b``), and ``operator.attrgetter('id')``
975(returns a callable that fetches the ``"id"`` attribute).
976
977``itertools.starmap(func, iter)`` assumes that the iterable will return a stream
978of tuples, and calls ``f()`` using these tuples as the arguments::
979
980 itertools.starmap(os.path.join,
981 [('/usr', 'bin', 'java'), ('/bin', 'python'),
982 ('/usr', 'bin', 'perl'),('/usr', 'bin', 'ruby')])
983 =>
984 /usr/bin/java, /bin/python, /usr/bin/perl, /usr/bin/ruby
985
986
987Selecting elements
988------------------
989
990Another group of functions chooses a subset of an iterator's elements based on a
991predicate.
992
993``itertools.ifilter(predicate, iter)`` returns all the elements for which the
994predicate returns true::
995
996 def is_even(x):
997 return (x % 2) == 0
998
999 itertools.ifilter(is_even, itertools.count()) =>
1000 0, 2, 4, 6, 8, 10, 12, 14, ...
1001
1002``itertools.ifilterfalse(predicate, iter)`` is the opposite, returning all
1003elements for which the predicate returns false::
1004
1005 itertools.ifilterfalse(is_even, itertools.count()) =>
1006 1, 3, 5, 7, 9, 11, 13, 15, ...
1007
1008``itertools.takewhile(predicate, iter)`` returns elements for as long as the
1009predicate returns true. Once the predicate returns false, the iterator will
1010signal the end of its results.
1011
1012::
1013
1014 def less_than_10(x):
1015 return (x < 10)
1016
1017 itertools.takewhile(less_than_10, itertools.count()) =>
1018 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
1019
1020 itertools.takewhile(is_even, itertools.count()) =>
1021 0
1022
1023``itertools.dropwhile(predicate, iter)`` discards elements while the predicate
1024returns true, and then returns the rest of the iterable's results.
1025
1026::
1027
1028 itertools.dropwhile(less_than_10, itertools.count()) =>
1029 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ...
1030
1031 itertools.dropwhile(is_even, itertools.count()) =>
1032 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ...
1033
1034
1035Grouping elements
1036-----------------
1037
1038The last function I'll discuss, ``itertools.groupby(iter, key_func=None)``, is
1039the most complicated. ``key_func(elem)`` is a function that can compute a key
1040value for each element returned by the iterable. If you don't supply a key
1041function, the key is simply each element itself.
1042
1043``groupby()`` collects all the consecutive elements from the underlying iterable
1044that have the same key value, and returns a stream of 2-tuples containing a key
1045value and an iterator for the elements with that key.
1046
1047::
1048
1049 city_list = [('Decatur', 'AL'), ('Huntsville', 'AL'), ('Selma', 'AL'),
1050 ('Anchorage', 'AK'), ('Nome', 'AK'),
1051 ('Flagstaff', 'AZ'), ('Phoenix', 'AZ'), ('Tucson', 'AZ'),
1052 ...
1053 ]
1054
1055 def get_state ((city, state)):
1056 return state
1057
1058 itertools.groupby(city_list, get_state) =>
1059 ('AL', iterator-1),
1060 ('AK', iterator-2),
1061 ('AZ', iterator-3), ...
1062
1063 where
1064 iterator-1 =>
1065 ('Decatur', 'AL'), ('Huntsville', 'AL'), ('Selma', 'AL')
1066 iterator-2 =>
1067 ('Anchorage', 'AK'), ('Nome', 'AK')
1068 iterator-3 =>
1069 ('Flagstaff', 'AZ'), ('Phoenix', 'AZ'), ('Tucson', 'AZ')
1070
1071``groupby()`` assumes that the underlying iterable's contents will already be
1072sorted based on the key. Note that the returned iterators also use the
1073underlying iterable, so you have to consume the results of iterator-1 before
1074requesting iterator-2 and its corresponding key.
1075
1076
1077The functools module
1078====================
1079
1080The :mod:`functools` module in Python 2.5 contains some higher-order functions.
1081A **higher-order function** takes one or more functions as input and returns a
1082new function. The most useful tool in this module is the
1083:func:`functools.partial` function.
1084
1085For programs written in a functional style, you'll sometimes want to construct
1086variants of existing functions that have some of the parameters filled in.
1087Consider a Python function ``f(a, b, c)``; you may wish to create a new function
1088``g(b, c)`` that's equivalent to ``f(1, b, c)``; you're filling in a value for
1089one of ``f()``'s parameters. This is called "partial function application".
1090
1091The constructor for ``partial`` takes the arguments ``(function, arg1, arg2,
1092... kwarg1=value1, kwarg2=value2)``. The resulting object is callable, so you
1093can just call it to invoke ``function`` with the filled-in arguments.
1094
1095Here's a small but realistic example::
1096
1097 import functools
1098
1099 def log (message, subsystem):
1100 "Write the contents of 'message' to the specified subsystem."
1101 print '%s: %s' % (subsystem, message)
1102 ...
1103
1104 server_log = functools.partial(log, subsystem='server')
1105 server_log('Unable to open socket')
1106
1107
1108The operator module
1109-------------------
1110
1111The :mod:`operator` module was mentioned earlier. It contains a set of
1112functions corresponding to Python's operators. These functions are often useful
1113in functional-style code because they save you from writing trivial functions
1114that perform a single operation.
1115
1116Some of the functions in this module are:
1117
1118* Math operations: ``add()``, ``sub()``, ``mul()``, ``div()``, ``floordiv()``,
1119 ``abs()``, ...
1120* Logical operations: ``not_()``, ``truth()``.
1121* Bitwise operations: ``and_()``, ``or_()``, ``invert()``.
1122* Comparisons: ``eq()``, ``ne()``, ``lt()``, ``le()``, ``gt()``, and ``ge()``.
1123* Object identity: ``is_()``, ``is_not()``.
1124
1125Consult the operator module's documentation for a complete list.
1126
1127
1128
1129The functional module
1130---------------------
1131
1132Collin Winter's `functional module <http://oakwinter.com/code/functional/>`__
1133provides a number of more advanced tools for functional programming. It also
1134reimplements several Python built-ins, trying to make them more intuitive to
1135those used to functional programming in other languages.
1136
1137This section contains an introduction to some of the most important functions in
1138``functional``; full documentation can be found at `the project's website
1139<http://oakwinter.com/code/functional/documentation/>`__.
1140
1141``compose(outer, inner, unpack=False)``
1142
1143The ``compose()`` function implements function composition. In other words, it
1144returns a wrapper around the ``outer`` and ``inner`` callables, such that the
1145return value from ``inner`` is fed directly to ``outer``. That is,
1146
1147::
1148
1149 >>> def add(a, b):
1150 ... return a + b
1151 ...
1152 >>> def double(a):
1153 ... return 2 * a
1154 ...
1155 >>> compose(double, add)(5, 6)
1156 22
1157
1158is equivalent to
1159
1160::
1161
1162 >>> double(add(5, 6))
1163 22
1164
1165The ``unpack`` keyword is provided to work around the fact that Python functions
1166are not always `fully curried <http://en.wikipedia.org/wiki/Currying>`__. By
1167default, it is expected that the ``inner`` function will return a single object
1168and that the ``outer`` function will take a single argument. Setting the
1169``unpack`` argument causes ``compose`` to expect a tuple from ``inner`` which
1170will be expanded before being passed to ``outer``. Put simply,
1171
1172::
1173
1174 compose(f, g)(5, 6)
1175
1176is equivalent to::
1177
1178 f(g(5, 6))
1179
1180while
1181
1182::
1183
1184 compose(f, g, unpack=True)(5, 6)
1185
1186is equivalent to::
1187
1188 f(*g(5, 6))
1189
1190Even though ``compose()`` only accepts two functions, it's trivial to build up a
1191version that will compose any number of functions. We'll use ``reduce()``,
1192``compose()`` and ``partial()`` (the last of which is provided by both
1193``functional`` and ``functools``).
1194
1195::
1196
1197 from functional import compose, partial
1198
1199 multi_compose = partial(reduce, compose)
1200
1201
1202We can also use ``map()``, ``compose()`` and ``partial()`` to craft a version of
1203``"".join(...)`` that converts its arguments to string::
1204
1205 from functional import compose, partial
1206
1207 join = compose("".join, partial(map, str))
1208
1209
1210``flip(func)``
1211
1212``flip()`` wraps the callable in ``func`` and causes it to receive its
1213non-keyword arguments in reverse order.
1214
1215::
1216
1217 >>> def triple(a, b, c):
1218 ... return (a, b, c)
1219 ...
1220 >>> triple(5, 6, 7)
1221 (5, 6, 7)
1222 >>>
1223 >>> flipped_triple = flip(triple)
1224 >>> flipped_triple(5, 6, 7)
1225 (7, 6, 5)
1226
1227``foldl(func, start, iterable)``
1228
1229``foldl()`` takes a binary function, a starting value (usually some kind of
1230'zero'), and an iterable. The function is applied to the starting value and the
1231first element of the list, then the result of that and the second element of the
1232list, then the result of that and the third element of the list, and so on.
1233
1234This means that a call such as::
1235
1236 foldl(f, 0, [1, 2, 3])
1237
1238is equivalent to::
1239
1240 f(f(f(0, 1), 2), 3)
1241
1242
1243``foldl()`` is roughly equivalent to the following recursive function::
1244
1245 def foldl(func, start, seq):
1246 if len(seq) == 0:
1247 return start
1248
1249 return foldl(func, func(start, seq[0]), seq[1:])
1250
1251Speaking of equivalence, the above ``foldl`` call can be expressed in terms of
1252the built-in ``reduce`` like so::
1253
1254 reduce(f, [1, 2, 3], 0)
1255
1256
1257We can use ``foldl()``, ``operator.concat()`` and ``partial()`` to write a
1258cleaner, more aesthetically-pleasing version of Python's ``"".join(...)``
1259idiom::
1260
1261 from functional import foldl, partial
1262 from operator import concat
1263
1264 join = partial(foldl, concat, "")
1265
1266
1267Revision History and Acknowledgements
1268=====================================
1269
1270The author would like to thank the following people for offering suggestions,
1271corrections and assistance with various drafts of this article: Ian Bicking,
1272Nick Coghlan, Nick Efford, Raymond Hettinger, Jim Jewett, Mike Krell, Leandro
1273Lameiro, Jussi Salmela, Collin Winter, Blake Winton.
1274
1275Version 0.1: posted June 30 2006.
1276
1277Version 0.11: posted July 1 2006. Typo fixes.
1278
1279Version 0.2: posted July 10 2006. Merged genexp and listcomp sections into one.
1280Typo fixes.
1281
1282Version 0.21: Added more references suggested on the tutor mailing list.
1283
1284Version 0.30: Adds a section on the ``functional`` module written by Collin
1285Winter; adds short section on the operator module; a few other edits.
1286
1287
1288References
1289==========
1290
1291General
1292-------
1293
1294**Structure and Interpretation of Computer Programs**, by Harold Abelson and
1295Gerald Jay Sussman with Julie Sussman. Full text at
1296http://mitpress.mit.edu/sicp/. In this classic textbook of computer science,
1297chapters 2 and 3 discuss the use of sequences and streams to organize the data
1298flow inside a program. The book uses Scheme for its examples, but many of the
1299design approaches described in these chapters are applicable to functional-style
1300Python code.
1301
1302http://www.defmacro.org/ramblings/fp.html: A general introduction to functional
1303programming that uses Java examples and has a lengthy historical introduction.
1304
1305http://en.wikipedia.org/wiki/Functional_programming: General Wikipedia entry
1306describing functional programming.
1307
1308http://en.wikipedia.org/wiki/Coroutine: Entry for coroutines.
1309
1310http://en.wikipedia.org/wiki/Currying: Entry for the concept of currying.
1311
1312Python-specific
1313---------------
1314
1315http://gnosis.cx/TPiP/: The first chapter of David Mertz's book
1316:title-reference:`Text Processing in Python` discusses functional programming
1317for text processing, in the section titled "Utilizing Higher-Order Functions in
1318Text Processing".
1319
1320Mertz also wrote a 3-part series of articles on functional programming
1321for IBM's DeveloperWorks site; see
1322`part 1 <http://www-128.ibm.com/developerworks/library/l-prog.html>`__,
1323`part 2 <http://www-128.ibm.com/developerworks/library/l-prog2.html>`__, and
1324`part 3 <http://www-128.ibm.com/developerworks/linux/library/l-prog3.html>`__,
1325
1326
1327Python documentation
1328--------------------
1329
1330Documentation for the :mod:`itertools` module.
1331
1332Documentation for the :mod:`operator` module.
1333
1334:pep:`289`: "Generator Expressions"
1335
1336:pep:`342`: "Coroutines via Enhanced Generators" describes the new generator
1337features in Python 2.5.
1338
1339.. comment
1340
1341 Topics to place
1342 -----------------------------
1343
1344 XXX os.walk()
1345
1346 XXX Need a large example.
1347
1348 But will an example add much? I'll post a first draft and see
1349 what the comments say.
1350
1351.. comment
1352
1353 Original outline:
1354 Introduction
1355 Idea of FP
1356 Programs built out of functions
1357 Functions are strictly input-output, no internal state
1358 Opposed to OO programming, where objects have state
1359
1360 Why FP?
1361 Formal provability
1362 Assignment is difficult to reason about
1363 Not very relevant to Python
1364 Modularity
1365 Small functions that do one thing
1366 Debuggability:
1367 Easy to test due to lack of state
1368 Easy to verify output from intermediate steps
1369 Composability
1370 You assemble a toolbox of functions that can be mixed
1371
1372 Tackling a problem
1373 Need a significant example
1374
1375 Iterators
1376 Generators
1377 The itertools module
1378 List comprehensions
1379 Small functions and the lambda statement
1380 Built-in functions
1381 map
1382 filter
1383 reduce
1384
1385.. comment
1386
1387 Handy little function for printing part of an iterator -- used
1388 while writing this document.
1389
1390 import itertools
1391 def print_iter(it):
1392 slice = itertools.islice(it, 10)
1393 for elem in slice[:-1]:
1394 sys.stdout.write(str(elem))
1395 sys.stdout.write(', ')
1396 print elem[-1]
1397
1398