blob: 42aad5d45ee5e8518a4ce78c030417a1ee6feae3 [file] [log] [blame]
Andrew M. Kuchling22145072006-08-22 23:13:43 +00001Functional Programming HOWTO
2================================
3
Andrew M. Kuchlingf4dcd1d2006-11-08 13:35:34 +00004**Version 0.30**
Andrew M. Kuchling22145072006-08-22 23:13:43 +00005
6(This is a first draft. Please send comments/error
7reports/suggestions to amk@amk.ca. This URL is probably not going to
8be the final location of the document, so be careful about linking to
9it -- you may want to add a disclaimer.)
10
11In this document, we'll take a tour of Python's features suitable for
12implementing programs in a functional style. After an introduction to
13the concepts of functional programming, we'll look at language
14features such as iterators and generators and relevant library modules
15such as ``itertools`` and ``functools``.
16
17
18Introduction
19----------------------
20
21This section explains the basic concept of functional programming; if
22you're just interested in learning about Python language features,
23skip to the next section.
24
25Programming languages support decomposing problems in several different
26ways:
27
28* Most programming languages are **procedural**:
29 programs are lists of instructions that tell the computer what to
30 do with the program's input.
31 C, Pascal, and even Unix shells are procedural languages.
32
33* In **declarative** languages, you write a specification that describes
34 the problem to be solved, and the language implementation figures out
35 how to perform the computation efficiently. SQL is the declarative
36 language you're most likely to be familiar with; a SQL query describes
37 the data set you want to retrieve, and the SQL engine decides whether to
38 scan tables or use indexes, which subclauses should be performed first,
39 etc.
40
41* **Object-oriented** programs manipulate collections of objects.
42 Objects have internal state and support methods that query or modify
43 this internal state in some way. Smalltalk and Java are
44 object-oriented languages. C++ and Python are languages that
45 support object-oriented programming, but don't force the use
46 of object-oriented features.
47
48* **Functional** programming decomposes a problem into a set of functions.
49 Ideally, functions only take inputs and produce outputs, and don't have any
50 internal state that affects the output produced for a given input.
51 Well-known functional languages include the ML family (Standard ML,
52 OCaml, and other variants) and Haskell.
53
54The designers of some computer languages have chosen one approach to
55programming that's emphasized. This often makes it difficult to
56write programs that use a different approach. Other languages are
57multi-paradigm languages that support several different approaches. Lisp,
58C++, and Python are multi-paradigm; you can write programs or
59libraries that are largely procedural, object-oriented, or functional
60in all of these languages. In a large program, different sections
61might be written using different approaches; the GUI might be object-oriented
62while the processing logic is procedural or functional, for example.
63
64In a functional program, input flows through a set of functions. Each
65function operates on its input and produces some output. Functional
66style frowns upon functions with side effects that modify internal
67state or make other changes that aren't visible in the function's
68return value. Functions that have no side effects at all are
69called **purely functional**.
70Avoiding side effects means not using data structures
71that get updated as a program runs; every function's output
72must only depend on its input.
73
74Some languages are very strict about purity and don't even have
75assignment statements such as ``a=3`` or ``c = a + b``, but it's
76difficult to avoid all side effects. Printing to the screen or
77writing to a disk file are side effects, for example. For example, in
78Python a ``print`` statement or a ``time.sleep(1)`` both return no
79useful value; they're only called for their side effects of sending
80some text to the screen or pausing execution for a second.
81
82Python programs written in functional style usually won't go to the
83extreme of avoiding all I/O or all assignments; instead, they'll
84provide a functional-appearing interface but will use non-functional
85features internally. For example, the implementation of a function
86will still use assignments to local variables, but won't modify global
87variables or have other side effects.
88
89Functional programming can be considered the opposite of
90object-oriented programming. Objects are little capsules containing
91some internal state along with a collection of method calls that let
92you modify this state, and programs consist of making the right set of
93state changes. Functional programming wants to avoid state changes as
94much as possible and works with data flowing between functions. In
95Python you might combine the two approaches by writing functions that
96take and return instances representing objects in your application
97(e-mail messages, transactions, etc.).
98
99Functional design may seem like an odd constraint to work under. Why
100should you avoid objects and side effects? There are theoretical and
101practical advantages to the functional style:
102
103* Formal provability.
104* Modularity.
105* Composability.
106* Ease of debugging and testing.
107
108Formal provability
109''''''''''''''''''''''
110
111A theoretical benefit is that it's easier to construct a mathematical proof
112that a functional program is correct.
113
114For a long time researchers have been interested in finding ways to
115mathematically prove programs correct. This is different from testing
116a program on numerous inputs and concluding that its output is usually
117correct, or reading a program's source code and concluding that the
118code looks right; the goal is instead a rigorous proof that a program
119produces the right result for all possible inputs.
120
121The technique used to prove programs correct is to write down
122**invariants**, properties of the input data and of the program's
123variables that are always true. For each line of code, you then show
124that if invariants X and Y are true **before** the line is executed,
125the slightly different invariants X' and Y' are true **after**
126the line is executed. This continues until you reach the end of the
127program, at which point the invariants should match the desired
128conditions on the program's output.
129
130Functional programming's avoidance of assignments arose because
131assignments are difficult to handle with this technique;
132assignments can break invariants that were true before the assignment
133without producing any new invariants that can be propagated onward.
134
135Unfortunately, proving programs correct is largely impractical and not
136relevant to Python software. Even trivial programs require proofs that
137are several pages long; the proof of correctness for a moderately
138complicated program would be enormous, and few or none of the programs
139you use daily (the Python interpreter, your XML parser, your web
140browser) could be proven correct. Even if you wrote down or generated
141a proof, there would then be the question of verifying the proof;
142maybe there's an error in it, and you wrongly believe you've proved
143the program correct.
144
145Modularity
146''''''''''''''''''''''
147
148A more practical benefit of functional programming is that it forces
149you to break apart your problem into small pieces. Programs are more
150modular as a result. It's easier to specify and write a small
151function that does one thing than a large function that performs a
152complicated transformation. Small functions are also easier to read
153and to check for errors.
154
155
156Ease of debugging and testing
157''''''''''''''''''''''''''''''''''
158
159Testing and debugging a functional-style program is easier.
160
161Debugging is simplified because functions are generally small and
162clearly specified. When a program doesn't work, each function is an
163interface point where you can check that the data are correct. You
164can look at the intermediate inputs and outputs to quickly isolate the
165function that's responsible for a bug.
166
167Testing is easier because each function is a potential subject for a
168unit test. Functions don't depend on system state that needs to be
169replicated before running a test; instead you only have to synthesize
170the right input and then check that the output matches expectations.
171
172
173
174Composability
175''''''''''''''''''''''
176
177As you work on a functional-style program, you'll write a number of
178functions with varying inputs and outputs. Some of these functions
179will be unavoidably specialized to a particular application, but
180others will be useful in a wide variety of programs. For example, a
181function that takes a directory path and returns all the XML files in
182the directory, or a function that takes a filename and returns its
183contents, can be applied to many different situations.
184
185Over time you'll form a personal library of utilities. Often you'll
186assemble new programs by arranging existing functions in a new
187configuration and writing a few functions specialized for the current
188task.
189
190
191
192Iterators
193-----------------------
194
195I'll start by looking at a Python language feature that's an important
196foundation for writing functional-style programs: iterators.
197
198An iterator is an object representing a stream of data; this object
199returns the data one element at a time. A Python iterator must
200support a method called ``next()`` that takes no arguments and always
201returns the next element of the stream. If there are no more elements
202in the stream, ``next()`` must raise the ``StopIteration`` exception.
203Iterators don't have to be finite, though; it's perfectly reasonable
204to write an iterator that produces an infinite stream of data.
205
206The built-in ``iter()`` function takes an arbitrary object and tries
207to return an iterator that will return the object's contents or
208elements, raising ``TypeError`` if the object doesn't support
209iteration. Several of Python's built-in data types support iteration,
210the most common being lists and dictionaries. An object is called
211an **iterable** object if you can get an iterator for it.
212
213You can experiment with the iteration interface manually::
214
215 >>> L = [1,2,3]
216 >>> it = iter(L)
217 >>> print it
218 <iterator object at 0x8116870>
219 >>> it.next()
220 1
221 >>> it.next()
222 2
223 >>> it.next()
224 3
225 >>> it.next()
226 Traceback (most recent call last):
227 File "<stdin>", line 1, in ?
228 StopIteration
229 >>>
230
231Python expects iterable objects in several different contexts, the
232most important being the ``for`` statement. In the statement ``for X in Y``,
233Y must be an iterator or some object for which ``iter()`` can create
234an iterator. These two statements are equivalent::
235
236 for i in iter(obj):
237 print i
238
239 for i in obj:
240 print i
241
242Iterators can be materialized as lists or tuples by using the
243``list()`` or ``tuple()`` constructor functions::
244
245 >>> L = [1,2,3]
246 >>> iterator = iter(L)
247 >>> t = tuple(iterator)
248 >>> t
249 (1, 2, 3)
250
251Sequence unpacking also supports iterators: if you know an iterator
252will return N elements, you can unpack them into an N-tuple::
253
254 >>> L = [1,2,3]
255 >>> iterator = iter(L)
256 >>> a,b,c = iterator
257 >>> a,b,c
258 (1, 2, 3)
259
260Built-in functions such as ``max()`` and ``min()`` can take a single
261iterator argument and will return the largest or smallest element.
262The ``"in"`` and ``"not in"`` operators also support iterators: ``X in
263iterator`` is true if X is found in the stream returned by the
264iterator. You'll run into obvious problems if the iterator is
265infinite; ``max()``, ``min()``, and ``"not in"`` will never return, and
266if the element X never appears in the stream, the ``"in"`` operator
267won't return either.
268
269Note that you can only go forward in an iterator; there's no way to
270get the previous element, reset the iterator, or make a copy of it.
271Iterator objects can optionally provide these additional capabilities,
272but the iterator protocol only specifies the ``next()`` method.
273Functions may therefore consume all of the iterator's output, and if
274you need to do something different with the same stream, you'll have
275to create a new iterator.
276
277
278
279Data Types That Support Iterators
280'''''''''''''''''''''''''''''''''''
281
282We've already seen how lists and tuples support iterators. In fact,
283any Python sequence type, such as strings, will automatically support
284creation of an iterator.
285
286Calling ``iter()`` on a dictionary returns an iterator that will loop
287over the dictionary's keys::
288
289 >>> m = {'Jan': 1, 'Feb': 2, 'Mar': 3, 'Apr': 4, 'May': 5, 'Jun': 6,
290 ... 'Jul': 7, 'Aug': 8, 'Sep': 9, 'Oct': 10, 'Nov': 11, 'Dec': 12}
291 >>> for key in m:
292 ... print key, m[key]
293 Mar 3
294 Feb 2
295 Aug 8
296 Sep 9
297 May 5
298 Jun 6
299 Jul 7
300 Jan 1
301 Apr 4
302 Nov 11
303 Dec 12
304 Oct 10
305
306Note that the order is essentially random, because it's based on the
307hash ordering of the objects in the dictionary.
308
309Applying ``iter()`` to a dictionary always loops over the keys, but
310dictionaries have methods that return other iterators. If you want to
311iterate over keys, values, or key/value pairs, you can explicitly call
312the ``iterkeys()``, ``itervalues()``, or ``iteritems()`` methods to
313get an appropriate iterator.
314
315The ``dict()`` constructor can accept an iterator that returns a
316finite stream of ``(key, value)`` tuples::
317
318 >>> L = [('Italy', 'Rome'), ('France', 'Paris'), ('US', 'Washington DC')]
319 >>> dict(iter(L))
320 {'Italy': 'Rome', 'US': 'Washington DC', 'France': 'Paris'}
321
322Files also support iteration by calling the ``readline()``
323method until there are no more lines in the file. This means you can
324read each line of a file like this::
325
326 for line in file:
327 # do something for each line
328 ...
329
330Sets can take their contents from an iterable and let you iterate over
331the set's elements::
332
333 S = set((2, 3, 5, 7, 11, 13))
334 for i in S:
335 print i
336
337
338
339Generator expressions and list comprehensions
340----------------------------------------------------
341
342Two common operations on a stream are 1) performing some operation for
343every element, 2) selecting a subset of elements that meet some
344condition. For example, given a list of strings, you might want to
345strip off trailing whitespace from each line or extract all the
346strings containing a given substring.
347
348List comprehensions and generator expressions (short form: "listcomps"
349and "genexps") are a concise notation for such operations, borrowed
350from the functional programming language Haskell
351(http://www.haskell.org). You can strip all the whitespace from a
352stream of strings with the following code::
353
354 line_list = [' line 1\n', 'line 2 \n', ...]
355
356 # Generator expression -- returns iterator
357 stripped_iter = (line.strip() for line in line_list)
358
359 # List comprehension -- returns list
360 stripped_list = [line.strip() for line in line_list]
361
362You can select only certain elements by adding an ``"if"`` condition::
363
364 stripped_list = [line.strip() for line in line_list
365 if line != ""]
366
367With a list comprehension, you get back a Python list;
368``stripped_list`` is a list containing the resulting lines, not an
369iterator. Generator expressions return an iterator that computes the
370values as necessary, not needing to materialize all the values at
371once. This means that list comprehensions aren't useful if you're
372working with iterators that return an infinite stream or a very large
373amount of data. Generator expressions are preferable in these
374situations.
375
376Generator expressions are surrounded by parentheses ("()") and list
377comprehensions are surrounded by square brackets ("[]"). Generator
378expressions have the form::
379
380 ( expression for expr in sequence1
381 if condition1
382 for expr2 in sequence2
383 if condition2
384 for expr3 in sequence3 ...
385 if condition3
386 for exprN in sequenceN
387 if conditionN )
388
389Again, for a list comprehension only the outside brackets are
390different (square brackets instead of parentheses).
391
392The elements of the generated output will be the successive values of
393``expression``. The ``if`` clauses are all optional; if present,
394``expression`` is only evaluated and added to the result when
395``condition`` is true.
396
397Generator expressions always have to be written inside parentheses,
398but the parentheses signalling a function call also count. If you
399want to create an iterator that will be immediately passed to a
400function you can write::
401
402 obj_total = sum(obj.count for obj in list_all_objects())
403
404The ``for...in`` clauses contain the sequences to be iterated over.
405The sequences do not have to be the same length, because they are
406iterated over from left to right, **not** in parallel. For each
407element in ``sequence1``, ``sequence2`` is looped over from the
408beginning. ``sequence3`` is then looped over for each
409resulting pair of elements from ``sequence1`` and ``sequence2``.
410
411To put it another way, a list comprehension or generator expression is
412equivalent to the following Python code::
413
414 for expr1 in sequence1:
415 if not (condition1):
416 continue # Skip this element
417 for expr2 in sequence2:
418 if not (condition2):
419 continue # Skip this element
420 ...
421 for exprN in sequenceN:
422 if not (conditionN):
423 continue # Skip this element
424
425 # Output the value of
426 # the expression.
427
428This means that when there are multiple ``for...in`` clauses but no
429``if`` clauses, the length of the resulting output will be equal to
430the product of the lengths of all the sequences. If you have two
431lists of length 3, the output list is 9 elements long::
432
433 seq1 = 'abc'
434 seq2 = (1,2,3)
435 >>> [ (x,y) for x in seq1 for y in seq2]
436 [('a', 1), ('a', 2), ('a', 3),
437 ('b', 1), ('b', 2), ('b', 3),
438 ('c', 1), ('c', 2), ('c', 3)]
439
440To avoid introducing an ambiguity into Python's grammar, if
441``expression`` is creating a tuple, it must be surrounded with
442parentheses. The first list comprehension below is a syntax error,
443while the second one is correct::
444
445 # Syntax error
446 [ x,y for x in seq1 for y in seq2]
447 # Correct
448 [ (x,y) for x in seq1 for y in seq2]
449
450
451Generators
452-----------------------
453
454Generators are a special class of functions that simplify the task of
455writing iterators. Regular functions compute a value and return it,
456but generators return an iterator that returns a stream of values.
457
458You're doubtless familiar with how regular function calls work in
459Python or C. When you call a function, it gets a private namespace
460where its local variables are created. When the function reaches a
461``return`` statement, the local variables are destroyed and the
462value is returned to the caller. A later call to the same function
463creates a new private namespace and a fresh set of local
464variables. But, what if the local variables weren't thrown away on
465exiting a function? What if you could later resume the function where
466it left off? This is what generators provide; they can be thought of
467as resumable functions.
468
469Here's the simplest example of a generator function::
470
471 def generate_ints(N):
472 for i in range(N):
473 yield i
474
475Any function containing a ``yield`` keyword is a generator function;
476this is detected by Python's bytecode compiler which compiles the
477function specially as a result.
478
479When you call a generator function, it doesn't return a single value;
480instead it returns a generator object that supports the iterator
481protocol. On executing the ``yield`` expression, the generator
482outputs the value of ``i``, similar to a ``return``
483statement. The big difference between ``yield`` and a
484``return`` statement is that on reaching a ``yield`` the
485generator's state of execution is suspended and local variables are
486preserved. On the next call to the generator's ``.next()`` method,
487the function will resume executing.
488
489Here's a sample usage of the ``generate_ints()`` generator::
490
491 >>> gen = generate_ints(3)
492 >>> gen
493 <generator object at 0x8117f90>
494 >>> gen.next()
495 0
496 >>> gen.next()
497 1
498 >>> gen.next()
499 2
500 >>> gen.next()
501 Traceback (most recent call last):
502 File "stdin", line 1, in ?
503 File "stdin", line 2, in generate_ints
504 StopIteration
505
506You could equally write ``for i in generate_ints(5)``, or
507``a,b,c = generate_ints(3)``.
508
509Inside a generator function, the ``return`` statement can only be used
510without a value, and signals the end of the procession of values;
511after executing a ``return`` the generator cannot return any further
512values. ``return`` with a value, such as ``return 5``, is a syntax
513error inside a generator function. The end of the generator's results
514can also be indicated by raising ``StopIteration`` manually, or by
515just letting the flow of execution fall off the bottom of the
516function.
517
518You could achieve the effect of generators manually by writing your
519own class and storing all the local variables of the generator as
520instance variables. For example, returning a list of integers could
521be done by setting ``self.count`` to 0, and having the
522``next()`` method increment ``self.count`` and return it.
523However, for a moderately complicated generator, writing a
524corresponding class can be much messier.
525
526The test suite included with Python's library, ``test_generators.py``,
527contains a number of more interesting examples. Here's one generator
528that implements an in-order traversal of a tree using generators
529recursively.
530
531::
532
533 # A recursive generator that generates Tree leaves in in-order.
534 def inorder(t):
535 if t:
536 for x in inorder(t.left):
537 yield x
538
539 yield t.label
540
541 for x in inorder(t.right):
542 yield x
543
544Two other examples in ``test_generators.py`` produce
545solutions for the N-Queens problem (placing N queens on an NxN
546chess board so that no queen threatens another) and the Knight's Tour
547(finding a route that takes a knight to every square of an NxN chessboard
548without visiting any square twice).
549
550
551
552Passing values into a generator
553''''''''''''''''''''''''''''''''''''''''''''''
554
555In Python 2.4 and earlier, generators only produced output. Once a
556generator's code was invoked to create an iterator, there was no way to
557pass any new information into the function when its execution is
558resumed. You could hack together this ability by making the
559generator look at a global variable or by passing in some mutable object
560that callers then modify, but these approaches are messy.
561
562In Python 2.5 there's a simple way to pass values into a generator.
563``yield`` became an expression, returning a value that can be assigned
564to a variable or otherwise operated on::
565
566 val = (yield i)
567
568I recommend that you **always** put parentheses around a ``yield``
569expression when you're doing something with the returned value, as in
570the above example. The parentheses aren't always necessary, but it's
571easier to always add them instead of having to remember when they're
572needed.
573
574(PEP 342 explains the exact rules, which are that a
575``yield``-expression must always be parenthesized except when it
576occurs at the top-level expression on the right-hand side of an
577assignment. This means you can write ``val = yield i`` but have to
578use parentheses when there's an operation, as in ``val = (yield i)
579+ 12``.)
580
581Values are sent into a generator by calling its
582``send(value)`` method. This method resumes the
583generator's code and the ``yield`` expression returns the specified
584value. If the regular ``next()`` method is called, the
585``yield`` returns ``None``.
586
587Here's a simple counter that increments by 1 and allows changing the
588value of the internal counter.
589
590::
591
592 def counter (maximum):
593 i = 0
594 while i < maximum:
595 val = (yield i)
596 # If value provided, change counter
597 if val is not None:
598 i = val
599 else:
600 i += 1
601
602And here's an example of changing the counter:
603
604 >>> it = counter(10)
605 >>> print it.next()
606 0
607 >>> print it.next()
608 1
609 >>> print it.send(8)
610 8
611 >>> print it.next()
612 9
613 >>> print it.next()
614 Traceback (most recent call last):
615 File ``t.py'', line 15, in ?
616 print it.next()
617 StopIteration
618
619Because ``yield`` will often be returning ``None``, you
620should always check for this case. Don't just use its value in
621expressions unless you're sure that the ``send()`` method
622will be the only method used resume your generator function.
623
624In addition to ``send()``, there are two other new methods on
625generators:
626
627* ``throw(type, value=None, traceback=None)`` is used to raise an exception inside the
628 generator; the exception is raised by the ``yield`` expression
629 where the generator's execution is paused.
630
631* ``close()`` raises a ``GeneratorExit``
632 exception inside the generator to terminate the iteration.
633 On receiving this
634 exception, the generator's code must either raise
635 ``GeneratorExit`` or ``StopIteration``; catching the
636 exception and doing anything else is illegal and will trigger
637 a ``RuntimeError``. ``close()`` will also be called by
638 Python's garbage collector when the generator is garbage-collected.
639
640 If you need to run cleanup code when a ``GeneratorExit`` occurs,
641 I suggest using a ``try: ... finally:`` suite instead of
642 catching ``GeneratorExit``.
643
644The cumulative effect of these changes is to turn generators from
645one-way producers of information into both producers and consumers.
646
647Generators also become **coroutines**, a more generalized form of
648subroutines. Subroutines are entered at one point and exited at
649another point (the top of the function, and a ``return``
650statement), but coroutines can be entered, exited, and resumed at
651many different points (the ``yield`` statements).
652
653
654Built-in functions
655----------------------------------------------
656
657Let's look in more detail at built-in functions often used with iterators.
658
659Two Python's built-in functions, ``map()`` and ``filter()``, are
660somewhat obsolete; they duplicate the features of list comprehensions
661and return actual lists instead of iterators.
662
663``map(f, iterA, iterB, ...)`` returns a list containing ``f(iterA[0],
664iterB[0]), f(iterA[1], iterB[1]), f(iterA[2], iterB[2]), ...``.
665
666::
667
668 def upper(s):
669 return s.upper()
670 map(upper, ['sentence', 'fragment']) =>
671 ['SENTENCE', 'FRAGMENT']
672
673 [upper(s) for s in ['sentence', 'fragment']] =>
674 ['SENTENCE', 'FRAGMENT']
675
676As shown above, you can achieve the same effect with a list
677comprehension. The ``itertools.imap()`` function does the same thing
678but can handle infinite iterators; it'll be discussed in the section on
679the ``itertools`` module.
680
681``filter(predicate, iter)`` returns a list
682that contains all the sequence elements that meet a certain condition,
683and is similarly duplicated by list comprehensions.
684A **predicate** is a function that returns the truth value of
685some condition; for use with ``filter()``, the predicate must take a
686single value.
687
688::
689
690 def is_even(x):
691 return (x % 2) == 0
692
693 filter(is_even, range(10)) =>
694 [0, 2, 4, 6, 8]
695
696This can also be written as a list comprehension::
697
698 >>> [x for x in range(10) if is_even(x)]
699 [0, 2, 4, 6, 8]
700
701``filter()`` also has a counterpart in the ``itertools`` module,
702``itertools.ifilter()``, that returns an iterator and
703can therefore handle infinite sequences just as ``itertools.imap()`` can.
704
705``reduce(func, iter, [initial_value])`` doesn't have a counterpart in
706the ``itertools`` module because it cumulatively performs an operation
707on all the iterable's elements and therefore can't be applied to
708infinite ones. ``func`` must be a function that takes two elements
709and returns a single value. ``reduce()`` takes the first two elements
710A and B returned by the iterator and calculates ``func(A, B)``. It
711then requests the third element, C, calculates ``func(func(A, B),
712C)``, combines this result with the fourth element returned, and
713continues until the iterable is exhausted. If the iterable returns no
714values at all, a ``TypeError`` exception is raised. If the initial
715value is supplied, it's used as a starting point and
716``func(initial_value, A)`` is the first calculation.
717
718::
719
720 import operator
721 reduce(operator.concat, ['A', 'BB', 'C']) =>
722 'ABBC'
723 reduce(operator.concat, []) =>
724 TypeError: reduce() of empty sequence with no initial value
725 reduce(operator.mul, [1,2,3], 1) =>
726 6
727 reduce(operator.mul, [], 1) =>
728 1
729
730If you use ``operator.add`` with ``reduce()``, you'll add up all the
731elements of the iterable. This case is so common that there's a special
732built-in called ``sum()`` to compute it::
733
734 reduce(operator.add, [1,2,3,4], 0) =>
735 10
736 sum([1,2,3,4]) =>
737 10
738 sum([]) =>
739 0
740
741For many uses of ``reduce()``, though, it can be clearer to just write
742the obvious ``for`` loop::
743
744 # Instead of:
745 product = reduce(operator.mul, [1,2,3], 1)
746
747 # You can write:
748 product = 1
749 for i in [1,2,3]:
750 product *= i
751
752
753``enumerate(iter)`` counts off the elements in the iterable, returning
7542-tuples containing the count and each element.
755
756::
757
758 enumerate(['subject', 'verb', 'object']) =>
759 (0, 'subject'), (1, 'verb'), (2, 'object')
760
761``enumerate()`` is often used when looping through a list
762and recording the indexes at which certain conditions are met::
763
764 f = open('data.txt', 'r')
765 for i, line in enumerate(f):
766 if line.strip() == '':
767 print 'Blank line at line #%i' % i
768
769``sorted(iterable, [cmp=None], [key=None], [reverse=False)``
770collects all the elements of the iterable into a list, sorts
771the list, and returns the sorted result. The ``cmp``, ``key``,
772and ``reverse`` arguments are passed through to the
773constructed list's ``.sort()`` method.
774
775::
776
777 import random
778 # Generate 8 random numbers between [0, 10000)
779 rand_list = random.sample(range(10000), 8)
780 rand_list =>
781 [769, 7953, 9828, 6431, 8442, 9878, 6213, 2207]
782 sorted(rand_list) =>
783 [769, 2207, 6213, 6431, 7953, 8442, 9828, 9878]
784 sorted(rand_list, reverse=True) =>
785 [9878, 9828, 8442, 7953, 6431, 6213, 2207, 769]
786
787(For a more detailed discussion of sorting, see the Sorting mini-HOWTO
788in the Python wiki at http://wiki.python.org/moin/HowTo/Sorting.)
789
790The ``any(iter)`` and ``all(iter)`` built-ins look at
791the truth values of an iterable's contents. ``any()`` returns
792True if any element in the iterable is a true value, and ``all()``
793returns True if all of the elements are true values::
794
795 any([0,1,0]) =>
796 True
797 any([0,0,0]) =>
798 False
799 any([1,1,1]) =>
800 True
801 all([0,1,0]) =>
802 False
803 all([0,0,0]) =>
804 False
805 all([1,1,1]) =>
806 True
807
808
809Small functions and the lambda statement
810----------------------------------------------
811
812When writing functional-style programs, you'll often need little
813functions that act as predicates or that combine elements in some way.
814
815If there's a Python built-in or a module function that's suitable, you
816don't need to define a new function at all::
817
818 stripped_lines = [line.strip() for line in lines]
819 existing_files = filter(os.path.exists, file_list)
820
821If the function you need doesn't exist, you need to write it. One way
822to write small functions is to use the ``lambda`` statement. ``lambda``
823takes a number of parameters and an expression combining these parameters,
824and creates a small function that returns the value of the expression:
825
826 lowercase = lambda x: x.lower()
827
828 print_assign = lambda name, value: name + '=' + str(value)
829
830 adder = lambda x, y: x+y
831
832An alternative is to just use the ``def`` statement and define a
833function in the usual way::
834
835 def lowercase(x):
836 return x.lower()
837
838 def print_assign(name, value):
839 return name + '=' + str(value)
840
841 def adder(x,y):
842 return x + y
843
844Which alternative is preferable? That's a style question; my usual
845view is to avoid using ``lambda``.
846
847``lambda`` is quite limited in the functions it can define. The
848result has to be computable as a single expression, which means you
849can't have multiway ``if... elif... else`` comparisons or
850``try... except`` statements. If you try to do too much in a
851``lambda`` statement, you'll end up with an overly complicated
852expression that's hard to read. Quick, what's the following code doing?
853
854::
855
856 total = reduce(lambda a, b: (0, a[1] + b[1]), items)[1]
857
858You can figure it out, but it takes time to disentangle the expression
859to figure out what's going on. Using a short nested
860``def`` statements makes things a little bit better::
861
862 def combine (a, b):
863 return 0, a[1] + b[1]
864
865 total = reduce(combine, items)[1]
866
867But it would be best of all if I had simply used a ``for`` loop::
868
869 total = 0
870 for a, b in items:
871 total += b
872
873Or the ``sum()`` built-in and a generator expression::
874
875 total = sum(b for a,b in items)
876
877Many uses of ``reduce()`` are clearer when written as ``for`` loops.
878
879Fredrik Lundh once suggested the following set of rules for refactoring
880uses of ``lambda``:
881
8821) Write a lambda function.
8832) Write a comment explaining what the heck that lambda does.
8843) Study the comment for a while, and think of a name that captures
885 the essence of the comment.
8864) Convert the lambda to a def statement, using that name.
8875) Remove the comment.
888
889I really like these rules, but you're free to disagree that this style
890is better.
891
892
893The itertools module
894-----------------------
895
896The ``itertools`` module contains a number of commonly-used iterators
897as well as functions for combining several iterators. This section
898will introduce the module's contents by showing small examples.
899
900``itertools.count(n)`` returns an infinite stream of
901integers, increasing by 1 each time. You can optionally supply the
902starting number, which defaults to 0::
903
904 itertools.count() =>
905 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ...
906 itertools.count(10) =>
907 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ...
908
909``itertools.cycle(iter)`` saves a copy of the contents of a provided
910iterable and returns a new iterator that returns its elements from
911first to last. The new iterator will repeat these elements infinitely.
912
913::
914
915 itertools.cycle([1,2,3,4,5]) =>
916 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, ...
917
918``itertools.repeat(elem, [n])`` returns the provided element ``n``
919times, or returns the element endlessly if ``n`` is not provided.
920
921::
922
923 itertools.repeat('abc') =>
924 abc, abc, abc, abc, abc, abc, abc, abc, abc, abc, ...
925 itertools.repeat('abc', 5) =>
926 abc, abc, abc, abc, abc
927
928``itertools.chain(iterA, iterB, ...)`` takes an arbitrary number of
929iterables as input, and returns all the elements of the first
930iterator, then all the elements of the second, and so on, until all of
931the iterables have been exhausted.
932
933::
934
935 itertools.chain(['a', 'b', 'c'], (1, 2, 3)) =>
936 a, b, c, 1, 2, 3
937
938``itertools.izip(iterA, iterB, ...)`` takes one element from each iterable
939and returns them in a tuple::
940
941 itertools.izip(['a', 'b', 'c'], (1, 2, 3)) =>
942 ('a', 1), ('b', 2), ('c', 3)
943
944This iterator is intended to be used with iterables that are all of
945the same length. If the iterables are of different lengths, the
946resulting stream will be the same length as the shortest iterable.
947
948::
949
950 itertools.izip(['a', 'b'], (1, 2, 3)) =>
951 ('a', 1), ('b', 2)
952
953You should avoid doing this, though, because an element may be taken
954from the longer iterators and discarded. This means you can't go on
955to use the iterators further because you risk skipping a discarded
956element.
957
958``itertools.islice(iter, [start], stop, [step])`` returns a stream
959that's a slice of the iterator. It can return the first ``stop``
960elements. If you supply a starting index, you'll get ``stop-start``
961elements, and if you supply a value for ``step` elements will be
962skipped accordingly. Unlike Python's string and list slicing, you
963can't use negative values for ``start``, ``stop``, or ``step``.
964
965::
966
967 itertools.islice(range(10), 8) =>
968 0, 1, 2, 3, 4, 5, 6, 7
969 itertools.islice(range(10), 2, 8) =>
970 2, 3, 4, 5, 6, 7
971 itertools.islice(range(10), 2, 8, 2) =>
972 2, 4, 6
973
974``itertools.tee(iter, [n])`` replicates an iterator; it returns ``n``
975independent iterators that will all return the contents of the source
976iterator. If you don't supply a value for ``n``, the default is 2.
977Replicating iterators requires saving some of the contents of the source
978iterator, so this can consume significant memory if the iterator is large
979and one of the new iterators is consumed more than the others.
980
981::
982
983 itertools.tee( itertools.count() ) =>
984 iterA, iterB
985
986 where iterA ->
987 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ...
988
989 and iterB ->
990 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ...
991
992
993Two functions are used for calling other functions on the contents of an
994iterable.
995
996``itertools.imap(f, iterA, iterB, ...)`` returns
997a stream containing ``f(iterA[0], iterB[0]), f(iterA[1], iterB[1]),
998f(iterA[2], iterB[2]), ...``::
999
1000 itertools.imap(operator.add, [5, 6, 5], [1, 2, 3]) =>
1001 6, 8, 8
1002
1003The ``operator`` module contains a set of functions
1004corresponding to Python's operators. Some examples are
1005``operator.add(a, b)`` (adds two values),
1006``operator.ne(a, b)`` (same as ``a!=b``),
1007and
1008``operator.attrgetter('id')`` (returns a callable that
1009fetches the ``"id"`` attribute).
1010
1011``itertools.starmap(func, iter)`` assumes that the iterable will
1012return a stream of tuples, and calls ``f()`` using these tuples as the
1013arguments::
1014
1015 itertools.starmap(os.path.join,
1016 [('/usr', 'bin', 'java'), ('/bin', 'python'),
1017 ('/usr', 'bin', 'perl'),('/usr', 'bin', 'ruby')])
1018 =>
1019 /usr/bin/java, /bin/python, /usr/bin/perl, /usr/bin/ruby
1020
1021Another group of functions chooses a subset of an iterator's elements
1022based on a predicate.
1023
1024``itertools.ifilter(predicate, iter)`` returns all the elements for
1025which the predicate returns true::
1026
1027 def is_even(x):
1028 return (x % 2) == 0
1029
1030 itertools.ifilter(is_even, itertools.count()) =>
1031 0, 2, 4, 6, 8, 10, 12, 14, ...
1032
1033``itertools.ifilterfalse(predicate, iter)`` is the opposite,
1034returning all elements for which the predicate returns false::
1035
1036 itertools.ifilterfalse(is_even, itertools.count()) =>
1037 1, 3, 5, 7, 9, 11, 13, 15, ...
1038
1039``itertools.takewhile(predicate, iter)`` returns elements for as long
1040as the predicate returns true. Once the predicate returns false,
1041the iterator will signal the end of its results.
1042
1043::
1044
1045 def less_than_10(x):
1046 return (x < 10)
1047
1048 itertools.takewhile(less_than_10, itertools.count()) =>
1049 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
1050
1051 itertools.takewhile(is_even, itertools.count()) =>
1052 0
1053
1054``itertools.dropwhile(predicate, iter)`` discards elements while the
1055predicate returns true, and then returns the rest of the iterable's
1056results.
1057
1058::
1059
1060 itertools.dropwhile(less_than_10, itertools.count()) =>
1061 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ...
1062
1063 itertools.dropwhile(is_even, itertools.count()) =>
1064 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ...
1065
1066
1067The last function I'll discuss, ``itertools.groupby(iter,
1068key_func=None)``, is the most complicated. ``key_func(elem)`` is a
1069function that can compute a key value for each element returned by the
1070iterable. If you don't supply a key function, the key is simply each
1071element itself.
1072
1073``groupby()`` collects all the consecutive elements from the
1074underlying iterable that have the same key value, and returns a stream
1075of 2-tuples containing a key value and an iterator for the elements
1076with that key.
1077
1078::
1079
1080 city_list = [('Decatur', 'AL'), ('Huntsville', 'AL'), ('Selma', 'AL'),
1081 ('Anchorage', 'AK'), ('Nome', 'AK'),
1082 ('Flagstaff', 'AZ'), ('Phoenix', 'AZ'), ('Tucson', 'AZ'),
1083 ...
1084 ]
1085
1086 def get_state ((city, state)):
1087 return state
1088
1089 itertools.groupby(city_list, get_state) =>
1090 ('AL', iterator-1),
1091 ('AK', iterator-2),
1092 ('AZ', iterator-3), ...
1093
1094 where
1095 iterator-1 =>
1096 ('Decatur', 'AL'), ('Huntsville', 'AL'), ('Selma', 'AL')
1097 iterator-2 =>
1098 ('Anchorage', 'AK'), ('Nome', 'AK')
1099 iterator-3 =>
1100 ('Flagstaff', 'AZ'), ('Phoenix', 'AZ'), ('Tucson', 'AZ')
1101
1102``groupby()`` assumes that the underlying iterable's contents will
1103already be sorted based on the key. Note that the returned iterators
1104also use the underlying iterable, so you have to consume the results
1105of iterator-1 before requesting iterator-2 and its corresponding key.
1106
1107
1108The functools module
1109----------------------------------------------
1110
1111The ``functools`` module in Python 2.5 contains some higher-order
1112functions. A **higher-order function** takes functions as input and
1113returns new functions. The most useful tool in this module is the
1114``partial()`` function.
1115
1116For programs written in a functional style, you'll sometimes want to
1117construct variants of existing functions that have some of the
1118parameters filled in. Consider a Python function ``f(a, b, c)``; you
1119may wish to create a new function ``g(b, c)`` that was equivalent to
1120``f(1, b, c)``. This is called "partial function application".
1121
1122The constructor for ``partial`` takes the arguments ``(function, arg1,
1123arg2, ... kwarg1=value1, kwarg2=value2)``. The resulting object is
1124callable, so you can just call it to invoke ``function`` with the
1125filled-in arguments.
1126
1127Here's a small but realistic example::
1128
1129 import functools
1130
1131 def log (message, subsystem):
1132 "Write the contents of 'message' to the specified subsystem."
1133 print '%s: %s' % (subsystem, message)
1134 ...
1135
1136 server_log = functools.partial(log, subsystem='server')
1137 server_log('Unable to open socket')
1138
1139There are also third-party modules, such as Collin Winter's
1140`functional package <http://cheeseshop.python.org/pypi/functional>`__,
1141that are intended for use in functional-style programs.
1142
1143
Andrew M. Kuchlingf4dcd1d2006-11-08 13:35:34 +00001144The functional module
1145=====================
1146
1147Collin Winter's `functional module <http://oakwinter.com/code/functional/>`__
1148provides a number of more
1149advanced tools for functional programming. It also reimplements
1150several Python built-ins, trying to make them more intuitive to those
1151used to functional programming in other languages.
1152
1153This section contains an introduction to some of the most important
1154functions in ``functional``; full documentation can be found at `the
1155project's website <http://oakwinter.com/code/functional/documentation/>`__.
1156
1157``compose(outer, inner, unpack=False)``
1158
1159The ``compose()`` function implements function composition.
1160In other words, it returns a wrapper around the ``outer`` and ``inner`` callables, such
1161that the return value from ``inner`` is fed directly to ``outer``. That is,
1162
1163::
1164
1165 >>> def add(a, b):
1166 ... return a + b
1167 ...
1168 >>> def double(a):
1169 ... return 2 * a
1170 ...
1171 >>> compose(double, add)(5, 6)
1172 22
1173
1174is equivalent to
1175
1176::
1177
1178 >>> double(add(5, 6))
1179 22
1180
1181The ``unpack`` keyword is provided to work around the fact that Python functions are not always
1182`fully curried <http://en.wikipedia.org/wiki/Currying>`__.
1183By default, it is expected that the ``inner`` function will return a single object and that the ``outer``
1184function will take a single argument. Setting the ``unpack`` argument causes ``compose`` to expect a
1185tuple from ``inner`` which will be expanded before being passed to ``outer``. Put simply,
1186
1187::
1188
1189 compose(f, g)(5, 6)
1190
1191is equivalent to::
1192
1193 f(g(5, 6))
1194
1195while
1196
1197::
1198
1199 compose(f, g, unpack=True)(5, 6)
1200
1201is equivalent to::
1202
1203 f(*g(5, 6))
1204
1205Even though ``compose()`` only accepts two functions, it's trivial to
1206build up a version that will compose any number of functions. We'll
1207use ``reduce()``, ``compose()`` and ``partial()`` (the last of which
1208is provided by both ``functional`` and ``functools``).
1209
1210::
1211
1212 from functional import compose, partial
1213
1214 multi_compose = partial(reduce, compose)
1215
1216
1217We can also use ``map()``, ``compose()`` and ``partial()`` to craft a
1218version of ``"".join(...)`` that converts its arguments to string::
1219
1220 from functional import compose, partial
1221
1222 join = compose("".join, partial(map, str))
1223
1224
1225``flip(func)``
1226
1227``flip()`` wraps the callable in ``func`` and
1228causes it to receive its non-keyword arguments in reverse order.
1229
1230::
1231
1232 >>> def triple(a, b, c):
1233 ... return (a, b, c)
1234 ...
1235 >>> triple(5, 6, 7)
1236 (5, 6, 7)
1237 >>>
1238 >>> flipped_triple = flip(triple)
1239 >>> flipped_triple(5, 6, 7)
1240 (7, 6, 5)
1241
1242``foldl(func, start, iterable)``
1243
1244``foldl()`` takes a binary function, a starting value (usually some kind of 'zero'), and an iterable.
1245The function is applied to the starting value and the first element of the list, then the result of
1246that and the second element of the list, then the result of that and the third element of the list,
1247and so on.
1248
1249This means that a call such as::
1250
1251 foldl(f, 0, [1, 2, 3])
1252
1253is equivalent to::
1254
1255 f(f(f(0, 1), 2), 3)
1256
1257
1258``foldl()`` is roughly equivalent to the following recursive function::
1259
1260 def foldl(func, start, seq):
1261 if len(seq) == 0:
1262 return start
1263
1264 return foldl(func, func(start, seq[0]), seq[1:])
1265
1266Speaking of equivalence, the above ``foldl`` call can be expressed in terms of the built-in ``reduce`` like
1267so::
1268
1269 reduce(f, [1, 2, 3], 0)
1270
1271
1272We can use ``foldl()``, ``operator.concat()`` and ``partial()`` to
1273write a cleaner, more aesthetically-pleasing version of Python's
1274``"".join(...)`` idiom::
1275
1276 from functional import foldl, partial
1277 from operator import concat
1278
1279 join = partial(foldl, concat, "")
1280
1281
Andrew M. Kuchling22145072006-08-22 23:13:43 +00001282Revision History and Acknowledgements
1283------------------------------------------------
1284
1285The author would like to thank the following people for offering
1286suggestions, corrections and assistance with various drafts of this
1287article: Ian Bicking, Nick Coghlan, Nick Efford, Raymond Hettinger,
1288Jim Jewett, Mike Krell, Leandro Lameiro, Jussi Salmela,
1289Collin Winter, Blake Winton.
1290
1291Version 0.1: posted June 30 2006.
1292
1293Version 0.11: posted July 1 2006. Typo fixes.
1294
1295Version 0.2: posted July 10 2006. Merged genexp and listcomp
1296sections into one. Typo fixes.
1297
1298Version 0.21: Added more references suggested on the tutor mailing list.
1299
Andrew M. Kuchlingf4dcd1d2006-11-08 13:35:34 +00001300Version 0.30: Adds a section on the ``functional`` module written by
1301Collin Winter.
1302
Andrew M. Kuchling22145072006-08-22 23:13:43 +00001303
1304References
1305--------------------
1306
1307General
1308'''''''''''''''
1309
1310**Structure and Interpretation of Computer Programs**, by
1311Harold Abelson and Gerald Jay Sussman with Julie Sussman.
1312Full text at http://mitpress.mit.edu/sicp/.
1313In this classic textbook of computer science, chapters 2 and 3 discuss the
1314use of sequences and streams to organize the data flow inside a
1315program. The book uses Scheme for its examples, but many of the
1316design approaches described in these chapters are applicable to
1317functional-style Python code.
1318
1319http://www.defmacro.org/ramblings/fp.html: A general
1320introduction to functional programming that uses Java examples
1321and has a lengthy historical introduction.
1322
1323http://en.wikipedia.org/wiki/Functional_programming:
1324General Wikipedia entry describing functional programming.
1325
1326http://en.wikipedia.org/wiki/Coroutine:
1327Entry for coroutines.
1328
Andrew M. Kuchlingf4dcd1d2006-11-08 13:35:34 +00001329http://en.wikipedia.org/wiki/Currying:
1330Entry for the concept of currying.
Andrew M. Kuchling22145072006-08-22 23:13:43 +00001331
1332Python-specific
1333'''''''''''''''''''''''''''
1334
1335http://gnosis.cx/TPiP/:
1336The first chapter of David Mertz's book :title-reference:`Text Processing in Python`
1337discusses functional programming for text processing, in the section titled
1338"Utilizing Higher-Order Functions in Text Processing".
1339
1340Mertz also wrote a 3-part series of articles on functional programming
1341for IBM's DeveloperWorks site; see
1342`part 1 <http://www-128.ibm.com/developerworks/library/l-prog.html>`__,
1343`part 2 <http://www-128.ibm.com/developerworks/library/l-prog2.html>`__, and
1344`part 3 <http://www-128.ibm.com/developerworks/linux/library/l-prog3.html>`__,
1345
1346
1347Python documentation
1348'''''''''''''''''''''''''''
1349
1350http://docs.python.org/lib/module-itertools.html:
1351Documentation ``for the itertools`` module.
1352
1353http://docs.python.org/lib/module-operator.html:
1354Documentation ``for the operator`` module.
1355
1356http://www.python.org/dev/peps/pep-0289/:
1357PEP 289: "Generator Expressions"
1358
1359http://www.python.org/dev/peps/pep-0342/
1360PEP 342: "Coroutines via Enhanced Generators" describes the new generator
1361features in Python 2.5.
1362
1363.. comment
1364
1365 Topics to place
1366 -----------------------------
1367
1368 XXX os.walk()
1369
1370 XXX Need a large example.
1371
1372 But will an example add much? I'll post a first draft and see
1373 what the comments say.
1374
1375.. comment
1376
1377 Original outline:
1378 Introduction
1379 Idea of FP
1380 Programs built out of functions
1381 Functions are strictly input-output, no internal state
1382 Opposed to OO programming, where objects have state
1383
1384 Why FP?
1385 Formal provability
1386 Assignment is difficult to reason about
1387 Not very relevant to Python
1388 Modularity
1389 Small functions that do one thing
1390 Debuggability:
1391 Easy to test due to lack of state
1392 Easy to verify output from intermediate steps
1393 Composability
1394 You assemble a toolbox of functions that can be mixed
1395
1396 Tackling a problem
1397 Need a significant example
1398
1399 Iterators
1400 Generators
1401 The itertools module
1402 List comprehensions
1403 Small functions and the lambda statement
1404 Built-in functions
1405 map
1406 filter
1407 reduce
1408
1409.. comment
1410
1411 Handy little function for printing part of an iterator -- used
1412 while writing this document.
1413
1414 import itertools
1415 def print_iter(it):
1416 slice = itertools.islice(it, 10)
1417 for elem in slice[:-1]:
1418 sys.stdout.write(str(elem))
1419 sys.stdout.write(', ')
1420 print elem[-1]
1421
1422