Blame - Doc/howto/functional.rst - platform/external/python/cpython2

blob: 4ae216a04ac57712fedbaf53ef7bebc4b266ac3d [file] [log] [blame]

Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	1	********************************
				2	Functional Programming HOWTO
				3	********************************
				4
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	5	:Author: A. M. Kuchling
Andrew M. Kuchling	90921cc	2007-12-14 22:52:36 +0000	[diff] [blame]	6	:Release: 0.31
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	7
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	8	In this document, we'll take a tour of Python's features suitable for
				9	implementing programs in a functional style. After an introduction to the
				10	concepts of functional programming, we'll look at language features such as
Georg Brandl	e7a0990	2007-10-21 12:10:28 +0000	[diff] [blame]	11	:term:`iterator`\s and :term:`generator`\s and relevant library modules such as
Georg Brandl	cf3fb25	2007-10-21 10:52:38 +0000	[diff] [blame]	12	:mod:`itertools` and :mod:`functools`.
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	13
				14
				15	Introduction
				16	============
				17
				18	This section explains the basic concept of functional programming; if you're
				19	just interested in learning about Python language features, skip to the next
				20	section.
				21
				22	Programming languages support decomposing problems in several different ways:
				23
				24	* Most programming languages are procedural: programs are lists of
				25	instructions that tell the computer what to do with the program's input. C,
				26	Pascal, and even Unix shells are procedural languages.
				27
				28	* In declarative languages, you write a specification that describes the
				29	problem to be solved, and the language implementation figures out how to
				30	perform the computation efficiently. SQL is the declarative language you're
				31	most likely to be familiar with; a SQL query describes the data set you want
				32	to retrieve, and the SQL engine decides whether to scan tables or use indexes,
				33	which subclauses should be performed first, etc.
				34
				35	* Object-oriented programs manipulate collections of objects. Objects have
				36	internal state and support methods that query or modify this internal state in
				37	some way. Smalltalk and Java are object-oriented languages. C++ and Python
				38	are languages that support object-oriented programming, but don't force the
				39	use of object-oriented features.
				40
				41	* Functional programming decomposes a problem into a set of functions.
				42	Ideally, functions only take inputs and produce outputs, and don't have any
				43	internal state that affects the output produced for a given input. Well-known
				44	functional languages include the ML family (Standard ML, OCaml, and other
				45	variants) and Haskell.
				46
Andrew M. Kuchling	90921cc	2007-12-14 22:52:36 +0000	[diff] [blame]	47	The designers of some computer languages choose to emphasize one
				48	particular approach to programming. This often makes it difficult to
				49	write programs that use a different approach. Other languages are
				50	multi-paradigm languages that support several different approaches.
				51	Lisp, C++, and Python are multi-paradigm; you can write programs or
				52	libraries that are largely procedural, object-oriented, or functional
				53	in all of these languages. In a large program, different sections
				54	might be written using different approaches; the GUI might be
				55	object-oriented while the processing logic is procedural or
				56	functional, for example.
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	57
				58	In a functional program, input flows through a set of functions. Each function
Andrew M. Kuchling	90921cc	2007-12-14 22:52:36 +0000	[diff] [blame]	59	operates on its input and produces some output. Functional style discourages
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	60	functions with side effects that modify internal state or make other changes
				61	that aren't visible in the function's return value. Functions that have no side
				62	effects at all are called purely functional. Avoiding side effects means
				63	not using data structures that get updated as a program runs; every function's
				64	output must only depend on its input.
				65
				66	Some languages are very strict about purity and don't even have assignment
				67	statements such as ``a=3`` or ``c = a + b``, but it's difficult to avoid all
				68	side effects. Printing to the screen or writing to a disk file are side
				69	effects, for example. For example, in Python a ``print`` statement or a
				70	``time.sleep(1)`` both return no useful value; they're only called for their
				71	side effects of sending some text to the screen or pausing execution for a
				72	second.
				73
				74	Python programs written in functional style usually won't go to the extreme of
				75	avoiding all I/O or all assignments; instead, they'll provide a
				76	functional-appearing interface but will use non-functional features internally.
				77	For example, the implementation of a function will still use assignments to
				78	local variables, but won't modify global variables or have other side effects.
				79
				80	Functional programming can be considered the opposite of object-oriented
				81	programming. Objects are little capsules containing some internal state along
				82	with a collection of method calls that let you modify this state, and programs
				83	consist of making the right set of state changes. Functional programming wants
				84	to avoid state changes as much as possible and works with data flowing between
				85	functions. In Python you might combine the two approaches by writing functions
				86	that take and return instances representing objects in your application (e-mail
				87	messages, transactions, etc.).
				88
				89	Functional design may seem like an odd constraint to work under. Why should you
				90	avoid objects and side effects? There are theoretical and practical advantages
				91	to the functional style:
				92
				93	* Formal provability.
				94	* Modularity.
				95	* Composability.
				96	* Ease of debugging and testing.
				97
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	98
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	99	Formal provability
				100	------------------
				101
				102	A theoretical benefit is that it's easier to construct a mathematical proof that
				103	a functional program is correct.
				104
				105	For a long time researchers have been interested in finding ways to
				106	mathematically prove programs correct. This is different from testing a program
				107	on numerous inputs and concluding that its output is usually correct, or reading
				108	a program's source code and concluding that the code looks right; the goal is
				109	instead a rigorous proof that a program produces the right result for all
				110	possible inputs.
				111
				112	The technique used to prove programs correct is to write down invariants,
				113	properties of the input data and of the program's variables that are always
				114	true. For each line of code, you then show that if invariants X and Y are true
				115	before the line is executed, the slightly different invariants X' and Y' are
				116	true after the line is executed. This continues until you reach the end of
				117	the program, at which point the invariants should match the desired conditions
				118	on the program's output.
				119
				120	Functional programming's avoidance of assignments arose because assignments are
				121	difficult to handle with this technique; assignments can break invariants that
				122	were true before the assignment without producing any new invariants that can be
				123	propagated onward.
				124
				125	Unfortunately, proving programs correct is largely impractical and not relevant
				126	to Python software. Even trivial programs require proofs that are several pages
				127	long; the proof of correctness for a moderately complicated program would be
				128	enormous, and few or none of the programs you use daily (the Python interpreter,
				129	your XML parser, your web browser) could be proven correct. Even if you wrote
				130	down or generated a proof, there would then be the question of verifying the
				131	proof; maybe there's an error in it, and you wrongly believe you've proved the
				132	program correct.
				133
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	134
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	135	Modularity
				136	----------
				137
				138	A more practical benefit of functional programming is that it forces you to
				139	break apart your problem into small pieces. Programs are more modular as a
				140	result. It's easier to specify and write a small function that does one thing
				141	than a large function that performs a complicated transformation. Small
				142	functions are also easier to read and to check for errors.
				143
				144
Georg Brandl	c62ef8b	2009-01-03 20:55:06 +0000	[diff] [blame]	145	Ease of debugging and testing
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	146	-----------------------------
				147
				148	Testing and debugging a functional-style program is easier.
				149
				150	Debugging is simplified because functions are generally small and clearly
				151	specified. When a program doesn't work, each function is an interface point
				152	where you can check that the data are correct. You can look at the intermediate
				153	inputs and outputs to quickly isolate the function that's responsible for a bug.
				154
				155	Testing is easier because each function is a potential subject for a unit test.
				156	Functions don't depend on system state that needs to be replicated before
				157	running a test; instead you only have to synthesize the right input and then
				158	check that the output matches expectations.
				159
				160
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	161	Composability
				162	-------------
				163
				164	As you work on a functional-style program, you'll write a number of functions
				165	with varying inputs and outputs. Some of these functions will be unavoidably
				166	specialized to a particular application, but others will be useful in a wide
				167	variety of programs. For example, a function that takes a directory path and
				168	returns all the XML files in the directory, or a function that takes a filename
				169	and returns its contents, can be applied to many different situations.
				170
				171	Over time you'll form a personal library of utilities. Often you'll assemble
				172	new programs by arranging existing functions in a new configuration and writing
				173	a few functions specialized for the current task.
				174
				175
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	176	Iterators
				177	=========
				178
				179	I'll start by looking at a Python language feature that's an important
				180	foundation for writing functional-style programs: iterators.
				181
				182	An iterator is an object representing a stream of data; this object returns the
				183	data one element at a time. A Python iterator must support a method called
				184	``next()`` that takes no arguments and always returns the next element of the
				185	stream. If there are no more elements in the stream, ``next()`` must raise the
				186	``StopIteration`` exception. Iterators don't have to be finite, though; it's
				187	perfectly reasonable to write an iterator that produces an infinite stream of
				188	data.
				189
				190	The built-in :func:`iter` function takes an arbitrary object and tries to return
				191	an iterator that will return the object's contents or elements, raising
				192	:exc:`TypeError` if the object doesn't support iteration. Several of Python's
				193	built-in data types support iteration, the most common being lists and
				194	dictionaries. An object is called an iterable object if you can get an
				195	iterator for it.
				196
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	197	You can experiment with the iteration interface manually:
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	198
				199	>>> L = [1,2,3]
				200	>>> it = iter(L)
				201	>>> print it
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	202	<...iterator object at ...>
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	203	>>> it.next()
				204	1
				205	>>> it.next()
				206	2
				207	>>> it.next()
				208	3
				209	>>> it.next()
				210	Traceback (most recent call last):
				211	File "<stdin>", line 1, in ?
				212	StopIteration
Georg Brandl	c62ef8b	2009-01-03 20:55:06 +0000	[diff] [blame]	213	>>>
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	214
				215	Python expects iterable objects in several different contexts, the most
				216	important being the ``for`` statement. In the statement ``for X in Y``, Y must
				217	be an iterator or some object for which ``iter()`` can create an iterator.
				218	These two statements are equivalent::
				219
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	220	for i in iter(obj):
				221	print i
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	222
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	223	for i in obj:
				224	print i
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	225
				226	Iterators can be materialized as lists or tuples by using the :func:`list` or
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	227	:func:`tuple` constructor functions:
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	228
				229	>>> L = [1,2,3]
				230	>>> iterator = iter(L)
				231	>>> t = tuple(iterator)
				232	>>> t
				233	(1, 2, 3)
				234
				235	Sequence unpacking also supports iterators: if you know an iterator will return
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	236	N elements, you can unpack them into an N-tuple:
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	237
				238	>>> L = [1,2,3]
				239	>>> iterator = iter(L)
				240	>>> a,b,c = iterator
				241	>>> a,b,c
				242	(1, 2, 3)
				243
				244	Built-in functions such as :func:`max` and :func:`min` can take a single
				245	iterator argument and will return the largest or smallest element. The ``"in"``
				246	and ``"not in"`` operators also support iterators: ``X in iterator`` is true if
				247	X is found in the stream returned by the iterator. You'll run into obvious
				248	problems if the iterator is infinite; ``max()``, ``min()``, and ``"not in"``
				249	will never return, and if the element X never appears in the stream, the
				250	``"in"`` operator won't return either.
				251
				252	Note that you can only go forward in an iterator; there's no way to get the
				253	previous element, reset the iterator, or make a copy of it. Iterator objects
				254	can optionally provide these additional capabilities, but the iterator protocol
				255	only specifies the ``next()`` method. Functions may therefore consume all of
				256	the iterator's output, and if you need to do something different with the same
				257	stream, you'll have to create a new iterator.
				258
				259
				260
				261	Data Types That Support Iterators
				262	---------------------------------
				263
				264	We've already seen how lists and tuples support iterators. In fact, any Python
				265	sequence type, such as strings, will automatically support creation of an
				266	iterator.
				267
				268	Calling :func:`iter` on a dictionary returns an iterator that will loop over the
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	269	dictionary's keys:
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	270
Georg Brandl	9f66232	2008-03-22 11:47:10 +0000	[diff] [blame]	271	.. not a doctest since dict ordering varies across Pythons
				272
				273	::
				274
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	275	>>> m = {'Jan': 1, 'Feb': 2, 'Mar': 3, 'Apr': 4, 'May': 5, 'Jun': 6,
				276	... 'Jul': 7, 'Aug': 8, 'Sep': 9, 'Oct': 10, 'Nov': 11, 'Dec': 12}
				277	>>> for key in m:
				278	... print key, m[key]
				279	Mar 3
				280	Feb 2
				281	Aug 8
				282	Sep 9
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	283	Apr 4
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	284	Jun 6
				285	Jul 7
				286	Jan 1
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	287	May 5
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	288	Nov 11
				289	Dec 12
				290	Oct 10
				291
				292	Note that the order is essentially random, because it's based on the hash
				293	ordering of the objects in the dictionary.
				294
				295	Applying ``iter()`` to a dictionary always loops over the keys, but dictionaries
				296	have methods that return other iterators. If you want to iterate over keys,
				297	values, or key/value pairs, you can explicitly call the ``iterkeys()``,
				298	``itervalues()``, or ``iteritems()`` methods to get an appropriate iterator.
				299
				300	The :func:`dict` constructor can accept an iterator that returns a finite stream
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	301	of ``(key, value)`` tuples:
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	302
				303	>>> L = [('Italy', 'Rome'), ('France', 'Paris'), ('US', 'Washington DC')]
				304	>>> dict(iter(L))
				305	{'Italy': 'Rome', 'US': 'Washington DC', 'France': 'Paris'}
				306
				307	Files also support iteration by calling the ``readline()`` method until there
				308	are no more lines in the file. This means you can read each line of a file like
				309	this::
				310
				311	for line in file:
				312	# do something for each line
				313	...
				314
				315	Sets can take their contents from an iterable and let you iterate over the set's
				316	elements::
				317
				318	S = set((2, 3, 5, 7, 11, 13))
				319	for i in S:
				320	print i
				321
				322
				323
				324	Generator expressions and list comprehensions
				325	=============================================
				326
				327	Two common operations on an iterator's output are 1) performing some operation
				328	for every element, 2) selecting a subset of elements that meet some condition.
				329	For example, given a list of strings, you might want to strip off trailing
				330	whitespace from each line or extract all the strings containing a given
				331	substring.
				332
				333	List comprehensions and generator expressions (short form: "listcomps" and
				334	"genexps") are a concise notation for such operations, borrowed from the
Ezio Melotti	425aa2e	2010-04-05 12:51:45 +0000	[diff] [blame]	335	functional programming language Haskell (http://www.haskell.org/). You can strip
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	336	all the whitespace from a stream of strings with the following code::
				337
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	338	line_list = [' line 1\n', 'line 2 \n', ...]
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	339
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	340	# Generator expression -- returns iterator
				341	stripped_iter = (line.strip() for line in line_list)
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	342
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	343	# List comprehension -- returns list
				344	stripped_list = [line.strip() for line in line_list]
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	345
				346	You can select only certain elements by adding an ``"if"`` condition::
				347
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	348	stripped_list = [line.strip() for line in line_list
				349	if line != ""]
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	350
				351	With a list comprehension, you get back a Python list; ``stripped_list`` is a
				352	list containing the resulting lines, not an iterator. Generator expressions
				353	return an iterator that computes the values as necessary, not needing to
				354	materialize all the values at once. This means that list comprehensions aren't
				355	useful if you're working with iterators that return an infinite stream or a very
				356	large amount of data. Generator expressions are preferable in these situations.
				357
				358	Generator expressions are surrounded by parentheses ("()") and list
				359	comprehensions are surrounded by square brackets ("[]"). Generator expressions
				360	have the form::
				361
Georg Brandl	c62ef8b	2009-01-03 20:55:06 +0000	[diff] [blame]	362	( expression for expr in sequence1
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	363	if condition1
				364	for expr2 in sequence2
				365	if condition2
				366	for expr3 in sequence3 ...
				367	if condition3
				368	for exprN in sequenceN
				369	if conditionN )
				370
				371	Again, for a list comprehension only the outside brackets are different (square
				372	brackets instead of parentheses).
				373
				374	The elements of the generated output will be the successive values of
				375	``expression``. The ``if`` clauses are all optional; if present, ``expression``
				376	is only evaluated and added to the result when ``condition`` is true.
				377
				378	Generator expressions always have to be written inside parentheses, but the
				379	parentheses signalling a function call also count. If you want to create an
				380	iterator that will be immediately passed to a function you can write::
				381
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	382	obj_total = sum(obj.count for obj in list_all_objects())
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	383
				384	The ``for...in`` clauses contain the sequences to be iterated over. The
				385	sequences do not have to be the same length, because they are iterated over from
				386	left to right, not in parallel. For each element in ``sequence1``,
				387	``sequence2`` is looped over from the beginning. ``sequence3`` is then looped
				388	over for each resulting pair of elements from ``sequence1`` and ``sequence2``.
				389
				390	To put it another way, a list comprehension or generator expression is
				391	equivalent to the following Python code::
				392
				393	for expr1 in sequence1:
				394	if not (condition1):
				395	continue # Skip this element
				396	for expr2 in sequence2:
				397	if not (condition2):
				398	continue # Skip this element
				399	...
				400	for exprN in sequenceN:
				401	if not (conditionN):
				402	continue # Skip this element
				403
Georg Brandl	c62ef8b	2009-01-03 20:55:06 +0000	[diff] [blame]	404	# Output the value of
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	405	# the expression.
				406
				407	This means that when there are multiple ``for...in`` clauses but no ``if``
				408	clauses, the length of the resulting output will be equal to the product of the
				409	lengths of all the sequences. If you have two lists of length 3, the output
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	410	list is 9 elements long:
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	411
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	412	.. doctest::
				413	:options: +NORMALIZE_WHITESPACE
				414
				415	>>> seq1 = 'abc'
				416	>>> seq2 = (1,2,3)
				417	>>> [(x,y) for x in seq1 for y in seq2]
Georg Brandl	c62ef8b	2009-01-03 20:55:06 +0000	[diff] [blame]	418	[('a', 1), ('a', 2), ('a', 3),
				419	('b', 1), ('b', 2), ('b', 3),
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	420	('c', 1), ('c', 2), ('c', 3)]
				421
				422	To avoid introducing an ambiguity into Python's grammar, if ``expression`` is
				423	creating a tuple, it must be surrounded with parentheses. The first list
				424	comprehension below is a syntax error, while the second one is correct::
				425
				426	# Syntax error
				427	[ x,y for x in seq1 for y in seq2]
				428	# Correct
				429	[ (x,y) for x in seq1 for y in seq2]
				430
				431
				432	Generators
				433	==========
				434
				435	Generators are a special class of functions that simplify the task of writing
				436	iterators. Regular functions compute a value and return it, but generators
				437	return an iterator that returns a stream of values.
				438
				439	You're doubtless familiar with how regular function calls work in Python or C.
				440	When you call a function, it gets a private namespace where its local variables
				441	are created. When the function reaches a ``return`` statement, the local
				442	variables are destroyed and the value is returned to the caller. A later call
				443	to the same function creates a new private namespace and a fresh set of local
				444	variables. But, what if the local variables weren't thrown away on exiting a
				445	function? What if you could later resume the function where it left off? This
				446	is what generators provide; they can be thought of as resumable functions.
				447
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	448	Here's the simplest example of a generator function:
				449
Georg Brandl	838b4b0	2008-03-22 13:07:06 +0000	[diff] [blame]	450	.. testcode::
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	451
				452	def generate_ints(N):
				453	for i in range(N):
				454	yield i
				455
				456	Any function containing a ``yield`` keyword is a generator function; this is
Georg Brandl	5e52db0	2007-10-21 10:45:46 +0000	[diff] [blame]	457	detected by Python's :term:`bytecode` compiler which compiles the function
				458	specially as a result.
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	459
				460	When you call a generator function, it doesn't return a single value; instead it
				461	returns a generator object that supports the iterator protocol. On executing
				462	the ``yield`` expression, the generator outputs the value of ``i``, similar to a
				463	``return`` statement. The big difference between ``yield`` and a ``return``
				464	statement is that on reaching a ``yield`` the generator's state of execution is
				465	suspended and local variables are preserved. On the next call to the
				466	generator's ``.next()`` method, the function will resume executing.
				467
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	468	Here's a sample usage of the ``generate_ints()`` generator:
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	469
				470	>>> gen = generate_ints(3)
				471	>>> gen
Georg Brandl	f6dab95	2009-04-28 21:48:35 +0000	[diff] [blame]	472	<generator object generate_ints at ...>
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	473	>>> gen.next()
				474	0
				475	>>> gen.next()
				476	1
				477	>>> gen.next()
				478	2
				479	>>> gen.next()
				480	Traceback (most recent call last):
				481	File "stdin", line 1, in ?
				482	File "stdin", line 2, in generate_ints
				483	StopIteration
				484
				485	You could equally write ``for i in generate_ints(5)``, or ``a,b,c =
				486	generate_ints(3)``.
				487
				488	Inside a generator function, the ``return`` statement can only be used without a
				489	value, and signals the end of the procession of values; after executing a
				490	``return`` the generator cannot return any further values. ``return`` with a
				491	value, such as ``return 5``, is a syntax error inside a generator function. The
				492	end of the generator's results can also be indicated by raising
				493	``StopIteration`` manually, or by just letting the flow of execution fall off
				494	the bottom of the function.
				495
				496	You could achieve the effect of generators manually by writing your own class
				497	and storing all the local variables of the generator as instance variables. For
				498	example, returning a list of integers could be done by setting ``self.count`` to
				499	0, and having the ``next()`` method increment ``self.count`` and return it.
				500	However, for a moderately complicated generator, writing a corresponding class
				501	can be much messier.
				502
				503	The test suite included with Python's library, ``test_generators.py``, contains
				504	a number of more interesting examples. Here's one generator that implements an
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	505	in-order traversal of a tree using generators recursively. ::
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	506
				507	# A recursive generator that generates Tree leaves in in-order.
				508	def inorder(t):
				509	if t:
				510	for x in inorder(t.left):
				511	yield x
				512
				513	yield t.label
				514
				515	for x in inorder(t.right):
				516	yield x
				517
				518	Two other examples in ``test_generators.py`` produce solutions for the N-Queens
				519	problem (placing N queens on an NxN chess board so that no queen threatens
				520	another) and the Knight's Tour (finding a route that takes a knight to every
				521	square of an NxN chessboard without visiting any square twice).
				522
				523
				524
				525	Passing values into a generator
				526	-------------------------------
				527
				528	In Python 2.4 and earlier, generators only produced output. Once a generator's
				529	code was invoked to create an iterator, there was no way to pass any new
				530	information into the function when its execution is resumed. You could hack
				531	together this ability by making the generator look at a global variable or by
				532	passing in some mutable object that callers then modify, but these approaches
				533	are messy.
				534
				535	In Python 2.5 there's a simple way to pass values into a generator.
				536	:keyword:`yield` became an expression, returning a value that can be assigned to
				537	a variable or otherwise operated on::
				538
				539	val = (yield i)
				540
				541	I recommend that you always put parentheses around a ``yield`` expression
				542	when you're doing something with the returned value, as in the above example.
				543	The parentheses aren't always necessary, but it's easier to always add them
				544	instead of having to remember when they're needed.
				545
				546	(PEP 342 explains the exact rules, which are that a ``yield``-expression must
				547	always be parenthesized except when it occurs at the top-level expression on the
				548	right-hand side of an assignment. This means you can write ``val = yield i``
				549	but have to use parentheses when there's an operation, as in ``val = (yield i)
				550	+ 12``.)
				551
				552	Values are sent into a generator by calling its ``send(value)`` method. This
				553	method resumes the generator's code and the ``yield`` expression returns the
				554	specified value. If the regular ``next()`` method is called, the ``yield``
				555	returns ``None``.
				556
				557	Here's a simple counter that increments by 1 and allows changing the value of
				558	the internal counter.
				559
Georg Brandl	838b4b0	2008-03-22 13:07:06 +0000	[diff] [blame]	560	.. testcode::
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	561
				562	def counter (maximum):
				563	i = 0
				564	while i < maximum:
				565	val = (yield i)
				566	# If value provided, change counter
				567	if val is not None:
				568	i = val
				569	else:
				570	i += 1
				571
				572	And here's an example of changing the counter:
				573
				574	>>> it = counter(10)
				575	>>> print it.next()
				576	0
				577	>>> print it.next()
				578	1
				579	>>> print it.send(8)
				580	8
				581	>>> print it.next()
				582	9
				583	>>> print it.next()
				584	Traceback (most recent call last):
Georg Brandl	fc29f27	2009-01-02 20:25:14 +0000	[diff] [blame]	585	File "t.py", line 15, in ?
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	586	print it.next()
				587	StopIteration
				588
				589	Because ``yield`` will often be returning ``None``, you should always check for
				590	this case. Don't just use its value in expressions unless you're sure that the
				591	``send()`` method will be the only method used resume your generator function.
				592
				593	In addition to ``send()``, there are two other new methods on generators:
				594
				595	* ``throw(type, value=None, traceback=None)`` is used to raise an exception
				596	inside the generator; the exception is raised by the ``yield`` expression
				597	where the generator's execution is paused.
				598
				599	* ``close()`` raises a :exc:`GeneratorExit` exception inside the generator to
				600	terminate the iteration. On receiving this exception, the generator's code
				601	must either raise :exc:`GeneratorExit` or :exc:`StopIteration`; catching the
				602	exception and doing anything else is illegal and will trigger a
				603	:exc:`RuntimeError`. ``close()`` will also be called by Python's garbage
				604	collector when the generator is garbage-collected.
				605
				606	If you need to run cleanup code when a :exc:`GeneratorExit` occurs, I suggest
				607	using a ``try: ... finally:`` suite instead of catching :exc:`GeneratorExit`.
				608
				609	The cumulative effect of these changes is to turn generators from one-way
				610	producers of information into both producers and consumers.
				611
				612	Generators also become coroutines, a more generalized form of subroutines.
				613	Subroutines are entered at one point and exited at another point (the top of the
				614	function, and a ``return`` statement), but coroutines can be entered, exited,
				615	and resumed at many different points (the ``yield`` statements).
				616
				617
				618	Built-in functions
				619	==================
				620
				621	Let's look in more detail at built-in functions often used with iterators.
				622
Andrew M. Kuchling	90921cc	2007-12-14 22:52:36 +0000	[diff] [blame]	623	Two of Python's built-in functions, :func:`map` and :func:`filter`, are somewhat
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	624	obsolete; they duplicate the features of list comprehensions but return actual
				625	lists instead of iterators.
				626
				627	``map(f, iterA, iterB, ...)`` returns a list containing ``f(iterA[0], iterB[0]),
				628	f(iterA[1], iterB[1]), f(iterA[2], iterB[2]), ...``.
				629
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	630	>>> def upper(s):
				631	... return s.upper()
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	632
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	633	>>> map(upper, ['sentence', 'fragment'])
				634	['SENTENCE', 'FRAGMENT']
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	635
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	636	>>> [upper(s) for s in ['sentence', 'fragment']]
				637	['SENTENCE', 'FRAGMENT']
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	638
				639	As shown above, you can achieve the same effect with a list comprehension. The
				640	:func:`itertools.imap` function does the same thing but can handle infinite
				641	iterators; it'll be discussed later, in the section on the :mod:`itertools` module.
				642
				643	``filter(predicate, iter)`` returns a list that contains all the sequence
				644	elements that meet a certain condition, and is similarly duplicated by list
				645	comprehensions. A predicate is a function that returns the truth value of
				646	some condition; for use with :func:`filter`, the predicate must take a single
				647	value.
				648
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	649	>>> def is_even(x):
				650	... return (x % 2) == 0
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	651
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	652	>>> filter(is_even, range(10))
				653	[0, 2, 4, 6, 8]
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	654
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	655	This can also be written as a list comprehension:
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	656
				657	>>> [x for x in range(10) if is_even(x)]
				658	[0, 2, 4, 6, 8]
				659
				660	:func:`filter` also has a counterpart in the :mod:`itertools` module,
				661	:func:`itertools.ifilter`, that returns an iterator and can therefore handle
				662	infinite sequences just as :func:`itertools.imap` can.
				663
				664	``reduce(func, iter, [initial_value])`` doesn't have a counterpart in the
				665	:mod:`itertools` module because it cumulatively performs an operation on all the
				666	iterable's elements and therefore can't be applied to infinite iterables.
				667	``func`` must be a function that takes two elements and returns a single value.
				668	:func:`reduce` takes the first two elements A and B returned by the iterator and
				669	calculates ``func(A, B)``. It then requests the third element, C, calculates
				670	``func(func(A, B), C)``, combines this result with the fourth element returned,
				671	and continues until the iterable is exhausted. If the iterable returns no
				672	values at all, a :exc:`TypeError` exception is raised. If the initial value is
				673	supplied, it's used as a starting point and ``func(initial_value, A)`` is the
				674	first calculation.
				675
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	676	>>> import operator
				677	>>> reduce(operator.concat, ['A', 'BB', 'C'])
				678	'ABBC'
				679	>>> reduce(operator.concat, [])
				680	Traceback (most recent call last):
				681	...
				682	TypeError: reduce() of empty sequence with no initial value
				683	>>> reduce(operator.mul, [1,2,3], 1)
				684	6
				685	>>> reduce(operator.mul, [], 1)
				686	1
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	687
				688	If you use :func:`operator.add` with :func:`reduce`, you'll add up all the
				689	elements of the iterable. This case is so common that there's a special
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	690	built-in called :func:`sum` to compute it:
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	691
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	692	>>> reduce(operator.add, [1,2,3,4], 0)
				693	10
				694	>>> sum([1,2,3,4])
				695	10
				696	>>> sum([])
				697	0
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	698
				699	For many uses of :func:`reduce`, though, it can be clearer to just write the
				700	obvious :keyword:`for` loop::
				701
				702	# Instead of:
				703	product = reduce(operator.mul, [1,2,3], 1)
				704
				705	# You can write:
				706	product = 1
				707	for i in [1,2,3]:
				708	product *= i
				709
				710
				711	``enumerate(iter)`` counts off the elements in the iterable, returning 2-tuples
				712	containing the count and each element.
				713
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	714	>>> for item in enumerate(['subject', 'verb', 'object']):
				715	... print item
				716	(0, 'subject')
				717	(1, 'verb')
				718	(2, 'object')
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	719
				720	:func:`enumerate` is often used when looping through a list and recording the
				721	indexes at which certain conditions are met::
				722
				723	f = open('data.txt', 'r')
				724	for i, line in enumerate(f):
				725	if line.strip() == '':
				726	print 'Blank line at line #%i' % i
				727
Benjamin Peterson	3e1c67e	2008-12-14 17:26:04 +0000	[diff] [blame]	728	``sorted(iterable, [cmp=None], [key=None], [reverse=False])`` collects all the
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	729	elements of the iterable into a list, sorts the list, and returns the sorted
				730	result. The ``cmp``, ``key``, and ``reverse`` arguments are passed through to
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	731	the constructed list's ``.sort()`` method. ::
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	732
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	733	>>> import random
				734	>>> # Generate 8 random numbers between [0, 10000)
				735	>>> rand_list = random.sample(range(10000), 8)
				736	>>> rand_list
				737	[769, 7953, 9828, 6431, 8442, 9878, 6213, 2207]
				738	>>> sorted(rand_list)
				739	[769, 2207, 6213, 6431, 7953, 8442, 9828, 9878]
				740	>>> sorted(rand_list, reverse=True)
				741	[9878, 9828, 8442, 7953, 6431, 6213, 2207, 769]
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	742
				743	(For a more detailed discussion of sorting, see the Sorting mini-HOWTO in the
				744	Python wiki at http://wiki.python.org/moin/HowTo/Sorting.)
				745
				746	The ``any(iter)`` and ``all(iter)`` built-ins look at the truth values of an
				747	iterable's contents. :func:`any` returns True if any element in the iterable is
				748	a true value, and :func:`all` returns True if all of the elements are true
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	749	values:
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	750
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	751	>>> any([0,1,0])
				752	True
				753	>>> any([0,0,0])
				754	False
				755	>>> any([1,1,1])
				756	True
				757	>>> all([0,1,0])
				758	False
Georg Brandl	c62ef8b	2009-01-03 20:55:06 +0000	[diff] [blame]	759	>>> all([0,0,0])
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	760	False
				761	>>> all([1,1,1])
				762	True
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	763
				764
				765	Small functions and the lambda expression
				766	=========================================
				767
				768	When writing functional-style programs, you'll often need little functions that
				769	act as predicates or that combine elements in some way.
				770
				771	If there's a Python built-in or a module function that's suitable, you don't
				772	need to define a new function at all::
				773
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	774	stripped_lines = [line.strip() for line in lines]
				775	existing_files = filter(os.path.exists, file_list)
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	776
				777	If the function you need doesn't exist, you need to write it. One way to write
				778	small functions is to use the ``lambda`` statement. ``lambda`` takes a number
				779	of parameters and an expression combining these parameters, and creates a small
				780	function that returns the value of the expression::
				781
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	782	lowercase = lambda x: x.lower()
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	783
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	784	print_assign = lambda name, value: name + '=' + str(value)
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	785
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	786	adder = lambda x, y: x+y
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	787
				788	An alternative is to just use the ``def`` statement and define a function in the
				789	usual way::
				790
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	791	def lowercase(x):
				792	return x.lower()
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	793
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	794	def print_assign(name, value):
				795	return name + '=' + str(value)
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	796
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	797	def adder(x,y):
				798	return x + y
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	799
				800	Which alternative is preferable? That's a style question; my usual course is to
				801	avoid using ``lambda``.
				802
				803	One reason for my preference is that ``lambda`` is quite limited in the
				804	functions it can define. The result has to be computable as a single
				805	expression, which means you can't have multiway ``if... elif... else``
				806	comparisons or ``try... except`` statements. If you try to do too much in a
				807	``lambda`` statement, you'll end up with an overly complicated expression that's
				808	hard to read. Quick, what's the following code doing?
				809
				810	::
				811
				812	total = reduce(lambda a, b: (0, a[1] + b[1]), items)[1]
				813
				814	You can figure it out, but it takes time to disentangle the expression to figure
				815	out what's going on. Using a short nested ``def`` statements makes things a
				816	little bit better::
				817
				818	def combine (a, b):
				819	return 0, a[1] + b[1]
				820
				821	total = reduce(combine, items)[1]
				822
				823	But it would be best of all if I had simply used a ``for`` loop::
				824
				825	total = 0
				826	for a, b in items:
				827	total += b
				828
				829	Or the :func:`sum` built-in and a generator expression::
				830
				831	total = sum(b for a,b in items)
				832
				833	Many uses of :func:`reduce` are clearer when written as ``for`` loops.
				834
				835	Fredrik Lundh once suggested the following set of rules for refactoring uses of
				836	``lambda``:
				837
				838	1) Write a lambda function.
				839	2) Write a comment explaining what the heck that lambda does.
				840	3) Study the comment for a while, and think of a name that captures the essence
				841	of the comment.
				842	4) Convert the lambda to a def statement, using that name.
				843	5) Remove the comment.
				844
Georg Brandl	c62ef8b	2009-01-03 20:55:06 +0000	[diff] [blame]	845	I really like these rules, but you're free to disagree
Andrew M. Kuchling	90921cc	2007-12-14 22:52:36 +0000	[diff] [blame]	846	about whether this lambda-free style is better.
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	847
				848
				849	The itertools module
				850	====================
				851
				852	The :mod:`itertools` module contains a number of commonly-used iterators as well
				853	as functions for combining several iterators. This section will introduce the
				854	module's contents by showing small examples.
				855
				856	The module's functions fall into a few broad classes:
				857
				858	* Functions that create a new iterator based on an existing iterator.
				859	* Functions for treating an iterator's elements as function arguments.
				860	* Functions for selecting portions of an iterator's output.
				861	* A function for grouping an iterator's output.
				862
				863	Creating new iterators
				864	----------------------
				865
				866	``itertools.count(n)`` returns an infinite stream of integers, increasing by 1
				867	each time. You can optionally supply the starting number, which defaults to 0::
				868
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	869	itertools.count() =>
				870	0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ...
				871	itertools.count(10) =>
				872	10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ...
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	873
				874	``itertools.cycle(iter)`` saves a copy of the contents of a provided iterable
				875	and returns a new iterator that returns its elements from first to last. The
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	876	new iterator will repeat these elements infinitely. ::
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	877
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	878	itertools.cycle([1,2,3,4,5]) =>
				879	1, 2, 3, 4, 5, 1, 2, 3, 4, 5, ...
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	880
				881	``itertools.repeat(elem, [n])`` returns the provided element ``n`` times, or
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	882	returns the element endlessly if ``n`` is not provided. ::
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	883
				884	itertools.repeat('abc') =>
				885	abc, abc, abc, abc, abc, abc, abc, abc, abc, abc, ...
				886	itertools.repeat('abc', 5) =>
				887	abc, abc, abc, abc, abc
				888
				889	``itertools.chain(iterA, iterB, ...)`` takes an arbitrary number of iterables as
				890	input, and returns all the elements of the first iterator, then all the elements
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	891	of the second, and so on, until all of the iterables have been exhausted. ::
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	892
				893	itertools.chain(['a', 'b', 'c'], (1, 2, 3)) =>
				894	a, b, c, 1, 2, 3
				895
				896	``itertools.izip(iterA, iterB, ...)`` takes one element from each iterable and
				897	returns them in a tuple::
				898
				899	itertools.izip(['a', 'b', 'c'], (1, 2, 3)) =>
				900	('a', 1), ('b', 2), ('c', 3)
				901
Georg Brandl	907a720	2008-02-22 12:31:45 +0000	[diff] [blame]	902	It's similar to the built-in :func:`zip` function, but doesn't construct an
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	903	in-memory list and exhaust all the input iterators before returning; instead
				904	tuples are constructed and returned only if they're requested. (The technical
				905	term for this behaviour is `lazy evaluation
				906	<http://en.wikipedia.org/wiki/Lazy_evaluation>`__.)
				907
				908	This iterator is intended to be used with iterables that are all of the same
				909	length. If the iterables are of different lengths, the resulting stream will be
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	910	the same length as the shortest iterable. ::
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	911
				912	itertools.izip(['a', 'b'], (1, 2, 3)) =>
				913	('a', 1), ('b', 2)
				914
				915	You should avoid doing this, though, because an element may be taken from the
				916	longer iterators and discarded. This means you can't go on to use the iterators
				917	further because you risk skipping a discarded element.
				918
				919	``itertools.islice(iter, [start], stop, [step])`` returns a stream that's a
				920	slice of the iterator. With a single ``stop`` argument, it will return the
				921	first ``stop`` elements. If you supply a starting index, you'll get
				922	``stop-start`` elements, and if you supply a value for ``step``, elements will
				923	be skipped accordingly. Unlike Python's string and list slicing, you can't use
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	924	negative values for ``start``, ``stop``, or ``step``. ::
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	925
				926	itertools.islice(range(10), 8) =>
				927	0, 1, 2, 3, 4, 5, 6, 7
				928	itertools.islice(range(10), 2, 8) =>
				929	2, 3, 4, 5, 6, 7
				930	itertools.islice(range(10), 2, 8, 2) =>
				931	2, 4, 6
				932
				933	``itertools.tee(iter, [n])`` replicates an iterator; it returns ``n``
				934	independent iterators that will all return the contents of the source iterator.
				935	If you don't supply a value for ``n``, the default is 2. Replicating iterators
				936	requires saving some of the contents of the source iterator, so this can consume
				937	significant memory if the iterator is large and one of the new iterators is
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	938	consumed more than the others. ::
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	939
				940	itertools.tee( itertools.count() ) =>
				941	iterA, iterB
				942
				943	where iterA ->
				944	0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ...
				945
				946	and iterB ->
				947	0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ...
				948
				949
				950	Calling functions on elements
				951	-----------------------------
				952
				953	Two functions are used for calling other functions on the contents of an
				954	iterable.
				955
				956	``itertools.imap(f, iterA, iterB, ...)`` returns a stream containing
				957	``f(iterA[0], iterB[0]), f(iterA[1], iterB[1]), f(iterA[2], iterB[2]), ...``::
				958
				959	itertools.imap(operator.add, [5, 6, 5], [1, 2, 3]) =>
				960	6, 8, 8
				961
				962	The ``operator`` module contains a set of functions corresponding to Python's
				963	operators. Some examples are ``operator.add(a, b)`` (adds two values),
				964	``operator.ne(a, b)`` (same as ``a!=b``), and ``operator.attrgetter('id')``
				965	(returns a callable that fetches the ``"id"`` attribute).
				966
				967	``itertools.starmap(func, iter)`` assumes that the iterable will return a stream
				968	of tuples, and calls ``f()`` using these tuples as the arguments::
				969
Georg Brandl	c62ef8b	2009-01-03 20:55:06 +0000	[diff] [blame]	970	itertools.starmap(os.path.join,
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	971	[('/usr', 'bin', 'java'), ('/bin', 'python'),
				972	('/usr', 'bin', 'perl'),('/usr', 'bin', 'ruby')])
				973	=>
				974	/usr/bin/java, /bin/python, /usr/bin/perl, /usr/bin/ruby
				975
				976
				977	Selecting elements
				978	------------------
				979
				980	Another group of functions chooses a subset of an iterator's elements based on a
				981	predicate.
				982
				983	``itertools.ifilter(predicate, iter)`` returns all the elements for which the
				984	predicate returns true::
				985
				986	def is_even(x):
				987	return (x % 2) == 0
				988
				989	itertools.ifilter(is_even, itertools.count()) =>
				990	0, 2, 4, 6, 8, 10, 12, 14, ...
				991
				992	``itertools.ifilterfalse(predicate, iter)`` is the opposite, returning all
				993	elements for which the predicate returns false::
				994
				995	itertools.ifilterfalse(is_even, itertools.count()) =>
				996	1, 3, 5, 7, 9, 11, 13, 15, ...
				997
				998	``itertools.takewhile(predicate, iter)`` returns elements for as long as the
				999	predicate returns true. Once the predicate returns false, the iterator will
				1000	signal the end of its results.
				1001
				1002	::
				1003
				1004	def less_than_10(x):
				1005	return (x < 10)
				1006
				1007	itertools.takewhile(less_than_10, itertools.count()) =>
				1008	0, 1, 2, 3, 4, 5, 6, 7, 8, 9
				1009
				1010	itertools.takewhile(is_even, itertools.count()) =>
				1011	0
				1012
				1013	``itertools.dropwhile(predicate, iter)`` discards elements while the predicate
				1014	returns true, and then returns the rest of the iterable's results.
				1015
				1016	::
				1017
				1018	itertools.dropwhile(less_than_10, itertools.count()) =>
				1019	10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ...
				1020
				1021	itertools.dropwhile(is_even, itertools.count()) =>
				1022	1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ...
				1023
				1024
				1025	Grouping elements
				1026	-----------------
				1027
				1028	The last function I'll discuss, ``itertools.groupby(iter, key_func=None)``, is
				1029	the most complicated. ``key_func(elem)`` is a function that can compute a key
				1030	value for each element returned by the iterable. If you don't supply a key
				1031	function, the key is simply each element itself.
				1032
				1033	``groupby()`` collects all the consecutive elements from the underlying iterable
				1034	that have the same key value, and returns a stream of 2-tuples containing a key
				1035	value and an iterator for the elements with that key.
				1036
				1037	::
				1038
Georg Brandl	c62ef8b	2009-01-03 20:55:06 +0000	[diff] [blame]	1039	city_list = [('Decatur', 'AL'), ('Huntsville', 'AL'), ('Selma', 'AL'),
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	1040	('Anchorage', 'AK'), ('Nome', 'AK'),
Georg Brandl	c62ef8b	2009-01-03 20:55:06 +0000	[diff] [blame]	1041	('Flagstaff', 'AZ'), ('Phoenix', 'AZ'), ('Tucson', 'AZ'),
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	1042	...
				1043	]
				1044
				1045	def get_state ((city, state)):
				1046	return state
				1047
				1048	itertools.groupby(city_list, get_state) =>
				1049	('AL', iterator-1),
				1050	('AK', iterator-2),
				1051	('AZ', iterator-3), ...
				1052
				1053	where
				1054	iterator-1 =>
				1055	('Decatur', 'AL'), ('Huntsville', 'AL'), ('Selma', 'AL')
Georg Brandl	c62ef8b	2009-01-03 20:55:06 +0000	[diff] [blame]	1056	iterator-2 =>
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	1057	('Anchorage', 'AK'), ('Nome', 'AK')
				1058	iterator-3 =>
				1059	('Flagstaff', 'AZ'), ('Phoenix', 'AZ'), ('Tucson', 'AZ')
				1060
				1061	``groupby()`` assumes that the underlying iterable's contents will already be
				1062	sorted based on the key. Note that the returned iterators also use the
				1063	underlying iterable, so you have to consume the results of iterator-1 before
				1064	requesting iterator-2 and its corresponding key.
				1065
				1066
				1067	The functools module
				1068	====================
				1069
				1070	The :mod:`functools` module in Python 2.5 contains some higher-order functions.
				1071	A higher-order function takes one or more functions as input and returns a
				1072	new function. The most useful tool in this module is the
				1073	:func:`functools.partial` function.
				1074
				1075	For programs written in a functional style, you'll sometimes want to construct
				1076	variants of existing functions that have some of the parameters filled in.
				1077	Consider a Python function ``f(a, b, c)``; you may wish to create a new function
				1078	``g(b, c)`` that's equivalent to ``f(1, b, c)``; you're filling in a value for
				1079	one of ``f()``'s parameters. This is called "partial function application".
				1080
				1081	The constructor for ``partial`` takes the arguments ``(function, arg1, arg2,
				1082	... kwarg1=value1, kwarg2=value2)``. The resulting object is callable, so you
				1083	can just call it to invoke ``function`` with the filled-in arguments.
				1084
				1085	Here's a small but realistic example::
				1086
				1087	import functools
				1088
				1089	def log (message, subsystem):
				1090	"Write the contents of 'message' to the specified subsystem."
				1091	print '%s: %s' % (subsystem, message)
				1092	...
				1093
				1094	server_log = functools.partial(log, subsystem='server')
				1095	server_log('Unable to open socket')
				1096
				1097
				1098	The operator module
				1099	-------------------
				1100
				1101	The :mod:`operator` module was mentioned earlier. It contains a set of
				1102	functions corresponding to Python's operators. These functions are often useful
				1103	in functional-style code because they save you from writing trivial functions
				1104	that perform a single operation.
				1105
				1106	Some of the functions in this module are:
				1107
				1108	* Math operations: ``add()``, ``sub()``, ``mul()``, ``div()``, ``floordiv()``,
				1109	``abs()``, ...
				1110	* Logical operations: ``not_()``, ``truth()``.
				1111	* Bitwise operations: ``and_()``, ``or_()``, ``invert()``.
				1112	* Comparisons: ``eq()``, ``ne()``, ``lt()``, ``le()``, ``gt()``, and ``ge()``.
				1113	* Object identity: ``is_()``, ``is_not()``.
				1114
				1115	Consult the operator module's documentation for a complete list.
				1116
				1117
				1118
				1119	The functional module
				1120	---------------------
				1121
				1122	Collin Winter's `functional module <http://oakwinter.com/code/functional/>`__
				1123	provides a number of more advanced tools for functional programming. It also
				1124	reimplements several Python built-ins, trying to make them more intuitive to
				1125	those used to functional programming in other languages.
				1126
				1127	This section contains an introduction to some of the most important functions in
				1128	``functional``; full documentation can be found at `the project's website
				1129	<http://oakwinter.com/code/functional/documentation/>`__.
				1130
				1131	``compose(outer, inner, unpack=False)``
				1132
				1133	The ``compose()`` function implements function composition. In other words, it
				1134	returns a wrapper around the ``outer`` and ``inner`` callables, such that the
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	1135	return value from ``inner`` is fed directly to ``outer``. That is, ::
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	1136
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	1137	>>> def add(a, b):
				1138	... return a + b
				1139	...
				1140	>>> def double(a):
				1141	... return 2 * a
				1142	...
				1143	>>> compose(double, add)(5, 6)
				1144	22
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	1145
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	1146	is equivalent to ::
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	1147
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	1148	>>> double(add(5, 6))
				1149	22
Georg Brandl	c62ef8b	2009-01-03 20:55:06 +0000	[diff] [blame]	1150
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	1151	The ``unpack`` keyword is provided to work around the fact that Python functions
				1152	are not always `fully curried <http://en.wikipedia.org/wiki/Currying>`__. By
				1153	default, it is expected that the ``inner`` function will return a single object
				1154	and that the ``outer`` function will take a single argument. Setting the
				1155	``unpack`` argument causes ``compose`` to expect a tuple from ``inner`` which
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	1156	will be expanded before being passed to ``outer``. Put simply, ::
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	1157
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	1158	compose(f, g)(5, 6)
Georg Brandl	c62ef8b	2009-01-03 20:55:06 +0000	[diff] [blame]	1159
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	1160	is equivalent to::
				1161
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	1162	f(g(5, 6))
Georg Brandl	c62ef8b	2009-01-03 20:55:06 +0000	[diff] [blame]	1163
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	1164	while ::
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	1165
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	1166	compose(f, g, unpack=True)(5, 6)
Georg Brandl	c62ef8b	2009-01-03 20:55:06 +0000	[diff] [blame]	1167
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	1168	is equivalent to::
				1169
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	1170	f(*g(5, 6))
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	1171
				1172	Even though ``compose()`` only accepts two functions, it's trivial to build up a
				1173	version that will compose any number of functions. We'll use ``reduce()``,
				1174	``compose()`` and ``partial()`` (the last of which is provided by both
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	1175	``functional`` and ``functools``). ::
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	1176
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	1177	from functional import compose, partial
Georg Brandl	c62ef8b	2009-01-03 20:55:06 +0000	[diff] [blame]	1178
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	1179	multi_compose = partial(reduce, compose)
Georg Brandl	c62ef8b	2009-01-03 20:55:06 +0000	[diff] [blame]	1180
				1181
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	1182	We can also use ``map()``, ``compose()`` and ``partial()`` to craft a version of
				1183	``"".join(...)`` that converts its arguments to string::
				1184
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	1185	from functional import compose, partial
Georg Brandl	c62ef8b	2009-01-03 20:55:06 +0000	[diff] [blame]	1186
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	1187	join = compose("".join, partial(map, str))
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	1188
				1189
				1190	``flip(func)``
Georg Brandl	c62ef8b	2009-01-03 20:55:06 +0000	[diff] [blame]	1191
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	1192	``flip()`` wraps the callable in ``func`` and causes it to receive its
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	1193	non-keyword arguments in reverse order. ::
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	1194
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	1195	>>> def triple(a, b, c):
				1196	... return (a, b, c)
				1197	...
				1198	>>> triple(5, 6, 7)
				1199	(5, 6, 7)
				1200	>>>
				1201	>>> flipped_triple = flip(triple)
				1202	>>> flipped_triple(5, 6, 7)
				1203	(7, 6, 5)
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	1204
				1205	``foldl(func, start, iterable)``
Georg Brandl	c62ef8b	2009-01-03 20:55:06 +0000	[diff] [blame]	1206
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	1207	``foldl()`` takes a binary function, a starting value (usually some kind of
				1208	'zero'), and an iterable. The function is applied to the starting value and the
				1209	first element of the list, then the result of that and the second element of the
				1210	list, then the result of that and the third element of the list, and so on.
				1211
				1212	This means that a call such as::
				1213
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	1214	foldl(f, 0, [1, 2, 3])
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	1215
				1216	is equivalent to::
				1217
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	1218	f(f(f(0, 1), 2), 3)
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	1219
Georg Brandl	c62ef8b	2009-01-03 20:55:06 +0000	[diff] [blame]	1220
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	1221	``foldl()`` is roughly equivalent to the following recursive function::
				1222
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	1223	def foldl(func, start, seq):
				1224	if len(seq) == 0:
				1225	return start
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	1226
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	1227	return foldl(func, func(start, seq[0]), seq[1:])
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	1228
				1229	Speaking of equivalence, the above ``foldl`` call can be expressed in terms of
				1230	the built-in ``reduce`` like so::
				1231
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	1232	reduce(f, [1, 2, 3], 0)
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	1233
				1234
				1235	We can use ``foldl()``, ``operator.concat()`` and ``partial()`` to write a
				1236	cleaner, more aesthetically-pleasing version of Python's ``"".join(...)``
				1237	idiom::
				1238
Georg Brandl	09a7fe6	2008-03-22 11:00:48 +0000	[diff] [blame]	1239	from functional import foldl, partial from operator import concat
				1240
				1241	join = partial(foldl, concat, "")
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	1242
				1243
				1244	Revision History and Acknowledgements
				1245	=====================================
				1246
				1247	The author would like to thank the following people for offering suggestions,
				1248	corrections and assistance with various drafts of this article: Ian Bicking,
				1249	Nick Coghlan, Nick Efford, Raymond Hettinger, Jim Jewett, Mike Krell, Leandro
				1250	Lameiro, Jussi Salmela, Collin Winter, Blake Winton.
				1251
				1252	Version 0.1: posted June 30 2006.
				1253
				1254	Version 0.11: posted July 1 2006. Typo fixes.
				1255
				1256	Version 0.2: posted July 10 2006. Merged genexp and listcomp sections into one.
				1257	Typo fixes.
				1258
				1259	Version 0.21: Added more references suggested on the tutor mailing list.
				1260
				1261	Version 0.30: Adds a section on the ``functional`` module written by Collin
				1262	Winter; adds short section on the operator module; a few other edits.
				1263
				1264
				1265	References
				1266	==========
				1267
				1268	General
				1269	-------
				1270
				1271	Structure and Interpretation of Computer Programs, by Harold Abelson and
				1272	Gerald Jay Sussman with Julie Sussman. Full text at
				1273	http://mitpress.mit.edu/sicp/. In this classic textbook of computer science,
				1274	chapters 2 and 3 discuss the use of sequences and streams to organize the data
				1275	flow inside a program. The book uses Scheme for its examples, but many of the
				1276	design approaches described in these chapters are applicable to functional-style
				1277	Python code.
				1278
				1279	http://www.defmacro.org/ramblings/fp.html: A general introduction to functional
				1280	programming that uses Java examples and has a lengthy historical introduction.
				1281
				1282	http://en.wikipedia.org/wiki/Functional_programming: General Wikipedia entry
				1283	describing functional programming.
				1284
				1285	http://en.wikipedia.org/wiki/Coroutine: Entry for coroutines.
				1286
				1287	http://en.wikipedia.org/wiki/Currying: Entry for the concept of currying.
				1288
				1289	Python-specific
				1290	---------------
				1291
				1292	http://gnosis.cx/TPiP/: The first chapter of David Mertz's book
				1293	:title-reference:`Text Processing in Python` discusses functional programming
				1294	for text processing, in the section titled "Utilizing Higher-Order Functions in
				1295	Text Processing".
				1296
				1297	Mertz also wrote a 3-part series of articles on functional programming
Georg Brandl	c62ef8b	2009-01-03 20:55:06 +0000	[diff] [blame]	1298	for IBM's DeveloperWorks site; see
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	1299	`part 1 <http://www-128.ibm.com/developerworks/library/l-prog.html>`__,
				1300	`part 2 <http://www-128.ibm.com/developerworks/library/l-prog2.html>`__, and
				1301	`part 3 <http://www-128.ibm.com/developerworks/linux/library/l-prog3.html>`__,
				1302
				1303
				1304	Python documentation
				1305	--------------------
				1306
				1307	Documentation for the :mod:`itertools` module.
				1308
				1309	Documentation for the :mod:`operator` module.
				1310
				1311	:pep:`289`: "Generator Expressions"
				1312
				1313	:pep:`342`: "Coroutines via Enhanced Generators" describes the new generator
				1314	features in Python 2.5.
				1315
				1316	.. comment
				1317
				1318	Topics to place
				1319	-----------------------------
				1320
				1321	XXX os.walk()
				1322
				1323	XXX Need a large example.
				1324
				1325	But will an example add much? I'll post a first draft and see
				1326	what the comments say.
				1327
				1328	.. comment
				1329
				1330	Original outline:
				1331	Introduction
				1332	Idea of FP
				1333	Programs built out of functions
				1334	Functions are strictly input-output, no internal state
				1335	Opposed to OO programming, where objects have state
				1336
				1337	Why FP?
				1338	Formal provability
				1339	Assignment is difficult to reason about
				1340	Not very relevant to Python
				1341	Modularity
				1342	Small functions that do one thing
				1343	Debuggability:
				1344	Easy to test due to lack of state
				1345	Easy to verify output from intermediate steps
				1346	Composability
				1347	You assemble a toolbox of functions that can be mixed
				1348
				1349	Tackling a problem
				1350	Need a significant example
				1351
				1352	Iterators
				1353	Generators
				1354	The itertools module
				1355	List comprehensions
				1356	Small functions and the lambda statement
				1357	Built-in functions
				1358	map
				1359	filter
				1360	reduce
				1361
				1362	.. comment
				1363
				1364	Handy little function for printing part of an iterator -- used
				1365	while writing this document.
				1366
				1367	import itertools
				1368	def print_iter(it):
				1369	slice = itertools.islice(it, 10)
				1370	for elem in slice[:-1]:
				1371	sys.stdout.write(str(elem))
				1372	sys.stdout.write(', ')
				1373	print elem[-1]
				1374
				1375