blob: 2b3a9b21f1f59d7ee658ed5eb76e7558f7444cea [file] [log] [blame]
Andrew M. Kuchlinga8defaa2001-05-05 16:37:29 +00001\documentclass{howto}
2
3% $Id$
4
5\title{What's New in Python 2.2}
Andrew M. Kuchling0ab31b82001-08-29 01:16:54 +00006\release{0.05}
Andrew M. Kuchlinga8defaa2001-05-05 16:37:29 +00007\author{A.M. Kuchling}
Andrew M. Kuchling7bf82772001-07-11 18:54:26 +00008\authoraddress{\email{akuchlin@mems-exchange.org}}
Andrew M. Kuchlinga8defaa2001-05-05 16:37:29 +00009\begin{document}
10\maketitle\tableofcontents
11
12\section{Introduction}
13
14{\large This document is a draft, and is subject to change until the
Andrew M. Kuchling9e9c1352001-08-11 03:06:50 +000015final version of Python 2.2 is released. Currently it's up to date
16for Python 2.2 alpha 1. Please send any comments, bug reports, or
17questions, no matter how minor, to \email{akuchlin@mems-exchange.org}.
18}
Andrew M. Kuchlinga8defaa2001-05-05 16:37:29 +000019
Andrew M. Kuchling26c39bf2001-09-10 03:20:53 +000020This article explains the new features in Python 2.2.
21
22Python 2.2 can be thought of as the "cleanup release". There are some
23features such as generators and iterators that are completely new, but
24most of the changes, significant and far-reaching though they may be,
25are aimed at cleaning up irregularities and dark corners of the
26language design.
Andrew M. Kuchlinga8defaa2001-05-05 16:37:29 +000027
Andrew M. Kuchling1497b622001-09-24 14:51:16 +000028This article doesn't attempt to provide a complete specification of
Andrew M. Kuchling26c39bf2001-09-10 03:20:53 +000029the new features, but instead provides a convenient overview. For
30full details, you should refer to the documentation for Python 2.2,
Fred Drake0d002542001-07-17 13:55:33 +000031such as the
32\citetitle[http://python.sourceforge.net/devel-docs/lib/lib.html]{Python
33Library Reference} and the
34\citetitle[http://python.sourceforge.net/devel-docs/ref/ref.html]{Python
Andrew M. Kuchling26c39bf2001-09-10 03:20:53 +000035Reference Manual}.
36% XXX These \citetitle marks should get the python.org URLs for the final
Fred Drake0d002542001-07-17 13:55:33 +000037% release, just as soon as the docs are published there.
Andrew M. Kuchling26c39bf2001-09-10 03:20:53 +000038If you want to understand the complete implementation and design
39rationale for a change, refer to the PEP for a particular new feature.
Andrew M. Kuchlinga8defaa2001-05-05 16:37:29 +000040
Andrew M. Kuchling1497b622001-09-24 14:51:16 +000041
Andrew M. Kuchlinga8defaa2001-05-05 16:37:29 +000042The final release of Python 2.2 is planned for October 2001.
43
Andrew M. Kuchling1497b622001-09-24 14:51:16 +000044\begin{seealso}
45
46\url{http://www.unixreview.com/documents/s=1356/urm0109h/0109h.htm}
47{``What's So Special About Python 2.2?'' is also about the new 2.2
48features, and was written by Cameron Laird and Kathryn Soraiz.}
49
50\end{seealso}
51
Andrew M. Kuchling8cfa9052001-07-19 01:19:59 +000052
Andrew M. Kuchlinga8defaa2001-05-05 16:37:29 +000053%======================================================================
Andrew M. Kuchling26c39bf2001-09-10 03:20:53 +000054\section{PEP 252: Type and Class Changes}
Andrew M. Kuchlinga8defaa2001-05-05 16:37:29 +000055
Andrew M. Kuchlingd6e40e22001-09-10 16:18:50 +000056XXX I need to read and digest the relevant PEPs.
Andrew M. Kuchlinga8defaa2001-05-05 16:37:29 +000057
Andrew M. Kuchling26c39bf2001-09-10 03:20:53 +000058\begin{seealso}
Andrew M. Kuchlinga8defaa2001-05-05 16:37:29 +000059
Andrew M. Kuchling26c39bf2001-09-10 03:20:53 +000060\seepep{252}{Making Types Look More Like Classes}{Written and implemented
61by Guido van Rossum.}
62
Andrew M. Kuchlingd6e40e22001-09-10 16:18:50 +000063\seeurl{http://www.python.org/2.2/descrintro.html}{A tutorial
64on the type/class changes in 2.2.}
65
Andrew M. Kuchling26c39bf2001-09-10 03:20:53 +000066\end{seealso}
Andrew M. Kuchlinga8defaa2001-05-05 16:37:29 +000067
Andrew M. Kuchling8cfa9052001-07-19 01:19:59 +000068
Andrew M. Kuchlinga8defaa2001-05-05 16:37:29 +000069%======================================================================
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +000070\section{PEP 234: Iterators}
71
72A significant addition to 2.2 is an iteration interface at both the C
73and Python levels. Objects can define how they can be looped over by
74callers.
75
76In Python versions up to 2.1, the usual way to make \code{for item in
77obj} work is to define a \method{__getitem__()} method that looks
78something like this:
79
80\begin{verbatim}
81 def __getitem__(self, index):
82 return <next item>
83\end{verbatim}
84
85\method{__getitem__()} is more properly used to define an indexing
86operation on an object so that you can write \code{obj[5]} to retrieve
Andrew M. Kuchling8c69c912001-08-07 14:28:58 +000087the sixth element. It's a bit misleading when you're using this only
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +000088to support \keyword{for} loops. Consider some file-like object that
89wants to be looped over; the \var{index} parameter is essentially
90meaningless, as the class probably assumes that a series of
91\method{__getitem__()} calls will be made, with \var{index}
92incrementing by one each time. In other words, the presence of the
93\method{__getitem__()} method doesn't mean that \code{file[5]} will
94work, though it really should.
95
96In Python 2.2, iteration can be implemented separately, and
97\method{__getitem__()} methods can be limited to classes that really
98do support random access. The basic idea of iterators is quite
99simple. A new built-in function, \function{iter(obj)}, returns an
100iterator for the object \var{obj}. (It can also take two arguments:
Fred Drake0d002542001-07-17 13:55:33 +0000101\code{iter(\var{C}, \var{sentinel})} will call the callable \var{C},
102until it returns \var{sentinel}, which will signal that the iterator
103is done. This form probably won't be used very often.)
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +0000104
105Python classes can define an \method{__iter__()} method, which should
106create and return a new iterator for the object; if the object is its
107own iterator, this method can just return \code{self}. In particular,
108iterators will usually be their own iterators. Extension types
109implemented in C can implement a \code{tp_iter} function in order to
Andrew M. Kuchling4cf52a92001-07-17 12:48:48 +0000110return an iterator, and extension types that want to behave as
111iterators can define a \code{tp_iternext} function.
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +0000112
113So what do iterators do? They have one required method,
114\method{next()}, which takes no arguments and returns the next value.
115When there are no more values to be returned, calling \method{next()}
116should raise the \exception{StopIteration} exception.
117
118\begin{verbatim}
119>>> L = [1,2,3]
120>>> i = iter(L)
121>>> print i
122<iterator object at 0x8116870>
123>>> i.next()
1241
125>>> i.next()
1262
127>>> i.next()
1283
129>>> i.next()
130Traceback (most recent call last):
131 File "<stdin>", line 1, in ?
132StopIteration
133>>>
134\end{verbatim}
135
136In 2.2, Python's \keyword{for} statement no longer expects a sequence;
137it expects something for which \function{iter()} will return something.
138For backward compatibility, and convenience, an iterator is
139automatically constructed for sequences that don't implement
140\method{__iter__()} or a \code{tp_iter} slot, so \code{for i in
141[1,2,3]} will still work. Wherever the Python interpreter loops over
142a sequence, it's been changed to use the iterator protocol. This
143means you can do things like this:
144
145\begin{verbatim}
146>>> i = iter(L)
147>>> a,b,c = i
148>>> a,b,c
149(1, 2, 3)
150>>>
151\end{verbatim}
152
Andrew M. Kuchling9e9c1352001-08-11 03:06:50 +0000153Iterator support has been added to some of Python's basic types.
Fred Drake0d002542001-07-17 13:55:33 +0000154Calling \function{iter()} on a dictionary will return an iterator
Andrew M. Kuchling6ea9f0b2001-07-17 14:50:31 +0000155which loops over its keys:
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +0000156
157\begin{verbatim}
158>>> m = {'Jan': 1, 'Feb': 2, 'Mar': 3, 'Apr': 4, 'May': 5, 'Jun': 6,
159... 'Jul': 7, 'Aug': 8, 'Sep': 9, 'Oct': 10, 'Nov': 11, 'Dec': 12}
160>>> for key in m: print key, m[key]
161...
162Mar 3
163Feb 2
164Aug 8
165Sep 9
166May 5
167Jun 6
168Jul 7
169Jan 1
170Apr 4
171Nov 11
172Dec 12
173Oct 10
174>>>
175\end{verbatim}
176
177That's just the default behaviour. If you want to iterate over keys,
178values, or key/value pairs, you can explicitly call the
179\method{iterkeys()}, \method{itervalues()}, or \method{iteritems()}
Andrew M. Kuchling9e9c1352001-08-11 03:06:50 +0000180methods to get an appropriate iterator. In a minor related change,
181the \keyword{in} operator now works on dictionaries, so
182\code{\var{key} in dict} is now equivalent to
183\code{dict.has_key(\var{key})}.
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +0000184
Andrew M. Kuchling9e9c1352001-08-11 03:06:50 +0000185
186Files also provide an iterator, which calls the \method{readline()}
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +0000187method until there are no more lines in the file. This means you can
188now read each line of a file using code like this:
189
190\begin{verbatim}
191for line in file:
192 # do something for each line
193\end{verbatim}
194
195Note that you can only go forward in an iterator; there's no way to
196get the previous element, reset the iterator, or make a copy of it.
Fred Drake0d002542001-07-17 13:55:33 +0000197An iterator object could provide such additional capabilities, but the
198iterator protocol only requires a \method{next()} method.
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +0000199
200\begin{seealso}
201
202\seepep{234}{Iterators}{Written by Ka-Ping Yee and GvR; implemented
203by the Python Labs crew, mostly by GvR and Tim Peters.}
204
205\end{seealso}
206
Andrew M. Kuchling8cfa9052001-07-19 01:19:59 +0000207
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +0000208%======================================================================
209\section{PEP 255: Simple Generators}
210
211Generators are another new feature, one that interacts with the
212introduction of iterators.
213
214You're doubtless familiar with how function calls work in Python or
215C. When you call a function, it gets a private area where its local
216variables are created. When the function reaches a \keyword{return}
217statement, the local variables are destroyed and the resulting value
218is returned to the caller. A later call to the same function will get
219a fresh new set of local variables. But, what if the local variables
220weren't destroyed on exiting a function? What if you could later
221resume the function where it left off? This is what generators
222provide; they can be thought of as resumable functions.
223
224Here's the simplest example of a generator function:
225
226\begin{verbatim}
227def generate_ints(N):
228 for i in range(N):
229 yield i
230\end{verbatim}
231
232A new keyword, \keyword{yield}, was introduced for generators. Any
233function containing a \keyword{yield} statement is a generator
234function; this is detected by Python's bytecode compiler which
Andrew M. Kuchling4cf52a92001-07-17 12:48:48 +0000235compiles the function specially. Because a new keyword was
236introduced, generators must be explicitly enabled in a module by
237including a \code{from __future__ import generators} statement near
238the top of the module's source code. In Python 2.3 this statement
239will become unnecessary.
240
241When you call a generator function, it doesn't return a single value;
242instead it returns a generator object that supports the iterator
243interface. On executing the \keyword{yield} statement, the generator
244outputs the value of \code{i}, similar to a \keyword{return}
245statement. The big difference between \keyword{yield} and a
246\keyword{return} statement is that, on reaching a \keyword{yield} the
247generator's state of execution is suspended and local variables are
248preserved. On the next call to the generator's \code{.next()} method,
249the function will resume executing immediately after the
250\keyword{yield} statement. (For complicated reasons, the
251\keyword{yield} statement isn't allowed inside the \keyword{try} block
252of a \code{try...finally} statement; read PEP 255 for a full
253explanation of the interaction between \keyword{yield} and
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +0000254exceptions.)
255
256Here's a sample usage of the \function{generate_ints} generator:
257
258\begin{verbatim}
259>>> gen = generate_ints(3)
260>>> gen
261<generator object at 0x8117f90>
262>>> gen.next()
2630
264>>> gen.next()
2651
266>>> gen.next()
2672
268>>> gen.next()
269Traceback (most recent call last):
270 File "<stdin>", line 1, in ?
271 File "<stdin>", line 2, in generate_ints
272StopIteration
273>>>
274\end{verbatim}
275
276You could equally write \code{for i in generate_ints(5)}, or
277\code{a,b,c = generate_ints(3)}.
278
279Inside a generator function, the \keyword{return} statement can only
Andrew M. Kuchling4cf52a92001-07-17 12:48:48 +0000280be used without a value, and signals the end of the procession of
281values; afterwards the generator cannot return any further values.
282\keyword{return} with a value, such as \code{return 5}, is a syntax
283error inside a generator function. The end of the generator's results
284can also be indicated by raising \exception{StopIteration} manually,
285or by just letting the flow of execution fall off the bottom of the
286function.
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +0000287
288You could achieve the effect of generators manually by writing your
Andrew M. Kuchling4cf52a92001-07-17 12:48:48 +0000289own class and storing all the local variables of the generator as
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +0000290instance variables. For example, returning a list of integers could
291be done by setting \code{self.count} to 0, and having the
292\method{next()} method increment \code{self.count} and return it.
Andrew M. Kuchlingc32cc7c2001-07-17 18:25:01 +0000293However, for a moderately complicated generator, writing a
294corresponding class would be much messier.
295\file{Lib/test/test_generators.py} contains a number of more
296interesting examples. The simplest one implements an in-order
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +0000297traversal of a tree using generators recursively.
298
299\begin{verbatim}
300# A recursive generator that generates Tree leaves in in-order.
301def inorder(t):
302 if t:
303 for x in inorder(t.left):
304 yield x
305 yield t.label
306 for x in inorder(t.right):
307 yield x
308\end{verbatim}
309
310Two other examples in \file{Lib/test/test_generators.py} produce
311solutions for the N-Queens problem (placing $N$ queens on an $NxN$
312chess board so that no queen threatens another) and the Knight's Tour
313(a route that takes a knight to every square of an $NxN$ chessboard
314without visiting any square twice).
315
316The idea of generators comes from other programming languages,
317especially Icon (\url{http://www.cs.arizona.edu/icon/}), where the
318idea of generators is central to the language. In Icon, every
319expression and function call behaves like a generator. One example
320from ``An Overview of the Icon Programming Language'' at
321\url{http://www.cs.arizona.edu/icon/docs/ipd266.htm} gives an idea of
322what this looks like:
323
324\begin{verbatim}
325sentence := "Store it in the neighboring harbor"
326if (i := find("or", sentence)) > 5 then write(i)
327\end{verbatim}
328
329The \function{find()} function returns the indexes at which the
330substring ``or'' is found: 3, 23, 33. In the \keyword{if} statement,
331\code{i} is first assigned a value of 3, but 3 is less than 5, so the
332comparison fails, and Icon retries it with the second value of 23. 23
333is greater than 5, so the comparison now succeeds, and the code prints
334the value 23 to the screen.
335
336Python doesn't go nearly as far as Icon in adopting generators as a
337central concept. Generators are considered a new part of the core
338Python language, but learning or using them isn't compulsory; if they
339don't solve any problems that you have, feel free to ignore them.
340This is different from Icon where the idea of generators is a basic
341concept. One novel feature of Python's interface as compared to
342Icon's is that a generator's state is represented as a concrete object
343that can be passed around to other functions or stored in a data
344structure.
345
346\begin{seealso}
347
Andrew M. Kuchling4cf52a92001-07-17 12:48:48 +0000348\seepep{255}{Simple Generators}{Written by Neil Schemenauer, Tim
349Peters, Magnus Lie Hetland. Implemented mostly by Neil Schemenauer
350and Tim Peters, with other fixes from the Python Labs crew.}
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +0000351
352\end{seealso}
353
Andrew M. Kuchling8cfa9052001-07-19 01:19:59 +0000354
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +0000355%======================================================================
Andrew M. Kuchling2f0047a2001-09-05 14:53:31 +0000356\section{PEP 237: Unifying Long Integers and Integers}
357
Andrew M. Kuchling26c39bf2001-09-10 03:20:53 +0000358In recent versions, the distinction between regular integers, which
359are 32-bit values on most machines, and long integers, which can be of
360arbitrary size, was becoming an annoyance. For example, on platforms
361that support large files (files larger than \code{2**32} bytes), the
362\method{tell()} method of file objects has to return a long integer.
363However, there were various bits of Python that expected plain
364integers and would raise an error if a long integer was provided
Andrew M. Kuchlingd6e40e22001-09-10 16:18:50 +0000365instead. For example, in Python 1.5, only regular integers
Andrew M. Kuchling26c39bf2001-09-10 03:20:53 +0000366could be used as a slice index, and \code{'abc'[1L:]} would raise a
367\exception{TypeError} exception with the message 'slice index must be
368int'.
Andrew M. Kuchling2f0047a2001-09-05 14:53:31 +0000369
Andrew M. Kuchling26c39bf2001-09-10 03:20:53 +0000370Python 2.2 will shift values from short to long integers as required.
371The 'L' suffix is no longer needed to indicate a long integer literal,
372as now the compiler will choose the appropriate type. (Using the 'L'
373suffix will be discouraged in future 2.x versions of Python,
374triggering a warning in Python 2.4, and probably dropped in Python
3753.0.) Many operations that used to raise an \exception{OverflowError}
376will now return a long integer as their result. For example:
377
378\begin{verbatim}
379>>> 1234567890123
Andrew M. Kuchlingd6e40e22001-09-10 16:18:50 +00003801234567890123L
381>>> 2 ** 64
38218446744073709551616L
Andrew M. Kuchling26c39bf2001-09-10 03:20:53 +0000383\end{verbatim}
384
385In most cases, integers and long integers will now be treated
386identically. You can still distinguish them with the
387\function{type()} built-in function, but that's rarely needed. The
388\function{int()} function will now return a long integer if the value
389is large enough.
390
391% XXX is there a warning-enabling command-line option for this?
392
393\begin{seealso}
394
395\seepep{237}{Unifying Long Integers and Integers}{Written by
396Moshe Zadka and Guido van Rossum. Implemented mostly by Guido van Rossum.}
397
398\end{seealso}
Andrew M. Kuchling2f0047a2001-09-05 14:53:31 +0000399
400%======================================================================
Andrew M. Kuchling9e9c1352001-08-11 03:06:50 +0000401\section{PEP 238: Changing the Division Operator}
402
403The most controversial change in Python 2.2 is the start of an effort
404to fix an old design flaw that's been in Python from the beginning.
405Currently Python's division operator, \code{/}, behaves like C's
406division operator when presented with two integer arguments. It
407returns an integer result that's truncated down when there would be
408fractional part. For example, \code{3/2} is 1, not 1.5, and
409\code{(-1)/2} is -1, not -0.5. This means that the results of divison
410can vary unexpectedly depending on the type of the two operands and
411because Python is dynamically typed, it can be difficult to determine
412the possible types of the operands.
413
414(The controversy is over whether this is \emph{really} a design flaw,
415and whether it's worth breaking existing code to fix this. It's
416caused endless discussions on python-dev and in July erupted into an
417storm of acidly sarcastic postings on \newsgroup{comp.lang.python}. I
418won't argue for either side here; read PEP 238 for a summary of
419arguments and counter-arguments.)
420
421Because this change might break code, it's being introduced very
422gradually. Python 2.2 begins the transition, but the switch won't be
423complete until Python 3.0.
424
425First, some terminology from PEP 238. ``True division'' is the
426division that most non-programmers are familiar with: 3/2 is 1.5, 1/4
427is 0.25, and so forth. ``Floor division'' is what Python's \code{/}
428operator currently does when given integer operands; the result is the
429floor of the value returned by true division. ``Classic division'' is
430the current mixed behaviour of \code{/}; it returns the result of
431floor division when the operands are integers, and returns the result
432of true division when one of the operands is a floating-point number.
433
434Here are the changes 2.2 introduces:
435
436\begin{itemize}
437
438\item A new operator, \code{//}, is the floor division operator.
439(Yes, we know it looks like \Cpp's comment symbol.) \code{//}
440\emph{always} returns the floor divison no matter what the types of
441its operands are, so \code{1 // 2} is 0 and \code{1.0 // 2.0} is also
4420.0.
443
444\code{//} is always available in Python 2.2; you don't need to enable
445it using a \code{__future__} statement.
446
447\item By including a \code{from __future__ import true_division} in a
448module, the \code{/} operator will be changed to return the result of
449true division, so \code{1/2} is 0.5. Without the \code{__future__}
450statement, \code{/} still means classic division. The default meaning
451of \code{/} will not change until Python 3.0.
452
453\item Classes can define methods called \method{__truediv__} and
454\method{__floordiv__} to overload the two division operators. At the
455C level, there are also slots in the \code{PyNumberMethods} structure
456so extension types can define the two operators.
457
458% XXX a warning someday?
459
460\end{itemize}
461
462\begin{seealso}
463
464\seepep{238}{Changing the Division Operator}{Written by Moshe Zadka and
465Guido van Rossum. Implemented by Guido van Rossum..}
466
467\end{seealso}
468
469
470%======================================================================
Andrew M. Kuchlinga43e7032001-06-27 20:32:12 +0000471\section{Unicode Changes}
472
Andrew M. Kuchling2cd712b2001-07-16 13:39:08 +0000473Python's Unicode support has been enhanced a bit in 2.2. Unicode
Andrew M. Kuchlinga6d2a042001-07-20 18:34:34 +0000474strings are usually stored as UCS-2, as 16-bit unsigned integers.
Andrew M. Kuchlingf5fec3c2001-07-19 01:48:08 +0000475Python 2.2 can also be compiled to use UCS-4, 32-bit unsigned
476integers, as its internal encoding by supplying
477\longprogramopt{enable-unicode=ucs4} to the configure script. When
Andrew M. Kuchlingab010872001-07-19 14:59:53 +0000478built to use UCS-4 (a ``wide Python''), the interpreter can natively
Andrew M. Kuchlinga6d2a042001-07-20 18:34:34 +0000479handle Unicode characters from U+000000 to U+110000, so the range of
480legal values for the \function{unichr()} function is expanded
481accordingly. Using an interpreter compiled to use UCS-2 (a ``narrow
482Python''), values greater than 65535 will still cause
483\function{unichr()} to raise a \exception{ValueError} exception.
Andrew M. Kuchlingab010872001-07-19 14:59:53 +0000484
485All this is the province of the still-unimplemented PEP 261, ``Support
486for `wide' Unicode characters''; consult it for further details, and
Andrew M. Kuchlinga6d2a042001-07-20 18:34:34 +0000487please offer comments on the PEP and on your experiences with the
4882.2 alpha releases.
489% XXX update previous line once 2.2 reaches beta.
Andrew M. Kuchlingab010872001-07-19 14:59:53 +0000490
491Another change is much simpler to explain. Since their introduction,
492Unicode strings have supported an \method{encode()} method to convert
493the string to a selected encoding such as UTF-8 or Latin-1. A
494symmetric \method{decode(\optional{\var{encoding}})} method has been
495added to 8-bit strings (though not to Unicode strings) in 2.2.
496\method{decode()} assumes that the string is in the specified encoding
497and decodes it, returning whatever is returned by the codec.
498
499Using this new feature, codecs have been added for tasks not directly
500related to Unicode. For example, codecs have been added for
501uu-encoding, MIME's base64 encoding, and compression with the
502\module{zlib} module:
Andrew M. Kuchling2cd712b2001-07-16 13:39:08 +0000503
504\begin{verbatim}
505>>> s = """Here is a lengthy piece of redundant, overly verbose,
506... and repetitive text.
507... """
508>>> data = s.encode('zlib')
509>>> data
510'x\x9c\r\xc9\xc1\r\x80 \x10\x04\xc0?Ul...'
511>>> data.decode('zlib')
512'Here is a lengthy piece of redundant, overly verbose,\nand repetitive text.\n'
513>>> print s.encode('uu')
514begin 666 <data>
515M2&5R92!I<R!A(&QE;F=T:'D@<&EE8V4@;V8@<F5D=6YD86YT+"!O=F5R;'D@
516>=F5R8F]S92P*86YD(')E<&5T:71I=F4@=&5X="X*
517
518end
519>>> "sheesh".encode('rot-13')
520'furrfu'
521\end{verbatim}
Andrew M. Kuchlinga43e7032001-06-27 20:32:12 +0000522
Andrew M. Kuchlingf5fec3c2001-07-19 01:48:08 +0000523\method{encode()} and \method{decode()} were implemented by
524Marc-Andr\'e Lemburg. The changes to support using UCS-4 internally
525were implemented by Fredrik Lundh and Martin von L\"owis.
Andrew M. Kuchlinga43e7032001-06-27 20:32:12 +0000526
Andrew M. Kuchlingf5fec3c2001-07-19 01:48:08 +0000527\begin{seealso}
528
529\seepep{261}{Support for `wide' Unicode characters}{PEP written by
530Paul Prescod. Not yet accepted or fully implemented.}
531
532\end{seealso}
Andrew M. Kuchling8cfa9052001-07-19 01:19:59 +0000533
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +0000534%======================================================================
535\section{PEP 227: Nested Scopes}
536
537In Python 2.1, statically nested scopes were added as an optional
538feature, to be enabled by a \code{from __future__ import
539nested_scopes} directive. In 2.2 nested scopes no longer need to be
540specially enabled, but are always enabled. The rest of this section
541is a copy of the description of nested scopes from my ``What's New in
542Python 2.1'' document; if you read it when 2.1 came out, you can skip
543the rest of this section.
544
545The largest change introduced in Python 2.1, and made complete in 2.2,
546is to Python's scoping rules. In Python 2.0, at any given time there
547are at most three namespaces used to look up variable names: local,
548module-level, and the built-in namespace. This often surprised people
549because it didn't match their intuitive expectations. For example, a
550nested recursive function definition doesn't work:
551
552\begin{verbatim}
553def f():
554 ...
555 def g(value):
556 ...
557 return g(value-1) + 1
558 ...
559\end{verbatim}
560
561The function \function{g()} will always raise a \exception{NameError}
562exception, because the binding of the name \samp{g} isn't in either
563its local namespace or in the module-level namespace. This isn't much
564of a problem in practice (how often do you recursively define interior
565functions like this?), but this also made using the \keyword{lambda}
566statement clumsier, and this was a problem in practice. In code which
567uses \keyword{lambda} you can often find local variables being copied
568by passing them as the default values of arguments.
569
570\begin{verbatim}
571def find(self, name):
572 "Return list of any entries equal to 'name'"
573 L = filter(lambda x, name=name: x == name,
574 self.list_attribute)
575 return L
576\end{verbatim}
577
578The readability of Python code written in a strongly functional style
579suffers greatly as a result.
580
581The most significant change to Python 2.2 is that static scoping has
582been added to the language to fix this problem. As a first effect,
583the \code{name=name} default argument is now unnecessary in the above
584example. Put simply, when a given variable name is not assigned a
585value within a function (by an assignment, or the \keyword{def},
586\keyword{class}, or \keyword{import} statements), references to the
587variable will be looked up in the local namespace of the enclosing
588scope. A more detailed explanation of the rules, and a dissection of
589the implementation, can be found in the PEP.
590
591This change may cause some compatibility problems for code where the
592same variable name is used both at the module level and as a local
593variable within a function that contains further function definitions.
594This seems rather unlikely though, since such code would have been
595pretty confusing to read in the first place.
596
597One side effect of the change is that the \code{from \var{module}
598import *} and \keyword{exec} statements have been made illegal inside
599a function scope under certain conditions. The Python reference
600manual has said all along that \code{from \var{module} import *} is
601only legal at the top level of a module, but the CPython interpreter
602has never enforced this before. As part of the implementation of
603nested scopes, the compiler which turns Python source into bytecodes
604has to generate different code to access variables in a containing
605scope. \code{from \var{module} import *} and \keyword{exec} make it
606impossible for the compiler to figure this out, because they add names
607to the local namespace that are unknowable at compile time.
608Therefore, if a function contains function definitions or
609\keyword{lambda} expressions with free variables, the compiler will
610flag this by raising a \exception{SyntaxError} exception.
611
612To make the preceding explanation a bit clearer, here's an example:
613
614\begin{verbatim}
615x = 1
616def f():
617 # The next line is a syntax error
618 exec 'x=2'
619 def g():
620 return x
621\end{verbatim}
622
623Line 4 containing the \keyword{exec} statement is a syntax error,
624since \keyword{exec} would define a new local variable named \samp{x}
625whose value should be accessed by \function{g()}.
626
627This shouldn't be much of a limitation, since \keyword{exec} is rarely
628used in most Python code (and when it is used, it's often a sign of a
629poor design anyway).
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +0000630
631\begin{seealso}
632
633\seepep{227}{Statically Nested Scopes}{Written and implemented by
634Jeremy Hylton.}
635
636\end{seealso}
637
Andrew M. Kuchlinga43e7032001-06-27 20:32:12 +0000638
639%======================================================================
Andrew M. Kuchlinga8defaa2001-05-05 16:37:29 +0000640\section{New and Improved Modules}
641
642\begin{itemize}
643
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +0000644 \item The \module{xmlrpclib} module was contributed to the standard
Andrew M. Kuchling8c69c912001-08-07 14:28:58 +0000645 library by Fredrik Lundh. It provides support for writing XML-RPC
646 clients; XML-RPC is a simple remote procedure call protocol built on
647 top of HTTP and XML. For example, the following snippet retrieves a
648 list of RSS channels from the O'Reilly Network, and then retrieves a
649 list of the recent headlines for one channel:
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +0000650
651\begin{verbatim}
652import xmlrpclib
653s = xmlrpclib.Server(
654 'http://www.oreillynet.com/meerkat/xml-rpc/server.php')
655channels = s.meerkat.getChannels()
656# channels is a list of dictionaries, like this:
657# [{'id': 4, 'title': 'Freshmeat Daily News'}
658# {'id': 190, 'title': '32Bits Online'},
659# {'id': 4549, 'title': '3DGamers'}, ... ]
660
661# Get the items for one channel
662items = s.meerkat.getItems( {'channel': 4} )
663
664# 'items' is another list of dictionaries, like this:
665# [{'link': 'http://freshmeat.net/releases/52719/',
666# 'description': 'A utility which converts HTML to XSL FO.',
667# 'title': 'html2fo 0.3 (Default)'}, ... ]
668\end{verbatim}
669
Fred Drake0d002542001-07-17 13:55:33 +0000670See \url{http://www.xmlrpc.com/} for more information about XML-RPC.
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +0000671
672 \item The \module{socket} module can be compiled to support IPv6;
Andrew M. Kuchlingddeb1352001-07-16 14:35:52 +0000673 specify the \longprogramopt{enable-ipv6} option to Python's configure
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +0000674 script. (Contributed by Jun-ichiro ``itojun'' Hagino.)
675
676 \item Two new format characters were added to the \module{struct}
677 module for 64-bit integers on platforms that support the C
678 \ctype{long long} type. \samp{q} is for a signed 64-bit integer,
679 and \samp{Q} is for an unsigned one. The value is returned in
680 Python's long integer type. (Contributed by Tim Peters.)
681
682 \item In the interpreter's interactive mode, there's a new built-in
683 function \function{help()}, that uses the \module{pydoc} module
684 introduced in Python 2.1 to provide interactive.
685 \code{help(\var{object})} displays any available help text about
686 \var{object}. \code{help()} with no argument puts you in an online
687 help utility, where you can enter the names of functions, classes,
688 or modules to read their help text.
689 (Contributed by Guido van Rossum, using Ka-Ping Yee's \module{pydoc} module.)
690
691 \item Various bugfixes and performance improvements have been made
Andrew M. Kuchling4cf52a92001-07-17 12:48:48 +0000692 to the SRE engine underlying the \module{re} module. For example,
693 \function{re.sub()} will now use \function{string.replace()}
694 automatically when the pattern and its replacement are both just
695 literal strings without regex metacharacters. Another contributed
696 patch speeds up certain Unicode character ranges by a factor of
697 two. (SRE is maintained by Fredrik Lundh. The BIGCHARSET patch was
698 contributed by Martin von L\"owis.)
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +0000699
Andrew M. Kuchling1efd7ad2001-09-14 16:19:27 +0000700 \item The \module{smtplib} module now supports \rfc{2487}, ``Secure
701 SMTP over TLS'', so it's now possible to encrypt the SMTP traffic
702 between a Python program and the mail transport agent being handed a
703 message. (Contributed by Gerhard H\"aring.)
704
Andrew M. Kuchlinga6d2a042001-07-20 18:34:34 +0000705 \item The \module{imaplib} module, maintained by Piers Lauder, has
706 support for several new extensions: the NAMESPACE extension defined
707 in \rfc{2342}, SORT, GETACL and SETACL. (Contributed by Anthony
708 Baxter and Michel Pelletier.)
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +0000709
Fred Drake0d002542001-07-17 13:55:33 +0000710 \item The \module{rfc822} module's parsing of email addresses is
Andrew M. Kuchling4cf52a92001-07-17 12:48:48 +0000711 now compliant with \rfc{2822}, an update to \rfc{822}. The module's
712 name is \emph{not} going to be changed to \samp{rfc2822}.
713 (Contributed by Barry Warsaw.)
Andrew M. Kuchling77707672001-07-31 15:51:16 +0000714
715 \item New constants \constant{ascii_letters},
716 \constant{ascii_lowercase}, and \constant{ascii_uppercase} were
717 added to the \module{string} module. There were several modules in
718 the standard library that used \constant{string.letters} to mean the
719 ranges A-Za-z, but that assumption is incorrect when locales are in
720 use, because \constant{string.letters} varies depending on the set
721 of legal characters defined by the current locale. The buggy
722 modules have all been fixed to use \constant{ascii_letters} instead.
723 (Reported by an unknown person; fixed by Fred L. Drake, Jr.)
724
Andrew M. Kuchling8c69c912001-08-07 14:28:58 +0000725 \item The \module{mimetypes} module now makes it easier to use
726 alternative MIME-type databases by the addition of a
727 \class{MimeTypes} class, which takes a list of filenames to be
728 parsed. (Contributed by Fred L. Drake, Jr.)
729
Andrew M. Kuchlingd6e40e22001-09-10 16:18:50 +0000730 \item A \class{Timer} class was added to the \module{threading}
731 module that allows scheduling an activity to happen at some future
732 time. (Contributed by Itamar Shtull-Trauring.)
Andrew M. Kuchling2f0047a2001-09-05 14:53:31 +0000733
Andrew M. Kuchling77707672001-07-31 15:51:16 +0000734\end{itemize}
735
736
737%======================================================================
738\section{Interpreter Changes and Fixes}
739
740Some of the changes only affect people who deal with the Python
741interpreter at the C level, writing Python extension modules,
742embedding the interpreter, or just hacking on the interpreter itself.
743If you only write Python code, none of the changes described here will
744affect you very much.
745
746\begin{itemize}
747
748 \item Profiling and tracing functions can now be implemented in C,
749 which can operate at much higher speeds than Python-based functions
750 and should reduce the overhead of enabling profiling and tracing, so
751 it will be of interest to authors of development environments for
752 Python. Two new C functions were added to Python's API,
753 \cfunction{PyEval_SetProfile()} and \cfunction{PyEval_SetTrace()}.
754 The existing \function{sys.setprofile()} and
755 \function{sys.settrace()} functions still exist, and have simply
756 been changed to use the new C-level interface. (Contributed by Fred
757 L. Drake, Jr.)
758
759 \item Another low-level API, primarily of interest to implementors
760 of Python debuggers and development tools, was added.
761 \cfunction{PyInterpreterState_Head()} and
762 \cfunction{PyInterpreterState_Next()} let a caller walk through all
763 the existing interpreter objects;
764 \cfunction{PyInterpreterState_ThreadHead()} and
765 \cfunction{PyThreadState_Next()} allow looping over all the thread
766 states for a given interpreter. (Contributed by David Beazley.)
767
768 \item A new \samp{et} format sequence was added to
769 \cfunction{PyArg_ParseTuple}; \samp{et} takes both a parameter and
770 an encoding name, and converts the parameter to the given encoding
771 if the parameter turns out to be a Unicode string, or leaves it
772 alone if it's an 8-bit string, assuming it to already be in the
773 desired encoding. This differs from the \samp{es} format character,
774 which assumes that 8-bit strings are in Python's default ASCII
775 encoding and converts them to the specified new encoding.
Andrew M. Kuchlingd6e40e22001-09-10 16:18:50 +0000776 (Contributed by M.-A. Lemburg.)
Andrew M. Kuchling0ab31b82001-08-29 01:16:54 +0000777
778 \item Two new flags \constant{METH_NOARGS} and \constant{METH_O} are
779 available in method definition tables to simplify implementation of
780 methods with no arguments or a single untyped argument. Calling
781 such methods is more efficient than calling a corresponding method
782 that uses \constant{METH_VARARGS}.
783 Also, the old \constant{METH_OLDARGS} style of writing C methods is
784 now officially deprecated.
785
786\item
787 Two new wrapper functions, \cfunction{PyOS_snprintf()} and
788 \cfunction{PyOS_vsnprintf()} were added. which provide a
789 cross-platform implementations for the relatively new
790 \cfunction{snprintf()} and \cfunction{vsnprintf()} C lib APIs. In
791 contrast to the standard \cfunction{sprintf()} and
792 \cfunction{vsprintf()} functions, the Python versions check the
793 bounds of the buffer used to protect against buffer overruns.
794 (Contributed by M.-A. Lemburg.)
Andrew M. Kuchling77707672001-07-31 15:51:16 +0000795
Andrew M. Kuchlinga8defaa2001-05-05 16:37:29 +0000796\end{itemize}
797
798
799%======================================================================
800\section{Other Changes and Fixes}
801
Andrew M. Kuchling8cfa9052001-07-19 01:19:59 +0000802% XXX update the patch and bug figures as we go
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +0000803As usual there were a bunch of other improvements and bugfixes
804scattered throughout the source tree. A search through the CVS change
Andrew M. Kuchlingd6e40e22001-09-10 16:18:50 +0000805logs finds there were 119 patches applied, and 179 bugs fixed; both
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +0000806figures are likely to be underestimates. Some of the more notable
807changes are:
Andrew M. Kuchlinga8defaa2001-05-05 16:37:29 +0000808
809\begin{itemize}
810
Andrew M. Kuchling0e03f582001-08-30 21:30:16 +0000811 \item The code for the MacOS port for Python, maintained by Jack
812 Jansen, is now kept in the main Python CVS tree, and many changes
813 have been made to support MacOS X.
814
815The most significant change is the ability to build Python as a
816framework, enabled by supplying the \longprogramopt{enable-framework}
817option to the configure script when compiling Python. According to
818Jack Jansen, ``This installs a self-contained Python installation plus
819the OSX framework "glue" into
820\file{/Library/Frameworks/Python.framework} (or another location of
821choice). For now there is little immediate added benefit to this
822(actually, there is the disadvantage that you have to change your PATH
823to be able to find Python), but it is the basis for creating a
824full-blown Python application, porting the MacPython IDE, possibly
825using Python as a standard OSA scripting language and much more.''
826
827Most of the MacPython toolbox modules, which interface to MacOS APIs
828such as windowing, QuickTime, scripting, etc. have been ported to OS
829X, but they've been left commented out in setup.py. People who want
830to experiment with these modules can uncomment them manually.
831
832% Jack's original comments:
833%The main change is the possibility to build Python as a
834%framework. This installs a self-contained Python installation plus the
835%OSX framework "glue" into /Library/Frameworks/Python.framework (or
836%another location of choice). For now there is little immedeate added
837%benefit to this (actually, there is the disadvantage that you have to
838%change your PATH to be able to find Python), but it is the basis for
839%creating a fullblown Python application, porting the MacPython IDE,
840%possibly using Python as a standard OSA scripting language and much
841%more. You enable this with "configure --enable-framework".
842
843%The other change is that most MacPython toolbox modules, which
844%interface to all the MacOS APIs such as windowing, quicktime,
845%scripting, etc. have been ported. Again, most of these are not of
846%immedeate use, as they need a full application to be really useful, so
847%they have been commented out in setup.py. People wanting to experiment
848%can uncomment them. Gestalt and Internet Config modules are enabled by
849%default.
850
851
Andrew M. Kuchling2cd712b2001-07-16 13:39:08 +0000852 \item Keyword arguments passed to builtin functions that don't take them
853 now cause a \exception{TypeError} exception to be raised, with the
854 message "\var{function} takes no keyword arguments".
855
Andrew M. Kuchling94a7eba2001-08-15 15:55:48 +0000856 \item A new script, \file{Tools/scripts/cleanfuture.py} by Tim
857 Peters, automatically removes obsolete \code{__future__} statements
858 from Python source code.
Andrew M. Kuchling2cd712b2001-07-16 13:39:08 +0000859
860 \item The new license introduced with Python 1.6 wasn't
861 GPL-compatible. This is fixed by some minor textual changes to the
862 2.2 license, so Python can now be embedded inside a GPLed program
863 again. The license changes were also applied to the Python 2.0.1
864 and 2.1.1 releases.
865
Andrew M. Kuchlingf4ccf582001-07-31 01:11:36 +0000866 \item When presented with a Unicode filename on Windows, Python will
867 now convert it to an MBCS encoded string, as used by the Microsoft
868 file APIs. As MBCS is explicitly used by the file APIs, Python's
869 choice of ASCII as the default encoding turns out to be an
870 annoyance.
Andrew M. Kuchling8cfa9052001-07-19 01:19:59 +0000871 (Contributed by Mark Hammond with assistance from Marc-Andr\'e
872 Lemburg.)
873
Andrew M. Kuchlingd6e40e22001-09-10 16:18:50 +0000874 \item Large file support is now enabled on Windows. (Contributed by
875 Tim Peters.)
876
Andrew M. Kuchling2cd712b2001-07-16 13:39:08 +0000877 \item The \file{Tools/scripts/ftpmirror.py} script
878 now parses a \file{.netrc} file, if you have one.
Andrew M. Kuchling4cf52a92001-07-17 12:48:48 +0000879 (Contributed by Mike Romberg.)
Andrew M. Kuchling2cd712b2001-07-16 13:39:08 +0000880
Andrew M. Kuchling4cf52a92001-07-17 12:48:48 +0000881 \item Some features of the object returned by the
882 \function{xrange()} function are now deprecated, and trigger
883 warnings when they're accessed; they'll disappear in Python 2.3.
884 \class{xrange} objects tried to pretend they were full sequence
885 types by supporting slicing, sequence multiplication, and the
886 \keyword{in} operator, but these features were rarely used and
887 therefore buggy. The \method{tolist()} method and the
888 \member{start}, \member{stop}, and \member{step} attributes are also
889 being deprecated. At the C level, the fourth argument to the
890 \cfunction{PyRange_New()} function, \samp{repeat}, has also been
891 deprecated.
892
Andrew M. Kuchling8cfa9052001-07-19 01:19:59 +0000893 \item There were a bunch of patches to the dictionary
894 implementation, mostly to fix potential core dumps if a dictionary
895 contains objects that sneakily changed their hash value, or mutated
896 the dictionary they were contained in. For a while python-dev fell
897 into a gentle rhythm of Michael Hudson finding a case that dump
898 core, Tim Peters fixing it, Michael finding another case, and round
899 and round it went.
900
Andrew M. Kuchling33a3b632001-09-04 21:25:58 +0000901 \item On Windows, Python can now be compiled with Borland C thanks
902 to a number of patches contributed by Stephen Hansen, though the
903 result isn't fully functional yet. (But this \emph{is} progress...)
Andrew M. Kuchling8c69c912001-08-07 14:28:58 +0000904
Andrew M. Kuchlingf4ccf582001-07-31 01:11:36 +0000905 \item Another Windows enhancement: Wise Solutions generously offered
906 PythonLabs use of their InstallerMaster 8.1 system. Earlier
907 PythonLabs Windows installers used Wise 5.0a, which was beginning to
908 show its age. (Packaged up by Tim Peters.)
909
Andrew M. Kuchling8c69c912001-08-07 14:28:58 +0000910 \item Files ending in \samp{.pyw} can now be imported on Windows.
911 \samp{.pyw} is a Windows-only thing, used to indicate that a script
912 needs to be run using PYTHONW.EXE instead of PYTHON.EXE in order to
913 prevent a DOS console from popping up to display the output. This
914 patch makes it possible to import such scripts, in case they're also
915 usable as modules. (Implemented by David Bolen.)
916
Andrew M. Kuchling8cfa9052001-07-19 01:19:59 +0000917 \item On platforms where Python uses the C \cfunction{dlopen()} function
918 to load extension modules, it's now possible to set the flags used
919 by \cfunction{dlopen()} using the \function{sys.getdlopenflags()} and
920 \function{sys.setdlopenflags()} functions. (Contributed by Bram Stolk.)
Andrew M. Kuchling2f0047a2001-09-05 14:53:31 +0000921
Andrew M. Kuchling26c39bf2001-09-10 03:20:53 +0000922 \item The \function{pow()} built-in function no longer supports 3
923 arguments when floating-point numbers are supplied.
Andrew M. Kuchling1497b622001-09-24 14:51:16 +0000924 \code{pow(\var{x}, \var{y}, \var{z})} returns \code{(x**y) \% z}, but
Andrew M. Kuchling26c39bf2001-09-10 03:20:53 +0000925 this is never useful for floating point numbers, and the final
926 result varies unpredictably depending on the platform. A call such
Andrew M. Kuchlingd6e40e22001-09-10 16:18:50 +0000927 as \code{pow(2.0, 8.0, 7.0)} will now raise a \exception{TypeError}
Andrew M. Kuchling26c39bf2001-09-10 03:20:53 +0000928 exception.
Andrew M. Kuchling77707672001-07-31 15:51:16 +0000929
Andrew M. Kuchlinga8defaa2001-05-05 16:37:29 +0000930\end{itemize}
931
932
Andrew M. Kuchlinga8defaa2001-05-05 16:37:29 +0000933%======================================================================
934\section{Acknowledgements}
935
936The author would like to thank the following people for offering
Andrew M. Kuchling6ea9f0b2001-07-17 14:50:31 +0000937suggestions and corrections to various drafts of this article: Fred
Andrew M. Kuchling9e9c1352001-08-11 03:06:50 +0000938Bremmer, Keith Briggs, Fred L. Drake, Jr., Carel Fellinger, Mark
Andrew M. Kuchling33a3b632001-09-04 21:25:58 +0000939Hammond, Stephen Hansen, Jack Jansen, Marc-Andr\'e Lemburg, Tim Peters, Neil
Andrew M. Kuchling0e03f582001-08-30 21:30:16 +0000940Schemenauer, Guido van Rossum.
Andrew M. Kuchlinga8defaa2001-05-05 16:37:29 +0000941
942\end{document}