blob: e38d31f25c766bd672f720c44b11d22de86c4305 [file] [log] [blame]
Andrew M. Kuchlinga8defaa2001-05-05 16:37:29 +00001\documentclass{howto}
2
3% $Id$
4
5\title{What's New in Python 2.2}
6\release{0.01}
7\author{A.M. Kuchling}
Andrew M. Kuchling7bf82772001-07-11 18:54:26 +00008\authoraddress{\email{akuchlin@mems-exchange.org}}
Andrew M. Kuchlinga8defaa2001-05-05 16:37:29 +00009\begin{document}
10\maketitle\tableofcontents
11
12\section{Introduction}
13
14{\large This document is a draft, and is subject to change until the
15final version of Python 2.2 is released. Currently it's not up to
16date at all. Please send any comments, bug reports, or questions, no
Andrew M. Kuchling7bf82772001-07-11 18:54:26 +000017matter how minor, to \email{akuchlin@mems-exchange.org}. }
Andrew M. Kuchlinga8defaa2001-05-05 16:37:29 +000018
Andrew M. Kuchling7bf82772001-07-11 18:54:26 +000019This article explains the new features in Python 2.2. Python 2.2
20includes some significant changes that go far toward cleaning up the
21language's darkest corners, and some exciting new features.
Andrew M. Kuchlinga8defaa2001-05-05 16:37:29 +000022
23This article doesn't attempt to provide a complete specification for
24the new features, but instead provides a convenient overview of the
25new features. For full details, you should refer to 2.2 documentation
26such as the Library Reference and the Reference Guide, or to the PEP
27for a particular new feature.
28
29The final release of Python 2.2 is planned for October 2001.
30
31%======================================================================
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +000032% It looks like this set of changes will likely get into 2.2,
33% so I need to read and digest the relevant PEPs.
Andrew M. Kuchling7bf82772001-07-11 18:54:26 +000034%\section{PEP 252: Type and Class Changes}
Andrew M. Kuchlinga8defaa2001-05-05 16:37:29 +000035
Andrew M. Kuchling7bf82772001-07-11 18:54:26 +000036%XXX
Andrew M. Kuchlinga8defaa2001-05-05 16:37:29 +000037
Andrew M. Kuchling7bf82772001-07-11 18:54:26 +000038%\begin{seealso}
Andrew M. Kuchlinga8defaa2001-05-05 16:37:29 +000039
Andrew M. Kuchling7bf82772001-07-11 18:54:26 +000040%\seepep{252}{Making Types Look More Like Classes}{Written and implemented
41%by GvR.}
Andrew M. Kuchlinga8defaa2001-05-05 16:37:29 +000042
Andrew M. Kuchling7bf82772001-07-11 18:54:26 +000043%\end{seealso}
Andrew M. Kuchlinga8defaa2001-05-05 16:37:29 +000044
45%======================================================================
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +000046\section{PEP 234: Iterators}
47
48A significant addition to 2.2 is an iteration interface at both the C
49and Python levels. Objects can define how they can be looped over by
50callers.
51
52In Python versions up to 2.1, the usual way to make \code{for item in
53obj} work is to define a \method{__getitem__()} method that looks
54something like this:
55
56\begin{verbatim}
57 def __getitem__(self, index):
58 return <next item>
59\end{verbatim}
60
61\method{__getitem__()} is more properly used to define an indexing
62operation on an object so that you can write \code{obj[5]} to retrieve
63the fifth element. It's a bit misleading when you're using this only
64to support \keyword{for} loops. Consider some file-like object that
65wants to be looped over; the \var{index} parameter is essentially
66meaningless, as the class probably assumes that a series of
67\method{__getitem__()} calls will be made, with \var{index}
68incrementing by one each time. In other words, the presence of the
69\method{__getitem__()} method doesn't mean that \code{file[5]} will
70work, though it really should.
71
72In Python 2.2, iteration can be implemented separately, and
73\method{__getitem__()} methods can be limited to classes that really
74do support random access. The basic idea of iterators is quite
75simple. A new built-in function, \function{iter(obj)}, returns an
76iterator for the object \var{obj}. (It can also take two arguments:
77\code{iter(\var{C}, \var{sentinel})} will call the callable \var{C}, until it
78returns \var{sentinel}, which will signal that the iterator is done. This form probably won't be used very often.)
79
80Python classes can define an \method{__iter__()} method, which should
81create and return a new iterator for the object; if the object is its
82own iterator, this method can just return \code{self}. In particular,
83iterators will usually be their own iterators. Extension types
84implemented in C can implement a \code{tp_iter} function in order to
85return an iterator, too.
86
87So what do iterators do? They have one required method,
88\method{next()}, which takes no arguments and returns the next value.
89When there are no more values to be returned, calling \method{next()}
90should raise the \exception{StopIteration} exception.
91
92\begin{verbatim}
93>>> L = [1,2,3]
94>>> i = iter(L)
95>>> print i
96<iterator object at 0x8116870>
97>>> i.next()
981
99>>> i.next()
1002
101>>> i.next()
1023
103>>> i.next()
104Traceback (most recent call last):
105 File "<stdin>", line 1, in ?
106StopIteration
107>>>
108\end{verbatim}
109
110In 2.2, Python's \keyword{for} statement no longer expects a sequence;
111it expects something for which \function{iter()} will return something.
112For backward compatibility, and convenience, an iterator is
113automatically constructed for sequences that don't implement
114\method{__iter__()} or a \code{tp_iter} slot, so \code{for i in
115[1,2,3]} will still work. Wherever the Python interpreter loops over
116a sequence, it's been changed to use the iterator protocol. This
117means you can do things like this:
118
119\begin{verbatim}
120>>> i = iter(L)
121>>> a,b,c = i
122>>> a,b,c
123(1, 2, 3)
124>>>
125\end{verbatim}
126
127Iterator support has been added to some of Python's basic types. The
128\keyword{in} operator now works on dictionaries, so \code{\var{key} in
129dict} is now equivalent to \code{dict.has_key(\var{key})}.
130Calling \function{iter()} on a dictionary will return an iterator which loops over their keys:
131
132\begin{verbatim}
133>>> m = {'Jan': 1, 'Feb': 2, 'Mar': 3, 'Apr': 4, 'May': 5, 'Jun': 6,
134... 'Jul': 7, 'Aug': 8, 'Sep': 9, 'Oct': 10, 'Nov': 11, 'Dec': 12}
135>>> for key in m: print key, m[key]
136...
137Mar 3
138Feb 2
139Aug 8
140Sep 9
141May 5
142Jun 6
143Jul 7
144Jan 1
145Apr 4
146Nov 11
147Dec 12
148Oct 10
149>>>
150\end{verbatim}
151
152That's just the default behaviour. If you want to iterate over keys,
153values, or key/value pairs, you can explicitly call the
154\method{iterkeys()}, \method{itervalues()}, or \method{iteritems()}
155methods to get an appropriate iterator.
156
157Files also provide an iterator, which calls its \method{readline()}
158method until there are no more lines in the file. This means you can
159now read each line of a file using code like this:
160
161\begin{verbatim}
162for line in file:
163 # do something for each line
164\end{verbatim}
165
166Note that you can only go forward in an iterator; there's no way to
167get the previous element, reset the iterator, or make a copy of it.
168An iterator object could provide such additional capabilities, but the iterator protocol only requires a \method{next()} method.
169
170\begin{seealso}
171
172\seepep{234}{Iterators}{Written by Ka-Ping Yee and GvR; implemented
173by the Python Labs crew, mostly by GvR and Tim Peters.}
174
175\end{seealso}
176
177%======================================================================
178\section{PEP 255: Simple Generators}
179
180Generators are another new feature, one that interacts with the
181introduction of iterators.
182
183You're doubtless familiar with how function calls work in Python or
184C. When you call a function, it gets a private area where its local
185variables are created. When the function reaches a \keyword{return}
186statement, the local variables are destroyed and the resulting value
187is returned to the caller. A later call to the same function will get
188a fresh new set of local variables. But, what if the local variables
189weren't destroyed on exiting a function? What if you could later
190resume the function where it left off? This is what generators
191provide; they can be thought of as resumable functions.
192
193Here's the simplest example of a generator function:
194
195\begin{verbatim}
196def generate_ints(N):
197 for i in range(N):
198 yield i
199\end{verbatim}
200
201A new keyword, \keyword{yield}, was introduced for generators. Any
202function containing a \keyword{yield} statement is a generator
203function; this is detected by Python's bytecode compiler which
204compiles the function specially. When you call a generator function,
205it doesn't return a single value; instead it returns a generator
206object that supports the iterator interface. On executing the
207\keyword{yield} statement, the generator outputs the value of
208\code{i}, similar to a \keyword{return} statement. The big difference
209between \keyword{yield} and a \keyword{return} statement is that, on
210reaching a \keyword{yield} the generator's state of execution is
211suspended and local variables are preserved. On the next call to the
212generator's \code{.next()} method, the function will resume executing
213immediately after the \keyword{yield} statement. (For complicated
214reasons, the \keyword{yield} statement isn't allowed inside the
215\keyword{try} block of a \code{try...finally} statement; read PEP 255
216for a full explanation of the interaction between \keyword{yield} and
217exceptions.)
218
219Here's a sample usage of the \function{generate_ints} generator:
220
221\begin{verbatim}
222>>> gen = generate_ints(3)
223>>> gen
224<generator object at 0x8117f90>
225>>> gen.next()
2260
227>>> gen.next()
2281
229>>> gen.next()
2302
231>>> gen.next()
232Traceback (most recent call last):
233 File "<stdin>", line 1, in ?
234 File "<stdin>", line 2, in generate_ints
235StopIteration
236>>>
237\end{verbatim}
238
239You could equally write \code{for i in generate_ints(5)}, or
240\code{a,b,c = generate_ints(3)}.
241
242Inside a generator function, the \keyword{return} statement can only
243be used without a value, and is equivalent to raising the
244\exception{StopIteration} exception; afterwards the generator cannot
245return any further values. \keyword{return} with a value, such as
246\code{return 5}, is a syntax error inside a generator function. You
247can also raise \exception{StopIteration} manually, or just let the
248thread of execution fall off the bottom of the function, to achieve
249the same effect.
250
251You could achieve the effect of generators manually by writing your
252own class, and storing all the local variables of the generator as
253instance variables. For example, returning a list of integers could
254be done by setting \code{self.count} to 0, and having the
255\method{next()} method increment \code{self.count} and return it.
256because it would be easy to write a Python class. However, for a
257moderately complicated generator, writing a corresponding class would
258be much messier. \file{Lib/test/test_generators.py} contains a number
259of more interesting examples. The simplest one implements an in-order
260traversal of a tree using generators recursively.
261
262\begin{verbatim}
263# A recursive generator that generates Tree leaves in in-order.
264def inorder(t):
265 if t:
266 for x in inorder(t.left):
267 yield x
268 yield t.label
269 for x in inorder(t.right):
270 yield x
271\end{verbatim}
272
273Two other examples in \file{Lib/test/test_generators.py} produce
274solutions for the N-Queens problem (placing $N$ queens on an $NxN$
275chess board so that no queen threatens another) and the Knight's Tour
276(a route that takes a knight to every square of an $NxN$ chessboard
277without visiting any square twice).
278
279The idea of generators comes from other programming languages,
280especially Icon (\url{http://www.cs.arizona.edu/icon/}), where the
281idea of generators is central to the language. In Icon, every
282expression and function call behaves like a generator. One example
283from ``An Overview of the Icon Programming Language'' at
284\url{http://www.cs.arizona.edu/icon/docs/ipd266.htm} gives an idea of
285what this looks like:
286
287\begin{verbatim}
288sentence := "Store it in the neighboring harbor"
289if (i := find("or", sentence)) > 5 then write(i)
290\end{verbatim}
291
292The \function{find()} function returns the indexes at which the
293substring ``or'' is found: 3, 23, 33. In the \keyword{if} statement,
294\code{i} is first assigned a value of 3, but 3 is less than 5, so the
295comparison fails, and Icon retries it with the second value of 23. 23
296is greater than 5, so the comparison now succeeds, and the code prints
297the value 23 to the screen.
298
299Python doesn't go nearly as far as Icon in adopting generators as a
300central concept. Generators are considered a new part of the core
301Python language, but learning or using them isn't compulsory; if they
302don't solve any problems that you have, feel free to ignore them.
303This is different from Icon where the idea of generators is a basic
304concept. One novel feature of Python's interface as compared to
305Icon's is that a generator's state is represented as a concrete object
306that can be passed around to other functions or stored in a data
307structure.
308
309\begin{seealso}
310
311\seepep{255}{Simple Generators}{Written by Neil Schemenauer,
312Tim Peters, Magnus Lie Hetland. Implemented mostly by Neil
313Schemenauer, with fixes from the Python Labs crew.}
314
315\end{seealso}
316
317%======================================================================
Andrew M. Kuchlinga43e7032001-06-27 20:32:12 +0000318\section{Unicode Changes}
319
320XXX I have to figure out what the changes mean to users.
321(--enable-unicode configure switch)
322
323References: http://mail.python.org/pipermail/i18n-sig/2001-June/001107.html
324and following thread.
325
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +0000326%======================================================================
327\section{PEP 227: Nested Scopes}
328
329In Python 2.1, statically nested scopes were added as an optional
330feature, to be enabled by a \code{from __future__ import
331nested_scopes} directive. In 2.2 nested scopes no longer need to be
332specially enabled, but are always enabled. The rest of this section
333is a copy of the description of nested scopes from my ``What's New in
334Python 2.1'' document; if you read it when 2.1 came out, you can skip
335the rest of this section.
336
337The largest change introduced in Python 2.1, and made complete in 2.2,
338is to Python's scoping rules. In Python 2.0, at any given time there
339are at most three namespaces used to look up variable names: local,
340module-level, and the built-in namespace. This often surprised people
341because it didn't match their intuitive expectations. For example, a
342nested recursive function definition doesn't work:
343
344\begin{verbatim}
345def f():
346 ...
347 def g(value):
348 ...
349 return g(value-1) + 1
350 ...
351\end{verbatim}
352
353The function \function{g()} will always raise a \exception{NameError}
354exception, because the binding of the name \samp{g} isn't in either
355its local namespace or in the module-level namespace. This isn't much
356of a problem in practice (how often do you recursively define interior
357functions like this?), but this also made using the \keyword{lambda}
358statement clumsier, and this was a problem in practice. In code which
359uses \keyword{lambda} you can often find local variables being copied
360by passing them as the default values of arguments.
361
362\begin{verbatim}
363def find(self, name):
364 "Return list of any entries equal to 'name'"
365 L = filter(lambda x, name=name: x == name,
366 self.list_attribute)
367 return L
368\end{verbatim}
369
370The readability of Python code written in a strongly functional style
371suffers greatly as a result.
372
373The most significant change to Python 2.2 is that static scoping has
374been added to the language to fix this problem. As a first effect,
375the \code{name=name} default argument is now unnecessary in the above
376example. Put simply, when a given variable name is not assigned a
377value within a function (by an assignment, or the \keyword{def},
378\keyword{class}, or \keyword{import} statements), references to the
379variable will be looked up in the local namespace of the enclosing
380scope. A more detailed explanation of the rules, and a dissection of
381the implementation, can be found in the PEP.
382
383This change may cause some compatibility problems for code where the
384same variable name is used both at the module level and as a local
385variable within a function that contains further function definitions.
386This seems rather unlikely though, since such code would have been
387pretty confusing to read in the first place.
388
389One side effect of the change is that the \code{from \var{module}
390import *} and \keyword{exec} statements have been made illegal inside
391a function scope under certain conditions. The Python reference
392manual has said all along that \code{from \var{module} import *} is
393only legal at the top level of a module, but the CPython interpreter
394has never enforced this before. As part of the implementation of
395nested scopes, the compiler which turns Python source into bytecodes
396has to generate different code to access variables in a containing
397scope. \code{from \var{module} import *} and \keyword{exec} make it
398impossible for the compiler to figure this out, because they add names
399to the local namespace that are unknowable at compile time.
400Therefore, if a function contains function definitions or
401\keyword{lambda} expressions with free variables, the compiler will
402flag this by raising a \exception{SyntaxError} exception.
403
404To make the preceding explanation a bit clearer, here's an example:
405
406\begin{verbatim}
407x = 1
408def f():
409 # The next line is a syntax error
410 exec 'x=2'
411 def g():
412 return x
413\end{verbatim}
414
415Line 4 containing the \keyword{exec} statement is a syntax error,
416since \keyword{exec} would define a new local variable named \samp{x}
417whose value should be accessed by \function{g()}.
418
419This shouldn't be much of a limitation, since \keyword{exec} is rarely
420used in most Python code (and when it is used, it's often a sign of a
421poor design anyway).
422=======
423%\end{seealso}
424
425\begin{seealso}
426
427\seepep{227}{Statically Nested Scopes}{Written and implemented by
428Jeremy Hylton.}
429
430\end{seealso}
431
Andrew M. Kuchlinga43e7032001-06-27 20:32:12 +0000432
433%======================================================================
Andrew M. Kuchlinga8defaa2001-05-05 16:37:29 +0000434\section{New and Improved Modules}
435
436\begin{itemize}
437
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +0000438 \item The \module{xmlrpclib} module was contributed to the standard
439library by Fredrik Lundh. It provides support for writing XML-RPC
440clients; XML-RPC is a simple remote procedure call protocol built on
441top of HTTP and XML. For example, the following snippet retrieves a
442list of RSS channels from the O'Reilly Network, and then retrieves a
443list of the recent headlines for one channel:
444
445\begin{verbatim}
446import xmlrpclib
447s = xmlrpclib.Server(
448 'http://www.oreillynet.com/meerkat/xml-rpc/server.php')
449channels = s.meerkat.getChannels()
450# channels is a list of dictionaries, like this:
451# [{'id': 4, 'title': 'Freshmeat Daily News'}
452# {'id': 190, 'title': '32Bits Online'},
453# {'id': 4549, 'title': '3DGamers'}, ... ]
454
455# Get the items for one channel
456items = s.meerkat.getItems( {'channel': 4} )
457
458# 'items' is another list of dictionaries, like this:
459# [{'link': 'http://freshmeat.net/releases/52719/',
460# 'description': 'A utility which converts HTML to XSL FO.',
461# 'title': 'html2fo 0.3 (Default)'}, ... ]
462\end{verbatim}
463
464See \url{http://www.xmlrpc.com} for more information about XML-RPC.
465
466 \item The \module{socket} module can be compiled to support IPv6;
467 specify the \code{--enable-ipv6} option to Python's configure
468 script. (Contributed by Jun-ichiro ``itojun'' Hagino.)
469
470 \item Two new format characters were added to the \module{struct}
471 module for 64-bit integers on platforms that support the C
472 \ctype{long long} type. \samp{q} is for a signed 64-bit integer,
473 and \samp{Q} is for an unsigned one. The value is returned in
474 Python's long integer type. (Contributed by Tim Peters.)
475
476 \item In the interpreter's interactive mode, there's a new built-in
477 function \function{help()}, that uses the \module{pydoc} module
478 introduced in Python 2.1 to provide interactive.
479 \code{help(\var{object})} displays any available help text about
480 \var{object}. \code{help()} with no argument puts you in an online
481 help utility, where you can enter the names of functions, classes,
482 or modules to read their help text.
483 (Contributed by Guido van Rossum, using Ka-Ping Yee's \module{pydoc} module.)
484
485 \item Various bugfixes and performance improvements have been made
486to the SRE engine underlying the \module{re} module. For example,
487\function{re.sub()} will now use \function{string.replace()}
488automatically when the pattern and its replacement are both just
489literal strings without regex metacharacters. Another contributed
490patch speeds up certain Unicode character ranges by a factor of
491two. (SRE is maintained by Fredrik Lundh. The BIGCHARSET patch
492was contributed by Martin von L\"owis.)
493
494 \item The \module{imaplib} module now has support for the IMAP
495NAMESPACE extension defined in \rfc{2342}. (Contributed by Michel
496Pelletier.)
497
Andrew M. Kuchlinga8defaa2001-05-05 16:37:29 +0000498
499\end{itemize}
500
501
502%======================================================================
503\section{Other Changes and Fixes}
504
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +0000505As usual there were a bunch of other improvements and bugfixes
506scattered throughout the source tree. A search through the CVS change
507logs finds there were XXX patches applied, and XXX bugs fixed; both
508figures are likely to be underestimates. Some of the more notable
509changes are:
Andrew M. Kuchlinga8defaa2001-05-05 16:37:29 +0000510
511\begin{itemize}
512
Andrew M. Kuchlinga8defaa2001-05-05 16:37:29 +0000513 \item XXX C API: Reorganization of object calling
514
Andrew M. Kuchling3b923fc2001-05-19 19:35:46 +0000515 \item XXX .encode(), .decode() string methods. Interesting new codecs such
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +0000516as zlib.
Andrew M. Kuchling3b923fc2001-05-19 19:35:46 +0000517
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +0000518 \item MacOS code now in main CVS tree.
519
520 \item SF patch \#418147 Fixes to allow compiling w/ Borland, from Stephen Hansen.
521
522 \item Add support for Windows using "mbcs" as the default Unicode encoding when dealing with the file system. As discussed on python-dev and in patch 410465.
523
524\item Lots of patches to dictionaries; measure performance improvement, if any.
525
526 \item Patch \#430754: Makes ftpmirror.py .netrc aware
527
528\item Fix bug reported by Tim Peters on python-dev:
529
530Keyword arguments passed to builtin functions that don't take them are
531ignored.
532
533>>> {}.clear(x=2)
534>>>
535
536instead of
537
538>>> {}.clear(x=2)
539Traceback (most recent call last):
540 File "<stdin>", line 1, in ?
541TypeError: clear() takes no keyword arguments
542
543\item Make the license GPL-compatible.
544
545\item This change adds two new C-level APIs: PyEval_SetProfile() and
546PyEval_SetTrace(). These can be used to install profile and trace
547functions implemented in C, which can operate at much higher speeds
548than Python-based functions. The overhead for calling a C-based
549profile function is a very small fraction of a percent of the overhead
550involved in calling a Python-based function.
551
552The machinery required to call a Python-based profile or trace
553function been moved to sysmodule.c, where sys.setprofile() and
554sys.setprofile() simply become users of the new interface.
555
556\item 'Advanced' xrange() features now deprecated: repeat, slice,
557contains, tolist(), and the start/stop/step attributes. This includes
558removing the 4th ('repeat') argument to PyRange_New().
559
560
561\item The call_object() function, originally in ceval.c, begins a new life
Andrew M. Kuchlinga8defaa2001-05-05 16:37:29 +0000562%as the official API PyObject_Call(). It is also much simplified: all
563%it does is call the tp_call slot, or raise an exception if that's
564%NULL.
565
566%The subsidiary functions (call_eval_code2(), call_cfunction(),
567%call_instance(), and call_method()) have all been moved to the file
568%implementing their particular object type, renamed according to the
569%local convention, and added to the type's tp_call slot. Note that
570%call_eval_code2() became function_call(); the tp_slot for class
571%objects now simply points to PyInstance_New(), which already has the
572%correct signature.
573
574%Because of these moves, there are some more new APIs that expose
575%helpers in ceval.c that are now needed outside: PyEval_GetFuncName(),
576%PyEval_GetFuncDesc(), PyEval_EvalCodeEx() (formerly get_func_name(),
577%get_func_desc(), and eval_code2().
578
579\end{itemize}
580
581
582
583%======================================================================
584\section{Acknowledgements}
585
586The author would like to thank the following people for offering
587suggestions on various drafts of this article: No one yet.
588
589\end{document}