blob: f68b9839aebe90d1d4d18ddc1bbd03d43f53b556 [file] [log] [blame]
Andrew M. Kuchling4855b022001-10-23 20:26:16 +00001
Andrew M. Kuchlinga8defaa2001-05-05 16:37:29 +00002\documentclass{howto}
3
4% $Id$
5
6\title{What's New in Python 2.2}
Andrew M. Kuchlingd4707e32001-09-28 20:46:46 +00007\release{0.06}
Andrew M. Kuchlinga8defaa2001-05-05 16:37:29 +00008\author{A.M. Kuchling}
Andrew M. Kuchling7bf82772001-07-11 18:54:26 +00009\authoraddress{\email{akuchlin@mems-exchange.org}}
Andrew M. Kuchlinga8defaa2001-05-05 16:37:29 +000010\begin{document}
11\maketitle\tableofcontents
12
13\section{Introduction}
14
15{\large This document is a draft, and is subject to change until the
Andrew M. Kuchling9e9c1352001-08-11 03:06:50 +000016final version of Python 2.2 is released. Currently it's up to date
Andrew M. Kuchlingd4707e32001-09-28 20:46:46 +000017for Python 2.2 alpha 4. Please send any comments, bug reports, or
Andrew M. Kuchling9e9c1352001-08-11 03:06:50 +000018questions, no matter how minor, to \email{akuchlin@mems-exchange.org}.
19}
Andrew M. Kuchlinga8defaa2001-05-05 16:37:29 +000020
Andrew M. Kuchling26c39bf2001-09-10 03:20:53 +000021This article explains the new features in Python 2.2.
22
23Python 2.2 can be thought of as the "cleanup release". There are some
24features such as generators and iterators that are completely new, but
25most of the changes, significant and far-reaching though they may be,
26are aimed at cleaning up irregularities and dark corners of the
27language design.
Andrew M. Kuchlinga8defaa2001-05-05 16:37:29 +000028
Andrew M. Kuchling1497b622001-09-24 14:51:16 +000029This article doesn't attempt to provide a complete specification of
Andrew M. Kuchling26c39bf2001-09-10 03:20:53 +000030the new features, but instead provides a convenient overview. For
31full details, you should refer to the documentation for Python 2.2,
Fred Drake0d002542001-07-17 13:55:33 +000032such as the
33\citetitle[http://python.sourceforge.net/devel-docs/lib/lib.html]{Python
34Library Reference} and the
35\citetitle[http://python.sourceforge.net/devel-docs/ref/ref.html]{Python
Andrew M. Kuchling26c39bf2001-09-10 03:20:53 +000036Reference Manual}.
37% XXX These \citetitle marks should get the python.org URLs for the final
Fred Drake0d002542001-07-17 13:55:33 +000038% release, just as soon as the docs are published there.
Andrew M. Kuchling26c39bf2001-09-10 03:20:53 +000039If you want to understand the complete implementation and design
40rationale for a change, refer to the PEP for a particular new feature.
Andrew M. Kuchlinga8defaa2001-05-05 16:37:29 +000041
Andrew M. Kuchling8b42f012001-10-22 02:00:11 +000042The final release of Python 2.2 is planned for December 2001.
Andrew M. Kuchlinga8defaa2001-05-05 16:37:29 +000043
Andrew M. Kuchling1497b622001-09-24 14:51:16 +000044\begin{seealso}
45
46\url{http://www.unixreview.com/documents/s=1356/urm0109h/0109h.htm}
47{``What's So Special About Python 2.2?'' is also about the new 2.2
48features, and was written by Cameron Laird and Kathryn Soraiz.}
49
50\end{seealso}
51
Andrew M. Kuchling8cfa9052001-07-19 01:19:59 +000052
Andrew M. Kuchlinga8defaa2001-05-05 16:37:29 +000053%======================================================================
Andrew M. Kuchling26c39bf2001-09-10 03:20:53 +000054\section{PEP 252: Type and Class Changes}
Andrew M. Kuchlinga8defaa2001-05-05 16:37:29 +000055
Andrew M. Kuchling279e7442001-10-22 02:03:40 +000056The largest and most far-reaching changes in Python 2.2 are to
57Python's model of objects and classes. The changes should be backward
58compatible, so it's likely that your code will continue to run
59unchanged, but the changes provide some amazing new capabilities.
60Before beginning this, the longest and most complicated section of
61this article, I'll provide an overview of the changes and offer some
62comments.
Andrew M. Kuchlinga8defaa2001-05-05 16:37:29 +000063
Andrew M. Kuchling279e7442001-10-22 02:03:40 +000064A long time ago I wrote a Web page
65(\url{http://www.amk.ca/python/writing/warts.html}) listing flaws in
66Python's design. One of the most significant flaws was that it's
67impossible to subclass Python types implemented in C. In particular,
68it's not possible to subclass built-in types, so you can't just
69subclass, say, lists in order to add a single useful method to them.
70The \module{UserList} module provides a class that supports all of the
71methods of lists and that can be subclassed further, but there's lots
72of C code that expects a regular Python list and won't accept a
73\class{UserList} instance.
Andrew M. Kuchlinga8defaa2001-05-05 16:37:29 +000074
Andrew M. Kuchling279e7442001-10-22 02:03:40 +000075Python 2.2 fixes this, and in the process adds some exciting new
76capabilities. A brief summary:
Andrew M. Kuchling26c39bf2001-09-10 03:20:53 +000077
Andrew M. Kuchling279e7442001-10-22 02:03:40 +000078\begin{itemize}
Andrew M. Kuchlingd6e40e22001-09-10 16:18:50 +000079
Andrew M. Kuchling279e7442001-10-22 02:03:40 +000080\item You can subclass built-in types such as lists and even integers,
81and your subclasses should work in every place that requires the
82original type.
83
84\item It's now possible to define static and class methods, in addition
85to the instance methods available in previous versions of Python.
86
87\item It's also possible to automatically call methods on accessing or
88setting an instance attribute by using a new mechanism called
89\dfn{properties}. Many uses of \method{__getattr__} can be rewritten
90to use properties instead, making the resulting code simpler and
91faster. As a small side benefit, attributes can now have docstrings,
92too.
93
94\item The list of legal attributes for an instance can be limited to a
95particular set using \dfn{slots}, making it possible to safeguard
96against typos and perhaps make more optimizations possible in future
97versions of Python.
98
99\end{itemize}
100
101Some users have voiced concern about all these changes. Sure, they
102say, the new features are neat and lend themselves to all sorts of
103tricks that weren't possible in previous versions of Python, but
104they also make the language more complicated. Some people have said
105that they've always recommended Python for its simplicity, and feel
106that its simplicity is being lost.
107
108Personally, I think there's no need to worry. Many of the new
109features are quite esoteric, and you can write a lot of Python code
110without ever needed to be aware of them. Writing a simple class is no
111more difficult than it ever was, so you don't need to bother learning
112or teaching them unless they're actually needed. Some very
113complicated tasks that were previously only possible from C will now
114be possible in pure Python, and to my mind that's all for the better.
115
116I'm not going to attempt to cover every single corner case and small
117change that were required to make the new features work. Instead this
118section will paint only the broad strokes. See section~\cite{sect-rellinks},
119``Related Links'', for further sources of information about Python 2.2's new
120object model.
121
122
123\subsection{Old and New Classes}
124
125First, you should know that Python 2.2 really has two kinds of
126classes: classic or old-style classes, and new-style classes. The
127old-style class model is exactly the same as the class model in
128earlier versions of Python. All the new features described in this
129section apply only to new-style classes. This divergence isn't
130intended to last forever; eventually old-style classes will be
131dropped, possibly in Python 3.0.
132
Andrew M. Kuchling4855b022001-10-23 20:26:16 +0000133So how do you define a new-style class? You do it by subclassing an
134existing new-style class. Most of Python's built-in types, such as
135integers, lists, dictionaries, and even files, are new-style classes
136now. A new-style class named \class{object}, the base class for all
137built-in types, has been also been added so if no built-in type is
138suitable, you can just subclass \class{object}:
139
140\begin{verbatim}
141class C(object):
142 def __init__ (self):
143 ...
144 ...
145\end{verbatim}
146
147This means that \keyword{class} statements that don't have any base
148classes are always classic classes in Python 2.2. There's actually a
149way to make new-style classes without any base classes, by setting the
150\member{__metaclass__} variable to XXX. (What do you set it to?)
151
152The type objects for the built-in types are available as built-ins,
153named using a clever trick. Python has always had built-in functions
154named \function{int()}, \function{float()}, and \function{str()}. In
1552.2, they aren't functions any more, but type objects that behave as
156factories when called.
157
158\begin{verbatim}
159>>> int
160<type 'int'>
161>>> int('123')
162123
163\end{verbatim}
164
165To make the set of types complete, new type objects such as
166\function{dictionary} and \function{file} have been added.
167
168Here's a more interesting example. The following class subclasses
169Python's dictionary implementation in order to automatically fold all
170dictionary keys to lowercase.
171
172\begin{verbatim}
173class LowerCaseDict(dictionary):
174 def _fold_key (self, key):
175 if not isinstance(key, str):
176 raise TypeError, "All keys must be strings"
177 return key.lower()
178
179 def __getitem__ (self, key):
180 key = self._fold_key(key)
181 return dictionary.__getitem__(self, key)
182
183 def __setitem__ (self, key, value):
184 key = self._fold_key(key)
185 dictionary.__setitem__(self, key, value)
186
187 def __delitem__ (self, key):
188 key = self._fold_key(key)
189 dictionary.__delitem__(self, key, value)
190\end{verbatim}
191
192Trying out this class, it works as you'd expect:
193
194\begin{verbatim}
195>>> d = LowerCaseDict()
196>>> d['ABC'] = 1
197>>> d['abc']
1981
199\end{verbatim}
200
201However, because it's a subclass of Python's dictionary type,
202instances of \class{LowerCaseDict} can be used in most places where a
203regular dictionary is required.
204
205\begin{verbatim}
206>>> d = LowerCaseDict()
207>>> exec 'Name = 1' in d
208>>> print d.items()
209XXX
210>>> exec 'nAmE = name + 1' in d
211>>> print d.items()
212XXX
213\end{verbatim}
214
215And now you can have Python with case-insensitive variable names! One
216of the nice things about Python 2.2 is that it makes Python flexible
217enough to solve many other past problems without hacking Python's C
218code. If you want a case-insensitive Python environment, using a
219case-folding dictionary and writing a case-insensitive tokenizer using
220the compiler package (now automatically installed in 2.2) will make it
221a straightforward.
Andrew M. Kuchling279e7442001-10-22 02:03:40 +0000222
223
224\subsection{Descriptors}
225
226In previous versions of Python, there was no consistent way to
227discover what attributes and methods were supported by an object.
228There were some informal conventions, such as defining
229\member{__members__} and \member{__methods__} attributes that were
230lists of names, but often the author of an extension type or a class
231wouldn't bother to define them. You could fall back on inspecting the
232\member{__dict__} of an object, but when class inheritance or an
233arbitrary \method{__getattr__} hook were in use this could still be
234inaccurate.
235
236The one big idea underlying the new class model is that an API for
237describing the attributes of an object using \dfn{descriptors} has
238been formalized. Descriptors specify the value of an attribute,
239stating whether it's a method or a field. With the descriptor API,
240static methods and class methods become possible, as well as more
241exotic constructs.
242
243Attribute descriptors are objects that live inside class objects, and
244have a few attributes of their own:
245
246\begin{itemize}
247
248\item \member{__name__} is the attribute's name.
249
250\item \member{__doc__} is the attribute's docstring.
251
252\item \method{__get__(\var{object})} is a method that retrieves the attribute value from \var{object}.
253
254\item \method{__get__(\var{object}, \var{value})} sets the attribute
255on \var{object} to \var{value}.
256
257\end{itemize}
258
259For example, when you write \code{obj.x}, the steps that Python
260actually performs are:
261
262\begin{verbatim}
263descriptor = obj.__class__.x
264descriptor.get(obj)
265\end{verbatim}
266
267For methods, \method{descriptor.get} returns a temporary object that's
268callable, and wraps up the instance and the method to be called on it.
269This is also why static methods and class methods are now possible;
270they have descriptors that wrap up just the method, or the method and
271the class. As a brief explanation of these new kinds of methods,
272static methods aren't passed the instance, and therefore resemble
273regular functions. Class methods are passed the class of the object,
274but not the object itself. Static and class methods is defined like
275this:
276
277\begin{verbatim}
278class C:
279 def f(arg1, arg2):
280 ...
281 f = staticmethod(f)
282
283 def g(cls, arg1, arg2):
284 ...
285 g = classmethod(g)
286\end{verbatim}
287
288The \function{staticmethod()} function takes the function
289\function{f}, and returns it wrapped up in a descriptor so it can be
290stored in the class object. You might expect there to be special
291syntax for creating such methods (\code{def static f()},
292\code{defstatic f()}, or something like that) but no such syntax has
293been defined yet; that's been left for future versions.
294
295More new features, such as slots and properties, are also implemented
296as new kinds of descriptors, and it's not difficult to write a
297descriptor class that does something novel. For example, it would be
298possible to write a descriptor class that made it possible to write
299Eiffel-style preconditions and postconditions for a method. A class
300that used this feature might be defined like this:
301
302\begin{verbatim}
303from eiffel import eiffelmethod
304
305class C:
306 def f(self, arg1, arg2):
307 # The actual function
308 def pre_f(self):
309 # Check preconditions
310 def post_f(self):
311 # Check postconditions
312
313 f = eiffelmethod(f, pre_f, post_f)
314\end{verbatim}
315
316Note that a person using the new \function{eiffelmethod()} doesn't
317have to understand anything about descriptors. This is why I think
318the new features don't increase the basic complexity of the language.
319There will be a few wizards who need to know about it in order to
320write \function{eiffelmethod()} or the ZODB or whatever, but most
321users will just write code on top of the resulting libraries and
322ignore the implementation details.
323
Andrew M. Kuchling4855b022001-10-23 20:26:16 +0000324\subsection{Multiple Inheritance: The Diamond Rule}
Andrew M. Kuchling279e7442001-10-22 02:03:40 +0000325
Andrew M. Kuchling4855b022001-10-23 20:26:16 +0000326Multiple inheritance has also been made more useful through changing
327the rules under which names are resolved. Consider this set of classes
328(diagram taken from \pep{253} by Guido van Rossum):
329
330\begin{verbatim}
331 class A:
332 ^ ^ def save(self): ...
333 / \
334 / \
335 / \
336 / \
337 class B class C:
338 ^ ^ def save(self): ...
339 \ /
340 \ /
341 \ /
342 \ /
343 class D
344\end{verbatim}
345
346The lookup rule for classic classes is simple but not very smart; the
347base classes are searched depth-first, going from left to right. A
348reference to \method{D.save} will search the classes \class{D},
349\class{B}, and then \class{A}, where \method{save()} would be found
350and returned. \method{C.save()} would never be found at all. This is
351bad, because if \class{C}'s \method{save()} method is saving some
352internal state specific to \class{C}, not calling it will result in
353that state never getting saved.
354
355New-style classes follow a different algorithm that's a bit more
356complicated to explain, but does the right thing in this situation.
357
358\begin{enumerate}
359
360\item List all the base classes, following the classic lookup rule and
361include a class multiple times if it's visited repeatedly. In the
362above example, the list of visited classes is [\class{D}, \class{B},
363\class{A}, \class{C}, class{A}].
364
365\item Scan the list for duplicated classes. If any are found, remove
366all but one occurrence, leaving the \emph{last} one in the list. In
367the above example, the list becomes [\class{D}, \class{B}, \class{C},
368class{A}] after dropping duplicates.
369
370\end{enumerate}
371
372Following this rule, referring to \method{D.save()} will return
373\method{C.save()}, which is the behaviour we're after. This lookup
374rule is the same as the one followed by XXX Common Lisp?.
375
Andrew M. Kuchling279e7442001-10-22 02:03:40 +0000376
377\subsection{Attribute Access}
378
379XXX __getattribute__, __getattr__
380
Andrew M. Kuchling4855b022001-10-23 20:26:16 +0000381XXX properties, slots
382
383
Andrew M. Kuchling279e7442001-10-22 02:03:40 +0000384\subsection{Related Links}
385\ref{sect-rellinks}
386
387This section has just been a quick overview of the new features,
388giving enough of an explanation to start you programming, but many
389details have been simplified or ignored. Where should you go to get a
390more complete picture?
391
392\url{http://www.python.org/2.2/descrintro.html} is a tutorial
393introduction to the descriptor features, written by Guido van Rossum.
394% XXX read it and comment on it
395
396Next, there are two relevant PEPs, \pep{252} and \pep{253}. \pep{252}
397is titled "Making Types Look More Like Classes", and covers the
398descriptor API. \pep{253} is titled "Subtyping Built-in Types", and
399describes the changes to type objects that make it possible to subtype
400built-in objects. This is the more complicated PEP of the two, and at
401a few points the necessary explanations of types and meta-types may
402cause your head to explode. Both PEPs were written and implemented by
403Guido van Rossum, with substantial assistance from the rest of the
404Zope Corp. team.
405
406Finally, there's the ultimate authority: the source code.
Andrew M. Kuchling4855b022001-10-23 20:26:16 +0000407typeobject.c, others?
Andrew M. Kuchling279e7442001-10-22 02:03:40 +0000408% XXX point people at the right files
Andrew M. Kuchlinga8defaa2001-05-05 16:37:29 +0000409
Andrew M. Kuchling8cfa9052001-07-19 01:19:59 +0000410
Andrew M. Kuchlinga8defaa2001-05-05 16:37:29 +0000411%======================================================================
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +0000412\section{PEP 234: Iterators}
413
414A significant addition to 2.2 is an iteration interface at both the C
415and Python levels. Objects can define how they can be looped over by
416callers.
417
418In Python versions up to 2.1, the usual way to make \code{for item in
419obj} work is to define a \method{__getitem__()} method that looks
420something like this:
421
422\begin{verbatim}
423 def __getitem__(self, index):
424 return <next item>
425\end{verbatim}
426
427\method{__getitem__()} is more properly used to define an indexing
428operation on an object so that you can write \code{obj[5]} to retrieve
Andrew M. Kuchling8c69c912001-08-07 14:28:58 +0000429the sixth element. It's a bit misleading when you're using this only
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +0000430to support \keyword{for} loops. Consider some file-like object that
431wants to be looped over; the \var{index} parameter is essentially
432meaningless, as the class probably assumes that a series of
433\method{__getitem__()} calls will be made, with \var{index}
434incrementing by one each time. In other words, the presence of the
Andrew M. Kuchling8b42f012001-10-22 02:00:11 +0000435\method{__getitem__()} method doesn't mean that using \code{file[5]}
436to randomly access the sixth element will work, though it really should.
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +0000437
438In Python 2.2, iteration can be implemented separately, and
439\method{__getitem__()} methods can be limited to classes that really
440do support random access. The basic idea of iterators is quite
Andrew M. Kuchling8b42f012001-10-22 02:00:11 +0000441simple. A new built-in function, \function{iter(obj)} or
442\code{iter(\var{C}, \var{sentinel})}, is used to get an iterator.
443\function{iter(obj)} returns an iterator for the object \var{obj},
444while \code{iter(\var{C}, \var{sentinel})} returns an iterator that
445will invoke the callable object \var{C} until it returns
446\var{sentinel} to signal that the iterator is done.
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +0000447
448Python classes can define an \method{__iter__()} method, which should
449create and return a new iterator for the object; if the object is its
450own iterator, this method can just return \code{self}. In particular,
451iterators will usually be their own iterators. Extension types
452implemented in C can implement a \code{tp_iter} function in order to
Andrew M. Kuchling4cf52a92001-07-17 12:48:48 +0000453return an iterator, and extension types that want to behave as
454iterators can define a \code{tp_iternext} function.
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +0000455
456So what do iterators do? They have one required method,
457\method{next()}, which takes no arguments and returns the next value.
458When there are no more values to be returned, calling \method{next()}
459should raise the \exception{StopIteration} exception.
460
461\begin{verbatim}
462>>> L = [1,2,3]
463>>> i = iter(L)
464>>> print i
465<iterator object at 0x8116870>
466>>> i.next()
4671
468>>> i.next()
4692
470>>> i.next()
4713
472>>> i.next()
473Traceback (most recent call last):
474 File "<stdin>", line 1, in ?
475StopIteration
476>>>
477\end{verbatim}
478
479In 2.2, Python's \keyword{for} statement no longer expects a sequence;
480it expects something for which \function{iter()} will return something.
Andrew M. Kuchling8b42f012001-10-22 02:00:11 +0000481For backward compatibility and convenience, an iterator is
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +0000482automatically constructed for sequences that don't implement
483\method{__iter__()} or a \code{tp_iter} slot, so \code{for i in
484[1,2,3]} will still work. Wherever the Python interpreter loops over
485a sequence, it's been changed to use the iterator protocol. This
486means you can do things like this:
487
488\begin{verbatim}
489>>> i = iter(L)
490>>> a,b,c = i
491>>> a,b,c
492(1, 2, 3)
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +0000493\end{verbatim}
494
Andrew M. Kuchling9e9c1352001-08-11 03:06:50 +0000495Iterator support has been added to some of Python's basic types.
Fred Drake0d002542001-07-17 13:55:33 +0000496Calling \function{iter()} on a dictionary will return an iterator
Andrew M. Kuchling6ea9f0b2001-07-17 14:50:31 +0000497which loops over its keys:
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +0000498
499\begin{verbatim}
500>>> m = {'Jan': 1, 'Feb': 2, 'Mar': 3, 'Apr': 4, 'May': 5, 'Jun': 6,
501... 'Jul': 7, 'Aug': 8, 'Sep': 9, 'Oct': 10, 'Nov': 11, 'Dec': 12}
502>>> for key in m: print key, m[key]
503...
504Mar 3
505Feb 2
506Aug 8
507Sep 9
508May 5
509Jun 6
510Jul 7
511Jan 1
512Apr 4
513Nov 11
514Dec 12
515Oct 10
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +0000516\end{verbatim}
517
518That's just the default behaviour. If you want to iterate over keys,
519values, or key/value pairs, you can explicitly call the
520\method{iterkeys()}, \method{itervalues()}, or \method{iteritems()}
Andrew M. Kuchling9e9c1352001-08-11 03:06:50 +0000521methods to get an appropriate iterator. In a minor related change,
522the \keyword{in} operator now works on dictionaries, so
523\code{\var{key} in dict} is now equivalent to
524\code{dict.has_key(\var{key})}.
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +0000525
Andrew M. Kuchling9e9c1352001-08-11 03:06:50 +0000526Files also provide an iterator, which calls the \method{readline()}
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +0000527method until there are no more lines in the file. This means you can
528now read each line of a file using code like this:
529
530\begin{verbatim}
531for line in file:
532 # do something for each line
533\end{verbatim}
534
535Note that you can only go forward in an iterator; there's no way to
536get the previous element, reset the iterator, or make a copy of it.
Fred Drake0d002542001-07-17 13:55:33 +0000537An iterator object could provide such additional capabilities, but the
538iterator protocol only requires a \method{next()} method.
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +0000539
540\begin{seealso}
541
542\seepep{234}{Iterators}{Written by Ka-Ping Yee and GvR; implemented
543by the Python Labs crew, mostly by GvR and Tim Peters.}
544
545\end{seealso}
546
Andrew M. Kuchling8cfa9052001-07-19 01:19:59 +0000547
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +0000548%======================================================================
549\section{PEP 255: Simple Generators}
550
551Generators are another new feature, one that interacts with the
552introduction of iterators.
553
554You're doubtless familiar with how function calls work in Python or
Andrew M. Kuchling8b42f012001-10-22 02:00:11 +0000555C. When you call a function, it gets a private namespace where its local
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +0000556variables are created. When the function reaches a \keyword{return}
557statement, the local variables are destroyed and the resulting value
558is returned to the caller. A later call to the same function will get
559a fresh new set of local variables. But, what if the local variables
560weren't destroyed on exiting a function? What if you could later
561resume the function where it left off? This is what generators
562provide; they can be thought of as resumable functions.
563
564Here's the simplest example of a generator function:
565
566\begin{verbatim}
567def generate_ints(N):
568 for i in range(N):
569 yield i
570\end{verbatim}
571
572A new keyword, \keyword{yield}, was introduced for generators. Any
573function containing a \keyword{yield} statement is a generator
574function; this is detected by Python's bytecode compiler which
Andrew M. Kuchling8b42f012001-10-22 02:00:11 +0000575compiles the function specially as a result. Because a new keyword was
Andrew M. Kuchling4cf52a92001-07-17 12:48:48 +0000576introduced, generators must be explicitly enabled in a module by
577including a \code{from __future__ import generators} statement near
578the top of the module's source code. In Python 2.3 this statement
579will become unnecessary.
580
581When you call a generator function, it doesn't return a single value;
582instead it returns a generator object that supports the iterator
Andrew M. Kuchling8b42f012001-10-22 02:00:11 +0000583protocol. On executing the \keyword{yield} statement, the generator
Andrew M. Kuchling4cf52a92001-07-17 12:48:48 +0000584outputs the value of \code{i}, similar to a \keyword{return}
585statement. The big difference between \keyword{yield} and a
Andrew M. Kuchling8b42f012001-10-22 02:00:11 +0000586\keyword{return} statement is that on reaching a \keyword{yield} the
Andrew M. Kuchling4cf52a92001-07-17 12:48:48 +0000587generator's state of execution is suspended and local variables are
588preserved. On the next call to the generator's \code{.next()} method,
589the function will resume executing immediately after the
590\keyword{yield} statement. (For complicated reasons, the
591\keyword{yield} statement isn't allowed inside the \keyword{try} block
592of a \code{try...finally} statement; read PEP 255 for a full
593explanation of the interaction between \keyword{yield} and
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +0000594exceptions.)
595
596Here's a sample usage of the \function{generate_ints} generator:
597
598\begin{verbatim}
599>>> gen = generate_ints(3)
600>>> gen
601<generator object at 0x8117f90>
602>>> gen.next()
6030
604>>> gen.next()
6051
606>>> gen.next()
6072
608>>> gen.next()
609Traceback (most recent call last):
610 File "<stdin>", line 1, in ?
611 File "<stdin>", line 2, in generate_ints
612StopIteration
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +0000613\end{verbatim}
614
615You could equally write \code{for i in generate_ints(5)}, or
616\code{a,b,c = generate_ints(3)}.
617
618Inside a generator function, the \keyword{return} statement can only
Andrew M. Kuchling4cf52a92001-07-17 12:48:48 +0000619be used without a value, and signals the end of the procession of
620values; afterwards the generator cannot return any further values.
621\keyword{return} with a value, such as \code{return 5}, is a syntax
622error inside a generator function. The end of the generator's results
623can also be indicated by raising \exception{StopIteration} manually,
624or by just letting the flow of execution fall off the bottom of the
625function.
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +0000626
627You could achieve the effect of generators manually by writing your
Andrew M. Kuchling4cf52a92001-07-17 12:48:48 +0000628own class and storing all the local variables of the generator as
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +0000629instance variables. For example, returning a list of integers could
630be done by setting \code{self.count} to 0, and having the
631\method{next()} method increment \code{self.count} and return it.
Andrew M. Kuchlingc32cc7c2001-07-17 18:25:01 +0000632However, for a moderately complicated generator, writing a
633corresponding class would be much messier.
634\file{Lib/test/test_generators.py} contains a number of more
635interesting examples. The simplest one implements an in-order
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +0000636traversal of a tree using generators recursively.
637
638\begin{verbatim}
639# A recursive generator that generates Tree leaves in in-order.
640def inorder(t):
641 if t:
642 for x in inorder(t.left):
643 yield x
644 yield t.label
645 for x in inorder(t.right):
646 yield x
647\end{verbatim}
648
649Two other examples in \file{Lib/test/test_generators.py} produce
650solutions for the N-Queens problem (placing $N$ queens on an $NxN$
651chess board so that no queen threatens another) and the Knight's Tour
652(a route that takes a knight to every square of an $NxN$ chessboard
653without visiting any square twice).
654
655The idea of generators comes from other programming languages,
656especially Icon (\url{http://www.cs.arizona.edu/icon/}), where the
Andrew M. Kuchling8b42f012001-10-22 02:00:11 +0000657idea of generators is central. In Icon, every
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +0000658expression and function call behaves like a generator. One example
659from ``An Overview of the Icon Programming Language'' at
660\url{http://www.cs.arizona.edu/icon/docs/ipd266.htm} gives an idea of
661what this looks like:
662
663\begin{verbatim}
664sentence := "Store it in the neighboring harbor"
665if (i := find("or", sentence)) > 5 then write(i)
666\end{verbatim}
667
668The \function{find()} function returns the indexes at which the
669substring ``or'' is found: 3, 23, 33. In the \keyword{if} statement,
670\code{i} is first assigned a value of 3, but 3 is less than 5, so the
671comparison fails, and Icon retries it with the second value of 23. 23
672is greater than 5, so the comparison now succeeds, and the code prints
673the value 23 to the screen.
674
675Python doesn't go nearly as far as Icon in adopting generators as a
676central concept. Generators are considered a new part of the core
677Python language, but learning or using them isn't compulsory; if they
678don't solve any problems that you have, feel free to ignore them.
Andrew M. Kuchling8b42f012001-10-22 02:00:11 +0000679One novel feature of Python's interface as compared to
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +0000680Icon's is that a generator's state is represented as a concrete object
681that can be passed around to other functions or stored in a data
682structure.
683
684\begin{seealso}
685
Andrew M. Kuchling4cf52a92001-07-17 12:48:48 +0000686\seepep{255}{Simple Generators}{Written by Neil Schemenauer, Tim
687Peters, Magnus Lie Hetland. Implemented mostly by Neil Schemenauer
688and Tim Peters, with other fixes from the Python Labs crew.}
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +0000689
690\end{seealso}
691
Andrew M. Kuchling8cfa9052001-07-19 01:19:59 +0000692
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +0000693%======================================================================
Andrew M. Kuchling2f0047a2001-09-05 14:53:31 +0000694\section{PEP 237: Unifying Long Integers and Integers}
695
Andrew M. Kuchling26c39bf2001-09-10 03:20:53 +0000696In recent versions, the distinction between regular integers, which
697are 32-bit values on most machines, and long integers, which can be of
698arbitrary size, was becoming an annoyance. For example, on platforms
Andrew M. Kuchling8b42f012001-10-22 02:00:11 +0000699that support files larger than \code{2**32} bytes, the
Andrew M. Kuchling26c39bf2001-09-10 03:20:53 +0000700\method{tell()} method of file objects has to return a long integer.
701However, there were various bits of Python that expected plain
702integers and would raise an error if a long integer was provided
Andrew M. Kuchlingd6e40e22001-09-10 16:18:50 +0000703instead. For example, in Python 1.5, only regular integers
Andrew M. Kuchling26c39bf2001-09-10 03:20:53 +0000704could be used as a slice index, and \code{'abc'[1L:]} would raise a
705\exception{TypeError} exception with the message 'slice index must be
706int'.
Andrew M. Kuchling2f0047a2001-09-05 14:53:31 +0000707
Andrew M. Kuchling26c39bf2001-09-10 03:20:53 +0000708Python 2.2 will shift values from short to long integers as required.
709The 'L' suffix is no longer needed to indicate a long integer literal,
710as now the compiler will choose the appropriate type. (Using the 'L'
711suffix will be discouraged in future 2.x versions of Python,
712triggering a warning in Python 2.4, and probably dropped in Python
7133.0.) Many operations that used to raise an \exception{OverflowError}
714will now return a long integer as their result. For example:
715
716\begin{verbatim}
717>>> 1234567890123
Andrew M. Kuchlingd6e40e22001-09-10 16:18:50 +00007181234567890123L
719>>> 2 ** 64
72018446744073709551616L
Andrew M. Kuchling26c39bf2001-09-10 03:20:53 +0000721\end{verbatim}
722
723In most cases, integers and long integers will now be treated
724identically. You can still distinguish them with the
725\function{type()} built-in function, but that's rarely needed. The
Andrew M. Kuchling8b42f012001-10-22 02:00:11 +0000726\function{int()} constructor will now return a long integer if the value
Andrew M. Kuchling26c39bf2001-09-10 03:20:53 +0000727is large enough.
728
Andrew M. Kuchling26c39bf2001-09-10 03:20:53 +0000729\begin{seealso}
730
731\seepep{237}{Unifying Long Integers and Integers}{Written by
732Moshe Zadka and Guido van Rossum. Implemented mostly by Guido van Rossum.}
733
734\end{seealso}
Andrew M. Kuchling2f0047a2001-09-05 14:53:31 +0000735
Andrew M. Kuchlingd4707e32001-09-28 20:46:46 +0000736
Andrew M. Kuchling2f0047a2001-09-05 14:53:31 +0000737%======================================================================
Andrew M. Kuchling9e9c1352001-08-11 03:06:50 +0000738\section{PEP 238: Changing the Division Operator}
739
740The most controversial change in Python 2.2 is the start of an effort
741to fix an old design flaw that's been in Python from the beginning.
742Currently Python's division operator, \code{/}, behaves like C's
Andrew M. Kuchling8b42f012001-10-22 02:00:11 +0000743division operator when presented with two integer arguments: it
Andrew M. Kuchling9e9c1352001-08-11 03:06:50 +0000744returns an integer result that's truncated down when there would be
Andrew M. Kuchling8b42f012001-10-22 02:00:11 +0000745a fractional part. For example, \code{3/2} is 1, not 1.5, and
Andrew M. Kuchling9e9c1352001-08-11 03:06:50 +0000746\code{(-1)/2} is -1, not -0.5. This means that the results of divison
747can vary unexpectedly depending on the type of the two operands and
748because Python is dynamically typed, it can be difficult to determine
749the possible types of the operands.
750
751(The controversy is over whether this is \emph{really} a design flaw,
752and whether it's worth breaking existing code to fix this. It's
753caused endless discussions on python-dev and in July erupted into an
754storm of acidly sarcastic postings on \newsgroup{comp.lang.python}. I
Andrew M. Kuchling8b42f012001-10-22 02:00:11 +0000755won't argue for either side here and will stick to describing what's
756implemented in 2.2. Read PEP 238 for a summary of arguments and
757counter-arguments.)
Andrew M. Kuchling9e9c1352001-08-11 03:06:50 +0000758
759Because this change might break code, it's being introduced very
760gradually. Python 2.2 begins the transition, but the switch won't be
761complete until Python 3.0.
762
Andrew M. Kuchling8b42f012001-10-22 02:00:11 +0000763First, I'll borrow some terminology from PEP 238. ``True division'' is the
Andrew M. Kuchling9e9c1352001-08-11 03:06:50 +0000764division that most non-programmers are familiar with: 3/2 is 1.5, 1/4
765is 0.25, and so forth. ``Floor division'' is what Python's \code{/}
766operator currently does when given integer operands; the result is the
767floor of the value returned by true division. ``Classic division'' is
768the current mixed behaviour of \code{/}; it returns the result of
769floor division when the operands are integers, and returns the result
770of true division when one of the operands is a floating-point number.
771
772Here are the changes 2.2 introduces:
773
774\begin{itemize}
775
776\item A new operator, \code{//}, is the floor division operator.
777(Yes, we know it looks like \Cpp's comment symbol.) \code{//}
778\emph{always} returns the floor divison no matter what the types of
779its operands are, so \code{1 // 2} is 0 and \code{1.0 // 2.0} is also
7800.0.
781
782\code{//} is always available in Python 2.2; you don't need to enable
783it using a \code{__future__} statement.
784
785\item By including a \code{from __future__ import true_division} in a
786module, the \code{/} operator will be changed to return the result of
787true division, so \code{1/2} is 0.5. Without the \code{__future__}
788statement, \code{/} still means classic division. The default meaning
789of \code{/} will not change until Python 3.0.
790
791\item Classes can define methods called \method{__truediv__} and
792\method{__floordiv__} to overload the two division operators. At the
793C level, there are also slots in the \code{PyNumberMethods} structure
794so extension types can define the two operators.
795
796% XXX a warning someday?
797
798\end{itemize}
799
800\begin{seealso}
801
802\seepep{238}{Changing the Division Operator}{Written by Moshe Zadka and
803Guido van Rossum. Implemented by Guido van Rossum..}
804
805\end{seealso}
806
807
808%======================================================================
Andrew M. Kuchlinga43e7032001-06-27 20:32:12 +0000809\section{Unicode Changes}
810
Andrew M. Kuchling2cd712b2001-07-16 13:39:08 +0000811Python's Unicode support has been enhanced a bit in 2.2. Unicode
Andrew M. Kuchlinga6d2a042001-07-20 18:34:34 +0000812strings are usually stored as UCS-2, as 16-bit unsigned integers.
Andrew M. Kuchlingf5fec3c2001-07-19 01:48:08 +0000813Python 2.2 can also be compiled to use UCS-4, 32-bit unsigned
814integers, as its internal encoding by supplying
815\longprogramopt{enable-unicode=ucs4} to the configure script. When
Andrew M. Kuchlingab010872001-07-19 14:59:53 +0000816built to use UCS-4 (a ``wide Python''), the interpreter can natively
Andrew M. Kuchlinga6d2a042001-07-20 18:34:34 +0000817handle Unicode characters from U+000000 to U+110000, so the range of
818legal values for the \function{unichr()} function is expanded
819accordingly. Using an interpreter compiled to use UCS-2 (a ``narrow
820Python''), values greater than 65535 will still cause
821\function{unichr()} to raise a \exception{ValueError} exception.
Andrew M. Kuchlingab010872001-07-19 14:59:53 +0000822
Andrew M. Kuchling8b42f012001-10-22 02:00:11 +0000823% XXX is this still unimplemented?
Andrew M. Kuchlingab010872001-07-19 14:59:53 +0000824All this is the province of the still-unimplemented PEP 261, ``Support
825for `wide' Unicode characters''; consult it for further details, and
Andrew M. Kuchlinga6d2a042001-07-20 18:34:34 +0000826please offer comments on the PEP and on your experiences with the
Andrew M. Kuchling8b42f012001-10-22 02:00:11 +00008272.2 beta releases.
Andrew M. Kuchling279e7442001-10-22 02:03:40 +0000828% XXX update previous line once 2.2 reaches beta or final.
Andrew M. Kuchlingab010872001-07-19 14:59:53 +0000829
830Another change is much simpler to explain. Since their introduction,
831Unicode strings have supported an \method{encode()} method to convert
832the string to a selected encoding such as UTF-8 or Latin-1. A
833symmetric \method{decode(\optional{\var{encoding}})} method has been
834added to 8-bit strings (though not to Unicode strings) in 2.2.
835\method{decode()} assumes that the string is in the specified encoding
836and decodes it, returning whatever is returned by the codec.
837
838Using this new feature, codecs have been added for tasks not directly
839related to Unicode. For example, codecs have been added for
840uu-encoding, MIME's base64 encoding, and compression with the
841\module{zlib} module:
Andrew M. Kuchling2cd712b2001-07-16 13:39:08 +0000842
843\begin{verbatim}
844>>> s = """Here is a lengthy piece of redundant, overly verbose,
845... and repetitive text.
846... """
847>>> data = s.encode('zlib')
848>>> data
849'x\x9c\r\xc9\xc1\r\x80 \x10\x04\xc0?Ul...'
850>>> data.decode('zlib')
851'Here is a lengthy piece of redundant, overly verbose,\nand repetitive text.\n'
852>>> print s.encode('uu')
853begin 666 <data>
854M2&5R92!I<R!A(&QE;F=T:'D@<&EE8V4@;V8@<F5D=6YD86YT+"!O=F5R;'D@
855>=F5R8F]S92P*86YD(')E<&5T:71I=F4@=&5X="X*
856
857end
858>>> "sheesh".encode('rot-13')
859'furrfu'
860\end{verbatim}
Andrew M. Kuchlinga43e7032001-06-27 20:32:12 +0000861
Andrew M. Kuchling8b42f012001-10-22 02:00:11 +0000862To convert a class instance to Unicode, a \method{__unicode__} method
863can be defined, analogous to \method{__str__}.
864% XXX who implemented that?
865
Andrew M. Kuchlingf5fec3c2001-07-19 01:48:08 +0000866\method{encode()} and \method{decode()} were implemented by
867Marc-Andr\'e Lemburg. The changes to support using UCS-4 internally
868were implemented by Fredrik Lundh and Martin von L\"owis.
Andrew M. Kuchlinga43e7032001-06-27 20:32:12 +0000869
Andrew M. Kuchlingf5fec3c2001-07-19 01:48:08 +0000870\begin{seealso}
871
872\seepep{261}{Support for `wide' Unicode characters}{PEP written by
873Paul Prescod. Not yet accepted or fully implemented.}
874
875\end{seealso}
Andrew M. Kuchling8cfa9052001-07-19 01:19:59 +0000876
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +0000877%======================================================================
878\section{PEP 227: Nested Scopes}
879
880In Python 2.1, statically nested scopes were added as an optional
881feature, to be enabled by a \code{from __future__ import
882nested_scopes} directive. In 2.2 nested scopes no longer need to be
Andrew M. Kuchling8b42f012001-10-22 02:00:11 +0000883specially enabled, and are now always present. The rest of this section
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +0000884is a copy of the description of nested scopes from my ``What's New in
885Python 2.1'' document; if you read it when 2.1 came out, you can skip
886the rest of this section.
887
888The largest change introduced in Python 2.1, and made complete in 2.2,
889is to Python's scoping rules. In Python 2.0, at any given time there
890are at most three namespaces used to look up variable names: local,
891module-level, and the built-in namespace. This often surprised people
892because it didn't match their intuitive expectations. For example, a
893nested recursive function definition doesn't work:
894
895\begin{verbatim}
896def f():
897 ...
898 def g(value):
899 ...
900 return g(value-1) + 1
901 ...
902\end{verbatim}
903
904The function \function{g()} will always raise a \exception{NameError}
905exception, because the binding of the name \samp{g} isn't in either
906its local namespace or in the module-level namespace. This isn't much
907of a problem in practice (how often do you recursively define interior
908functions like this?), but this also made using the \keyword{lambda}
909statement clumsier, and this was a problem in practice. In code which
910uses \keyword{lambda} you can often find local variables being copied
911by passing them as the default values of arguments.
912
913\begin{verbatim}
914def find(self, name):
915 "Return list of any entries equal to 'name'"
916 L = filter(lambda x, name=name: x == name,
917 self.list_attribute)
918 return L
919\end{verbatim}
920
921The readability of Python code written in a strongly functional style
922suffers greatly as a result.
923
924The most significant change to Python 2.2 is that static scoping has
925been added to the language to fix this problem. As a first effect,
926the \code{name=name} default argument is now unnecessary in the above
927example. Put simply, when a given variable name is not assigned a
928value within a function (by an assignment, or the \keyword{def},
929\keyword{class}, or \keyword{import} statements), references to the
930variable will be looked up in the local namespace of the enclosing
931scope. A more detailed explanation of the rules, and a dissection of
932the implementation, can be found in the PEP.
933
934This change may cause some compatibility problems for code where the
935same variable name is used both at the module level and as a local
936variable within a function that contains further function definitions.
937This seems rather unlikely though, since such code would have been
938pretty confusing to read in the first place.
939
940One side effect of the change is that the \code{from \var{module}
941import *} and \keyword{exec} statements have been made illegal inside
942a function scope under certain conditions. The Python reference
943manual has said all along that \code{from \var{module} import *} is
944only legal at the top level of a module, but the CPython interpreter
945has never enforced this before. As part of the implementation of
946nested scopes, the compiler which turns Python source into bytecodes
947has to generate different code to access variables in a containing
948scope. \code{from \var{module} import *} and \keyword{exec} make it
949impossible for the compiler to figure this out, because they add names
950to the local namespace that are unknowable at compile time.
951Therefore, if a function contains function definitions or
952\keyword{lambda} expressions with free variables, the compiler will
953flag this by raising a \exception{SyntaxError} exception.
954
955To make the preceding explanation a bit clearer, here's an example:
956
957\begin{verbatim}
958x = 1
959def f():
960 # The next line is a syntax error
961 exec 'x=2'
962 def g():
963 return x
964\end{verbatim}
965
966Line 4 containing the \keyword{exec} statement is a syntax error,
967since \keyword{exec} would define a new local variable named \samp{x}
968whose value should be accessed by \function{g()}.
969
970This shouldn't be much of a limitation, since \keyword{exec} is rarely
971used in most Python code (and when it is used, it's often a sign of a
972poor design anyway).
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +0000973
974\begin{seealso}
975
976\seepep{227}{Statically Nested Scopes}{Written and implemented by
977Jeremy Hylton.}
978
979\end{seealso}
980
Andrew M. Kuchlinga43e7032001-06-27 20:32:12 +0000981
982%======================================================================
Andrew M. Kuchlinga8defaa2001-05-05 16:37:29 +0000983\section{New and Improved Modules}
984
985\begin{itemize}
986
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +0000987 \item The \module{xmlrpclib} module was contributed to the standard
Andrew M. Kuchling8b42f012001-10-22 02:00:11 +0000988 library by Fredrik Lundh, provding support for writing XML-RPC
989 clients. XML-RPC is a simple remote procedure call protocol built on
Andrew M. Kuchling8c69c912001-08-07 14:28:58 +0000990 top of HTTP and XML. For example, the following snippet retrieves a
Andrew M. Kuchling8b42f012001-10-22 02:00:11 +0000991 list of RSS channels from the O'Reilly Network, and then
992 lists the recent headlines for one channel:
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +0000993
994\begin{verbatim}
995import xmlrpclib
996s = xmlrpclib.Server(
997 'http://www.oreillynet.com/meerkat/xml-rpc/server.php')
998channels = s.meerkat.getChannels()
999# channels is a list of dictionaries, like this:
1000# [{'id': 4, 'title': 'Freshmeat Daily News'}
1001# {'id': 190, 'title': '32Bits Online'},
1002# {'id': 4549, 'title': '3DGamers'}, ... ]
1003
1004# Get the items for one channel
1005items = s.meerkat.getItems( {'channel': 4} )
1006
1007# 'items' is another list of dictionaries, like this:
1008# [{'link': 'http://freshmeat.net/releases/52719/',
1009# 'description': 'A utility which converts HTML to XSL FO.',
1010# 'title': 'html2fo 0.3 (Default)'}, ... ]
1011\end{verbatim}
1012
Andrew M. Kuchlingd4707e32001-09-28 20:46:46 +00001013The \module{SimpleXMLRPCServer} module makes it easy to create
1014straightforward XML-RPC servers. See \url{http://www.xmlrpc.com/} for
1015more information about XML-RPC.
1016
1017 \item The new \module{hmac} module implements implements the HMAC
1018 algorithm described by \rfc{2104}.
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +00001019
Andrew M. Kuchling8b42f012001-10-22 02:00:11 +00001020 \item The Python profiler has been extensively reworked and various
1021 errors in its output have been corrected. (Contributed by Fred
1022 Fred~L. Drake, Jr. and Tim Peters.)
1023
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +00001024 \item The \module{socket} module can be compiled to support IPv6;
Andrew M. Kuchlingddeb1352001-07-16 14:35:52 +00001025 specify the \longprogramopt{enable-ipv6} option to Python's configure
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +00001026 script. (Contributed by Jun-ichiro ``itojun'' Hagino.)
1027
1028 \item Two new format characters were added to the \module{struct}
1029 module for 64-bit integers on platforms that support the C
1030 \ctype{long long} type. \samp{q} is for a signed 64-bit integer,
1031 and \samp{Q} is for an unsigned one. The value is returned in
1032 Python's long integer type. (Contributed by Tim Peters.)
1033
1034 \item In the interpreter's interactive mode, there's a new built-in
Andrew M. Kuchling8b42f012001-10-22 02:00:11 +00001035 function \function{help()} that uses the \module{pydoc} module
1036 introduced in Python 2.1 to provide interactive help.
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +00001037 \code{help(\var{object})} displays any available help text about
1038 \var{object}. \code{help()} with no argument puts you in an online
1039 help utility, where you can enter the names of functions, classes,
1040 or modules to read their help text.
1041 (Contributed by Guido van Rossum, using Ka-Ping Yee's \module{pydoc} module.)
1042
1043 \item Various bugfixes and performance improvements have been made
Andrew M. Kuchling4cf52a92001-07-17 12:48:48 +00001044 to the SRE engine underlying the \module{re} module. For example,
Andrew M. Kuchlingbeb38552001-10-22 14:11:06 +00001045 the \function{re.sub()} and \function{re.split()} functions have
1046 been rewritten in C. Another contributed patch speeds up certain
1047 Unicode character ranges by a factor of two. (SRE is maintained by
1048 Fredrik Lundh. The BIGCHARSET patch was contributed by Martin von
1049 L\"owis.)
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +00001050
Andrew M. Kuchling1efd7ad2001-09-14 16:19:27 +00001051 \item The \module{smtplib} module now supports \rfc{2487}, ``Secure
1052 SMTP over TLS'', so it's now possible to encrypt the SMTP traffic
1053 between a Python program and the mail transport agent being handed a
1054 message. (Contributed by Gerhard H\"aring.)
1055
Andrew M. Kuchlinga6d2a042001-07-20 18:34:34 +00001056 \item The \module{imaplib} module, maintained by Piers Lauder, has
1057 support for several new extensions: the NAMESPACE extension defined
1058 in \rfc{2342}, SORT, GETACL and SETACL. (Contributed by Anthony
1059 Baxter and Michel Pelletier.)
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +00001060
Andrew M. Kuchlingd4707e32001-09-28 20:46:46 +00001061 \item The \module{rfc822} module's parsing of email addresses is now
1062 compliant with \rfc{2822}, an update to \rfc{822}. (The module's
1063 name is \emph{not} going to be changed to \samp{rfc2822}.) A new
1064 package, \module{email}, has also been added for parsing and
1065 generating e-mail messages. (Contributed by Barry Warsaw, and
1066 arising out of his work on Mailman.)
Andrew M. Kuchling77707672001-07-31 15:51:16 +00001067
1068 \item New constants \constant{ascii_letters},
1069 \constant{ascii_lowercase}, and \constant{ascii_uppercase} were
1070 added to the \module{string} module. There were several modules in
1071 the standard library that used \constant{string.letters} to mean the
1072 ranges A-Za-z, but that assumption is incorrect when locales are in
1073 use, because \constant{string.letters} varies depending on the set
1074 of legal characters defined by the current locale. The buggy
1075 modules have all been fixed to use \constant{ascii_letters} instead.
Andrew M. Kuchling8b42f012001-10-22 02:00:11 +00001076 (Reported by an unknown person; fixed by Fred~L. Drake, Jr.)
Andrew M. Kuchling77707672001-07-31 15:51:16 +00001077
Andrew M. Kuchling8c69c912001-08-07 14:28:58 +00001078 \item The \module{mimetypes} module now makes it easier to use
1079 alternative MIME-type databases by the addition of a
1080 \class{MimeTypes} class, which takes a list of filenames to be
Andrew M. Kuchling8b42f012001-10-22 02:00:11 +00001081 parsed. (Contributed by Fred~L. Drake, Jr.)
Andrew M. Kuchling8c69c912001-08-07 14:28:58 +00001082
Andrew M. Kuchlingd6e40e22001-09-10 16:18:50 +00001083 \item A \class{Timer} class was added to the \module{threading}
1084 module that allows scheduling an activity to happen at some future
1085 time. (Contributed by Itamar Shtull-Trauring.)
Andrew M. Kuchling2f0047a2001-09-05 14:53:31 +00001086
Andrew M. Kuchling77707672001-07-31 15:51:16 +00001087\end{itemize}
1088
1089
1090%======================================================================
1091\section{Interpreter Changes and Fixes}
1092
1093Some of the changes only affect people who deal with the Python
Andrew M. Kuchling8b42f012001-10-22 02:00:11 +00001094interpreter at the C level because they're writing Python extension modules,
Andrew M. Kuchling77707672001-07-31 15:51:16 +00001095embedding the interpreter, or just hacking on the interpreter itself.
1096If you only write Python code, none of the changes described here will
1097affect you very much.
1098
1099\begin{itemize}
1100
1101 \item Profiling and tracing functions can now be implemented in C,
1102 which can operate at much higher speeds than Python-based functions
Andrew M. Kuchling8b42f012001-10-22 02:00:11 +00001103 and should reduce the overhead of profiling and tracing. This
1104 will be of interest to authors of development environments for
Andrew M. Kuchling77707672001-07-31 15:51:16 +00001105 Python. Two new C functions were added to Python's API,
1106 \cfunction{PyEval_SetProfile()} and \cfunction{PyEval_SetTrace()}.
1107 The existing \function{sys.setprofile()} and
1108 \function{sys.settrace()} functions still exist, and have simply
1109 been changed to use the new C-level interface. (Contributed by Fred
1110 L. Drake, Jr.)
1111
1112 \item Another low-level API, primarily of interest to implementors
1113 of Python debuggers and development tools, was added.
1114 \cfunction{PyInterpreterState_Head()} and
1115 \cfunction{PyInterpreterState_Next()} let a caller walk through all
1116 the existing interpreter objects;
1117 \cfunction{PyInterpreterState_ThreadHead()} and
1118 \cfunction{PyThreadState_Next()} allow looping over all the thread
1119 states for a given interpreter. (Contributed by David Beazley.)
1120
1121 \item A new \samp{et} format sequence was added to
1122 \cfunction{PyArg_ParseTuple}; \samp{et} takes both a parameter and
1123 an encoding name, and converts the parameter to the given encoding
1124 if the parameter turns out to be a Unicode string, or leaves it
1125 alone if it's an 8-bit string, assuming it to already be in the
1126 desired encoding. This differs from the \samp{es} format character,
1127 which assumes that 8-bit strings are in Python's default ASCII
1128 encoding and converts them to the specified new encoding.
Andrew M. Kuchling8b42f012001-10-22 02:00:11 +00001129 (Contributed by M.-A. Lemburg, and used for the MBCS support on
1130 Windows described in the following section.)
Andrew M. Kuchling0ab31b82001-08-29 01:16:54 +00001131
1132 \item Two new flags \constant{METH_NOARGS} and \constant{METH_O} are
1133 available in method definition tables to simplify implementation of
1134 methods with no arguments or a single untyped argument. Calling
1135 such methods is more efficient than calling a corresponding method
1136 that uses \constant{METH_VARARGS}.
1137 Also, the old \constant{METH_OLDARGS} style of writing C methods is
1138 now officially deprecated.
1139
1140\item
1141 Two new wrapper functions, \cfunction{PyOS_snprintf()} and
Andrew M. Kuchling8b42f012001-10-22 02:00:11 +00001142 \cfunction{PyOS_vsnprintf()} were added to provide
Andrew M. Kuchling0ab31b82001-08-29 01:16:54 +00001143 cross-platform implementations for the relatively new
1144 \cfunction{snprintf()} and \cfunction{vsnprintf()} C lib APIs. In
1145 contrast to the standard \cfunction{sprintf()} and
1146 \cfunction{vsprintf()} functions, the Python versions check the
1147 bounds of the buffer used to protect against buffer overruns.
1148 (Contributed by M.-A. Lemburg.)
Andrew M. Kuchling77707672001-07-31 15:51:16 +00001149
Andrew M. Kuchlinga8defaa2001-05-05 16:37:29 +00001150\end{itemize}
1151
1152
1153%======================================================================
1154\section{Other Changes and Fixes}
1155
Andrew M. Kuchling8cfa9052001-07-19 01:19:59 +00001156% XXX update the patch and bug figures as we go
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +00001157As usual there were a bunch of other improvements and bugfixes
1158scattered throughout the source tree. A search through the CVS change
Andrew M. Kuchling32e32322001-10-22 15:32:05 +00001159logs finds there were 312 patches applied, and 391 bugs fixed; both
Andrew M. Kuchling4dbf8712001-07-16 02:17:14 +00001160figures are likely to be underestimates. Some of the more notable
1161changes are:
Andrew M. Kuchlinga8defaa2001-05-05 16:37:29 +00001162
1163\begin{itemize}
1164
Andrew M. Kuchling0e03f582001-08-30 21:30:16 +00001165 \item The code for the MacOS port for Python, maintained by Jack
1166 Jansen, is now kept in the main Python CVS tree, and many changes
Andrew M. Kuchling279e7442001-10-22 02:03:40 +00001167 have been made to support MacOS~X.
Andrew M. Kuchling0e03f582001-08-30 21:30:16 +00001168
1169The most significant change is the ability to build Python as a
1170framework, enabled by supplying the \longprogramopt{enable-framework}
1171option to the configure script when compiling Python. According to
1172Jack Jansen, ``This installs a self-contained Python installation plus
Andrew M. Kuchling279e7442001-10-22 02:03:40 +00001173the OS~X framework "glue" into
Andrew M. Kuchling0e03f582001-08-30 21:30:16 +00001174\file{/Library/Frameworks/Python.framework} (or another location of
1175choice). For now there is little immediate added benefit to this
1176(actually, there is the disadvantage that you have to change your PATH
1177to be able to find Python), but it is the basis for creating a
1178full-blown Python application, porting the MacPython IDE, possibly
1179using Python as a standard OSA scripting language and much more.''
1180
1181Most of the MacPython toolbox modules, which interface to MacOS APIs
Andrew M. Kuchling279e7442001-10-22 02:03:40 +00001182such as windowing, QuickTime, scripting, etc. have been ported to OS~X,
Andrew M. Kuchlingbeb38552001-10-22 14:11:06 +00001183but they've been left commented out in \file{setup.py}. People who want
Andrew M. Kuchling0e03f582001-08-30 21:30:16 +00001184to experiment with these modules can uncomment them manually.
1185
1186% Jack's original comments:
1187%The main change is the possibility to build Python as a
1188%framework. This installs a self-contained Python installation plus the
1189%OSX framework "glue" into /Library/Frameworks/Python.framework (or
1190%another location of choice). For now there is little immedeate added
1191%benefit to this (actually, there is the disadvantage that you have to
1192%change your PATH to be able to find Python), but it is the basis for
1193%creating a fullblown Python application, porting the MacPython IDE,
1194%possibly using Python as a standard OSA scripting language and much
1195%more. You enable this with "configure --enable-framework".
1196
1197%The other change is that most MacPython toolbox modules, which
1198%interface to all the MacOS APIs such as windowing, quicktime,
1199%scripting, etc. have been ported. Again, most of these are not of
1200%immedeate use, as they need a full application to be really useful, so
1201%they have been commented out in setup.py. People wanting to experiment
1202%can uncomment them. Gestalt and Internet Config modules are enabled by
1203%default.
Andrew M. Kuchling0e03f582001-08-30 21:30:16 +00001204
Andrew M. Kuchling2cd712b2001-07-16 13:39:08 +00001205 \item Keyword arguments passed to builtin functions that don't take them
1206 now cause a \exception{TypeError} exception to be raised, with the
1207 message "\var{function} takes no keyword arguments".
1208
Andrew M. Kuchling8b42f012001-10-22 02:00:11 +00001209 \item Weak references, added in Python 2.1 as an extension module,
1210 are now part of the core because they're used in the implementation
1211 of new-style classes. The \exception{ReferenceError} exception has
1212 therefore moved from the \module{weakref} module to become a
1213 built-in exception.
1214
Andrew M. Kuchling94a7eba2001-08-15 15:55:48 +00001215 \item A new script, \file{Tools/scripts/cleanfuture.py} by Tim
1216 Peters, automatically removes obsolete \code{__future__} statements
1217 from Python source code.
Andrew M. Kuchling2cd712b2001-07-16 13:39:08 +00001218
1219 \item The new license introduced with Python 1.6 wasn't
1220 GPL-compatible. This is fixed by some minor textual changes to the
Andrew M. Kuchling8b42f012001-10-22 02:00:11 +00001221 2.2 license, so it's now legal to embed Python inside a GPLed
1222 program again. Note that Python itself is not GPLed, but instead is
1223 under a license that's essentially equivalent to the BSD license,
1224 same as it always was. The license changes were also applied to the
1225 Python 2.0.1 and 2.1.1 releases.
Andrew M. Kuchling2cd712b2001-07-16 13:39:08 +00001226
Andrew M. Kuchlingf4ccf582001-07-31 01:11:36 +00001227 \item When presented with a Unicode filename on Windows, Python will
1228 now convert it to an MBCS encoded string, as used by the Microsoft
1229 file APIs. As MBCS is explicitly used by the file APIs, Python's
1230 choice of ASCII as the default encoding turns out to be an
1231 annoyance.
Andrew M. Kuchling8cfa9052001-07-19 01:19:59 +00001232 (Contributed by Mark Hammond with assistance from Marc-Andr\'e
1233 Lemburg.)
1234
Andrew M. Kuchlingd6e40e22001-09-10 16:18:50 +00001235 \item Large file support is now enabled on Windows. (Contributed by
1236 Tim Peters.)
1237
Andrew M. Kuchling2cd712b2001-07-16 13:39:08 +00001238 \item The \file{Tools/scripts/ftpmirror.py} script
1239 now parses a \file{.netrc} file, if you have one.
Andrew M. Kuchling4cf52a92001-07-17 12:48:48 +00001240 (Contributed by Mike Romberg.)
Andrew M. Kuchling2cd712b2001-07-16 13:39:08 +00001241
Andrew M. Kuchling4cf52a92001-07-17 12:48:48 +00001242 \item Some features of the object returned by the
1243 \function{xrange()} function are now deprecated, and trigger
1244 warnings when they're accessed; they'll disappear in Python 2.3.
1245 \class{xrange} objects tried to pretend they were full sequence
1246 types by supporting slicing, sequence multiplication, and the
1247 \keyword{in} operator, but these features were rarely used and
1248 therefore buggy. The \method{tolist()} method and the
1249 \member{start}, \member{stop}, and \member{step} attributes are also
1250 being deprecated. At the C level, the fourth argument to the
1251 \cfunction{PyRange_New()} function, \samp{repeat}, has also been
1252 deprecated.
1253
Andrew M. Kuchling8cfa9052001-07-19 01:19:59 +00001254 \item There were a bunch of patches to the dictionary
1255 implementation, mostly to fix potential core dumps if a dictionary
1256 contains objects that sneakily changed their hash value, or mutated
1257 the dictionary they were contained in. For a while python-dev fell
Andrew M. Kuchling8b42f012001-10-22 02:00:11 +00001258 into a gentle rhythm of Michael Hudson finding a case that dumped
1259 core, Tim Peters fixing the bug, Michael finding another case, and round
Andrew M. Kuchling8cfa9052001-07-19 01:19:59 +00001260 and round it went.
1261
Andrew M. Kuchling33a3b632001-09-04 21:25:58 +00001262 \item On Windows, Python can now be compiled with Borland C thanks
1263 to a number of patches contributed by Stephen Hansen, though the
1264 result isn't fully functional yet. (But this \emph{is} progress...)
Andrew M. Kuchling8c69c912001-08-07 14:28:58 +00001265
Andrew M. Kuchlingf4ccf582001-07-31 01:11:36 +00001266 \item Another Windows enhancement: Wise Solutions generously offered
1267 PythonLabs use of their InstallerMaster 8.1 system. Earlier
1268 PythonLabs Windows installers used Wise 5.0a, which was beginning to
1269 show its age. (Packaged up by Tim Peters.)
1270
Andrew M. Kuchling8c69c912001-08-07 14:28:58 +00001271 \item Files ending in \samp{.pyw} can now be imported on Windows.
1272 \samp{.pyw} is a Windows-only thing, used to indicate that a script
1273 needs to be run using PYTHONW.EXE instead of PYTHON.EXE in order to
1274 prevent a DOS console from popping up to display the output. This
1275 patch makes it possible to import such scripts, in case they're also
1276 usable as modules. (Implemented by David Bolen.)
1277
Andrew M. Kuchling8cfa9052001-07-19 01:19:59 +00001278 \item On platforms where Python uses the C \cfunction{dlopen()} function
1279 to load extension modules, it's now possible to set the flags used
1280 by \cfunction{dlopen()} using the \function{sys.getdlopenflags()} and
1281 \function{sys.setdlopenflags()} functions. (Contributed by Bram Stolk.)
Andrew M. Kuchling2f0047a2001-09-05 14:53:31 +00001282
Andrew M. Kuchling26c39bf2001-09-10 03:20:53 +00001283 \item The \function{pow()} built-in function no longer supports 3
1284 arguments when floating-point numbers are supplied.
Andrew M. Kuchling1497b622001-09-24 14:51:16 +00001285 \code{pow(\var{x}, \var{y}, \var{z})} returns \code{(x**y) \% z}, but
Andrew M. Kuchling26c39bf2001-09-10 03:20:53 +00001286 this is never useful for floating point numbers, and the final
1287 result varies unpredictably depending on the platform. A call such
Andrew M. Kuchlingd6e40e22001-09-10 16:18:50 +00001288 as \code{pow(2.0, 8.0, 7.0)} will now raise a \exception{TypeError}
Andrew M. Kuchling26c39bf2001-09-10 03:20:53 +00001289 exception.
Andrew M. Kuchling77707672001-07-31 15:51:16 +00001290
Andrew M. Kuchlinga8defaa2001-05-05 16:37:29 +00001291\end{itemize}
1292
1293
Andrew M. Kuchlinga8defaa2001-05-05 16:37:29 +00001294%======================================================================
1295\section{Acknowledgements}
1296
1297The author would like to thank the following people for offering
Andrew M. Kuchling6ea9f0b2001-07-17 14:50:31 +00001298suggestions and corrections to various drafts of this article: Fred
Andrew M. Kuchlingbeb38552001-10-22 14:11:06 +00001299Bremmer, Keith Briggs, Andrew Dalke, Fred~L. Drake, Jr., Carel
1300Fellinger, Mark Hammond, Stephen Hansen, Jack Jansen, Marc-Andr\'e
1301Lemburg, Fredrik Lundh, Tim Peters, Neil Schemenauer, Guido van
1302Rossum.
Andrew M. Kuchlinga8defaa2001-05-05 16:37:29 +00001303
1304\end{document}