blob: cde028db26061d555a7b16025acffb4eb5aef00b [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001****************************
Georg Brandl48310cd2009-01-03 21:18:54 +00002 What's New in Python 2.3
Georg Brandl116aa622007-08-15 14:28:22 +00003****************************
4
5:Author: A.M. Kuchling
6
7.. |release| replace:: 1.01
8
Christian Heimes5b5e81c2007-12-31 16:14:33 +00009.. $Id: whatsnew23.tex 54631 2007-03-31 11:58:36Z georg.brandl $
Georg Brandl116aa622007-08-15 14:28:22 +000010
11This article explains the new features in Python 2.3. Python 2.3 was released
12on July 29, 2003.
13
14The main themes for Python 2.3 are polishing some of the features added in 2.2,
15adding various small but useful enhancements to the core language, and expanding
16the standard library. The new object model introduced in the previous version
17has benefited from 18 months of bugfixes and from optimization efforts that have
18improved the performance of new-style classes. A few new built-in functions
19have been added such as :func:`sum` and :func:`enumerate`. The :keyword:`in`
20operator can now be used for substring searches (e.g. ``"ab" in "abc"`` returns
21:const:`True`).
22
23Some of the many new library features include Boolean, set, heap, and date/time
24data types, the ability to import modules from ZIP-format archives, metadata
25support for the long-awaited Python catalog, an updated version of IDLE, and
26modules for logging messages, wrapping text, parsing CSV files, processing
27command-line options, using BerkeleyDB databases... the list of new and
28enhanced modules is lengthy.
29
30This article doesn't attempt to provide a complete specification of the new
31features, but instead provides a convenient overview. For full details, you
32should refer to the documentation for Python 2.3, such as the Python Library
33Reference and the Python Reference Manual. If you want to understand the
34complete implementation and design rationale, refer to the PEP for a particular
35new feature.
36
Christian Heimes5b5e81c2007-12-31 16:14:33 +000037.. ======================================================================
Georg Brandl116aa622007-08-15 14:28:22 +000038
39
40PEP 218: A Standard Set Datatype
41================================
42
43The new :mod:`sets` module contains an implementation of a set datatype. The
44:class:`Set` class is for mutable sets, sets that can have members added and
45removed. The :class:`ImmutableSet` class is for sets that can't be modified,
46and instances of :class:`ImmutableSet` can therefore be used as dictionary keys.
47Sets are built on top of dictionaries, so the elements within a set must be
48hashable.
49
50Here's a simple example::
51
52 >>> import sets
53 >>> S = sets.Set([1,2,3])
54 >>> S
55 Set([1, 2, 3])
56 >>> 1 in S
57 True
58 >>> 0 in S
59 False
60 >>> S.add(5)
61 >>> S.remove(3)
62 >>> S
63 Set([1, 2, 5])
64 >>>
65
66The union and intersection of sets can be computed with the :meth:`union` and
67:meth:`intersection` methods; an alternative notation uses the bitwise operators
68``&`` and ``|``. Mutable sets also have in-place versions of these methods,
69:meth:`union_update` and :meth:`intersection_update`. ::
70
71 >>> S1 = sets.Set([1,2,3])
72 >>> S2 = sets.Set([4,5,6])
73 >>> S1.union(S2)
74 Set([1, 2, 3, 4, 5, 6])
75 >>> S1 | S2 # Alternative notation
76 Set([1, 2, 3, 4, 5, 6])
77 >>> S1.intersection(S2)
78 Set([])
79 >>> S1 & S2 # Alternative notation
80 Set([])
81 >>> S1.union_update(S2)
82 >>> S1
83 Set([1, 2, 3, 4, 5, 6])
84 >>>
85
86It's also possible to take the symmetric difference of two sets. This is the
87set of all elements in the union that aren't in the intersection. Another way
88of putting it is that the symmetric difference contains all elements that are in
89exactly one set. Again, there's an alternative notation (``^``), and an in-
90place version with the ungainly name :meth:`symmetric_difference_update`. ::
91
92 >>> S1 = sets.Set([1,2,3,4])
93 >>> S2 = sets.Set([3,4,5,6])
94 >>> S1.symmetric_difference(S2)
95 Set([1, 2, 5, 6])
96 >>> S1 ^ S2
97 Set([1, 2, 5, 6])
98 >>>
99
100There are also :meth:`issubset` and :meth:`issuperset` methods for checking
101whether one set is a subset or superset of another::
102
103 >>> S1 = sets.Set([1,2,3])
104 >>> S2 = sets.Set([2,3])
105 >>> S2.issubset(S1)
106 True
107 >>> S1.issubset(S2)
108 False
109 >>> S1.issuperset(S2)
110 True
111 >>>
112
113
114.. seealso::
115
116 :pep:`218` - Adding a Built-In Set Object Type
117 PEP written by Greg V. Wilson. Implemented by Greg V. Wilson, Alex Martelli, and
118 GvR.
119
Christian Heimes5b5e81c2007-12-31 16:14:33 +0000120.. ======================================================================
Georg Brandl116aa622007-08-15 14:28:22 +0000121
122
123.. _section-generators:
124
125PEP 255: Simple Generators
126==========================
127
128In Python 2.2, generators were added as an optional feature, to be enabled by a
129``from __future__ import generators`` directive. In 2.3 generators no longer
130need to be specially enabled, and are now always present; this means that
131:keyword:`yield` is now always a keyword. The rest of this section is a copy of
132the description of generators from the "What's New in Python 2.2" document; if
133you read it back when Python 2.2 came out, you can skip the rest of this
134section.
135
136You're doubtless familiar with how function calls work in Python or C. When you
137call a function, it gets a private namespace where its local variables are
138created. When the function reaches a :keyword:`return` statement, the local
139variables are destroyed and the resulting value is returned to the caller. A
140later call to the same function will get a fresh new set of local variables.
141But, what if the local variables weren't thrown away on exiting a function?
142What if you could later resume the function where it left off? This is what
143generators provide; they can be thought of as resumable functions.
144
145Here's the simplest example of a generator function::
146
147 def generate_ints(N):
148 for i in range(N):
149 yield i
150
151A new keyword, :keyword:`yield`, was introduced for generators. Any function
152containing a :keyword:`yield` statement is a generator function; this is
153detected by Python's bytecode compiler which compiles the function specially as
154a result.
155
156When you call a generator function, it doesn't return a single value; instead it
157returns a generator object that supports the iterator protocol. On executing
158the :keyword:`yield` statement, the generator outputs the value of ``i``,
159similar to a :keyword:`return` statement. The big difference between
160:keyword:`yield` and a :keyword:`return` statement is that on reaching a
161:keyword:`yield` the generator's state of execution is suspended and local
162variables are preserved. On the next call to the generator's ``.next()``
163method, the function will resume executing immediately after the
164:keyword:`yield` statement. (For complicated reasons, the :keyword:`yield`
165statement isn't allowed inside the :keyword:`try` block of a :keyword:`try`...\
166:keyword:`finally` statement; read :pep:`255` for a full explanation of the
167interaction between :keyword:`yield` and exceptions.)
168
169Here's a sample usage of the :func:`generate_ints` generator::
170
171 >>> gen = generate_ints(3)
172 >>> gen
173 <generator object at 0x8117f90>
174 >>> gen.next()
175 0
176 >>> gen.next()
177 1
178 >>> gen.next()
179 2
180 >>> gen.next()
181 Traceback (most recent call last):
182 File "stdin", line 1, in ?
183 File "stdin", line 2, in generate_ints
184 StopIteration
185
186You could equally write ``for i in generate_ints(5)``, or ``a,b,c =
187generate_ints(3)``.
188
189Inside a generator function, the :keyword:`return` statement can only be used
190without a value, and signals the end of the procession of values; afterwards the
191generator cannot return any further values. :keyword:`return` with a value, such
192as ``return 5``, is a syntax error inside a generator function. The end of the
193generator's results can also be indicated by raising :exc:`StopIteration`
194manually, or by just letting the flow of execution fall off the bottom of the
195function.
196
197You could achieve the effect of generators manually by writing your own class
198and storing all the local variables of the generator as instance variables. For
199example, returning a list of integers could be done by setting ``self.count`` to
2000, and having the :meth:`next` method increment ``self.count`` and return it.
201However, for a moderately complicated generator, writing a corresponding class
202would be much messier. :file:`Lib/test/test_generators.py` contains a number of
203more interesting examples. The simplest one implements an in-order traversal of
204a tree using generators recursively. ::
205
206 # A recursive generator that generates Tree leaves in in-order.
207 def inorder(t):
208 if t:
209 for x in inorder(t.left):
210 yield x
211 yield t.label
212 for x in inorder(t.right):
213 yield x
214
215Two other examples in :file:`Lib/test/test_generators.py` produce solutions for
216the N-Queens problem (placing $N$ queens on an $NxN$ chess board so that no
217queen threatens another) and the Knight's Tour (a route that takes a knight to
218every square of an $NxN$ chessboard without visiting any square twice).
219
220The idea of generators comes from other programming languages, especially Icon
221(http://www.cs.arizona.edu/icon/), where the idea of generators is central. In
222Icon, every expression and function call behaves like a generator. One example
223from "An Overview of the Icon Programming Language" at
224http://www.cs.arizona.edu/icon/docs/ipd266.htm gives an idea of what this looks
225like::
226
227 sentence := "Store it in the neighboring harbor"
228 if (i := find("or", sentence)) > 5 then write(i)
229
230In Icon the :func:`find` function returns the indexes at which the substring
231"or" is found: 3, 23, 33. In the :keyword:`if` statement, ``i`` is first
232assigned a value of 3, but 3 is less than 5, so the comparison fails, and Icon
233retries it with the second value of 23. 23 is greater than 5, so the comparison
234now succeeds, and the code prints the value 23 to the screen.
235
236Python doesn't go nearly as far as Icon in adopting generators as a central
237concept. Generators are considered part of the core Python language, but
238learning or using them isn't compulsory; if they don't solve any problems that
239you have, feel free to ignore them. One novel feature of Python's interface as
240compared to Icon's is that a generator's state is represented as a concrete
241object (the iterator) that can be passed around to other functions or stored in
242a data structure.
243
244
245.. seealso::
246
247 :pep:`255` - Simple Generators
248 Written by Neil Schemenauer, Tim Peters, Magnus Lie Hetland. Implemented mostly
249 by Neil Schemenauer and Tim Peters, with other fixes from the Python Labs crew.
250
Christian Heimes5b5e81c2007-12-31 16:14:33 +0000251.. ======================================================================
Georg Brandl116aa622007-08-15 14:28:22 +0000252
253
254.. _section-encodings:
255
256PEP 263: Source Code Encodings
257==============================
258
259Python source files can now be declared as being in different character set
260encodings. Encodings are declared by including a specially formatted comment in
261the first or second line of the source file. For example, a UTF-8 file can be
262declared with::
263
264 #!/usr/bin/env python
265 # -*- coding: UTF-8 -*-
266
267Without such an encoding declaration, the default encoding used is 7-bit ASCII.
268Executing or importing modules that contain string literals with 8-bit
269characters and have no encoding declaration will result in a
270:exc:`DeprecationWarning` being signalled by Python 2.3; in 2.4 this will be a
271syntax error.
272
273The encoding declaration only affects Unicode string literals, which will be
274converted to Unicode using the specified encoding. Note that Python identifiers
275are still restricted to ASCII characters, so you can't have variable names that
276use characters outside of the usual alphanumerics.
277
278
279.. seealso::
280
281 :pep:`263` - Defining Python Source Code Encodings
282 Written by Marc-André Lemburg and Martin von Löwis; implemented by Suzuki Hisao
283 and Martin von Löwis.
284
Christian Heimes5b5e81c2007-12-31 16:14:33 +0000285.. ======================================================================
Georg Brandl116aa622007-08-15 14:28:22 +0000286
287
288PEP 273: Importing Modules from ZIP Archives
289============================================
290
291The new :mod:`zipimport` module adds support for importing modules from a ZIP-
292format archive. You don't need to import the module explicitly; it will be
293automatically imported if a ZIP archive's filename is added to ``sys.path``.
294For example::
295
296 amk@nyman:~/src/python$ unzip -l /tmp/example.zip
297 Archive: /tmp/example.zip
298 Length Date Time Name
299 -------- ---- ---- ----
300 8467 11-26-02 22:30 jwzthreading.py
301 -------- -------
302 8467 1 file
303 amk@nyman:~/src/python$ ./python
Georg Brandl48310cd2009-01-03 21:18:54 +0000304 Python 2.3 (#1, Aug 1 2003, 19:54:32)
Georg Brandl116aa622007-08-15 14:28:22 +0000305 >>> import sys
306 >>> sys.path.insert(0, '/tmp/example.zip') # Add .zip file to front of path
307 >>> import jwzthreading
308 >>> jwzthreading.__file__
309 '/tmp/example.zip/jwzthreading.py'
310 >>>
311
312An entry in ``sys.path`` can now be the filename of a ZIP archive. The ZIP
313archive can contain any kind of files, but only files named :file:`\*.py`,
314:file:`\*.pyc`, or :file:`\*.pyo` can be imported. If an archive only contains
315:file:`\*.py` files, Python will not attempt to modify the archive by adding the
316corresponding :file:`\*.pyc` file, meaning that if a ZIP archive doesn't contain
317:file:`\*.pyc` files, importing may be rather slow.
318
319A path within the archive can also be specified to only import from a
320subdirectory; for example, the path :file:`/tmp/example.zip/lib/` would only
321import from the :file:`lib/` subdirectory within the archive.
322
323
324.. seealso::
325
326 :pep:`273` - Import Modules from Zip Archives
327 Written by James C. Ahlstrom, who also provided an implementation. Python 2.3
328 follows the specification in :pep:`273`, but uses an implementation written by
329 Just van Rossum that uses the import hooks described in :pep:`302`. See section
330 :ref:`section-pep302` for a description of the new import hooks.
331
Christian Heimes5b5e81c2007-12-31 16:14:33 +0000332.. ======================================================================
Georg Brandl116aa622007-08-15 14:28:22 +0000333
334
335PEP 277: Unicode file name support for Windows NT
336=================================================
337
338On Windows NT, 2000, and XP, the system stores file names as Unicode strings.
339Traditionally, Python has represented file names as byte strings, which is
340inadequate because it renders some file names inaccessible.
341
342Python now allows using arbitrary Unicode strings (within the limitations of the
343file system) for all functions that expect file names, most notably the
344:func:`open` built-in function. If a Unicode string is passed to
345:func:`os.listdir`, Python now returns a list of Unicode strings. A new
346function, :func:`os.getcwdu`, returns the current directory as a Unicode string.
347
348Byte strings still work as file names, and on Windows Python will transparently
349convert them to Unicode using the ``mbcs`` encoding.
350
351Other systems also allow Unicode strings as file names but convert them to byte
352strings before passing them to the system, which can cause a :exc:`UnicodeError`
353to be raised. Applications can test whether arbitrary Unicode strings are
354supported as file names by checking :attr:`os.path.supports_unicode_filenames`,
355a Boolean value.
356
357Under MacOS, :func:`os.listdir` may now return Unicode filenames.
358
359
360.. seealso::
361
362 :pep:`277` - Unicode file name support for Windows NT
363 Written by Neil Hodgson; implemented by Neil Hodgson, Martin von Löwis, and Mark
364 Hammond.
365
Christian Heimes5b5e81c2007-12-31 16:14:33 +0000366.. ======================================================================
Georg Brandl116aa622007-08-15 14:28:22 +0000367
368
R David Murray1b00f252012-08-15 10:43:58 -0400369.. index::
370 single: universal newlines; What's new
371
Georg Brandl116aa622007-08-15 14:28:22 +0000372PEP 278: Universal Newline Support
373==================================
374
375The three major operating systems used today are Microsoft Windows, Apple's
376Macintosh OS, and the various Unix derivatives. A minor irritation of cross-
377platform work is that these three platforms all use different characters to
378mark the ends of lines in text files. Unix uses the linefeed (ASCII character
37910), MacOS uses the carriage return (ASCII character 13), and Windows uses a
380two-character sequence of a carriage return plus a newline.
381
R David Murrayee0a9452012-08-15 11:05:36 -0400382Python's file objects can now support end of line conventions other than the
383one followed by the platform on which Python is running. Opening a file with
384the mode ``'U'`` or ``'rU'`` will open a file for reading in :term:`universal
385newlines` mode. All three line ending conventions will be translated to a
386``'\n'`` in the strings returned by the various file methods such as
387:meth:`read` and :meth:`readline`.
Georg Brandl116aa622007-08-15 14:28:22 +0000388
389Universal newline support is also used when importing modules and when executing
390a file with the :func:`execfile` function. This means that Python modules can
391be shared between all three operating systems without needing to convert the
392line-endings.
393
394This feature can be disabled when compiling Python by specifying the
395:option:`--without-universal-newlines` switch when running Python's
396:program:`configure` script.
397
398
399.. seealso::
400
401 :pep:`278` - Universal Newline Support
402 Written and implemented by Jack Jansen.
403
Christian Heimes5b5e81c2007-12-31 16:14:33 +0000404.. ======================================================================
Georg Brandl116aa622007-08-15 14:28:22 +0000405
406
407.. _section-enumerate:
408
409PEP 279: enumerate()
410====================
411
412A new built-in function, :func:`enumerate`, will make certain loops a bit
413clearer. ``enumerate(thing)``, where *thing* is either an iterator or a
Martin Panter7462b6492015-11-02 03:37:02 +0000414sequence, returns an iterator that will return ``(0, thing[0])``, ``(1,
Georg Brandl116aa622007-08-15 14:28:22 +0000415thing[1])``, ``(2, thing[2])``, and so forth.
416
417A common idiom to change every element of a list looks like this::
418
419 for i in range(len(L)):
420 item = L[i]
421 # ... compute some result based on item ...
422 L[i] = result
423
424This can be rewritten using :func:`enumerate` as::
425
426 for i, item in enumerate(L):
427 # ... compute some result based on item ...
428 L[i] = result
429
430
431.. seealso::
432
433 :pep:`279` - The enumerate() built-in function
434 Written and implemented by Raymond D. Hettinger.
435
Christian Heimes5b5e81c2007-12-31 16:14:33 +0000436.. ======================================================================
Georg Brandl116aa622007-08-15 14:28:22 +0000437
438
439PEP 282: The logging Package
440============================
441
442A standard package for writing logs, :mod:`logging`, has been added to Python
4432.3. It provides a powerful and flexible mechanism for generating logging
444output which can then be filtered and processed in various ways. A
445configuration file written in a standard format can be used to control the
446logging behavior of a program. Python includes handlers that will write log
447records to standard error or to a file or socket, send them to the system log,
448or even e-mail them to a particular address; of course, it's also possible to
449write your own handler classes.
450
451The :class:`Logger` class is the primary class. Most application code will deal
452with one or more :class:`Logger` objects, each one used by a particular
453subsystem of the application. Each :class:`Logger` is identified by a name, and
454names are organized into a hierarchy using ``.`` as the component separator.
455For example, you might have :class:`Logger` instances named ``server``,
456``server.auth`` and ``server.network``. The latter two instances are below
457``server`` in the hierarchy. This means that if you turn up the verbosity for
458``server`` or direct ``server`` messages to a different handler, the changes
459will also apply to records logged to ``server.auth`` and ``server.network``.
460There's also a root :class:`Logger` that's the parent of all other loggers.
461
462For simple uses, the :mod:`logging` package contains some convenience functions
463that always use the root log::
464
465 import logging
466
467 logging.debug('Debugging information')
468 logging.info('Informational message')
469 logging.warning('Warning:config file %s not found', 'server.conf')
470 logging.error('Error occurred')
471 logging.critical('Critical error -- shutting down')
472
473This produces the following output::
474
475 WARNING:root:Warning:config file server.conf not found
476 ERROR:root:Error occurred
477 CRITICAL:root:Critical error -- shutting down
478
479In the default configuration, informational and debugging messages are
480suppressed and the output is sent to standard error. You can enable the display
481of informational and debugging messages by calling the :meth:`setLevel` method
482on the root logger.
483
484Notice the :func:`warning` call's use of string formatting operators; all of the
485functions for logging messages take the arguments ``(msg, arg1, arg2, ...)`` and
486log the string resulting from ``msg % (arg1, arg2, ...)``.
487
488There's also an :func:`exception` function that records the most recent
489traceback. Any of the other functions will also record the traceback if you
490specify a true value for the keyword argument *exc_info*. ::
491
492 def f():
493 try: 1/0
494 except: logging.exception('Problem recorded')
495
496 f()
497
498This produces the following output::
499
500 ERROR:root:Problem recorded
501 Traceback (most recent call last):
502 File "t.py", line 6, in f
503 1/0
504 ZeroDivisionError: integer division or modulo by zero
505
506Slightly more advanced programs will use a logger other than the root logger.
Andrew Svetlova2fe3342012-08-11 21:14:08 +0300507The ``getLogger(name)`` function is used to get a particular log, creating
508it if it doesn't exist yet. ``getLogger(None)`` returns the root logger. ::
Georg Brandl116aa622007-08-15 14:28:22 +0000509
510 log = logging.getLogger('server')
511 ...
512 log.info('Listening on port %i', port)
513 ...
514 log.critical('Disk full')
515 ...
516
517Log records are usually propagated up the hierarchy, so a message logged to
518``server.auth`` is also seen by ``server`` and ``root``, but a :class:`Logger`
519can prevent this by setting its :attr:`propagate` attribute to :const:`False`.
520
521There are more classes provided by the :mod:`logging` package that can be
522customized. When a :class:`Logger` instance is told to log a message, it
523creates a :class:`LogRecord` instance that is sent to any number of different
524:class:`Handler` instances. Loggers and handlers can also have an attached list
525of filters, and each filter can cause the :class:`LogRecord` to be ignored or
526can modify the record before passing it along. When they're finally output,
527:class:`LogRecord` instances are converted to text by a :class:`Formatter`
528class. All of these classes can be replaced by your own specially-written
529classes.
530
531With all of these features the :mod:`logging` package should provide enough
532flexibility for even the most complicated applications. This is only an
533incomplete overview of its features, so please see the package's reference
534documentation for all of the details. Reading :pep:`282` will also be helpful.
535
536
537.. seealso::
538
539 :pep:`282` - A Logging System
540 Written by Vinay Sajip and Trent Mick; implemented by Vinay Sajip.
541
Christian Heimes5b5e81c2007-12-31 16:14:33 +0000542.. ======================================================================
Georg Brandl116aa622007-08-15 14:28:22 +0000543
544
545.. _section-bool:
546
547PEP 285: A Boolean Type
548=======================
549
550A Boolean type was added to Python 2.3. Two new constants were added to the
551:mod:`__builtin__` module, :const:`True` and :const:`False`. (:const:`True` and
552:const:`False` constants were added to the built-ins in Python 2.2.1, but the
5532.2.1 versions are simply set to integer values of 1 and 0 and aren't a
554different type.)
555
556The type object for this new type is named :class:`bool`; the constructor for it
557takes any Python value and converts it to :const:`True` or :const:`False`. ::
558
559 >>> bool(1)
560 True
561 >>> bool(0)
562 False
563 >>> bool([])
564 False
565 >>> bool( (1,) )
566 True
567
568Most of the standard library modules and built-in functions have been changed to
569return Booleans. ::
570
571 >>> obj = []
572 >>> hasattr(obj, 'append')
573 True
574 >>> isinstance(obj, list)
575 True
576 >>> isinstance(obj, tuple)
577 False
578
579Python's Booleans were added with the primary goal of making code clearer. For
580example, if you're reading a function and encounter the statement ``return 1``,
581you might wonder whether the ``1`` represents a Boolean truth value, an index,
582or a coefficient that multiplies some other quantity. If the statement is
583``return True``, however, the meaning of the return value is quite clear.
584
585Python's Booleans were *not* added for the sake of strict type-checking. A very
586strict language such as Pascal would also prevent you performing arithmetic with
587Booleans, and would require that the expression in an :keyword:`if` statement
588always evaluate to a Boolean result. Python is not this strict and never will
589be, as :pep:`285` explicitly says. This means you can still use any expression
590in an :keyword:`if` statement, even ones that evaluate to a list or tuple or
591some random object. The Boolean type is a subclass of the :class:`int` class so
592that arithmetic using a Boolean still works. ::
593
594 >>> True + 1
595 2
596 >>> False + 1
597 1
598 >>> False * 75
599 0
600 >>> True * 75
601 75
602
603To sum up :const:`True` and :const:`False` in a sentence: they're alternative
604ways to spell the integer values 1 and 0, with the single difference that
605:func:`str` and :func:`repr` return the strings ``'True'`` and ``'False'``
606instead of ``'1'`` and ``'0'``.
607
608
609.. seealso::
610
611 :pep:`285` - Adding a bool type
612 Written and implemented by GvR.
613
Christian Heimes5b5e81c2007-12-31 16:14:33 +0000614.. ======================================================================
Georg Brandl116aa622007-08-15 14:28:22 +0000615
616
617PEP 293: Codec Error Handling Callbacks
618=======================================
619
620When encoding a Unicode string into a byte string, unencodable characters may be
621encountered. So far, Python has allowed specifying the error processing as
622either "strict" (raising :exc:`UnicodeError`), "ignore" (skipping the
623character), or "replace" (using a question mark in the output string), with
624"strict" being the default behavior. It may be desirable to specify alternative
625processing of such errors, such as inserting an XML character reference or HTML
626entity reference into the converted string.
627
628Python now has a flexible framework to add different processing strategies. New
629error handlers can be added with :func:`codecs.register_error`, and codecs then
630can access the error handler with :func:`codecs.lookup_error`. An equivalent C
631API has been added for codecs written in C. The error handler gets the necessary
632state information such as the string being converted, the position in the string
633where the error was detected, and the target encoding. The handler can then
634either raise an exception or return a replacement string.
635
636Two additional error handlers have been implemented using this framework:
637"backslashreplace" uses Python backslash quoting to represent unencodable
638characters and "xmlcharrefreplace" emits XML character references.
639
640
641.. seealso::
642
643 :pep:`293` - Codec Error Handling Callbacks
644 Written and implemented by Walter Dörwald.
645
Christian Heimes5b5e81c2007-12-31 16:14:33 +0000646.. ======================================================================
Georg Brandl116aa622007-08-15 14:28:22 +0000647
648
649.. _section-pep301:
650
651PEP 301: Package Index and Metadata for Distutils
652=================================================
653
654Support for the long-requested Python catalog makes its first appearance in 2.3.
655
656The heart of the catalog is the new Distutils :command:`register` command.
657Running ``python setup.py register`` will collect the metadata describing a
658package, such as its name, version, maintainer, description, &c., and send it to
659a central catalog server. The resulting catalog is available from
Georg Brandle73778c2014-10-29 08:36:35 +0100660https://pypi.python.org/pypi.
Georg Brandl116aa622007-08-15 14:28:22 +0000661
662To make the catalog a bit more useful, a new optional *classifiers* keyword
663argument has been added to the Distutils :func:`setup` function. A list of
664`Trove <http://catb.org/~esr/trove/>`_-style strings can be supplied to help
665classify the software.
666
667Here's an example :file:`setup.py` with classifiers, written to be compatible
668with older versions of the Distutils::
669
670 from distutils import core
671 kw = {'name': "Quixote",
672 'version': "0.5.1",
673 'description': "A highly Pythonic Web application framework",
674 # ...
675 }
676
Georg Brandl48310cd2009-01-03 21:18:54 +0000677 if (hasattr(core, 'setup_keywords') and
Georg Brandl116aa622007-08-15 14:28:22 +0000678 'classifiers' in core.setup_keywords):
679 kw['classifiers'] = \
680 ['Topic :: Internet :: WWW/HTTP :: Dynamic Content',
681 'Environment :: No Input/Output (Daemon)',
682 'Intended Audience :: Developers'],
683
684 core.setup(**kw)
685
686The full list of classifiers can be obtained by running ``python setup.py
687register --list-classifiers``.
688
689
690.. seealso::
691
692 :pep:`301` - Package Index and Metadata for Distutils
693 Written and implemented by Richard Jones.
694
Christian Heimes5b5e81c2007-12-31 16:14:33 +0000695.. ======================================================================
Georg Brandl116aa622007-08-15 14:28:22 +0000696
697
698.. _section-pep302:
699
700PEP 302: New Import Hooks
701=========================
702
703While it's been possible to write custom import hooks ever since the
704:mod:`ihooks` module was introduced in Python 1.3, no one has ever been really
705happy with it because writing new import hooks is difficult and messy. There
706have been various proposed alternatives such as the :mod:`imputil` and :mod:`iu`
707modules, but none of them has ever gained much acceptance, and none of them were
708easily usable from C code.
709
710:pep:`302` borrows ideas from its predecessors, especially from Gordon
711McMillan's :mod:`iu` module. Three new items are added to the :mod:`sys`
712module:
713
714* ``sys.path_hooks`` is a list of callable objects; most often they'll be
715 classes. Each callable takes a string containing a path and either returns an
716 importer object that will handle imports from this path or raises an
717 :exc:`ImportError` exception if it can't handle this path.
718
719* ``sys.path_importer_cache`` caches importer objects for each path, so
720 ``sys.path_hooks`` will only need to be traversed once for each path.
721
722* ``sys.meta_path`` is a list of importer objects that will be traversed before
723 ``sys.path`` is checked. This list is initially empty, but user code can add
724 objects to it. Additional built-in and frozen modules can be imported by an
725 object added to this list.
726
Andrew Svetlova2fe3342012-08-11 21:14:08 +0300727Importer objects must have a single method, ``find_module(fullname,
728path=None)``. *fullname* will be a module or package name, e.g. ``string`` or
Georg Brandl116aa622007-08-15 14:28:22 +0000729``distutils.core``. :meth:`find_module` must return a loader object that has a
Andrew Svetlova2fe3342012-08-11 21:14:08 +0300730single method, ``load_module(fullname)``, that creates and returns the
Georg Brandl116aa622007-08-15 14:28:22 +0000731corresponding module object.
732
733Pseudo-code for Python's new import logic, therefore, looks something like this
734(simplified a bit; see :pep:`302` for the full details)::
735
736 for mp in sys.meta_path:
737 loader = mp(fullname)
738 if loader is not None:
739 <module> = loader.load_module(fullname)
740
741 for path in sys.path:
742 for hook in sys.path_hooks:
743 try:
744 importer = hook(path)
745 except ImportError:
746 # ImportError, so try the other path hooks
747 pass
748 else:
749 loader = importer.find_module(fullname)
750 <module> = loader.load_module(fullname)
751
752 # Not found!
753 raise ImportError
754
755
756.. seealso::
757
758 :pep:`302` - New Import Hooks
759 Written by Just van Rossum and Paul Moore. Implemented by Just van Rossum.
760
Christian Heimes5b5e81c2007-12-31 16:14:33 +0000761.. ======================================================================
Georg Brandl116aa622007-08-15 14:28:22 +0000762
763
764.. _section-pep305:
765
766PEP 305: Comma-separated Files
767==============================
768
769Comma-separated files are a format frequently used for exporting data from
770databases and spreadsheets. Python 2.3 adds a parser for comma-separated files.
771
772Comma-separated format is deceptively simple at first glance::
773
774 Costs,150,200,3.95
775
776Read a line and call ``line.split(',')``: what could be simpler? But toss in
777string data that can contain commas, and things get more complicated::
778
779 "Costs",150,200,3.95,"Includes taxes, shipping, and sundry items"
780
781A big ugly regular expression can parse this, but using the new :mod:`csv`
782package is much simpler::
783
784 import csv
785
786 input = open('datafile', 'rb')
787 reader = csv.reader(input)
788 for line in reader:
789 print line
790
791The :func:`reader` function takes a number of different options. The field
792separator isn't limited to the comma and can be changed to any character, and so
793can the quoting and line-ending characters.
794
795Different dialects of comma-separated files can be defined and registered;
796currently there are two dialects, both used by Microsoft Excel. A separate
797:class:`csv.writer` class will generate comma-separated files from a succession
798of tuples or lists, quoting strings that contain the delimiter.
799
800
801.. seealso::
802
803 :pep:`305` - CSV File API
804 Written and implemented by Kevin Altis, Dave Cole, Andrew McNamara, Skip
805 Montanaro, Cliff Wells.
806
Christian Heimes5b5e81c2007-12-31 16:14:33 +0000807.. ======================================================================
Georg Brandl116aa622007-08-15 14:28:22 +0000808
809
810.. _section-pep307:
811
812PEP 307: Pickle Enhancements
813============================
814
815The :mod:`pickle` and :mod:`cPickle` modules received some attention during the
8162.3 development cycle. In 2.2, new-style classes could be pickled without
817difficulty, but they weren't pickled very compactly; :pep:`307` quotes a trivial
818example where a new-style class results in a pickled string three times longer
819than that for a classic class.
820
821The solution was to invent a new pickle protocol. The :func:`pickle.dumps`
822function has supported a text-or-binary flag for a long time. In 2.3, this
823flag is redefined from a Boolean to an integer: 0 is the old text-mode pickle
824format, 1 is the old binary format, and now 2 is a new 2.3-specific format. A
825new constant, :const:`pickle.HIGHEST_PROTOCOL`, can be used to select the
826fanciest protocol available.
827
828Unpickling is no longer considered a safe operation. 2.2's :mod:`pickle`
829provided hooks for trying to prevent unsafe classes from being unpickled
830(specifically, a :attr:`__safe_for_unpickling__` attribute), but none of this
831code was ever audited and therefore it's all been ripped out in 2.3. You should
832not unpickle untrusted data in any version of Python.
833
834To reduce the pickling overhead for new-style classes, a new interface for
835customizing pickling was added using three special methods:
836:meth:`__getstate__`, :meth:`__setstate__`, and :meth:`__getnewargs__`. Consult
837:pep:`307` for the full semantics of these methods.
838
839As a way to compress pickles yet further, it's now possible to use integer codes
840instead of long strings to identify pickled classes. The Python Software
841Foundation will maintain a list of standardized codes; there's also a range of
842codes for private use. Currently no codes have been specified.
843
844
845.. seealso::
846
847 :pep:`307` - Extensions to the pickle protocol
848 Written and implemented by Guido van Rossum and Tim Peters.
849
Christian Heimes5b5e81c2007-12-31 16:14:33 +0000850.. ======================================================================
Georg Brandl116aa622007-08-15 14:28:22 +0000851
852
853.. _section-slices:
854
855Extended Slices
856===============
857
858Ever since Python 1.4, the slicing syntax has supported an optional third "step"
859or "stride" argument. For example, these are all legal Python syntax:
860``L[1:10:2]``, ``L[:-1:1]``, ``L[::-1]``. This was added to Python at the
861request of the developers of Numerical Python, which uses the third argument
862extensively. However, Python's built-in list, tuple, and string sequence types
863have never supported this feature, raising a :exc:`TypeError` if you tried it.
864Michael Hudson contributed a patch to fix this shortcoming.
865
866For example, you can now easily extract the elements of a list that have even
867indexes::
868
869 >>> L = range(10)
870 >>> L[::2]
871 [0, 2, 4, 6, 8]
872
873Negative values also work to make a copy of the same list in reverse order::
874
875 >>> L[::-1]
876 [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
877
878This also works for tuples, arrays, and strings::
879
880 >>> s='abcd'
881 >>> s[::2]
882 'ac'
883 >>> s[::-1]
884 'dcba'
885
886If you have a mutable sequence such as a list or an array you can assign to or
887delete an extended slice, but there are some differences between assignment to
888extended and regular slices. Assignment to a regular slice can be used to
889change the length of the sequence::
890
891 >>> a = range(3)
892 >>> a
893 [0, 1, 2]
894 >>> a[1:3] = [4, 5, 6]
895 >>> a
896 [0, 4, 5, 6]
897
898Extended slices aren't this flexible. When assigning to an extended slice, the
899list on the right hand side of the statement must contain the same number of
900items as the slice it is replacing::
901
902 >>> a = range(4)
903 >>> a
904 [0, 1, 2, 3]
905 >>> a[::2]
906 [0, 2]
907 >>> a[::2] = [0, -1]
908 >>> a
909 [0, 1, -1, 3]
910 >>> a[::2] = [0,1,2]
911 Traceback (most recent call last):
912 File "<stdin>", line 1, in ?
913 ValueError: attempt to assign sequence of size 3 to extended slice of size 2
914
915Deletion is more straightforward::
916
917 >>> a = range(4)
918 >>> a
919 [0, 1, 2, 3]
920 >>> a[::2]
921 [0, 2]
922 >>> del a[::2]
923 >>> a
924 [1, 3]
925
926One can also now pass slice objects to the :meth:`__getitem__` methods of the
927built-in sequences::
928
929 >>> range(10).__getitem__(slice(0, 5, 2))
930 [0, 2, 4]
931
932Or use slice objects directly in subscripts::
933
934 >>> range(10)[slice(0, 5, 2)]
935 [0, 2, 4]
936
937To simplify implementing sequences that support extended slicing, slice objects
Andrew Svetlova2fe3342012-08-11 21:14:08 +0300938now have a method ``indices(length)`` which, given the length of a sequence,
Georg Brandl116aa622007-08-15 14:28:22 +0000939returns a ``(start, stop, step)`` tuple that can be passed directly to
940:func:`range`. :meth:`indices` handles omitted and out-of-bounds indices in a
941manner consistent with regular slices (and this innocuous phrase hides a welter
942of confusing details!). The method is intended to be used like this::
943
944 class FakeSeq:
945 ...
946 def calc_item(self, i):
947 ...
948 def __getitem__(self, item):
949 if isinstance(item, slice):
950 indices = item.indices(len(self))
951 return FakeSeq([self.calc_item(i) for i in range(*indices)])
952 else:
953 return self.calc_item(i)
954
955From this example you can also see that the built-in :class:`slice` object is
956now the type object for the slice type, and is no longer a function. This is
957consistent with Python 2.2, where :class:`int`, :class:`str`, etc., underwent
958the same change.
959
Christian Heimes5b5e81c2007-12-31 16:14:33 +0000960.. ======================================================================
Georg Brandl116aa622007-08-15 14:28:22 +0000961
962
963Other Language Changes
964======================
965
966Here are all of the changes that Python 2.3 makes to the core Python language.
967
968* The :keyword:`yield` statement is now always a keyword, as described in
969 section :ref:`section-generators` of this document.
970
971* A new built-in function :func:`enumerate` was added, as described in section
972 :ref:`section-enumerate` of this document.
973
974* Two new constants, :const:`True` and :const:`False` were added along with the
975 built-in :class:`bool` type, as described in section :ref:`section-bool` of this
976 document.
977
978* The :func:`int` type constructor will now return a long integer instead of
979 raising an :exc:`OverflowError` when a string or floating-point number is too
980 large to fit into an integer. This can lead to the paradoxical result that
981 ``isinstance(int(expression), int)`` is false, but that seems unlikely to cause
982 problems in practice.
983
984* Built-in types now support the extended slicing syntax, as described in
985 section :ref:`section-slices` of this document.
986
Andrew Svetlova2fe3342012-08-11 21:14:08 +0300987* A new built-in function, ``sum(iterable, start=0)``, adds up the numeric
Georg Brandl116aa622007-08-15 14:28:22 +0000988 items in the iterable object and returns their sum. :func:`sum` only accepts
989 numbers, meaning that you can't use it to concatenate a bunch of strings.
990 (Contributed by Alex Martelli.)
991
992* ``list.insert(pos, value)`` used to insert *value* at the front of the list
993 when *pos* was negative. The behaviour has now been changed to be consistent
994 with slice indexing, so when *pos* is -1 the value will be inserted before the
995 last element, and so forth.
996
997* ``list.index(value)``, which searches for *value* within the list and returns
998 its index, now takes optional *start* and *stop* arguments to limit the search
999 to only part of the list.
1000
Andrew Svetlova2fe3342012-08-11 21:14:08 +03001001* Dictionaries have a new method, ``pop(key[, *default*])``, that returns
Georg Brandl116aa622007-08-15 14:28:22 +00001002 the value corresponding to *key* and removes that key/value pair from the
1003 dictionary. If the requested key isn't present in the dictionary, *default* is
1004 returned if it's specified and :exc:`KeyError` raised if it isn't. ::
1005
1006 >>> d = {1:2}
1007 >>> d
1008 {1: 2}
1009 >>> d.pop(4)
1010 Traceback (most recent call last):
1011 File "stdin", line 1, in ?
1012 KeyError: 4
1013 >>> d.pop(1)
1014 2
1015 >>> d.pop(1)
1016 Traceback (most recent call last):
1017 File "stdin", line 1, in ?
1018 KeyError: 'pop(): dictionary is empty'
1019 >>> d
1020 {}
1021 >>>
1022
Andrew Svetlova2fe3342012-08-11 21:14:08 +03001023 There's also a new class method, ``dict.fromkeys(iterable, value)``, that
Georg Brandl116aa622007-08-15 14:28:22 +00001024 creates a dictionary with keys taken from the supplied iterator *iterable* and
1025 all values set to *value*, defaulting to ``None``.
1026
1027 (Patches contributed by Raymond Hettinger.)
1028
1029 Also, the :func:`dict` constructor now accepts keyword arguments to simplify
1030 creating small dictionaries::
1031
1032 >>> dict(red=1, blue=2, green=3, black=4)
Georg Brandl48310cd2009-01-03 21:18:54 +00001033 {'blue': 2, 'black': 4, 'green': 3, 'red': 1}
Georg Brandl116aa622007-08-15 14:28:22 +00001034
1035 (Contributed by Just van Rossum.)
1036
1037* The :keyword:`assert` statement no longer checks the ``__debug__`` flag, so
1038 you can no longer disable assertions by assigning to ``__debug__``. Running
1039 Python with the :option:`-O` switch will still generate code that doesn't
1040 execute any assertions.
1041
1042* Most type objects are now callable, so you can use them to create new objects
1043 such as functions, classes, and modules. (This means that the :mod:`new` module
1044 can be deprecated in a future Python version, because you can now use the type
1045 objects available in the :mod:`types` module.) For example, you can create a new
1046 module object with the following code:
1047
Georg Brandl116aa622007-08-15 14:28:22 +00001048 ::
1049
1050 >>> import types
1051 >>> m = types.ModuleType('abc','docstring')
1052 >>> m
1053 <module 'abc' (built-in)>
1054 >>> m.__doc__
1055 'docstring'
1056
1057* A new warning, :exc:`PendingDeprecationWarning` was added to indicate features
1058 which are in the process of being deprecated. The warning will *not* be printed
1059 by default. To check for use of features that will be deprecated in the future,
1060 supply :option:`-Walways::PendingDeprecationWarning::` on the command line or
1061 use :func:`warnings.filterwarnings`.
1062
1063* The process of deprecating string-based exceptions, as in ``raise "Error
1064 occurred"``, has begun. Raising a string will now trigger
1065 :exc:`PendingDeprecationWarning`.
1066
1067* Using ``None`` as a variable name will now result in a :exc:`SyntaxWarning`
1068 warning. In a future version of Python, ``None`` may finally become a keyword.
1069
1070* The :meth:`xreadlines` method of file objects, introduced in Python 2.1, is no
1071 longer necessary because files now behave as their own iterator.
1072 :meth:`xreadlines` was originally introduced as a faster way to loop over all
1073 the lines in a file, but now you can simply write ``for line in file_obj``.
1074 File objects also have a new read-only :attr:`encoding` attribute that gives the
1075 encoding used by the file; Unicode strings written to the file will be
1076 automatically converted to bytes using the given encoding.
1077
1078* The method resolution order used by new-style classes has changed, though
1079 you'll only notice the difference if you have a really complicated inheritance
1080 hierarchy. Classic classes are unaffected by this change. Python 2.2
1081 originally used a topological sort of a class's ancestors, but 2.3 now uses the
1082 C3 algorithm as described in the paper `"A Monotonic Superclass Linearization
Georg Brandl5d941342016-02-26 19:37:12 +01001083 for Dylan" <http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.19.3910>`_. To
Georg Brandl116aa622007-08-15 14:28:22 +00001084 understand the motivation for this change, read Michele Simionato's article
Georg Brandl5d941342016-02-26 19:37:12 +01001085 `"Python 2.3 Method Resolution Order" <http://www.phyast.pitt.edu/~micheles/mro.html>`_, or
Georg Brandl116aa622007-08-15 14:28:22 +00001086 read the thread on python-dev starting with the message at
Georg Brandle73778c2014-10-29 08:36:35 +01001087 https://mail.python.org/pipermail/python-dev/2002-October/029035.html. Samuele
Georg Brandl116aa622007-08-15 14:28:22 +00001088 Pedroni first pointed out the problem and also implemented the fix by coding the
1089 C3 algorithm.
1090
1091* Python runs multithreaded programs by switching between threads after
1092 executing N bytecodes. The default value for N has been increased from 10 to
1093 100 bytecodes, speeding up single-threaded applications by reducing the
1094 switching overhead. Some multithreaded applications may suffer slower response
1095 time, but that's easily fixed by setting the limit back to a lower number using
Andrew Svetlova2fe3342012-08-11 21:14:08 +03001096 ``sys.setcheckinterval(N)``. The limit can be retrieved with the new
Georg Brandl116aa622007-08-15 14:28:22 +00001097 :func:`sys.getcheckinterval` function.
1098
1099* One minor but far-reaching change is that the names of extension types defined
1100 by the modules included with Python now contain the module and a ``'.'`` in
1101 front of the type name. For example, in Python 2.2, if you created a socket and
1102 printed its :attr:`__class__`, you'd get this output::
1103
1104 >>> s = socket.socket()
1105 >>> s.__class__
1106 <type 'socket'>
1107
1108 In 2.3, you get this::
1109
1110 >>> s.__class__
1111 <type '_socket.socket'>
1112
1113* One of the noted incompatibilities between old- and new-style classes has been
1114 removed: you can now assign to the :attr:`__name__` and :attr:`__bases__`
1115 attributes of new-style classes. There are some restrictions on what can be
1116 assigned to :attr:`__bases__` along the lines of those relating to assigning to
1117 an instance's :attr:`__class__` attribute.
1118
Christian Heimes5b5e81c2007-12-31 16:14:33 +00001119.. ======================================================================
Georg Brandl116aa622007-08-15 14:28:22 +00001120
1121
1122String Changes
1123--------------
1124
1125* The :keyword:`in` operator now works differently for strings. Previously, when
1126 evaluating ``X in Y`` where *X* and *Y* are strings, *X* could only be a single
1127 character. That's now changed; *X* can be a string of any length, and ``X in Y``
1128 will return :const:`True` if *X* is a substring of *Y*. If *X* is the empty
1129 string, the result is always :const:`True`. ::
1130
1131 >>> 'ab' in 'abcd'
1132 True
1133 >>> 'ad' in 'abcd'
1134 False
1135 >>> '' in 'abcd'
1136 True
1137
1138 Note that this doesn't tell you where the substring starts; if you need that
1139 information, use the :meth:`find` string method.
1140
1141* The :meth:`strip`, :meth:`lstrip`, and :meth:`rstrip` string methods now have
1142 an optional argument for specifying the characters to strip. The default is
1143 still to remove all whitespace characters::
1144
1145 >>> ' abc '.strip()
1146 'abc'
1147 >>> '><><abc<><><>'.strip('<>')
1148 'abc'
1149 >>> '><><abc<><><>\n'.strip('<>')
1150 'abc<><><>\n'
1151 >>> u'\u4000\u4001abc\u4000'.strip(u'\u4000')
1152 u'\u4001abc'
1153 >>>
1154
1155 (Suggested by Simon Brunning and implemented by Walter Dörwald.)
1156
1157* The :meth:`startswith` and :meth:`endswith` string methods now accept negative
1158 numbers for the *start* and *end* parameters.
1159
1160* Another new string method is :meth:`zfill`, originally a function in the
1161 :mod:`string` module. :meth:`zfill` pads a numeric string with zeros on the
1162 left until it's the specified width. Note that the ``%`` operator is still more
1163 flexible and powerful than :meth:`zfill`. ::
1164
1165 >>> '45'.zfill(4)
1166 '0045'
1167 >>> '12345'.zfill(4)
1168 '12345'
1169 >>> 'goofy'.zfill(6)
1170 '0goofy'
1171
1172 (Contributed by Walter Dörwald.)
1173
1174* A new type object, :class:`basestring`, has been added. Both 8-bit strings and
1175 Unicode strings inherit from this type, so ``isinstance(obj, basestring)`` will
1176 return :const:`True` for either kind of string. It's a completely abstract
1177 type, so you can't create :class:`basestring` instances.
1178
1179* Interned strings are no longer immortal and will now be garbage-collected in
1180 the usual way when the only reference to them is from the internal dictionary of
1181 interned strings. (Implemented by Oren Tirosh.)
1182
Christian Heimes5b5e81c2007-12-31 16:14:33 +00001183.. ======================================================================
Georg Brandl116aa622007-08-15 14:28:22 +00001184
1185
1186Optimizations
1187-------------
1188
1189* The creation of new-style class instances has been made much faster; they're
1190 now faster than classic classes!
1191
1192* The :meth:`sort` method of list objects has been extensively rewritten by Tim
1193 Peters, and the implementation is significantly faster.
1194
1195* Multiplication of large long integers is now much faster thanks to an
1196 implementation of Karatsuba multiplication, an algorithm that scales better than
1197 the O(n\*n) required for the grade-school multiplication algorithm. (Original
1198 patch by Christopher A. Craig, and significantly reworked by Tim Peters.)
1199
1200* The ``SET_LINENO`` opcode is now gone. This may provide a small speed
1201 increase, depending on your compiler's idiosyncrasies. See section
Benjamin Petersonf10a79a2008-10-11 00:49:57 +00001202 :ref:`23section-other` for a longer explanation. (Removed by Michael Hudson.)
Georg Brandl116aa622007-08-15 14:28:22 +00001203
1204* :func:`xrange` objects now have their own iterator, making ``for i in
1205 xrange(n)`` slightly faster than ``for i in range(n)``. (Patch by Raymond
1206 Hettinger.)
1207
1208* A number of small rearrangements have been made in various hotspots to improve
1209 performance, such as inlining a function or removing some code. (Implemented
1210 mostly by GvR, but lots of people have contributed single changes.)
1211
1212The net result of the 2.3 optimizations is that Python 2.3 runs the pystone
1213benchmark around 25% faster than Python 2.2.
1214
Christian Heimes5b5e81c2007-12-31 16:14:33 +00001215.. ======================================================================
Georg Brandl116aa622007-08-15 14:28:22 +00001216
1217
1218New, Improved, and Deprecated Modules
1219=====================================
1220
1221As usual, Python's standard library received a number of enhancements and bug
1222fixes. Here's a partial list of the most notable changes, sorted alphabetically
1223by module name. Consult the :file:`Misc/NEWS` file in the source tree for a more
1224complete list of changes, or look through the CVS logs for all the details.
1225
1226* The :mod:`array` module now supports arrays of Unicode characters using the
1227 ``'u'`` format character. Arrays also now support using the ``+=`` assignment
1228 operator to add another array's contents, and the ``*=`` assignment operator to
1229 repeat an array. (Contributed by Jason Orendorff.)
1230
1231* The :mod:`bsddb` module has been replaced by version 4.1.6 of the `PyBSDDB
1232 <http://pybsddb.sourceforge.net>`_ package, providing a more complete interface
1233 to the transactional features of the BerkeleyDB library.
1234
1235 The old version of the module has been renamed to :mod:`bsddb185` and is no
1236 longer built automatically; you'll have to edit :file:`Modules/Setup` to enable
1237 it. Note that the new :mod:`bsddb` package is intended to be compatible with
1238 the old module, so be sure to file bugs if you discover any incompatibilities.
1239 When upgrading to Python 2.3, if the new interpreter is compiled with a new
1240 version of the underlying BerkeleyDB library, you will almost certainly have to
1241 convert your database files to the new version. You can do this fairly easily
1242 with the new scripts :file:`db2pickle.py` and :file:`pickle2db.py` which you
1243 will find in the distribution's :file:`Tools/scripts` directory. If you've
1244 already been using the PyBSDDB package and importing it as :mod:`bsddb3`, you
1245 will have to change your ``import`` statements to import it as :mod:`bsddb`.
1246
1247* The new :mod:`bz2` module is an interface to the bz2 data compression library.
1248 bz2-compressed data is usually smaller than corresponding :mod:`zlib`\
1249 -compressed data. (Contributed by Gustavo Niemeyer.)
1250
1251* A set of standard date/time types has been added in the new :mod:`datetime`
1252 module. See the following section for more details.
1253
1254* The Distutils :class:`Extension` class now supports an extra constructor
1255 argument named *depends* for listing additional source files that an extension
1256 depends on. This lets Distutils recompile the module if any of the dependency
1257 files are modified. For example, if :file:`sampmodule.c` includes the header
1258 file :file:`sample.h`, you would create the :class:`Extension` object like
1259 this::
1260
1261 ext = Extension("samp",
1262 sources=["sampmodule.c"],
1263 depends=["sample.h"])
1264
1265 Modifying :file:`sample.h` would then cause the module to be recompiled.
1266 (Contributed by Jeremy Hylton.)
1267
1268* Other minor changes to Distutils: it now checks for the :envvar:`CC`,
1269 :envvar:`CFLAGS`, :envvar:`CPP`, :envvar:`LDFLAGS`, and :envvar:`CPPFLAGS`
1270 environment variables, using them to override the settings in Python's
1271 configuration (contributed by Robert Weber).
1272
1273* Previously the :mod:`doctest` module would only search the docstrings of
1274 public methods and functions for test cases, but it now also examines private
Andrew Svetlova2fe3342012-08-11 21:14:08 +03001275 ones as well. The :func:`DocTestSuite` function creates a
Georg Brandl116aa622007-08-15 14:28:22 +00001276 :class:`unittest.TestSuite` object from a set of :mod:`doctest` tests.
1277
Andrew Svetlova2fe3342012-08-11 21:14:08 +03001278* The new ``gc.get_referents(object)`` function returns a list of all the
Georg Brandl116aa622007-08-15 14:28:22 +00001279 objects referenced by *object*.
1280
1281* The :mod:`getopt` module gained a new function, :func:`gnu_getopt`, that
1282 supports the same arguments as the existing :func:`getopt` function but uses
1283 GNU-style scanning mode. The existing :func:`getopt` stops processing options as
1284 soon as a non-option argument is encountered, but in GNU-style mode processing
1285 continues, meaning that options and arguments can be mixed. For example::
1286
1287 >>> getopt.getopt(['-f', 'filename', 'output', '-v'], 'f:v')
1288 ([('-f', 'filename')], ['output', '-v'])
1289 >>> getopt.gnu_getopt(['-f', 'filename', 'output', '-v'], 'f:v')
1290 ([('-f', 'filename'), ('-v', '')], ['output'])
1291
1292 (Contributed by Peter Ă…strand.)
1293
1294* The :mod:`grp`, :mod:`pwd`, and :mod:`resource` modules now return enhanced
1295 tuples::
1296
1297 >>> import grp
1298 >>> g = grp.getgrnam('amk')
1299 >>> g.gr_name, g.gr_gid
1300 ('amk', 500)
1301
1302* The :mod:`gzip` module can now handle files exceeding 2 GiB.
1303
1304* The new :mod:`heapq` module contains an implementation of a heap queue
1305 algorithm. A heap is an array-like data structure that keeps items in a
1306 partially sorted order such that, for every index *k*, ``heap[k] <=
1307 heap[2*k+1]`` and ``heap[k] <= heap[2*k+2]``. This makes it quick to remove the
1308 smallest item, and inserting a new item while maintaining the heap property is
Georg Brandl5d941342016-02-26 19:37:12 +01001309 O(lg n). (See https://xlinux.nist.gov/dads//HTML/priorityque.html for more
Georg Brandl116aa622007-08-15 14:28:22 +00001310 information about the priority queue data structure.)
1311
1312 The :mod:`heapq` module provides :func:`heappush` and :func:`heappop` functions
1313 for adding and removing items while maintaining the heap property on top of some
1314 other mutable Python sequence type. Here's an example that uses a Python list::
1315
1316 >>> import heapq
1317 >>> heap = []
1318 >>> for item in [3, 7, 5, 11, 1]:
1319 ... heapq.heappush(heap, item)
1320 ...
1321 >>> heap
1322 [1, 3, 5, 11, 7]
1323 >>> heapq.heappop(heap)
1324 1
1325 >>> heapq.heappop(heap)
1326 3
1327 >>> heap
1328 [5, 7, 11]
1329
1330 (Contributed by Kevin O'Connor.)
1331
1332* The IDLE integrated development environment has been updated using the code
Georg Brandlb7354a62014-10-29 10:57:37 +01001333 from the IDLEfork project (http://idlefork.sourceforge.net). The most notable feature is
Georg Brandl116aa622007-08-15 14:28:22 +00001334 that the code being developed is now executed in a subprocess, meaning that
1335 there's no longer any need for manual ``reload()`` operations. IDLE's core code
1336 has been incorporated into the standard library as the :mod:`idlelib` package.
1337
1338* The :mod:`imaplib` module now supports IMAP over SSL. (Contributed by Piers
1339 Lauder and Tino Lange.)
1340
1341* The :mod:`itertools` contains a number of useful functions for use with
1342 iterators, inspired by various functions provided by the ML and Haskell
1343 languages. For example, ``itertools.ifilter(predicate, iterator)`` returns all
1344 elements in the iterator for which the function :func:`predicate` returns
1345 :const:`True`, and ``itertools.repeat(obj, N)`` returns ``obj`` *N* times.
1346 There are a number of other functions in the module; see the package's reference
1347 documentation for details.
1348 (Contributed by Raymond Hettinger.)
1349
Andrew Svetlova2fe3342012-08-11 21:14:08 +03001350* Two new functions in the :mod:`math` module, ``degrees(rads)`` and
1351 ``radians(degs)``, convert between radians and degrees. Other functions in
Georg Brandl116aa622007-08-15 14:28:22 +00001352 the :mod:`math` module such as :func:`math.sin` and :func:`math.cos` have always
1353 required input values measured in radians. Also, an optional *base* argument
1354 was added to :func:`math.log` to make it easier to compute logarithms for bases
1355 other than ``e`` and ``10``. (Contributed by Raymond Hettinger.)
1356
1357* Several new POSIX functions (:func:`getpgid`, :func:`killpg`, :func:`lchown`,
1358 :func:`loadavg`, :func:`major`, :func:`makedev`, :func:`minor`, and
1359 :func:`mknod`) were added to the :mod:`posix` module that underlies the
1360 :mod:`os` module. (Contributed by Gustavo Niemeyer, Geert Jansen, and Denis S.
1361 Otkidach.)
1362
1363* In the :mod:`os` module, the :func:`\*stat` family of functions can now report
1364 fractions of a second in a timestamp. Such time stamps are represented as
1365 floats, similar to the value returned by :func:`time.time`.
1366
1367 During testing, it was found that some applications will break if time stamps
1368 are floats. For compatibility, when using the tuple interface of the
1369 :class:`stat_result` time stamps will be represented as integers. When using
1370 named fields (a feature first introduced in Python 2.2), time stamps are still
1371 represented as integers, unless :func:`os.stat_float_times` is invoked to enable
1372 float return values::
1373
1374 >>> os.stat("/tmp").st_mtime
1375 1034791200
1376 >>> os.stat_float_times(True)
1377 >>> os.stat("/tmp").st_mtime
1378 1034791200.6335014
1379
1380 In Python 2.4, the default will change to always returning floats.
1381
1382 Application developers should enable this feature only if all their libraries
1383 work properly when confronted with floating point time stamps, or if they use
1384 the tuple API. If used, the feature should be activated on an application level
1385 instead of trying to enable it on a per-use basis.
1386
1387* The :mod:`optparse` module contains a new parser for command-line arguments
1388 that can convert option values to a particular Python type and will
1389 automatically generate a usage message. See the following section for more
1390 details.
1391
1392* The old and never-documented :mod:`linuxaudiodev` module has been deprecated,
1393 and a new version named :mod:`ossaudiodev` has been added. The module was
1394 renamed because the OSS sound drivers can be used on platforms other than Linux,
1395 and the interface has also been tidied and brought up to date in various ways.
1396 (Contributed by Greg Ward and Nicholas FitzRoy-Dale.)
1397
1398* The new :mod:`platform` module contains a number of functions that try to
1399 determine various properties of the platform you're running on. There are
1400 functions for getting the architecture, CPU type, the Windows OS version, and
1401 even the Linux distribution version. (Contributed by Marc-André Lemburg.)
1402
1403* The parser objects provided by the :mod:`pyexpat` module can now optionally
1404 buffer character data, resulting in fewer calls to your character data handler
1405 and therefore faster performance. Setting the parser object's
1406 :attr:`buffer_text` attribute to :const:`True` will enable buffering.
1407
Andrew Svetlova2fe3342012-08-11 21:14:08 +03001408* The ``sample(population, k)`` function was added to the :mod:`random`
Georg Brandl116aa622007-08-15 14:28:22 +00001409 module. *population* is a sequence or :class:`xrange` object containing the
1410 elements of a population, and :func:`sample` chooses *k* elements from the
1411 population without replacing chosen elements. *k* can be any value up to
1412 ``len(population)``. For example::
1413
1414 >>> days = ['Mo', 'Tu', 'We', 'Th', 'Fr', 'St', 'Sn']
1415 >>> random.sample(days, 3) # Choose 3 elements
1416 ['St', 'Sn', 'Th']
1417 >>> random.sample(days, 7) # Choose 7 elements
1418 ['Tu', 'Th', 'Mo', 'We', 'St', 'Fr', 'Sn']
1419 >>> random.sample(days, 7) # Choose 7 again
1420 ['We', 'Mo', 'Sn', 'Fr', 'Tu', 'St', 'Th']
1421 >>> random.sample(days, 8) # Can't choose eight
1422 Traceback (most recent call last):
1423 File "<stdin>", line 1, in ?
1424 File "random.py", line 414, in sample
1425 raise ValueError, "sample larger than population"
1426 ValueError: sample larger than population
1427 >>> random.sample(xrange(1,10000,2), 10) # Choose ten odd nos. under 10000
1428 [3407, 3805, 1505, 7023, 2401, 2267, 9733, 3151, 8083, 9195]
1429
1430 The :mod:`random` module now uses a new algorithm, the Mersenne Twister,
1431 implemented in C. It's faster and more extensively studied than the previous
1432 algorithm.
1433
1434 (All changes contributed by Raymond Hettinger.)
1435
1436* The :mod:`readline` module also gained a number of new functions:
1437 :func:`get_history_item`, :func:`get_current_history_length`, and
1438 :func:`redisplay`.
1439
1440* The :mod:`rexec` and :mod:`Bastion` modules have been declared dead, and
1441 attempts to import them will fail with a :exc:`RuntimeError`. New-style classes
1442 provide new ways to break out of the restricted execution environment provided
1443 by :mod:`rexec`, and no one has interest in fixing them or time to do so. If
1444 you have applications using :mod:`rexec`, rewrite them to use something else.
1445
1446 (Sticking with Python 2.2 or 2.1 will not make your applications any safer
1447 because there are known bugs in the :mod:`rexec` module in those versions. To
1448 repeat: if you're using :mod:`rexec`, stop using it immediately.)
1449
1450* The :mod:`rotor` module has been deprecated because the algorithm it uses for
1451 encryption is not believed to be secure. If you need encryption, use one of the
1452 several AES Python modules that are available separately.
1453
Andrew Svetlova2fe3342012-08-11 21:14:08 +03001454* The :mod:`shutil` module gained a ``move(src, dest)`` function that
Georg Brandl116aa622007-08-15 14:28:22 +00001455 recursively moves a file or directory to a new location.
1456
1457* Support for more advanced POSIX signal handling was added to the :mod:`signal`
1458 but then removed again as it proved impossible to make it work reliably across
1459 platforms.
1460
1461* The :mod:`socket` module now supports timeouts. You can call the
Andrew Svetlova2fe3342012-08-11 21:14:08 +03001462 ``settimeout(t)`` method on a socket object to set a timeout of *t* seconds.
Georg Brandl116aa622007-08-15 14:28:22 +00001463 Subsequent socket operations that take longer than *t* seconds to complete will
1464 abort and raise a :exc:`socket.timeout` exception.
1465
1466 The original timeout implementation was by Tim O'Malley. Michael Gilfix
1467 integrated it into the Python :mod:`socket` module and shepherded it through a
1468 lengthy review. After the code was checked in, Guido van Rossum rewrote parts
1469 of it. (This is a good example of a collaborative development process in
1470 action.)
1471
1472* On Windows, the :mod:`socket` module now ships with Secure Sockets Layer
1473 (SSL) support.
1474
1475* The value of the C :const:`PYTHON_API_VERSION` macro is now exposed at the
1476 Python level as ``sys.api_version``. The current exception can be cleared by
1477 calling the new :func:`sys.exc_clear` function.
1478
1479* The new :mod:`tarfile` module allows reading from and writing to
1480 :program:`tar`\ -format archive files. (Contributed by Lars Gustäbel.)
1481
1482* The new :mod:`textwrap` module contains functions for wrapping strings
Andrew Svetlova2fe3342012-08-11 21:14:08 +03001483 containing paragraphs of text. The ``wrap(text, width)`` function takes a
Georg Brandl116aa622007-08-15 14:28:22 +00001484 string and returns a list containing the text split into lines of no more than
Andrew Svetlova2fe3342012-08-11 21:14:08 +03001485 the chosen width. The ``fill(text, width)`` function returns a single
Georg Brandl116aa622007-08-15 14:28:22 +00001486 string, reformatted to fit into lines no longer than the chosen width. (As you
1487 can guess, :func:`fill` is built on top of :func:`wrap`. For example::
1488
1489 >>> import textwrap
1490 >>> paragraph = "Not a whit, we defy augury: ... more text ..."
1491 >>> textwrap.wrap(paragraph, 60)
1492 ["Not a whit, we defy augury: there's a special providence in",
1493 "the fall of a sparrow. If it be now, 'tis not to come; if it",
1494 ...]
1495 >>> print textwrap.fill(paragraph, 35)
1496 Not a whit, we defy augury: there's
1497 a special providence in the fall of
1498 a sparrow. If it be now, 'tis not
1499 to come; if it be not to come, it
1500 will be now; if it be not now, yet
1501 it will come: the readiness is all.
1502 >>>
1503
1504 The module also contains a :class:`TextWrapper` class that actually implements
1505 the text wrapping strategy. Both the :class:`TextWrapper` class and the
1506 :func:`wrap` and :func:`fill` functions support a number of additional keyword
1507 arguments for fine-tuning the formatting; consult the module's documentation
1508 for details. (Contributed by Greg Ward.)
1509
1510* The :mod:`thread` and :mod:`threading` modules now have companion modules,
1511 :mod:`dummy_thread` and :mod:`dummy_threading`, that provide a do-nothing
1512 implementation of the :mod:`thread` module's interface for platforms where
1513 threads are not supported. The intention is to simplify thread-aware modules
1514 (ones that *don't* rely on threads to run) by putting the following code at the
1515 top::
1516
1517 try:
1518 import threading as _threading
1519 except ImportError:
1520 import dummy_threading as _threading
1521
1522 In this example, :mod:`_threading` is used as the module name to make it clear
1523 that the module being used is not necessarily the actual :mod:`threading`
1524 module. Code can call functions and use classes in :mod:`_threading` whether or
1525 not threads are supported, avoiding an :keyword:`if` statement and making the
1526 code slightly clearer. This module will not magically make multithreaded code
1527 run without threads; code that waits for another thread to return or to do
1528 something will simply hang forever.
1529
1530* The :mod:`time` module's :func:`strptime` function has long been an annoyance
1531 because it uses the platform C library's :func:`strptime` implementation, and
1532 different platforms sometimes have odd bugs. Brett Cannon contributed a
1533 portable implementation that's written in pure Python and should behave
1534 identically on all platforms.
1535
1536* The new :mod:`timeit` module helps measure how long snippets of Python code
1537 take to execute. The :file:`timeit.py` file can be run directly from the
1538 command line, or the module's :class:`Timer` class can be imported and used
1539 directly. Here's a short example that figures out whether it's faster to
1540 convert an 8-bit string to Unicode by appending an empty Unicode string to it or
1541 by using the :func:`unicode` function::
1542
1543 import timeit
1544
1545 timer1 = timeit.Timer('unicode("abc")')
1546 timer2 = timeit.Timer('"abc" + u""')
1547
1548 # Run three trials
1549 print timer1.repeat(repeat=3, number=100000)
1550 print timer2.repeat(repeat=3, number=100000)
1551
1552 # On my laptop this outputs:
1553 # [0.36831796169281006, 0.37441694736480713, 0.35304892063140869]
1554 # [0.17574405670166016, 0.18193507194519043, 0.17565798759460449]
1555
1556* The :mod:`Tix` module has received various bug fixes and updates for the
1557 current version of the Tix package.
1558
1559* The :mod:`Tkinter` module now works with a thread-enabled version of Tcl.
1560 Tcl's threading model requires that widgets only be accessed from the thread in
1561 which they're created; accesses from another thread can cause Tcl to panic. For
1562 certain Tcl interfaces, :mod:`Tkinter` will now automatically avoid this when a
1563 widget is accessed from a different thread by marshalling a command, passing it
1564 to the correct thread, and waiting for the results. Other interfaces can't be
1565 handled automatically but :mod:`Tkinter` will now raise an exception on such an
1566 access so that you can at least find out about the problem. See
Georg Brandle73778c2014-10-29 08:36:35 +01001567 https://mail.python.org/pipermail/python-dev/2002-December/031107.html for a more
Georg Brandl116aa622007-08-15 14:28:22 +00001568 detailed explanation of this change. (Implemented by Martin von Löwis.)
1569
Georg Brandl116aa622007-08-15 14:28:22 +00001570* Calling Tcl methods through :mod:`_tkinter` no longer returns only strings.
1571 Instead, if Tcl returns other objects those objects are converted to their
1572 Python equivalent, if one exists, or wrapped with a :class:`_tkinter.Tcl_Obj`
1573 object if no Python equivalent exists. This behavior can be controlled through
1574 the :meth:`wantobjects` method of :class:`tkapp` objects.
1575
1576 When using :mod:`_tkinter` through the :mod:`Tkinter` module (as most Tkinter
1577 applications will), this feature is always activated. It should not cause
1578 compatibility problems, since Tkinter would always convert string results to
1579 Python types where possible.
1580
1581 If any incompatibilities are found, the old behavior can be restored by setting
1582 the :attr:`wantobjects` variable in the :mod:`Tkinter` module to false before
1583 creating the first :class:`tkapp` object. ::
1584
1585 import Tkinter
1586 Tkinter.wantobjects = 0
1587
1588 Any breakage caused by this change should be reported as a bug.
1589
1590* The :mod:`UserDict` module has a new :class:`DictMixin` class which defines
1591 all dictionary methods for classes that already have a minimum mapping
1592 interface. This greatly simplifies writing classes that need to be
1593 substitutable for dictionaries, such as the classes in the :mod:`shelve`
1594 module.
1595
1596 Adding the mix-in as a superclass provides the full dictionary interface
1597 whenever the class defines :meth:`__getitem__`, :meth:`__setitem__`,
1598 :meth:`__delitem__`, and :meth:`keys`. For example::
1599
1600 >>> import UserDict
1601 >>> class SeqDict(UserDict.DictMixin):
1602 ... """Dictionary lookalike implemented with lists."""
1603 ... def __init__(self):
1604 ... self.keylist = []
1605 ... self.valuelist = []
1606 ... def __getitem__(self, key):
1607 ... try:
1608 ... i = self.keylist.index(key)
1609 ... except ValueError:
1610 ... raise KeyError
1611 ... return self.valuelist[i]
1612 ... def __setitem__(self, key, value):
1613 ... try:
1614 ... i = self.keylist.index(key)
1615 ... self.valuelist[i] = value
1616 ... except ValueError:
1617 ... self.keylist.append(key)
1618 ... self.valuelist.append(value)
1619 ... def __delitem__(self, key):
1620 ... try:
1621 ... i = self.keylist.index(key)
1622 ... except ValueError:
1623 ... raise KeyError
1624 ... self.keylist.pop(i)
1625 ... self.valuelist.pop(i)
1626 ... def keys(self):
1627 ... return list(self.keylist)
Georg Brandl48310cd2009-01-03 21:18:54 +00001628 ...
Georg Brandl116aa622007-08-15 14:28:22 +00001629 >>> s = SeqDict()
1630 >>> dir(s) # See that other dictionary methods are implemented
1631 ['__cmp__', '__contains__', '__delitem__', '__doc__', '__getitem__',
1632 '__init__', '__iter__', '__len__', '__module__', '__repr__',
1633 '__setitem__', 'clear', 'get', 'has_key', 'items', 'iteritems',
1634 'iterkeys', 'itervalues', 'keylist', 'keys', 'pop', 'popitem',
1635 'setdefault', 'update', 'valuelist', 'values']
1636
1637 (Contributed by Raymond Hettinger.)
1638
1639* The DOM implementation in :mod:`xml.dom.minidom` can now generate XML output
1640 in a particular encoding by providing an optional encoding argument to the
1641 :meth:`toxml` and :meth:`toprettyxml` methods of DOM nodes.
1642
1643* The :mod:`xmlrpclib` module now supports an XML-RPC extension for handling nil
1644 data values such as Python's ``None``. Nil values are always supported on
1645 unmarshalling an XML-RPC response. To generate requests containing ``None``,
1646 you must supply a true value for the *allow_none* parameter when creating a
1647 :class:`Marshaller` instance.
1648
1649* The new :mod:`DocXMLRPCServer` module allows writing self-documenting XML-RPC
1650 servers. Run it in demo mode (as a program) to see it in action. Pointing the
1651 Web browser to the RPC server produces pydoc-style documentation; pointing
1652 xmlrpclib to the server allows invoking the actual methods. (Contributed by
1653 Brian Quinlan.)
1654
1655* Support for internationalized domain names (RFCs 3454, 3490, 3491, and 3492)
1656 has been added. The "idna" encoding can be used to convert between a Unicode
1657 domain name and the ASCII-compatible encoding (ACE) of that name. ::
1658
1659 >{}>{}> u"www.Alliancefrançaise.nu".encode("idna")
1660 'www.xn--alliancefranaise-npb.nu'
1661
1662 The :mod:`socket` module has also been extended to transparently convert
1663 Unicode hostnames to the ACE version before passing them to the C library.
1664 Modules that deal with hostnames such as :mod:`httplib` and :mod:`ftplib`)
1665 also support Unicode host names; :mod:`httplib` also sends HTTP ``Host``
1666 headers using the ACE version of the domain name. :mod:`urllib` supports
1667 Unicode URLs with non-ASCII host names as long as the ``path`` part of the URL
1668 is ASCII only.
1669
1670 To implement this change, the :mod:`stringprep` module, the ``mkstringprep``
1671 tool and the ``punycode`` encoding have been added.
1672
Christian Heimes5b5e81c2007-12-31 16:14:33 +00001673.. ======================================================================
Georg Brandl116aa622007-08-15 14:28:22 +00001674
1675
1676Date/Time Type
1677--------------
1678
1679Date and time types suitable for expressing timestamps were added as the
1680:mod:`datetime` module. The types don't support different calendars or many
1681fancy features, and just stick to the basics of representing time.
1682
1683The three primary types are: :class:`date`, representing a day, month, and year;
1684:class:`time`, consisting of hour, minute, and second; and :class:`datetime`,
1685which contains all the attributes of both :class:`date` and :class:`time`.
1686There's also a :class:`timedelta` class representing differences between two
1687points in time, and time zone logic is implemented by classes inheriting from
1688the abstract :class:`tzinfo` class.
1689
1690You can create instances of :class:`date` and :class:`time` by either supplying
1691keyword arguments to the appropriate constructor, e.g.
1692``datetime.date(year=1972, month=10, day=15)``, or by using one of a number of
1693class methods. For example, the :meth:`date.today` class method returns the
1694current local date.
1695
1696Once created, instances of the date/time classes are all immutable. There are a
1697number of methods for producing formatted strings from objects::
1698
1699 >>> import datetime
1700 >>> now = datetime.datetime.now()
1701 >>> now.isoformat()
1702 '2002-12-30T21:27:03.994956'
1703 >>> now.ctime() # Only available on date, datetime
1704 'Mon Dec 30 21:27:03 2002'
1705 >>> now.strftime('%Y %d %b')
1706 '2002 30 Dec'
1707
1708The :meth:`replace` method allows modifying one or more fields of a
1709:class:`date` or :class:`datetime` instance, returning a new instance::
1710
1711 >>> d = datetime.datetime.now()
1712 >>> d
1713 datetime.datetime(2002, 12, 30, 22, 15, 38, 827738)
1714 >>> d.replace(year=2001, hour = 12)
1715 datetime.datetime(2001, 12, 30, 12, 15, 38, 827738)
1716 >>>
1717
1718Instances can be compared, hashed, and converted to strings (the result is the
1719same as that of :meth:`isoformat`). :class:`date` and :class:`datetime`
1720instances can be subtracted from each other, and added to :class:`timedelta`
1721instances. The largest missing feature is that there's no standard library
1722support for parsing strings and getting back a :class:`date` or
1723:class:`datetime`.
1724
1725For more information, refer to the module's reference documentation.
1726(Contributed by Tim Peters.)
1727
Christian Heimes5b5e81c2007-12-31 16:14:33 +00001728.. ======================================================================
Georg Brandl116aa622007-08-15 14:28:22 +00001729
1730
1731The optparse Module
1732-------------------
1733
1734The :mod:`getopt` module provides simple parsing of command-line arguments. The
1735new :mod:`optparse` module (originally named Optik) provides more elaborate
1736command-line parsing that follows the Unix conventions, automatically creates
1737the output for :option:`--help`, and can perform different actions for different
1738options.
1739
1740You start by creating an instance of :class:`OptionParser` and telling it what
1741your program's options are. ::
1742
1743 import sys
1744 from optparse import OptionParser
1745
1746 op = OptionParser()
1747 op.add_option('-i', '--input',
1748 action='store', type='string', dest='input',
1749 help='set input filename')
1750 op.add_option('-l', '--length',
1751 action='store', type='int', dest='length',
1752 help='set maximum length of output')
1753
1754Parsing a command line is then done by calling the :meth:`parse_args` method. ::
1755
1756 options, args = op.parse_args(sys.argv[1:])
1757 print options
1758 print args
1759
1760This returns an object containing all of the option values, and a list of
1761strings containing the remaining arguments.
1762
1763Invoking the script with the various arguments now works as you'd expect it to.
1764Note that the length argument is automatically converted to an integer. ::
1765
1766 $ ./python opt.py -i data arg1
1767 <Values at 0x400cad4c: {'input': 'data', 'length': None}>
1768 ['arg1']
1769 $ ./python opt.py --input=data --length=4
1770 <Values at 0x400cad2c: {'input': 'data', 'length': 4}>
1771 []
1772 $
1773
1774The help message is automatically generated for you::
1775
1776 $ ./python opt.py --help
1777 usage: opt.py [options]
1778
1779 options:
1780 -h, --help show this help message and exit
1781 -iINPUT, --input=INPUT
1782 set input filename
1783 -lLENGTH, --length=LENGTH
1784 set maximum length of output
Georg Brandl48310cd2009-01-03 21:18:54 +00001785 $
Georg Brandl116aa622007-08-15 14:28:22 +00001786
1787See the module's documentation for more details.
1788
1789
1790Optik was written by Greg Ward, with suggestions from the readers of the Getopt
1791SIG.
1792
Christian Heimes5b5e81c2007-12-31 16:14:33 +00001793.. ======================================================================
Georg Brandl116aa622007-08-15 14:28:22 +00001794
1795
1796.. _section-pymalloc:
1797
1798Pymalloc: A Specialized Object Allocator
1799========================================
1800
1801Pymalloc, a specialized object allocator written by Vladimir Marangozov, was a
1802feature added to Python 2.1. Pymalloc is intended to be faster than the system
Georg Brandl60203b42010-10-06 10:11:56 +00001803:c:func:`malloc` and to have less memory overhead for allocation patterns typical
1804of Python programs. The allocator uses C's :c:func:`malloc` function to get large
Georg Brandl116aa622007-08-15 14:28:22 +00001805pools of memory and then fulfills smaller memory requests from these pools.
1806
1807In 2.1 and 2.2, pymalloc was an experimental feature and wasn't enabled by
1808default; you had to explicitly enable it when compiling Python by providing the
1809:option:`--with-pymalloc` option to the :program:`configure` script. In 2.3,
1810pymalloc has had further enhancements and is now enabled by default; you'll have
1811to supply :option:`--without-pymalloc` to disable it.
1812
1813This change is transparent to code written in Python; however, pymalloc may
1814expose bugs in C extensions. Authors of C extension modules should test their
1815code with pymalloc enabled, because some incorrect code may cause core dumps at
1816runtime.
1817
1818There's one particularly common error that causes problems. There are a number
1819of memory allocation functions in Python's C API that have previously just been
Georg Brandl60203b42010-10-06 10:11:56 +00001820aliases for the C library's :c:func:`malloc` and :c:func:`free`, meaning that if
Georg Brandl116aa622007-08-15 14:28:22 +00001821you accidentally called mismatched functions the error wouldn't be noticeable.
1822When the object allocator is enabled, these functions aren't aliases of
Georg Brandl60203b42010-10-06 10:11:56 +00001823:c:func:`malloc` and :c:func:`free` any more, and calling the wrong function to
Georg Brandl116aa622007-08-15 14:28:22 +00001824free memory may get you a core dump. For example, if memory was allocated using
Georg Brandl60203b42010-10-06 10:11:56 +00001825:c:func:`PyObject_Malloc`, it has to be freed using :c:func:`PyObject_Free`, not
1826:c:func:`free`. A few modules included with Python fell afoul of this and had to
Georg Brandl116aa622007-08-15 14:28:22 +00001827be fixed; doubtless there are more third-party modules that will have the same
1828problem.
1829
1830As part of this change, the confusing multiple interfaces for allocating memory
1831have been consolidated down into two API families. Memory allocated with one
1832family must not be manipulated with functions from the other family. There is
1833one family for allocating chunks of memory and another family of functions
1834specifically for allocating Python objects.
1835
1836* To allocate and free an undistinguished chunk of memory use the "raw memory"
Georg Brandl60203b42010-10-06 10:11:56 +00001837 family: :c:func:`PyMem_Malloc`, :c:func:`PyMem_Realloc`, and :c:func:`PyMem_Free`.
Georg Brandl116aa622007-08-15 14:28:22 +00001838
1839* The "object memory" family is the interface to the pymalloc facility described
1840 above and is biased towards a large number of "small" allocations:
Georg Brandl60203b42010-10-06 10:11:56 +00001841 :c:func:`PyObject_Malloc`, :c:func:`PyObject_Realloc`, and :c:func:`PyObject_Free`.
Georg Brandl116aa622007-08-15 14:28:22 +00001842
1843* To allocate and free Python objects, use the "object" family
Georg Brandl60203b42010-10-06 10:11:56 +00001844 :c:func:`PyObject_New`, :c:func:`PyObject_NewVar`, and :c:func:`PyObject_Del`.
Georg Brandl116aa622007-08-15 14:28:22 +00001845
1846Thanks to lots of work by Tim Peters, pymalloc in 2.3 also provides debugging
1847features to catch memory overwrites and doubled frees in both extension modules
1848and in the interpreter itself. To enable this support, compile a debugging
1849version of the Python interpreter by running :program:`configure` with
1850:option:`--with-pydebug`.
1851
1852To aid extension writers, a header file :file:`Misc/pymemcompat.h` is
1853distributed with the source to Python 2.3 that allows Python extensions to use
1854the 2.3 interfaces to memory allocation while compiling against any version of
1855Python since 1.5.2. You would copy the file from Python's source distribution
1856and bundle it with the source of your extension.
1857
1858
1859.. seealso::
1860
Georg Brandle73778c2014-10-29 08:36:35 +01001861 https://svn.python.org/view/python/trunk/Objects/obmalloc.c
Georg Brandl495f7b52009-10-27 15:28:25 +00001862 For the full details of the pymalloc implementation, see the comments at
1863 the top of the file :file:`Objects/obmalloc.c` in the Python source code.
1864 The above link points to the file within the python.org SVN browser.
Georg Brandl116aa622007-08-15 14:28:22 +00001865
Christian Heimes5b5e81c2007-12-31 16:14:33 +00001866.. ======================================================================
Georg Brandl116aa622007-08-15 14:28:22 +00001867
1868
1869Build and C API Changes
1870=======================
1871
1872Changes to Python's build process and to the C API include:
1873
1874* The cycle detection implementation used by the garbage collection has proven
1875 to be stable, so it's now been made mandatory. You can no longer compile Python
1876 without it, and the :option:`--with-cycle-gc` switch to :program:`configure` has
1877 been removed.
1878
1879* Python can now optionally be built as a shared library
1880 (:file:`libpython2.3.so`) by supplying :option:`--enable-shared` when running
1881 Python's :program:`configure` script. (Contributed by Ondrej Palkovsky.)
1882
Georg Brandl60203b42010-10-06 10:11:56 +00001883* The :c:macro:`DL_EXPORT` and :c:macro:`DL_IMPORT` macros are now deprecated.
Georg Brandl116aa622007-08-15 14:28:22 +00001884 Initialization functions for Python extension modules should now be declared
Georg Brandl60203b42010-10-06 10:11:56 +00001885 using the new macro :c:macro:`PyMODINIT_FUNC`, while the Python core will
1886 generally use the :c:macro:`PyAPI_FUNC` and :c:macro:`PyAPI_DATA` macros.
Georg Brandl116aa622007-08-15 14:28:22 +00001887
1888* The interpreter can be compiled without any docstrings for the built-in
1889 functions and modules by supplying :option:`--without-doc-strings` to the
1890 :program:`configure` script. This makes the Python executable about 10% smaller,
1891 but will also mean that you can't get help for Python's built-ins. (Contributed
1892 by Gustavo Niemeyer.)
1893
Georg Brandl60203b42010-10-06 10:11:56 +00001894* The :c:func:`PyArg_NoArgs` macro is now deprecated, and code that uses it
Georg Brandl116aa622007-08-15 14:28:22 +00001895 should be changed. For Python 2.2 and later, the method definition table can
1896 specify the :const:`METH_NOARGS` flag, signalling that there are no arguments,
1897 and the argument checking can then be removed. If compatibility with pre-2.2
1898 versions of Python is important, the code could use ``PyArg_ParseTuple(args,
1899 "")`` instead, but this will be slower than using :const:`METH_NOARGS`.
1900
Georg Brandl60203b42010-10-06 10:11:56 +00001901* :c:func:`PyArg_ParseTuple` accepts new format characters for various sizes of
1902 unsigned integers: ``B`` for :c:type:`unsigned char`, ``H`` for :c:type:`unsigned
1903 short int`, ``I`` for :c:type:`unsigned int`, and ``K`` for :c:type:`unsigned
Georg Brandl116aa622007-08-15 14:28:22 +00001904 long long`.
1905
Andrew Svetlova2fe3342012-08-11 21:14:08 +03001906* A new function, ``PyObject_DelItemString(mapping, char *key)`` was added
Georg Brandl116aa622007-08-15 14:28:22 +00001907 as shorthand for ``PyObject_DelItem(mapping, PyString_New(key))``.
1908
1909* File objects now manage their internal string buffer differently, increasing
1910 it exponentially when needed. This results in the benchmark tests in
1911 :file:`Lib/test/test_bufio.py` speeding up considerably (from 57 seconds to 1.7
1912 seconds, according to one measurement).
1913
1914* It's now possible to define class and static methods for a C extension type by
1915 setting either the :const:`METH_CLASS` or :const:`METH_STATIC` flags in a
Georg Brandl60203b42010-10-06 10:11:56 +00001916 method's :c:type:`PyMethodDef` structure.
Georg Brandl116aa622007-08-15 14:28:22 +00001917
1918* Python now includes a copy of the Expat XML parser's source code, removing any
1919 dependence on a system version or local installation of Expat.
1920
1921* If you dynamically allocate type objects in your extension, you should be
1922 aware of a change in the rules relating to the :attr:`__module__` and
1923 :attr:`__name__` attributes. In summary, you will want to ensure the type's
1924 dictionary contains a ``'__module__'`` key; making the module name the part of
1925 the type name leading up to the final period will no longer have the desired
1926 effect. For more detail, read the API reference documentation or the source.
1927
Christian Heimes5b5e81c2007-12-31 16:14:33 +00001928.. ======================================================================
Georg Brandl116aa622007-08-15 14:28:22 +00001929
1930
1931Port-Specific Changes
1932---------------------
1933
1934Support for a port to IBM's OS/2 using the EMX runtime environment was merged
1935into the main Python source tree. EMX is a POSIX emulation layer over the OS/2
1936system APIs. The Python port for EMX tries to support all the POSIX-like
1937capability exposed by the EMX runtime, and mostly succeeds; :func:`fork` and
1938:func:`fcntl` are restricted by the limitations of the underlying emulation
1939layer. The standard OS/2 port, which uses IBM's Visual Age compiler, also
1940gained support for case-sensitive import semantics as part of the integration of
1941the EMX port into CVS. (Contributed by Andrew MacIntyre.)
1942
1943On MacOS, most toolbox modules have been weaklinked to improve backward
1944compatibility. This means that modules will no longer fail to load if a single
1945routine is missing on the current OS version. Instead calling the missing
1946routine will raise an exception. (Contributed by Jack Jansen.)
1947
1948The RPM spec files, found in the :file:`Misc/RPM/` directory in the Python
1949source distribution, were updated for 2.3. (Contributed by Sean Reifschneider.)
1950
1951Other new platforms now supported by Python include AtheOS
Georg Brandl5d941342016-02-26 19:37:12 +01001952(http://atheos.cx/), GNU/Hurd, and OpenVMS.
Georg Brandl116aa622007-08-15 14:28:22 +00001953
Christian Heimes5b5e81c2007-12-31 16:14:33 +00001954.. ======================================================================
Georg Brandl116aa622007-08-15 14:28:22 +00001955
1956
Benjamin Petersonf10a79a2008-10-11 00:49:57 +00001957.. _23section-other:
Georg Brandl116aa622007-08-15 14:28:22 +00001958
1959Other Changes and Fixes
1960=======================
1961
1962As usual, there were a bunch of other improvements and bugfixes scattered
1963throughout the source tree. A search through the CVS change logs finds there
1964were 523 patches applied and 514 bugs fixed between Python 2.2 and 2.3. Both
1965figures are likely to be underestimates.
1966
1967Some of the more notable changes are:
1968
1969* If the :envvar:`PYTHONINSPECT` environment variable is set, the Python
1970 interpreter will enter the interactive prompt after running a Python program, as
1971 if Python had been invoked with the :option:`-i` option. The environment
1972 variable can be set before running the Python interpreter, or it can be set by
1973 the Python program as part of its execution.
1974
1975* The :file:`regrtest.py` script now provides a way to allow "all resources
1976 except *foo*." A resource name passed to the :option:`-u` option can now be
1977 prefixed with a hyphen (``'-'``) to mean "remove this resource." For example,
1978 the option '``-uall,-bsddb``' could be used to enable the use of all resources
1979 except ``bsddb``.
1980
1981* The tools used to build the documentation now work under Cygwin as well as
1982 Unix.
1983
1984* The ``SET_LINENO`` opcode has been removed. Back in the mists of time, this
1985 opcode was needed to produce line numbers in tracebacks and support trace
1986 functions (for, e.g., :mod:`pdb`). Since Python 1.5, the line numbers in
1987 tracebacks have been computed using a different mechanism that works with
1988 "python -O". For Python 2.3 Michael Hudson implemented a similar scheme to
1989 determine when to call the trace function, removing the need for ``SET_LINENO``
1990 entirely.
1991
1992 It would be difficult to detect any resulting difference from Python code, apart
1993 from a slight speed up when Python is run without :option:`-O`.
1994
1995 C extensions that access the :attr:`f_lineno` field of frame objects should
1996 instead call ``PyCode_Addr2Line(f->f_code, f->f_lasti)``. This will have the
1997 added effect of making the code work as desired under "python -O" in earlier
1998 versions of Python.
1999
2000 A nifty new feature is that trace functions can now assign to the
2001 :attr:`f_lineno` attribute of frame objects, changing the line that will be
2002 executed next. A ``jump`` command has been added to the :mod:`pdb` debugger
2003 taking advantage of this new feature. (Implemented by Richie Hindle.)
2004
Christian Heimes5b5e81c2007-12-31 16:14:33 +00002005.. ======================================================================
Georg Brandl116aa622007-08-15 14:28:22 +00002006
2007
2008Porting to Python 2.3
2009=====================
2010
2011This section lists previously described changes that may require changes to your
2012code:
2013
2014* :keyword:`yield` is now always a keyword; if it's used as a variable name in
2015 your code, a different name must be chosen.
2016
2017* For strings *X* and *Y*, ``X in Y`` now works if *X* is more than one
2018 character long.
2019
2020* The :func:`int` type constructor will now return a long integer instead of
2021 raising an :exc:`OverflowError` when a string or floating-point number is too
2022 large to fit into an integer.
2023
2024* If you have Unicode strings that contain 8-bit characters, you must declare
2025 the file's encoding (UTF-8, Latin-1, or whatever) by adding a comment to the top
2026 of the file. See section :ref:`section-encodings` for more information.
2027
2028* Calling Tcl methods through :mod:`_tkinter` no longer returns only strings.
2029 Instead, if Tcl returns other objects those objects are converted to their
2030 Python equivalent, if one exists, or wrapped with a :class:`_tkinter.Tcl_Obj`
2031 object if no Python equivalent exists.
2032
2033* Large octal and hex literals such as ``0xffffffff`` now trigger a
2034 :exc:`FutureWarning`. Currently they're stored as 32-bit numbers and result in a
2035 negative value, but in Python 2.4 they'll become positive long integers.
2036
2037 There are a few ways to fix this warning. If you really need a positive number,
2038 just add an ``L`` to the end of the literal. If you're trying to get a 32-bit
2039 integer with low bits set and have previously used an expression such as ``~(1
2040 << 31)``, it's probably clearest to start with all bits set and clear the
2041 desired upper bits. For example, to clear just the top bit (bit 31), you could
2042 write ``0xffffffffL &~(1L<<31)``.
2043
Georg Brandl116aa622007-08-15 14:28:22 +00002044* You can no longer disable assertions by assigning to ``__debug__``.
2045
2046* The Distutils :func:`setup` function has gained various new keyword arguments
2047 such as *depends*. Old versions of the Distutils will abort if passed unknown
2048 keywords. A solution is to check for the presence of the new
2049 :func:`get_distutil_options` function in your :file:`setup.py` and only uses the
2050 new keywords with a version of the Distutils that supports them::
2051
2052 from distutils import core
2053
2054 kw = {'sources': 'foo.c', ...}
2055 if hasattr(core, 'get_distutil_options'):
2056 kw['depends'] = ['foo.h']
2057 ext = Extension(**kw)
2058
2059* Using ``None`` as a variable name will now result in a :exc:`SyntaxWarning`
2060 warning.
2061
2062* Names of extension types defined by the modules included with Python now
2063 contain the module and a ``'.'`` in front of the type name.
2064
Christian Heimes5b5e81c2007-12-31 16:14:33 +00002065.. ======================================================================
Georg Brandl116aa622007-08-15 14:28:22 +00002066
2067
Benjamin Petersonf10a79a2008-10-11 00:49:57 +00002068.. _23acks:
Georg Brandl116aa622007-08-15 14:28:22 +00002069
2070Acknowledgements
2071================
2072
2073The author would like to thank the following people for offering suggestions,
2074corrections and assistance with various drafts of this article: Jeff Bauer,
2075Simon Brunning, Brett Cannon, Michael Chermside, Andrew Dalke, Scott David
2076Daniels, Fred L. Drake, Jr., David Fraser, Kelly Gerber, Raymond Hettinger,
2077Michael Hudson, Chris Lambert, Detlef Lannert, Martin von Löwis, Andrew
2078MacIntyre, Lalo Martins, Chad Netzer, Gustavo Niemeyer, Neal Norwitz, Hans
2079Nowak, Chris Reedy, Francesco Ricciardi, Vinay Sajip, Neil Schemenauer, Roman
2080Suzi, Jason Tishler, Just van Rossum.
2081