Update docs w.r.t. PEP 3100 changes -- patch for GHOP by Dan Finnie.

commit: f69451833191454bfef75804c2654dc37e8f3e93 [log] [tgz]
author: Georg Brandl <georg@python.org> Fri Feb 01 11:56:49 2008 +0000
committer: Georg Brandl <georg@python.org> Fri Feb 01 11:56:49 2008 +0000
tree: 7e81560f5276c35f68b7b02e75feb9221a82ae5d
parent: f25ef50549d9f2bcb6294fe61a9902490728edcc [diff]
diff --git a/Doc/howto/functional.rst b/Doc/howto/functional.rst
index 1557f55..e62d224 100644
--- a/Doc/howto/functional.rst
+++ b/Doc/howto/functional.rst

@@ -314,7 +314,7 @@
 Sets can take their contents from an iterable and let you iterate over the set's
 elements::
 
-    S = set((2, 3, 5, 7, 11, 13))
+    S = {2, 3, 5, 7, 11, 13}
     for i in S:
         print(i)
 
@@ -616,29 +616,26 @@
 
 Let's look in more detail at built-in functions often used with iterators.
 
-Two of Python's built-in functions, :func:`map` and :func:`filter`, are somewhat
-obsolete; they duplicate the features of list comprehensions but return actual
-lists instead of iterators.
+Two of Python's built-in functions, :func:`map` and :func:`filter` duplicate the
+features of generator expressions:
 
-``map(f, iterA, iterB, ...)`` returns a list containing ``f(iterA[0], iterB[0]),
-f(iterA[1], iterB[1]), f(iterA[2], iterB[2]), ...``.
+``map(f, iterA, iterB, ...)`` returns an iterator over the sequence 
+ ``f(iterA[0], iterB[0]), f(iterA[1], iterB[1]), f(iterA[2], iterB[2]), ...``.
 
 ::
 
     def upper(s):
         return s.upper()
-    map(upper, ['sentence', 'fragment']) =>
+    list(map(upper, ['sentence', 'fragment'])) =>
       ['SENTENCE', 'FRAGMENT']
 
-    [upper(s) for s in ['sentence', 'fragment']] =>
+    list(upper(s) for s in ['sentence', 'fragment']) =>
       ['SENTENCE', 'FRAGMENT']
 
-As shown above, you can achieve the same effect with a list comprehension.  The
-:func:`itertools.imap` function does the same thing but can handle infinite
-iterators; it'll be discussed later, in the section on the :mod:`itertools` module.
+You can of course achieve the same effect with a list comprehension. 
 
-``filter(predicate, iter)`` returns a list that contains all the sequence
-elements that meet a certain condition, and is similarly duplicated by list
+``filter(predicate, iter)`` returns an iterator over all the sequence elements
+that meet a certain condition, and is similarly duplicated by list
 comprehensions.  A **predicate** is a function that returns the truth value of
 some condition; for use with :func:`filter`, the predicate must take a single
 value.
@@ -648,69 +645,61 @@
     def is_even(x):
         return (x % 2) == 0
 
-    filter(is_even, range(10)) =>
+    list(filter(is_even, range(10))) =>
       [0, 2, 4, 6, 8]
 
-This can also be written as a list comprehension::
+This can also be written as a generator expression::
 
-    >>> [x for x in range(10) if is_even(x)]
+    >>> list(x for x in range(10) if is_even(x))
     [0, 2, 4, 6, 8]
 
-:func:`filter` also has a counterpart in the :mod:`itertools` module,
-:func:`itertools.ifilter`, that returns an iterator and can therefore handle
-infinite sequences just as :func:`itertools.imap` can.
+``functools.reduce(func, iter, [initial_value])`` cumulatively performs an
+operation on all the iterable's elements and, therefore, can't be applied to
+infinite iterables.  ``func`` must be a function that takes two elements and
+returns a single value.  :func:`functools.reduce` takes the first two elements A
+and B returned by the iterator and calculates ``func(A, B)``.  It then requests
+the third element, C, calculates ``func(func(A, B), C)``, combines this result
+with the fourth element returned, and continues until the iterable is exhausted.
+If the iterable returns no values at all, a :exc:`TypeError` exception is
+raised.  If the initial value is supplied, it's used as a starting point and
+``func(initial_value, A)`` is the first calculation. ::
 
-``reduce(func, iter, [initial_value])`` doesn't have a counterpart in the
-:mod:`itertools` module because it cumulatively performs an operation on all the
-iterable's elements and therefore can't be applied to infinite iterables.
-``func`` must be a function that takes two elements and returns a single value.
-:func:`reduce` takes the first two elements A and B returned by the iterator and
-calculates ``func(A, B)``.  It then requests the third element, C, calculates
-``func(func(A, B), C)``, combines this result with the fourth element returned,
-and continues until the iterable is exhausted.  If the iterable returns no
-values at all, a :exc:`TypeError` exception is raised.  If the initial value is
-supplied, it's used as a starting point and ``func(initial_value, A)`` is the
-first calculation.
-
-::
-
-    import operator
-    reduce(operator.concat, ['A', 'BB', 'C']) =>
-      'ABBC'
-    reduce(operator.concat, []) =>
-      TypeError: reduce() of empty sequence with no initial value
-    reduce(operator.mul, [1,2,3], 1) =>
-      6
-    reduce(operator.mul, [], 1) =>
-      1
-
-If you use :func:`operator.add` with :func:`reduce`, you'll add up all the
-elements of the iterable.  This case is so common that there's a special
+   import operator
+   import functools
+   functools.reduce(operator.concat, ['A', 'BB', 'C']) =>
+     'ABBC'
+   functools.reduce(operator.concat, []) =>
+     TypeError: reduce() of empty sequence with no initial value
+   functools.reduce(operator.mul, [1,2,3], 1) =>
+     6
+   functools.reduce(operator.mul, [], 1) =>
+     1
+ 
+If you use :func:`operator.add` with :func:`functools.reduce`, you'll add up all
+the elements of the iterable.  This case is so common that there's a special
 built-in called :func:`sum` to compute it::
 
-    reduce(operator.add, [1,2,3,4], 0) =>
-      10
-    sum([1,2,3,4]) =>
-      10
-    sum([]) =>
-      0
+   functools.reduce(operator.add, [1,2,3,4], 0) =>
+     10
+   sum([1,2,3,4]) =>
+     10
+   sum([]) =>
+     0
 
 For many uses of :func:`reduce`, though, it can be clearer to just write the
 obvious :keyword:`for` loop::
 
-    # Instead of:
-    product = reduce(operator.mul, [1,2,3], 1)
+   # Instead of:
+   product = functools.reduce(operator.mul, [1,2,3], 1)
 
-    # You can write:
-    product = 1
-    for i in [1,2,3]:
-        product *= i
+   # You can write:
+   product = 1
+   for i in [1,2,3]:
+       product *= i
 
 
 ``enumerate(iter)`` counts off the elements in the iterable, returning 2-tuples
-containing the count and each element.
-
-::
+containing the count and each element. ::
 
     enumerate(['subject', 'verb', 'object']) =>
       (0, 'subject'), (1, 'verb'), (2, 'object')
@@ -723,12 +712,10 @@
         if line.strip() == '':
             print('Blank line at line #%i' % i)
 
-``sorted(iterable, [cmp=None], [key=None], [reverse=False)`` collects all the
-elements of the iterable into a list, sorts the list, and returns the sorted
-result.  The ``cmp``, ``key``, and ``reverse`` arguments are passed through to
-the constructed list's ``.sort()`` method.
-
-::
+``sorted(iterable, [key=None], [reverse=False)`` collects all the elements of
+the iterable into a list, sorts the list, and returns the sorted result.  The
+``key``, and ``reverse`` arguments are passed through to the constructed list's
+``sort()`` method. ::
 
     import random
     # Generate 8 random numbers between [0, 10000)
@@ -962,14 +949,7 @@
 Calling functions on elements
 -----------------------------
 
-Two functions are used for calling other functions on the contents of an
-iterable.
-
-``itertools.imap(f, iterA, iterB, ...)`` returns a stream containing
-``f(iterA[0], iterB[0]), f(iterA[1], iterB[1]), f(iterA[2], iterB[2]), ...``::
-
-    itertools.imap(operator.add, [5, 6, 5], [1, 2, 3]) =>
-      6, 8, 8
+``itertools.imap(func, iter)`` is the same as built-in :func:`map`.
 
 The ``operator`` module contains a set of functions corresponding to Python's
 operators.  Some examples are ``operator.add(a, b)`` (adds two values),
@@ -992,14 +972,7 @@
 Another group of functions chooses a subset of an iterator's elements based on a
 predicate.
 
-``itertools.ifilter(predicate, iter)`` returns all the elements for which the
-predicate returns true::
-
-    def is_even(x):
-        return (x % 2) == 0
-
-    itertools.ifilter(is_even, itertools.count()) =>
-      0, 2, 4, 6, 8, 10, 12, 14, ...
+``itertools.ifilter(predicate, iter)`` is the same as built-in :func:`filter`.
 
 ``itertools.ifilterfalse(predicate, iter)`` is the opposite, returning all
 elements for which the predicate returns false::
@@ -1117,8 +1090,7 @@
 
 Some of the functions in this module are:
 
-* Math operations: ``add()``, ``sub()``, ``mul()``, ``div()``, ``floordiv()``,
-  ``abs()``, ...
+* Math operations: ``add()``, ``sub()``, ``mul()``, ``floordiv()``, ``abs()``, ...
 * Logical operations: ``not_()``, ``truth()``.
 * Bitwise operations: ``and_()``, ``or_()``, ``invert()``.
 * Comparisons: ``eq()``, ``ne()``, ``lt()``, ``le()``, ``gt()``, and ``ge()``.
@@ -1190,7 +1162,7 @@
         f(*g(5, 6))
 
 Even though ``compose()`` only accepts two functions, it's trivial to build up a
-version that will compose any number of functions. We'll use ``reduce()``,
+version that will compose any number of functions. We'll use ``functools.reduce()``,
 ``compose()`` and ``partial()`` (the last of which is provided by both
 ``functional`` and ``functools``).
 
@@ -1198,7 +1170,7 @@
 
         from functional import compose, partial
         
-        multi_compose = partial(reduce, compose)
+        multi_compose = partial(functools.reduce, compose)
         
     
 We can also use ``map()``, ``compose()`` and ``partial()`` to craft a version of

diff --git a/Doc/howto/regex.rst b/Doc/howto/regex.rst
index d6c6b0a..794c945 100644
--- a/Doc/howto/regex.rst
+++ b/Doc/howto/regex.rst

@@ -497,7 +497,7 @@
 the same ones in several locations, then it might be worthwhile to collect all
 the definitions in one place, in a section of code that compiles all the REs
 ahead of time.  To take an example from the standard library, here's an extract
-from :file:`xmllib.py`::
+from the now deprecated :file:`xmllib.py`::
 
    ref = re.compile( ... )
    entityref = re.compile( ... )

diff --git a/Doc/howto/unicode.rst b/Doc/howto/unicode.rst
index 8b52039..40c77d6 100644
--- a/Doc/howto/unicode.rst
+++ b/Doc/howto/unicode.rst

@@ -237,129 +237,83 @@
 Now that you've learned the rudiments of Unicode, we can look at Python's
 Unicode features.
 
+The String Type
+---------------
 
-The Unicode Type
-----------------
+Since Python 3.0, the language features a ``str`` type that contain Unicode
+characters, meaning any string created using ``"unicode rocks!"``, ``'unicode
+rocks!``, or the triple-quoted string syntax is stored as Unicode.
 
-Unicode strings are expressed as instances of the :class:`unicode` type, one of
-Python's repertoire of built-in types.  It derives from an abstract type called
-:class:`basestring`, which is also an ancestor of the :class:`str` type; you can
-therefore check if a value is a string type with ``isinstance(value,
-basestring)``.  Under the hood, Python represents Unicode strings as either 16-
-or 32-bit integers, depending on how the Python interpreter was compiled.
+To insert a Unicode character that is not part ASCII, e.g., any letters with
+accents, one can use escape sequences in their string literals as such::
 
-The :func:`unicode` constructor has the signature ``unicode(string[, encoding,
-errors])``.  All of its arguments should be 8-bit strings.  The first argument
-is converted to Unicode using the specified encoding; if you leave off the
-``encoding`` argument, the ASCII encoding is used for the conversion, so
-characters greater than 127 will be treated as errors::
+   >>> "\N{GREEK CAPITAL LETTER DELTA}"  # Using the character name
+   '\u0394'
+   >>> "\u0394"                          # Using a 16-bit hex value
+   '\u0394'
+   >>> "\U00000394"                      # Using a 32-bit hex value
+   '\u0394'
 
-    >>> unicode('abcdef')
-    u'abcdef'
-    >>> s = unicode('abcdef')
-    >>> type(s)
-    <type 'unicode'>
-    >>> unicode('abcdef' + chr(255))
-    Traceback (most recent call last):
-      File "<stdin>", line 1, in ?
-    UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 6:
-                        ordinal not in range(128)
+In addition, one can create a string using the :func:`decode` method of
+:class:`bytes`.  This method takes an encoding, such as UTF-8, and, optionally,
+an *errors* argument.
 
-The ``errors`` argument specifies the response when the input string can't be
+The *errors* argument specifies the response when the input string can't be
 converted according to the encoding's rules.  Legal values for this argument are
-'strict' (raise a ``UnicodeDecodeError`` exception), 'replace' (add U+FFFD,
+'strict' (raise a :exc:`UnicodeDecodeError` exception), 'replace' (add U+FFFD,
 'REPLACEMENT CHARACTER'), or 'ignore' (just leave the character out of the
 Unicode result).  The following examples show the differences::
 
-    >>> unicode('\x80abc', errors='strict')
+    >>> b'\x80abc'.decode("utf-8", "strict")
     Traceback (most recent call last):
       File "<stdin>", line 1, in ?
     UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 0:
                         ordinal not in range(128)
-    >>> unicode('\x80abc', errors='replace')
-    u'\ufffdabc'
-    >>> unicode('\x80abc', errors='ignore')
-    u'abc'
+    >>> b'\x80abc'.decode("utf-8", "replace")
+    '\ufffdabc'
+    >>> b'\x80abc'.decode("utf-8", "ignore")
+    'abc'
 
-Encodings are specified as strings containing the encoding's name.  Python 2.4
+Encodings are specified as strings containing the encoding's name.  Python
 comes with roughly 100 different encodings; see the Python Library Reference at
 <http://docs.python.org/lib/standard-encodings.html> for a list.  Some encodings
 have multiple names; for example, 'latin-1', 'iso_8859_1' and '8859' are all
 synonyms for the same encoding.
 
-One-character Unicode strings can also be created with the :func:`unichr`
+One-character Unicode strings can also be created with the :func:`chr`
 built-in function, which takes integers and returns a Unicode string of length 1
 that contains the corresponding code point.  The reverse operation is the
 built-in :func:`ord` function that takes a one-character Unicode string and
 returns the code point value::
 
-    >>> unichr(40960)
-    u'\ua000'
-    >>> ord(u'\ua000')
+    >>> chr(40960)
+    '\ua000'
+    >>> ord('\ua000')
     40960
 
-Instances of the :class:`unicode` type have many of the same methods as the
-8-bit string type for operations such as searching and formatting::
+Converting to Bytes
+-------------------
 
-    >>> s = u'Was ever feather so lightly blown to and fro as this multitude?'
-    >>> s.count('e')
-    5
-    >>> s.find('feather')
-    9
-    >>> s.find('bird')
-    -1
-    >>> s.replace('feather', 'sand')
-    u'Was ever sand so lightly blown to and fro as this multitude?'
-    >>> s.upper()
-    u'WAS EVER FEATHER SO LIGHTLY BLOWN TO AND FRO AS THIS MULTITUDE?'
-
-Note that the arguments to these methods can be Unicode strings or 8-bit
-strings.  8-bit strings will be converted to Unicode before carrying out the
-operation; Python's default ASCII encoding will be used, so characters greater
-than 127 will cause an exception::
-
-    >>> s.find('Was\x9f')
-    Traceback (most recent call last):
-      File "<stdin>", line 1, in ?
-    UnicodeDecodeError: 'ascii' codec can't decode byte 0x9f in position 3: ordinal not in range(128)
-    >>> s.find(u'Was\x9f')
-    -1
-
-Much Python code that operates on strings will therefore work with Unicode
-strings without requiring any changes to the code.  (Input and output code needs
-more updating for Unicode; more on this later.)
-
-Another important method is ``.encode([encoding], [errors='strict'])``, which
-returns an 8-bit string version of the Unicode string, encoded in the requested
-encoding.  The ``errors`` parameter is the same as the parameter of the
-``unicode()`` constructor, with one additional possibility; as well as 'strict',
+Another important str method is ``.encode([encoding], [errors='strict'])``,
+which returns a ``bytes`` representation of the Unicode string, encoded in the
+requested encoding.  The ``errors`` parameter is the same as the parameter of
+the :meth:`decode` method, with one additional possibility; as well as 'strict',
 'ignore', and 'replace', you can also pass 'xmlcharrefreplace' which uses XML's
 character references.  The following example shows the different results::
 
-    >>> u = unichr(40960) + u'abcd' + unichr(1972)
+    >>> u = chr(40960) + 'abcd' + chr(1972)
     >>> u.encode('utf-8')
-    '\xea\x80\x80abcd\xde\xb4'
+    b'\xea\x80\x80abcd\xde\xb4'
     >>> u.encode('ascii')
     Traceback (most recent call last):
       File "<stdin>", line 1, in ?
     UnicodeEncodeError: 'ascii' codec can't encode character '\ua000' in position 0: ordinal not in range(128)
     >>> u.encode('ascii', 'ignore')
-    'abcd'
+    b'abcd'
     >>> u.encode('ascii', 'replace')
-    '?abcd?'
+    b'?abcd?'
     >>> u.encode('ascii', 'xmlcharrefreplace')
-    '&#40960;abcd&#1972;'
-
-Python's 8-bit strings have a ``.decode([encoding], [errors])`` method that
-interprets the string using the given encoding::
-
-    >>> u = unichr(40960) + u'abcd' + unichr(1972)   # Assemble a string
-    >>> utf8_version = u.encode('utf-8')             # Encode as UTF-8
-    >>> type(utf8_version), utf8_version
-    (<type 'str'>, '\xea\x80\x80abcd\xde\xb4')
-    >>> u2 = utf8_version.decode('utf-8')            # Decode using UTF-8
-    >>> u == u2                                      # The two strings match
-    True
+    b'&#40960;abcd&#1972;'
 
 The low-level routines for registering and accessing the available encodings are
 found in the :mod:`codecs` module.  However, the encoding and decoding functions
@@ -377,22 +331,14 @@
 Unicode Literals in Python Source Code
 --------------------------------------
 
-In Python source code, Unicode literals are written as strings prefixed with the
-'u' or 'U' character: ``u'abcdefghijk'``.  Specific code points can be written
-using the ``\u`` escape sequence, which is followed by four hex digits giving
-the code point.  The ``\U`` escape sequence is similar, but expects 8 hex
-digits, not 4.
+In Python source code, specific Unicode code points can be written using the
+``\u`` escape sequence, which is followed by four hex digits giving the code
+point.  The ``\U`` escape sequence is similar, but expects 8 hex digits, not 4::
 
-Unicode literals can also use the same escape sequences as 8-bit strings,
-including ``\x``, but ``\x`` only takes two hex digits so it can't express an
-arbitrary code point.  Octal escapes can go up to U+01ff, which is octal 777.
-
-::
-
-    >>> s = u"a\xac\u1234\u20ac\U00008000"
-               ^^^^ two-digit hex escape
-                   ^^^^^^ four-digit Unicode escape
-                               ^^^^^^^^^^ eight-digit Unicode escape
+    >>> s = "a\xac\u1234\u20ac\U00008000"
+              ^^^^ two-digit hex escape
+                   ^^^^^ four-digit Unicode escape
+                              ^^^^^^^^^^ eight-digit Unicode escape
     >>> for c in s:  print(ord(c), end=" ")
     ...
     97 172 4660 8364 32768
@@ -400,7 +346,7 @@
 Using escape sequences for code points greater than 127 is fine in small doses,
 but becomes an annoyance if you're using many accented characters, as you would
 in a program with messages in French or some other accent-using language.  You
-can also assemble strings using the :func:`unichr` built-in function, but this is
+can also assemble strings using the :func:`chr` built-in function, but this is
 even more tedious.
 
 Ideally, you'd want to be able to write literals in your language's natural
@@ -408,14 +354,15 @@
 which would display the accented characters naturally, and have the right
 characters used at runtime.
 
-Python supports writing Unicode literals in any encoding, but you have to
-declare the encoding being used.  This is done by including a special comment as
-either the first or second line of the source file::
+Python supports writing Unicode literals in UTF-8 by default, but you can use
+(almost) any encoding if you declare the encoding being used.  This is done by
+including a special comment as either the first or second line of the source
+file::
 
     #!/usr/bin/env python
     # -*- coding: latin-1 -*-
 
-    u = u'abcdé'
+    u = 'abcdé'
     print(ord(u[-1]))
 
 The syntax is inspired by Emacs's notation for specifying variables local to a
@@ -424,22 +371,8 @@
 them, you must supply the name ``coding`` and the name of your chosen encoding,
 separated by ``':'``.
 
-If you don't include such a comment, the default encoding used will be ASCII.
-Versions of Python before 2.4 were Euro-centric and assumed Latin-1 as a default
-encoding for string literals; in Python 2.4, characters greater than 127 still
-work but result in a warning.  For example, the following program has no
-encoding declaration::
-
-    #!/usr/bin/env python
-    u = u'abcdé'
-    print(ord(u[-1]))
-
-When you run it with Python 2.4, it will output the following warning::
-
-    amk:~$ python p263.py
-    sys:1: DeprecationWarning: Non-ASCII character '\xe9'
-         in file p263.py on line 2, but no encoding declared;
-         see http://www.python.org/peps/pep-0263.html for details
+If you don't include such a comment, the default encoding used will be UTF-8 as
+already mentioned.
 
 
 Unicode Properties
@@ -457,7 +390,7 @@
 
     import unicodedata
 
-    u = unichr(233) + unichr(0x0bf2) + unichr(3972) + unichr(6000) + unichr(13231)
+    u = chr(233) + chr(0x0bf2) + chr(3972) + chr(6000) + chr(13231)
 
     for i, c in enumerate(u):
         print(i, '%04x' % ord(c), unicodedata.category(c), end=" ")
@@ -487,8 +420,8 @@
 References
 ----------
 
-The Unicode and 8-bit string types are described in the Python library reference
-at :ref:`typesseq`.
+The ``str`` type is described in the Python library reference at
+:ref:`typesseq`.
 
 The documentation for the :mod:`unicodedata` module.
 
@@ -557,7 +490,7 @@
 writing::
 
     f = codecs.open('test', encoding='utf-8', mode='w+')
-    f.write(u'\u4500 blah blah blah\n')
+    f.write('\u4500 blah blah blah\n')
     f.seek(0)
     print(repr(f.readline()[:1]))
     f.close()
@@ -590,7 +523,7 @@
 usually just provide the Unicode string as the filename, and it will be
 automatically converted to the right encoding for you::
 
-    filename = u'filename\u4500abc'
+    filename = 'filename\u4500abc'
     f = open(filename, 'w')
     f.write('blah\n')
     f.close()
@@ -607,7 +540,7 @@
 path will return the 8-bit versions of the filenames.  For example, assuming the
 default filesystem encoding is UTF-8, running the following program::
 
-	fn = u'filename\u4500abc'
+	fn = 'filename\u4500abc'
 	f = open(fn, 'w')
 	f.close()
 
@@ -619,7 +552,7 @@
 
 	amk:~$ python t.py
 	['.svn', 'filename\xe4\x94\x80abc', ...]
-	[u'.svn', u'filename\u4500abc', ...]
+	['.svn', 'filename\u4500abc', ...]
 
 The first list contains UTF-8-encoded filenames, and the second list contains
 the Unicode versions.
commit	f69451833191454bfef75804c2654dc37e8f3e93	[log] [tgz]
author	Georg Brandl <georg@python.org>	Fri Feb 01 11:56:49 2008 +0000
committer	Georg Brandl <georg@python.org>	Fri Feb 01 11:56:49 2008 +0000
tree	7e81560f5276c35f68b7b02e75feb9221a82ae5d
parent	f25ef50549d9f2bcb6294fe61a9902490728edcc [diff]