Review the doc changes for the urllib package creation.

commit: 0f7ede45693be57ba51c7aa23a0d841f160de874 [log] [tgz]
author: Georg Brandl <georg@python.org> Mon Jun 23 11:23:31 2008 +0000
committer: Georg Brandl <georg@python.org> Mon Jun 23 11:23:31 2008 +0000
tree: 42f8f578bdf60432c9056b2e300529efb1d9c6b4
parent: aca8fd7a9dc96143e592076fab4d89cc1691d03f [diff]
diff --git a/Doc/howto/urllib2.rst b/Doc/howto/urllib2.rst
index 6342b6e..5d32d4a 100644
--- a/Doc/howto/urllib2.rst
+++ b/Doc/howto/urllib2.rst

@@ -1,12 +1,12 @@
-*****************************************************
-  HOWTO Fetch Internet Resources Using urllib package
-*****************************************************
+***********************************************************
+  HOWTO Fetch Internet Resources Using The urllib Package
+***********************************************************
 
 :Author: `Michael Foord <http://www.voidspace.org.uk/python/index.shtml>`_
 
 .. note::
 
-    There is an French translation of an earlier revision of this
+    There is a French translation of an earlier revision of this
     HOWTO, available at `urllib2 - Le Manuel manquant
     <http://www.voidspace.org.uk/python/articles/urllib2_francais.shtml>`_.
 
@@ -18,7 +18,7 @@
 .. sidebar:: Related Articles
 
     You may also find useful the following article on fetching web resources
-    with Python :
+    with Python:
     
     * `Basic Authentication <http://www.voidspace.org.uk/python/articles/authentication.shtml>`_
     
@@ -94,8 +94,8 @@
 all POSTs have to come from forms: you can use a POST to transmit arbitrary data
 to your own application. In the common case of HTML forms, the data needs to be
 encoded in a standard way, and then passed to the Request object as the ``data``
-argument. The encoding is done using a function from the ``urllib.parse`` library
-*not* from ``urllib.request``. ::
+argument. The encoding is done using a function from the :mod:`urllib.parse`
+library. ::
 
     import urllib.parse
     import urllib.request 
@@ -115,7 +115,7 @@
 <http://www.w3.org/TR/REC-html40/interact/forms.html#h-17.13>`_ for more
 details).
 
-If you do not pass the ``data`` argument, urllib.request uses a **GET** request. One
+If you do not pass the ``data`` argument, urllib uses a **GET** request. One
 way in which GET and POST requests differ is that POST requests often have
 "side-effects": they change the state of the system in some way (for example by
 placing an order with the website for a hundredweight of tinned spam to be
@@ -182,13 +182,15 @@
 Handling Exceptions
 ===================
 
-*urllib.error* raises ``URLError`` when it cannot handle a response (though as usual
+*urlopen* raises ``URLError`` when it cannot handle a response (though as usual
 with Python APIs, builtin exceptions such as ValueError, TypeError etc. may also
 be raised).
 
 ``HTTPError`` is the subclass of ``URLError`` raised in the specific case of
 HTTP URLs.
 
+The exception classes are exported from the :mod:`urllib.error` module.
+
 URLError
 --------
 
@@ -214,7 +216,7 @@
 the status code indicates that the server is unable to fulfil the request. The
 default handlers will handle some of these responses for you (for example, if
 the response is a "redirection" that requests the client fetch the document from
-a different URL, urllib.request will handle that for you). For those it can't handle,
+a different URL, urllib will handle that for you). For those it can't handle,
 urlopen will raise an ``HTTPError``. Typical errors include '404' (page not
 found), '403' (request forbidden), and '401' (authentication required).
 
@@ -380,7 +382,7 @@
 
 The response returned by urlopen (or the ``HTTPError`` instance) has two useful
 methods ``info`` and ``geturl`` and is defined in the module
-``urllib.response``.
+:mod:`urllib.response`.
 
 **geturl** - this returns the real URL of the page fetched. This is useful
 because ``urlopen`` (or the opener object used) may have followed a
@@ -388,7 +390,7 @@
 
 **info** - this returns a dictionary-like object that describes the page
 fetched, particularly the headers sent by the server. It is currently an
-``http.client.HTTPMessage`` instance.
+:class:`http.client.HTTPMessage` instance.
 
 Typical headers include 'Content-length', 'Content-type', and so on. See the
 `Quick Reference to HTTP Headers <http://www.cs.tut.fi/~jkorpela/http.html>`_
@@ -508,7 +510,7 @@
 Proxies
 =======
 
-**urllib.request** will auto-detect your proxy settings and use those. This is through
+**urllib** will auto-detect your proxy settings and use those. This is through
 the ``ProxyHandler`` which is part of the normal handler chain. Normally that's
 a good thing, but there are occasions when it may not be helpful [#]_. One way
 to do this is to setup our own ``ProxyHandler``, with no proxies defined. This
@@ -528,8 +530,8 @@
 Sockets and Layers
 ==================
 
-The Python support for fetching resources from the web is layered.
-urllib.request uses the http.client library, which in turn uses the socket library.
+The Python support for fetching resources from the web is layered.  urllib uses
+the :mod:`http.client` library, which in turn uses the socket library.
 
 As of Python 2.3 you can specify how long a socket should wait for a response
 before timing out. This can be useful in applications which have to fetch web
@@ -573,9 +575,9 @@
        `Quick Reference to HTTP Headers`_.
 .. [#] In my case I have to use a proxy to access the internet at work. If you
        attempt to fetch *localhost* URLs through this proxy it blocks them. IE
-       is set to use the proxy, which urllib2 picks up on. In order to test
-       scripts with a localhost server, I have to prevent urllib2 from using
+       is set to use the proxy, which urllib picks up on. In order to test
+       scripts with a localhost server, I have to prevent urllib from using
        the proxy.
-.. [#] urllib2 opener for SSL proxy (CONNECT method): `ASPN Cookbook Recipe 
+.. [#] urllib opener for SSL proxy (CONNECT method): `ASPN Cookbook Recipe 
        <http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/456195>`_.
  

diff --git a/Doc/library/contextlib.rst b/Doc/library/contextlib.rst
index 2cd97c2..74a68cf 100644
--- a/Doc/library/contextlib.rst
+++ b/Doc/library/contextlib.rst

@@ -98,9 +98,9 @@
    And lets you write code like this::
 
       from contextlib import closing
-      import urllib.request
+      from urllib.request import urlopen
 
-      with closing(urllib.request.urlopen('http://www.python.org')) as page:
+      with closing(urlopen('http://www.python.org')) as page:
           for line in page:
               print(line)
 

diff --git a/Doc/library/http.client.rst b/Doc/library/http.client.rst
index 1ea3576..bcda4c9 100644
--- a/Doc/library/http.client.rst
+++ b/Doc/library/http.client.rst

@@ -13,8 +13,7 @@
 
 This module defines classes which implement the client side of the HTTP and
 HTTPS protocols.  It is normally not used directly --- the module
-:mod:`urllib.request`
-uses it to handle URLs that use HTTP and HTTPS.
+:mod:`urllib.request` uses it to handle URLs that use HTTP and HTTPS.
 
 .. note::
 

diff --git a/Doc/library/robotparser.rst b/Doc/library/robotparser.rst
deleted file mode 100644
index cce7966..0000000
--- a/Doc/library/robotparser.rst
+++ /dev/null

@@ -1,73 +0,0 @@
-
-:mod:`robotparser` ---  Parser for robots.txt
-=============================================
-
-.. module:: robotparser
-   :synopsis: Loads a robots.txt file and answers questions about
-              fetchability of other URLs.
-.. sectionauthor:: Skip Montanaro <skip@pobox.com>
-
-
-.. index::
-   single: WWW
-   single: World Wide Web
-   single: URL
-   single: robots.txt
-
-This module provides a single class, :class:`RobotFileParser`, which answers
-questions about whether or not a particular user agent can fetch a URL on the
-Web site that published the :file:`robots.txt` file.  For more details on the
-structure of :file:`robots.txt` files, see http://www.robotstxt.org/orig.html.
-
-
-.. class:: RobotFileParser()
-
-   This class provides a set of methods to read, parse and answer questions
-   about a single :file:`robots.txt` file.
-
-
-   .. method:: set_url(url)
-
-      Sets the URL referring to a :file:`robots.txt` file.
-
-
-   .. method:: read()
-
-      Reads the :file:`robots.txt` URL and feeds it to the parser.
-
-
-   .. method:: parse(lines)
-
-      Parses the lines argument.
-
-
-   .. method:: can_fetch(useragent, url)
-
-      Returns ``True`` if the *useragent* is allowed to fetch the *url*
-      according to the rules contained in the parsed :file:`robots.txt`
-      file.
-
-
-   .. method:: mtime()
-
-      Returns the time the ``robots.txt`` file was last fetched.  This is
-      useful for long-running web spiders that need to check for new
-      ``robots.txt`` files periodically.
-
-
-   .. method:: modified()
-
-      Sets the time the ``robots.txt`` file was last fetched to the current
-      time.
-
-The following example demonstrates basic use of the RobotFileParser class. ::
-
-   >>> import robotparser
-   >>> rp = robotparser.RobotFileParser()
-   >>> rp.set_url("http://www.musi-cal.com/robots.txt")
-   >>> rp.read()
-   >>> rp.can_fetch("*", "http://www.musi-cal.com/cgi-bin/search?city=San+Francisco")
-   False
-   >>> rp.can_fetch("*", "http://www.musi-cal.com/")
-   True
-

diff --git a/Doc/library/urllib.error.rst b/Doc/library/urllib.error.rst
index 1cbfe7d..bd76860 100644
--- a/Doc/library/urllib.error.rst
+++ b/Doc/library/urllib.error.rst

@@ -2,47 +2,47 @@
 ==================================================================
 
 .. module:: urllib.error
-   :synopsis: Next generation URL opening library.
+   :synopsis: Exception classes raised by urllib.request.
 .. moduleauthor:: Jeremy Hylton <jhylton@users.sourceforge.net>
 .. sectionauthor:: Senthil Kumaran <orsenthil@gmail.com>
 
 
-The :mod:`urllib.error` module defines exception classes raise by
-urllib.request. The base exception class is URLError, which inherits from
-IOError.
+The :mod:`urllib.error` module defines the exception classes for exceptions
+raised by :mod:`urllib.request`.  The base exception class is :exc:`URLError`,
+which inherits from :exc:`IOError`.
 
 The following exceptions are raised by :mod:`urllib.error` as appropriate:
 
-
 .. exception:: URLError
 
-   The handlers raise this exception (or derived exceptions) when they run into a
-   problem.  It is a subclass of :exc:`IOError`.
+   The handlers raise this exception (or derived exceptions) when they run into
+   a problem.  It is a subclass of :exc:`IOError`.
 
    .. attribute:: reason
 
-      The reason for this error.  It can be a message string or another exception
-      instance (:exc:`socket.error` for remote URLs, :exc:`OSError` for local
-      URLs).
+      The reason for this error.  It can be a message string or another
+      exception instance (:exc:`socket.error` for remote URLs, :exc:`OSError`
+      for local URLs).
 
 
 .. exception:: HTTPError
 
-   Though being an exception (a subclass of :exc:`URLError`), an :exc:`HTTPError`
-   can also function as a non-exceptional file-like return value (the same thing
-   that :func:`urlopen` returns).  This is useful when handling exotic HTTP
-   errors, such as requests for authentication.
+   Though being an exception (a subclass of :exc:`URLError`), an
+   :exc:`HTTPError` can also function as a non-exceptional file-like return
+   value (the same thing that :func:`urlopen` returns).  This is useful when
+   handling exotic HTTP errors, such as requests for authentication.
 
    .. attribute:: code
 
-      An HTTP status code as defined in `RFC 2616 <http://www.faqs.org/rfcs/rfc2616.html>`_. 
-      This numeric value corresponds to a value found in the dictionary of
-      codes as found in :attr:`http.server.BaseHTTPRequestHandler.responses`.
+      An HTTP status code as defined in `RFC 2616
+      <http://www.faqs.org/rfcs/rfc2616.html>`_.  This numeric value corresponds
+      to a value found in the dictionary of codes as found in
+      :attr:`http.server.BaseHTTPRequestHandler.responses`.
 
 .. exception:: ContentTooShortError(msg[, content])
 
-   This exception is raised when the :func:`urlretrieve` function detects that the
-   amount of the downloaded data is less than the  expected amount (given by the
-   *Content-Length* header). The :attr:`content` attribute stores the downloaded
-   (and supposedly truncated) data.
+   This exception is raised when the :func:`urlretrieve` function detects that
+   the amount of the downloaded data is less than the expected amount (given by
+   the *Content-Length* header).  The :attr:`content` attribute stores the
+   downloaded (and supposedly truncated) data.
 

diff --git a/Doc/library/urllib.parse.rst b/Doc/library/urllib.parse.rst
index affa406..a5463e6 100644
--- a/Doc/library/urllib.parse.rst
+++ b/Doc/library/urllib.parse.rst

@@ -20,13 +20,12 @@
 The module has been designed to match the Internet RFC on Relative Uniform
 Resource Locators (and discovered a bug in an earlier draft!). It supports the
 following URL schemes: ``file``, ``ftp``, ``gopher``, ``hdl``, ``http``,
-``https``, ``imap``, ``mailto``, ``mms``, ``news``,  ``nntp``, ``prospero``,
-``rsync``, ``rtsp``, ``rtspu``,  ``sftp``, ``shttp``, ``sip``, ``sips``,
-``snews``, ``svn``,  ``svn+ssh``, ``telnet``, ``wais``.
+``https``, ``imap``, ``mailto``, ``mms``, ``news``, ``nntp``, ``prospero``,
+``rsync``, ``rtsp``, ``rtspu``, ``sftp``, ``shttp``, ``sip``, ``sips``,
+``snews``, ``svn``, ``svn+ssh``, ``telnet``, ``wais``.
 
 The :mod:`urllib.parse` module defines the following functions:
 
-
 .. function:: urlparse(urlstring[, default_scheme[, allow_fragments]])
 
    Parse a URL into six components, returning a 6-tuple.  This corresponds to the
@@ -92,11 +91,11 @@
 
 .. function:: urlunparse(parts)
 
-   Construct a URL from a tuple as returned by ``urlparse()``. The *parts* argument
-   can be any six-item iterable. This may result in a slightly different, but
-   equivalent URL, if the URL that was parsed originally had unnecessary delimiters
-   (for example, a ? with an empty query; the RFC states that these are
-   equivalent).
+   Construct a URL from a tuple as returned by ``urlparse()``. The *parts*
+   argument can be any six-item iterable. This may result in a slightly
+   different, but equivalent URL, if the URL that was parsed originally had
+   unnecessary delimiters (for example, a ``?`` with an empty query; the RFC
+   states that these are equivalent).
 
 
 .. function:: urlsplit(urlstring[, default_scheme[, allow_fragments]])
@@ -140,19 +139,19 @@
 
 .. function:: urlunsplit(parts)
 
-   Combine the elements of a tuple as returned by :func:`urlsplit` into a complete
-   URL as a string. The *parts* argument can be any five-item iterable. This may
-   result in a slightly different, but equivalent URL, if the URL that was parsed
-   originally had unnecessary delimiters (for example, a ? with an empty query; the
-   RFC states that these are equivalent).
+   Combine the elements of a tuple as returned by :func:`urlsplit` into a
+   complete URL as a string. The *parts* argument can be any five-item
+   iterable. This may result in a slightly different, but equivalent URL, if the
+   URL that was parsed originally had unnecessary delimiters (for example, a ?
+   with an empty query; the RFC states that these are equivalent).
 
 
 .. function:: urljoin(base, url[, allow_fragments])
 
    Construct a full ("absolute") URL by combining a "base URL" (*base*) with
    another URL (*url*).  Informally, this uses components of the base URL, in
-   particular the addressing scheme, the network location and (part of) the path,
-   to provide missing components in the relative URL.  For example:
+   particular the addressing scheme, the network location and (part of) the
+   path, to provide missing components in the relative URL.  For example:
 
       >>> from urllib.parse import urljoin
       >>> urljoin('http://www.cwi.nl/%7Eguido/Python.html', 'FAQ.html')
@@ -178,10 +177,10 @@
 
 .. function:: urldefrag(url)
 
-   If *url* contains a fragment identifier, returns a modified version of *url*
-   with no fragment identifier, and the fragment identifier as a separate string.
-   If there is no fragment identifier in *url*, returns *url* unmodified and an
-   empty string.
+   If *url* contains a fragment identifier, return a modified version of *url*
+   with no fragment identifier, and the fragment identifier as a separate
+   string.  If there is no fragment identifier in *url*, return *url* unmodified
+   and an empty string.
 
 .. function:: quote(string[, safe])
 
@@ -195,9 +194,10 @@
 
 .. function:: quote_plus(string[, safe])
 
-   Like :func:`quote`, but also replaces spaces by plus signs, as required for
-   quoting HTML form values.  Plus signs in the original string are escaped unless
-   they are included in *safe*.  It also does not have *safe* default to ``'/'``.
+   Like :func:`quote`, but also replace spaces by plus signs, as required for
+   quoting HTML form values.  Plus signs in the original string are escaped
+   unless they are included in *safe*.  It also does not have *safe* default to
+   ``'/'``.
 
 
 .. function:: unquote(string)
@@ -209,7 +209,7 @@
 
 .. function:: unquote_plus(string)
 
-   Like :func:`unquote`, but also replaces plus signs by spaces, as required for
+   Like :func:`unquote`, but also replace plus signs by spaces, as required for
    unquoting HTML form values.
 
 
@@ -254,7 +254,6 @@
 subclasses of the :class:`tuple` type.  These subclasses add the attributes
 described in those functions, as well as provide an additional method:
 
-
 .. method:: ParseResult.geturl()
 
    Return the re-combined version of the original URL as a string. This may differ
@@ -279,13 +278,12 @@
 
 The following classes provide the implementations of the parse results::
 
-
 .. class:: BaseResult
 
-   Base class for the concrete result classes.  This provides most of the attribute
-   definitions.  It does not provide a :meth:`geturl` method.  It is derived from
-   :class:`tuple`, but does not override the :meth:`__init__` or :meth:`__new__`
-   methods.
+   Base class for the concrete result classes.  This provides most of the
+   attribute definitions.  It does not provide a :meth:`geturl` method.  It is
+   derived from :class:`tuple`, but does not override the :meth:`__init__` or
+   :meth:`__new__` methods.
 
 
 .. class:: ParseResult(scheme, netloc, path, params, query, fragment)

diff --git a/Doc/library/urllib.request.rst b/Doc/library/urllib.request.rst
index 4262836..d124d9a 100644
--- a/Doc/library/urllib.request.rst
+++ b/Doc/library/urllib.request.rst

@@ -7,9 +7,9 @@
 .. sectionauthor:: Moshe Zadka <moshez@users.sourceforge.net>
 
 
-The :mod:`urllib.request` module defines functions and classes which help in opening
-URLs (mostly HTTP) in a complex world --- basic and digest authentication,
-redirections, cookies and more.
+The :mod:`urllib.request` module defines functions and classes which help in
+opening URLs (mostly HTTP) in a complex world --- basic and digest
+authentication, redirections, cookies and more.
 
 The :mod:`urllib.request` module defines the following functions:
 
@@ -180,7 +180,7 @@
    the ``User-Agent`` header, which is used by a browser to identify itself --
    some HTTP servers only allow requests coming from common browsers as opposed
    to scripts.  For example, Mozilla Firefox may identify itself as ``"Mozilla/5.0
-   (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11"``, while :mod:`urllib2`'s
+   (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11"``, while :mod:`urllib`'s
    default user agent string is ``"Python-urllib/2.6"`` (on Python 2.6).
 
    The final two arguments are only of interest for correct handling of third-party
@@ -1005,10 +1005,11 @@
 
    For non-200 error codes, this simply passes the job on to the
    :meth:`protocol_error_code` handler methods, via :meth:`OpenerDirector.error`.
-   Eventually, :class:`urllib2.HTTPDefaultErrorHandler` will raise an
+   Eventually, :class:`HTTPDefaultErrorHandler` will raise an
    :exc:`HTTPError` if no other handler handles the error.
 
-.. _urllib2-examples:
+
+.. _urllib-request-examples:
 
 Examples
 --------
@@ -1180,15 +1181,18 @@
   using the :mod:`ftplib` module, subclassing :class:`FancyURLOpener`, or changing
   *_urlopener* to meet your needs.
 
+
+
 :mod:`urllib.response` --- Response classes used by urllib.
 ===========================================================
+
 .. module:: urllib.response
    :synopsis: Response classes used by urllib.
 
 The :mod:`urllib.response` module defines functions and classes which define a
-minimal file like interface, including read() and readline(). The typical
-response object is an addinfourl instance, which defines and info() method and
-that returns headers and a geturl() method that returns the url. 
+minimal file like interface, including ``read()`` and ``readline()``. The
+typical response object is an addinfourl instance, which defines and ``info()``
+method and that returns headers and a ``geturl()`` method that returns the url.
 Functions defined by this module are used internally by the
 :mod:`urllib.request` module.
 

diff --git a/Doc/library/urllib.robotparser.rst b/Doc/library/urllib.robotparser.rst
index e351c56..0cac2ad 100644
--- a/Doc/library/urllib.robotparser.rst
+++ b/Doc/library/urllib.robotparser.rst

@@ -1,9 +1,8 @@
-
 :mod:`urllib.robotparser` ---  Parser for robots.txt
 ====================================================
 
 .. module:: urllib.robotparser
-   :synopsis: Loads a robots.txt file and answers questions about
+   :synopsis: Load a robots.txt file and answer questions about
               fetchability of other URLs.
 .. sectionauthor:: Skip Montanaro <skip@pobox.com>
 
@@ -25,42 +24,37 @@
    This class provides a set of methods to read, parse and answer questions
    about a single :file:`robots.txt` file.
 
-
    .. method:: set_url(url)
 
       Sets the URL referring to a :file:`robots.txt` file.
 
-
    .. method:: read()
 
       Reads the :file:`robots.txt` URL and feeds it to the parser.
 
-
    .. method:: parse(lines)
 
       Parses the lines argument.
 
-
    .. method:: can_fetch(useragent, url)
 
       Returns ``True`` if the *useragent* is allowed to fetch the *url*
       according to the rules contained in the parsed :file:`robots.txt`
       file.
 
-
    .. method:: mtime()
 
       Returns the time the ``robots.txt`` file was last fetched.  This is
       useful for long-running web spiders that need to check for new
       ``robots.txt`` files periodically.
 
-
    .. method:: modified()
 
       Sets the time the ``robots.txt`` file was last fetched to the current
       time.
 
-The following example demonstrates basic use of the RobotFileParser class. ::
+
+The following example demonstrates basic use of the RobotFileParser class.
 
    >>> import urllib.robotparser
    >>> rp = urllib.robotparser.RobotFileParser()

diff --git a/Doc/tutorial/stdlib.rst b/Doc/tutorial/stdlib.rst
index b0c6e8e..9bc0890 100644
--- a/Doc/tutorial/stdlib.rst
+++ b/Doc/tutorial/stdlib.rst

@@ -150,8 +150,8 @@
 protocols. Two of the simplest are :mod:`urllib.request` for retrieving data
 from urls and :mod:`smtplib` for sending mail::
 
-   >>> import urllib.request
-   >>> for line in urllib.request.urlopen('http://tycho.usno.navy.mil/cgi-bin/timer.pl'):
+   >>> from urllib.request import urlopen
+   >>> for line in urlopen('http://tycho.usno.navy.mil/cgi-bin/timer.pl'):
    ...     if 'EST' in line or 'EDT' in line:  # look for Eastern Time
    ...         print(line)
commit	0f7ede45693be57ba51c7aa23a0d841f160de874	[log] [tgz]
author	Georg Brandl <georg@python.org>	Mon Jun 23 11:23:31 2008 +0000
committer	Georg Brandl <georg@python.org>	Mon Jun 23 11:23:31 2008 +0000
tree	42f8f578bdf60432c9056b2e300529efb1d9c6b4
parent	aca8fd7a9dc96143e592076fab4d89cc1691d03f [diff]