Review the doc changes for the urllib package creation.
diff --git a/Doc/howto/urllib2.rst b/Doc/howto/urllib2.rst
index 6342b6e..5d32d4a 100644
--- a/Doc/howto/urllib2.rst
+++ b/Doc/howto/urllib2.rst
@@ -1,12 +1,12 @@
-*****************************************************
- HOWTO Fetch Internet Resources Using urllib package
-*****************************************************
+***********************************************************
+ HOWTO Fetch Internet Resources Using The urllib Package
+***********************************************************
:Author: `Michael Foord <http://www.voidspace.org.uk/python/index.shtml>`_
.. note::
- There is an French translation of an earlier revision of this
+ There is a French translation of an earlier revision of this
HOWTO, available at `urllib2 - Le Manuel manquant
<http://www.voidspace.org.uk/python/articles/urllib2_francais.shtml>`_.
@@ -18,7 +18,7 @@
.. sidebar:: Related Articles
You may also find useful the following article on fetching web resources
- with Python :
+ with Python:
* `Basic Authentication <http://www.voidspace.org.uk/python/articles/authentication.shtml>`_
@@ -94,8 +94,8 @@
all POSTs have to come from forms: you can use a POST to transmit arbitrary data
to your own application. In the common case of HTML forms, the data needs to be
encoded in a standard way, and then passed to the Request object as the ``data``
-argument. The encoding is done using a function from the ``urllib.parse`` library
-*not* from ``urllib.request``. ::
+argument. The encoding is done using a function from the :mod:`urllib.parse`
+library. ::
import urllib.parse
import urllib.request
@@ -115,7 +115,7 @@
<http://www.w3.org/TR/REC-html40/interact/forms.html#h-17.13>`_ for more
details).
-If you do not pass the ``data`` argument, urllib.request uses a **GET** request. One
+If you do not pass the ``data`` argument, urllib uses a **GET** request. One
way in which GET and POST requests differ is that POST requests often have
"side-effects": they change the state of the system in some way (for example by
placing an order with the website for a hundredweight of tinned spam to be
@@ -182,13 +182,15 @@
Handling Exceptions
===================
-*urllib.error* raises ``URLError`` when it cannot handle a response (though as usual
+*urlopen* raises ``URLError`` when it cannot handle a response (though as usual
with Python APIs, builtin exceptions such as ValueError, TypeError etc. may also
be raised).
``HTTPError`` is the subclass of ``URLError`` raised in the specific case of
HTTP URLs.
+The exception classes are exported from the :mod:`urllib.error` module.
+
URLError
--------
@@ -214,7 +216,7 @@
the status code indicates that the server is unable to fulfil the request. The
default handlers will handle some of these responses for you (for example, if
the response is a "redirection" that requests the client fetch the document from
-a different URL, urllib.request will handle that for you). For those it can't handle,
+a different URL, urllib will handle that for you). For those it can't handle,
urlopen will raise an ``HTTPError``. Typical errors include '404' (page not
found), '403' (request forbidden), and '401' (authentication required).
@@ -380,7 +382,7 @@
The response returned by urlopen (or the ``HTTPError`` instance) has two useful
methods ``info`` and ``geturl`` and is defined in the module
-``urllib.response``.
+:mod:`urllib.response`.
**geturl** - this returns the real URL of the page fetched. This is useful
because ``urlopen`` (or the opener object used) may have followed a
@@ -388,7 +390,7 @@
**info** - this returns a dictionary-like object that describes the page
fetched, particularly the headers sent by the server. It is currently an
-``http.client.HTTPMessage`` instance.
+:class:`http.client.HTTPMessage` instance.
Typical headers include 'Content-length', 'Content-type', and so on. See the
`Quick Reference to HTTP Headers <http://www.cs.tut.fi/~jkorpela/http.html>`_
@@ -508,7 +510,7 @@
Proxies
=======
-**urllib.request** will auto-detect your proxy settings and use those. This is through
+**urllib** will auto-detect your proxy settings and use those. This is through
the ``ProxyHandler`` which is part of the normal handler chain. Normally that's
a good thing, but there are occasions when it may not be helpful [#]_. One way
to do this is to setup our own ``ProxyHandler``, with no proxies defined. This
@@ -528,8 +530,8 @@
Sockets and Layers
==================
-The Python support for fetching resources from the web is layered.
-urllib.request uses the http.client library, which in turn uses the socket library.
+The Python support for fetching resources from the web is layered. urllib uses
+the :mod:`http.client` library, which in turn uses the socket library.
As of Python 2.3 you can specify how long a socket should wait for a response
before timing out. This can be useful in applications which have to fetch web
@@ -573,9 +575,9 @@
`Quick Reference to HTTP Headers`_.
.. [#] In my case I have to use a proxy to access the internet at work. If you
attempt to fetch *localhost* URLs through this proxy it blocks them. IE
- is set to use the proxy, which urllib2 picks up on. In order to test
- scripts with a localhost server, I have to prevent urllib2 from using
+ is set to use the proxy, which urllib picks up on. In order to test
+ scripts with a localhost server, I have to prevent urllib from using
the proxy.
-.. [#] urllib2 opener for SSL proxy (CONNECT method): `ASPN Cookbook Recipe
+.. [#] urllib opener for SSL proxy (CONNECT method): `ASPN Cookbook Recipe
<http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/456195>`_.
diff --git a/Doc/library/contextlib.rst b/Doc/library/contextlib.rst
index 2cd97c2..74a68cf 100644
--- a/Doc/library/contextlib.rst
+++ b/Doc/library/contextlib.rst
@@ -98,9 +98,9 @@
And lets you write code like this::
from contextlib import closing
- import urllib.request
+ from urllib.request import urlopen
- with closing(urllib.request.urlopen('http://www.python.org')) as page:
+ with closing(urlopen('http://www.python.org')) as page:
for line in page:
print(line)
diff --git a/Doc/library/http.client.rst b/Doc/library/http.client.rst
index 1ea3576..bcda4c9 100644
--- a/Doc/library/http.client.rst
+++ b/Doc/library/http.client.rst
@@ -13,8 +13,7 @@
This module defines classes which implement the client side of the HTTP and
HTTPS protocols. It is normally not used directly --- the module
-:mod:`urllib.request`
-uses it to handle URLs that use HTTP and HTTPS.
+:mod:`urllib.request` uses it to handle URLs that use HTTP and HTTPS.
.. note::
diff --git a/Doc/library/robotparser.rst b/Doc/library/robotparser.rst
deleted file mode 100644
index cce7966..0000000
--- a/Doc/library/robotparser.rst
+++ /dev/null
@@ -1,73 +0,0 @@
-
-:mod:`robotparser` --- Parser for robots.txt
-=============================================
-
-.. module:: robotparser
- :synopsis: Loads a robots.txt file and answers questions about
- fetchability of other URLs.
-.. sectionauthor:: Skip Montanaro <skip@pobox.com>
-
-
-.. index::
- single: WWW
- single: World Wide Web
- single: URL
- single: robots.txt
-
-This module provides a single class, :class:`RobotFileParser`, which answers
-questions about whether or not a particular user agent can fetch a URL on the
-Web site that published the :file:`robots.txt` file. For more details on the
-structure of :file:`robots.txt` files, see http://www.robotstxt.org/orig.html.
-
-
-.. class:: RobotFileParser()
-
- This class provides a set of methods to read, parse and answer questions
- about a single :file:`robots.txt` file.
-
-
- .. method:: set_url(url)
-
- Sets the URL referring to a :file:`robots.txt` file.
-
-
- .. method:: read()
-
- Reads the :file:`robots.txt` URL and feeds it to the parser.
-
-
- .. method:: parse(lines)
-
- Parses the lines argument.
-
-
- .. method:: can_fetch(useragent, url)
-
- Returns ``True`` if the *useragent* is allowed to fetch the *url*
- according to the rules contained in the parsed :file:`robots.txt`
- file.
-
-
- .. method:: mtime()
-
- Returns the time the ``robots.txt`` file was last fetched. This is
- useful for long-running web spiders that need to check for new
- ``robots.txt`` files periodically.
-
-
- .. method:: modified()
-
- Sets the time the ``robots.txt`` file was last fetched to the current
- time.
-
-The following example demonstrates basic use of the RobotFileParser class. ::
-
- >>> import robotparser
- >>> rp = robotparser.RobotFileParser()
- >>> rp.set_url("http://www.musi-cal.com/robots.txt")
- >>> rp.read()
- >>> rp.can_fetch("*", "http://www.musi-cal.com/cgi-bin/search?city=San+Francisco")
- False
- >>> rp.can_fetch("*", "http://www.musi-cal.com/")
- True
-
diff --git a/Doc/library/urllib.error.rst b/Doc/library/urllib.error.rst
index 1cbfe7d..bd76860 100644
--- a/Doc/library/urllib.error.rst
+++ b/Doc/library/urllib.error.rst
@@ -2,47 +2,47 @@
==================================================================
.. module:: urllib.error
- :synopsis: Next generation URL opening library.
+ :synopsis: Exception classes raised by urllib.request.
.. moduleauthor:: Jeremy Hylton <jhylton@users.sourceforge.net>
.. sectionauthor:: Senthil Kumaran <orsenthil@gmail.com>
-The :mod:`urllib.error` module defines exception classes raise by
-urllib.request. The base exception class is URLError, which inherits from
-IOError.
+The :mod:`urllib.error` module defines the exception classes for exceptions
+raised by :mod:`urllib.request`. The base exception class is :exc:`URLError`,
+which inherits from :exc:`IOError`.
The following exceptions are raised by :mod:`urllib.error` as appropriate:
-
.. exception:: URLError
- The handlers raise this exception (or derived exceptions) when they run into a
- problem. It is a subclass of :exc:`IOError`.
+ The handlers raise this exception (or derived exceptions) when they run into
+ a problem. It is a subclass of :exc:`IOError`.
.. attribute:: reason
- The reason for this error. It can be a message string or another exception
- instance (:exc:`socket.error` for remote URLs, :exc:`OSError` for local
- URLs).
+ The reason for this error. It can be a message string or another
+ exception instance (:exc:`socket.error` for remote URLs, :exc:`OSError`
+ for local URLs).
.. exception:: HTTPError
- Though being an exception (a subclass of :exc:`URLError`), an :exc:`HTTPError`
- can also function as a non-exceptional file-like return value (the same thing
- that :func:`urlopen` returns). This is useful when handling exotic HTTP
- errors, such as requests for authentication.
+ Though being an exception (a subclass of :exc:`URLError`), an
+ :exc:`HTTPError` can also function as a non-exceptional file-like return
+ value (the same thing that :func:`urlopen` returns). This is useful when
+ handling exotic HTTP errors, such as requests for authentication.
.. attribute:: code
- An HTTP status code as defined in `RFC 2616 <http://www.faqs.org/rfcs/rfc2616.html>`_.
- This numeric value corresponds to a value found in the dictionary of
- codes as found in :attr:`http.server.BaseHTTPRequestHandler.responses`.
+ An HTTP status code as defined in `RFC 2616
+ <http://www.faqs.org/rfcs/rfc2616.html>`_. This numeric value corresponds
+ to a value found in the dictionary of codes as found in
+ :attr:`http.server.BaseHTTPRequestHandler.responses`.
.. exception:: ContentTooShortError(msg[, content])
- This exception is raised when the :func:`urlretrieve` function detects that the
- amount of the downloaded data is less than the expected amount (given by the
- *Content-Length* header). The :attr:`content` attribute stores the downloaded
- (and supposedly truncated) data.
+ This exception is raised when the :func:`urlretrieve` function detects that
+ the amount of the downloaded data is less than the expected amount (given by
+ the *Content-Length* header). The :attr:`content` attribute stores the
+ downloaded (and supposedly truncated) data.
diff --git a/Doc/library/urllib.parse.rst b/Doc/library/urllib.parse.rst
index affa406..a5463e6 100644
--- a/Doc/library/urllib.parse.rst
+++ b/Doc/library/urllib.parse.rst
@@ -20,13 +20,12 @@
The module has been designed to match the Internet RFC on Relative Uniform
Resource Locators (and discovered a bug in an earlier draft!). It supports the
following URL schemes: ``file``, ``ftp``, ``gopher``, ``hdl``, ``http``,
-``https``, ``imap``, ``mailto``, ``mms``, ``news``, ``nntp``, ``prospero``,
-``rsync``, ``rtsp``, ``rtspu``, ``sftp``, ``shttp``, ``sip``, ``sips``,
-``snews``, ``svn``, ``svn+ssh``, ``telnet``, ``wais``.
+``https``, ``imap``, ``mailto``, ``mms``, ``news``, ``nntp``, ``prospero``,
+``rsync``, ``rtsp``, ``rtspu``, ``sftp``, ``shttp``, ``sip``, ``sips``,
+``snews``, ``svn``, ``svn+ssh``, ``telnet``, ``wais``.
The :mod:`urllib.parse` module defines the following functions:
-
.. function:: urlparse(urlstring[, default_scheme[, allow_fragments]])
Parse a URL into six components, returning a 6-tuple. This corresponds to the
@@ -92,11 +91,11 @@
.. function:: urlunparse(parts)
- Construct a URL from a tuple as returned by ``urlparse()``. The *parts* argument
- can be any six-item iterable. This may result in a slightly different, but
- equivalent URL, if the URL that was parsed originally had unnecessary delimiters
- (for example, a ? with an empty query; the RFC states that these are
- equivalent).
+ Construct a URL from a tuple as returned by ``urlparse()``. The *parts*
+ argument can be any six-item iterable. This may result in a slightly
+ different, but equivalent URL, if the URL that was parsed originally had
+ unnecessary delimiters (for example, a ``?`` with an empty query; the RFC
+ states that these are equivalent).
.. function:: urlsplit(urlstring[, default_scheme[, allow_fragments]])
@@ -140,19 +139,19 @@
.. function:: urlunsplit(parts)
- Combine the elements of a tuple as returned by :func:`urlsplit` into a complete
- URL as a string. The *parts* argument can be any five-item iterable. This may
- result in a slightly different, but equivalent URL, if the URL that was parsed
- originally had unnecessary delimiters (for example, a ? with an empty query; the
- RFC states that these are equivalent).
+ Combine the elements of a tuple as returned by :func:`urlsplit` into a
+ complete URL as a string. The *parts* argument can be any five-item
+ iterable. This may result in a slightly different, but equivalent URL, if the
+ URL that was parsed originally had unnecessary delimiters (for example, a ?
+ with an empty query; the RFC states that these are equivalent).
.. function:: urljoin(base, url[, allow_fragments])
Construct a full ("absolute") URL by combining a "base URL" (*base*) with
another URL (*url*). Informally, this uses components of the base URL, in
- particular the addressing scheme, the network location and (part of) the path,
- to provide missing components in the relative URL. For example:
+ particular the addressing scheme, the network location and (part of) the
+ path, to provide missing components in the relative URL. For example:
>>> from urllib.parse import urljoin
>>> urljoin('http://www.cwi.nl/%7Eguido/Python.html', 'FAQ.html')
@@ -178,10 +177,10 @@
.. function:: urldefrag(url)
- If *url* contains a fragment identifier, returns a modified version of *url*
- with no fragment identifier, and the fragment identifier as a separate string.
- If there is no fragment identifier in *url*, returns *url* unmodified and an
- empty string.
+ If *url* contains a fragment identifier, return a modified version of *url*
+ with no fragment identifier, and the fragment identifier as a separate
+ string. If there is no fragment identifier in *url*, return *url* unmodified
+ and an empty string.
.. function:: quote(string[, safe])
@@ -195,9 +194,10 @@
.. function:: quote_plus(string[, safe])
- Like :func:`quote`, but also replaces spaces by plus signs, as required for
- quoting HTML form values. Plus signs in the original string are escaped unless
- they are included in *safe*. It also does not have *safe* default to ``'/'``.
+ Like :func:`quote`, but also replace spaces by plus signs, as required for
+ quoting HTML form values. Plus signs in the original string are escaped
+ unless they are included in *safe*. It also does not have *safe* default to
+ ``'/'``.
.. function:: unquote(string)
@@ -209,7 +209,7 @@
.. function:: unquote_plus(string)
- Like :func:`unquote`, but also replaces plus signs by spaces, as required for
+ Like :func:`unquote`, but also replace plus signs by spaces, as required for
unquoting HTML form values.
@@ -254,7 +254,6 @@
subclasses of the :class:`tuple` type. These subclasses add the attributes
described in those functions, as well as provide an additional method:
-
.. method:: ParseResult.geturl()
Return the re-combined version of the original URL as a string. This may differ
@@ -279,13 +278,12 @@
The following classes provide the implementations of the parse results::
-
.. class:: BaseResult
- Base class for the concrete result classes. This provides most of the attribute
- definitions. It does not provide a :meth:`geturl` method. It is derived from
- :class:`tuple`, but does not override the :meth:`__init__` or :meth:`__new__`
- methods.
+ Base class for the concrete result classes. This provides most of the
+ attribute definitions. It does not provide a :meth:`geturl` method. It is
+ derived from :class:`tuple`, but does not override the :meth:`__init__` or
+ :meth:`__new__` methods.
.. class:: ParseResult(scheme, netloc, path, params, query, fragment)
diff --git a/Doc/library/urllib.request.rst b/Doc/library/urllib.request.rst
index 4262836..d124d9a 100644
--- a/Doc/library/urllib.request.rst
+++ b/Doc/library/urllib.request.rst
@@ -7,9 +7,9 @@
.. sectionauthor:: Moshe Zadka <moshez@users.sourceforge.net>
-The :mod:`urllib.request` module defines functions and classes which help in opening
-URLs (mostly HTTP) in a complex world --- basic and digest authentication,
-redirections, cookies and more.
+The :mod:`urllib.request` module defines functions and classes which help in
+opening URLs (mostly HTTP) in a complex world --- basic and digest
+authentication, redirections, cookies and more.
The :mod:`urllib.request` module defines the following functions:
@@ -180,7 +180,7 @@
the ``User-Agent`` header, which is used by a browser to identify itself --
some HTTP servers only allow requests coming from common browsers as opposed
to scripts. For example, Mozilla Firefox may identify itself as ``"Mozilla/5.0
- (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11"``, while :mod:`urllib2`'s
+ (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11"``, while :mod:`urllib`'s
default user agent string is ``"Python-urllib/2.6"`` (on Python 2.6).
The final two arguments are only of interest for correct handling of third-party
@@ -1005,10 +1005,11 @@
For non-200 error codes, this simply passes the job on to the
:meth:`protocol_error_code` handler methods, via :meth:`OpenerDirector.error`.
- Eventually, :class:`urllib2.HTTPDefaultErrorHandler` will raise an
+ Eventually, :class:`HTTPDefaultErrorHandler` will raise an
:exc:`HTTPError` if no other handler handles the error.
-.. _urllib2-examples:
+
+.. _urllib-request-examples:
Examples
--------
@@ -1180,15 +1181,18 @@
using the :mod:`ftplib` module, subclassing :class:`FancyURLOpener`, or changing
*_urlopener* to meet your needs.
+
+
:mod:`urllib.response` --- Response classes used by urllib.
===========================================================
+
.. module:: urllib.response
:synopsis: Response classes used by urllib.
The :mod:`urllib.response` module defines functions and classes which define a
-minimal file like interface, including read() and readline(). The typical
-response object is an addinfourl instance, which defines and info() method and
-that returns headers and a geturl() method that returns the url.
+minimal file like interface, including ``read()`` and ``readline()``. The
+typical response object is an addinfourl instance, which defines and ``info()``
+method and that returns headers and a ``geturl()`` method that returns the url.
Functions defined by this module are used internally by the
:mod:`urllib.request` module.
diff --git a/Doc/library/urllib.robotparser.rst b/Doc/library/urllib.robotparser.rst
index e351c56..0cac2ad 100644
--- a/Doc/library/urllib.robotparser.rst
+++ b/Doc/library/urllib.robotparser.rst
@@ -1,9 +1,8 @@
-
:mod:`urllib.robotparser` --- Parser for robots.txt
====================================================
.. module:: urllib.robotparser
- :synopsis: Loads a robots.txt file and answers questions about
+ :synopsis: Load a robots.txt file and answer questions about
fetchability of other URLs.
.. sectionauthor:: Skip Montanaro <skip@pobox.com>
@@ -25,42 +24,37 @@
This class provides a set of methods to read, parse and answer questions
about a single :file:`robots.txt` file.
-
.. method:: set_url(url)
Sets the URL referring to a :file:`robots.txt` file.
-
.. method:: read()
Reads the :file:`robots.txt` URL and feeds it to the parser.
-
.. method:: parse(lines)
Parses the lines argument.
-
.. method:: can_fetch(useragent, url)
Returns ``True`` if the *useragent* is allowed to fetch the *url*
according to the rules contained in the parsed :file:`robots.txt`
file.
-
.. method:: mtime()
Returns the time the ``robots.txt`` file was last fetched. This is
useful for long-running web spiders that need to check for new
``robots.txt`` files periodically.
-
.. method:: modified()
Sets the time the ``robots.txt`` file was last fetched to the current
time.
-The following example demonstrates basic use of the RobotFileParser class. ::
+
+The following example demonstrates basic use of the RobotFileParser class.
>>> import urllib.robotparser
>>> rp = urllib.robotparser.RobotFileParser()
diff --git a/Doc/tutorial/stdlib.rst b/Doc/tutorial/stdlib.rst
index b0c6e8e..9bc0890 100644
--- a/Doc/tutorial/stdlib.rst
+++ b/Doc/tutorial/stdlib.rst
@@ -150,8 +150,8 @@
protocols. Two of the simplest are :mod:`urllib.request` for retrieving data
from urls and :mod:`smtplib` for sending mail::
- >>> import urllib.request
- >>> for line in urllib.request.urlopen('http://tycho.usno.navy.mil/cgi-bin/timer.pl'):
+ >>> from urllib.request import urlopen
+ >>> for line in urlopen('http://tycho.usno.navy.mil/cgi-bin/timer.pl'):
... if 'EST' in line or 'EDT' in line: # look for Eastern Time
... print(line)