blob: 84cb77b55a91bf0d8d18b96adb2d70353d069a2e [file] [log] [blame]
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001:mod:`urllib.request` --- extensible library for opening URLs
2=============================================================
Georg Brandl116aa622007-08-15 14:28:22 +00003
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00004.. module:: urllib.request
Georg Brandl116aa622007-08-15 14:28:22 +00005 :synopsis: Next generation URL opening library.
Jeremy Hyltone2573162009-03-31 14:38:13 +00006.. moduleauthor:: Jeremy Hylton <jeremy@alum.mit.edu>
Georg Brandl116aa622007-08-15 14:28:22 +00007.. sectionauthor:: Moshe Zadka <moshez@users.sourceforge.net>
8
9
Georg Brandl0f7ede42008-06-23 11:23:31 +000010The :mod:`urllib.request` module defines functions and classes which help in
11opening URLs (mostly HTTP) in a complex world --- basic and digest
12authentication, redirections, cookies and more.
Georg Brandl116aa622007-08-15 14:28:22 +000013
Antoine Pitrou79ecd762010-09-29 11:24:21 +000014
Senthil Kumaranaca8fd72008-06-23 04:41:59 +000015The :mod:`urllib.request` module defines the following functions:
Georg Brandl116aa622007-08-15 14:28:22 +000016
17
Georg Brandl7f01a132009-09-16 15:58:14 +000018.. function:: urlopen(url, data=None[, timeout])
Georg Brandl116aa622007-08-15 14:28:22 +000019
Jeremy Hyltone2573162009-03-31 14:38:13 +000020 Open the URL *url*, which can be either a string or a
21 :class:`Request` object.
Georg Brandl116aa622007-08-15 14:28:22 +000022
Amaury Forgeot d'Arcea8676b2010-10-01 23:42:24 +000023 .. warning::
24 HTTPS (or FTPS) requests do not do any verification of the server's
25 certificate.
26
Jeremy Hyltone2573162009-03-31 14:38:13 +000027 *data* may be a string specifying additional data to send to the
28 server, or ``None`` if no such data is needed. Currently HTTP
29 requests are the only ones that use *data*; the HTTP request will
30 be a POST instead of a GET when the *data* parameter is provided.
31 *data* should be a buffer in the standard
Georg Brandl116aa622007-08-15 14:28:22 +000032 :mimetype:`application/x-www-form-urlencoded` format. The
Georg Brandl7fe2c4a2008-12-05 07:32:56 +000033 :func:`urllib.parse.urlencode` function takes a mapping or sequence
Senthil Kumaran6cbe4272010-08-21 16:08:32 +000034 of 2-tuples and returns a string in this format. urllib.request module uses
35 HTTP/1.1 and includes `Connection:close` header in its HTTP requests.
Georg Brandl116aa622007-08-15 14:28:22 +000036
Jeremy Hyltone2573162009-03-31 14:38:13 +000037 The optional *timeout* parameter specifies a timeout in seconds for
38 blocking operations like the connection attempt (if not specified,
39 the global default timeout setting will be used). This actually
40 only works for HTTP, HTTPS, FTP and FTPS connections.
Georg Brandl116aa622007-08-15 14:28:22 +000041
Senthil Kumaranaca8fd72008-06-23 04:41:59 +000042 This function returns a file-like object with two additional methods from
43 the :mod:`urllib.response` module
Georg Brandl116aa622007-08-15 14:28:22 +000044
Jeremy Hyltone2573162009-03-31 14:38:13 +000045 * :meth:`geturl` --- return the URL of the resource retrieved,
46 commonly used to determine if a redirect was followed
Georg Brandl116aa622007-08-15 14:28:22 +000047
Georg Brandl2dd01042009-02-27 16:46:46 +000048 * :meth:`info` --- return the meta-information of the page, such as headers,
Senthil Kumaran13a7eb42010-06-28 17:31:40 +000049 in the form of an :func:`email.message_from_string` instance (see
50 `Quick Reference to HTTP Headers <http://www.cs.tut.fi/~jkorpela/http.html>`_)
Georg Brandl116aa622007-08-15 14:28:22 +000051
52 Raises :exc:`URLError` on errors.
53
Georg Brandl2dd01042009-02-27 16:46:46 +000054 Note that ``None`` may be returned if no handler handles the request (though
55 the default installed global :class:`OpenerDirector` uses
56 :class:`UnknownHandler` to ensure this never happens).
57
Senthil Kumarana51a1b32009-10-18 01:42:33 +000058 In addition, default installed :class:`ProxyHandler` makes sure the requests
59 are handled through the proxy when they are set.
60
Georg Brandl2dd01042009-02-27 16:46:46 +000061 The legacy ``urllib.urlopen`` function from Python 2.6 and earlier has been
62 discontinued; :func:`urlopen` corresponds to the old ``urllib2.urlopen``.
63 Proxy handling, which was done by passing a dictionary parameter to
64 ``urllib.urlopen``, can be obtained by using :class:`ProxyHandler` objects.
Georg Brandl116aa622007-08-15 14:28:22 +000065
Georg Brandl116aa622007-08-15 14:28:22 +000066.. function:: install_opener(opener)
67
68 Install an :class:`OpenerDirector` instance as the default global opener.
69 Installing an opener is only necessary if you want urlopen to use that opener;
70 otherwise, simply call :meth:`OpenerDirector.open` instead of :func:`urlopen`.
71 The code does not check for a real :class:`OpenerDirector`, and any class with
72 the appropriate interface will work.
73
74
75.. function:: build_opener([handler, ...])
76
77 Return an :class:`OpenerDirector` instance, which chains the handlers in the
78 order given. *handler*\s can be either instances of :class:`BaseHandler`, or
79 subclasses of :class:`BaseHandler` (in which case it must be possible to call
80 the constructor without any parameters). Instances of the following classes
81 will be in front of the *handler*\s, unless the *handler*\s contain them,
82 instances of them or subclasses of them: :class:`ProxyHandler`,
83 :class:`UnknownHandler`, :class:`HTTPHandler`, :class:`HTTPDefaultErrorHandler`,
84 :class:`HTTPRedirectHandler`, :class:`FTPHandler`, :class:`FileHandler`,
85 :class:`HTTPErrorProcessor`.
86
Georg Brandl7f01a132009-09-16 15:58:14 +000087 If the Python installation has SSL support (i.e., if the :mod:`ssl` module
88 can be imported), :class:`HTTPSHandler` will also be added.
Georg Brandl116aa622007-08-15 14:28:22 +000089
Georg Brandle6bcc912008-05-12 18:05:20 +000090 A :class:`BaseHandler` subclass may also change its :attr:`handler_order`
91 member variable to modify its position in the handlers list.
Georg Brandl116aa622007-08-15 14:28:22 +000092
Georg Brandl7f01a132009-09-16 15:58:14 +000093
94.. function:: urlretrieve(url, filename=None, reporthook=None, data=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +000095
96 Copy a network object denoted by a URL to a local file, if necessary. If the URL
97 points to a local file, or a valid cached copy of the object exists, the object
98 is not copied. Return a tuple ``(filename, headers)`` where *filename* is the
99 local file name under which the object can be found, and *headers* is whatever
100 the :meth:`info` method of the object returned by :func:`urlopen` returned (for
101 a remote object, possibly cached). Exceptions are the same as for
102 :func:`urlopen`.
103
104 The second argument, if present, specifies the file location to copy to (if
105 absent, the location will be a tempfile with a generated name). The third
106 argument, if present, is a hook function that will be called once on
107 establishment of the network connection and once after each block read
108 thereafter. The hook will be passed three arguments; a count of blocks
109 transferred so far, a block size in bytes, and the total size of the file. The
110 third argument may be ``-1`` on older FTP servers which do not return a file
111 size in response to a retrieval request.
112
113 If the *url* uses the :file:`http:` scheme identifier, the optional *data*
114 argument may be given to specify a ``POST`` request (normally the request type
115 is ``GET``). The *data* argument must in standard
116 :mimetype:`application/x-www-form-urlencoded` format; see the :func:`urlencode`
117 function below.
118
119 :func:`urlretrieve` will raise :exc:`ContentTooShortError` when it detects that
120 the amount of data available was less than the expected amount (which is the
121 size reported by a *Content-Length* header). This can occur, for example, when
122 the download is interrupted.
123
124 The *Content-Length* is treated as a lower bound: if there's more data to read,
125 urlretrieve reads more data, but if less data is available, it raises the
126 exception.
127
128 You can still retrieve the downloaded data in this case, it is stored in the
129 :attr:`content` attribute of the exception instance.
130
131 If no *Content-Length* header was supplied, urlretrieve can not check the size
132 of the data it has downloaded, and just returns it. In this case you just have
133 to assume that the download was successful.
Georg Brandl116aa622007-08-15 14:28:22 +0000134
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000135.. function:: urlcleanup()
Georg Brandl116aa622007-08-15 14:28:22 +0000136
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000137 Clear the cache that may have been built up by previous calls to
138 :func:`urlretrieve`.
Christian Heimes292d3512008-02-03 16:51:08 +0000139
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000140.. function:: pathname2url(path)
Christian Heimes292d3512008-02-03 16:51:08 +0000141
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000142 Convert the pathname *path* from the local syntax for a path to the form used in
143 the path component of a URL. This does not produce a complete URL. The return
144 value will already be quoted using the :func:`quote` function.
Christian Heimes292d3512008-02-03 16:51:08 +0000145
146
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000147.. function:: url2pathname(path)
148
Senthil Kumaranf0769e82010-08-09 19:53:52 +0000149 Convert the path component *path* from a percent-encoded URL to the local syntax for a
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000150 path. This does not accept a complete URL. This function uses :func:`unquote`
151 to decode *path*.
Georg Brandl116aa622007-08-15 14:28:22 +0000152
Senthil Kumaran7e557a62010-02-26 00:53:23 +0000153.. function:: getproxies()
154
155 This helper function returns a dictionary of scheme to proxy server URL
156 mappings. It scans the environment for variables named ``<scheme>_proxy``
157 for all operating systems first, and when it cannot find it, looks for proxy
158 information from Mac OSX System Configuration for Mac OS X and Windows
159 Systems Registry for Windows.
160
Georg Brandl7f01a132009-09-16 15:58:14 +0000161
Georg Brandl116aa622007-08-15 14:28:22 +0000162The following classes are provided:
163
Georg Brandl7f01a132009-09-16 15:58:14 +0000164.. class:: Request(url, data=None, headers={}, origin_req_host=None, unverifiable=False)
Georg Brandl116aa622007-08-15 14:28:22 +0000165
166 This class is an abstraction of a URL request.
167
168 *url* should be a string containing a valid URL.
169
Jeremy Hyltone2573162009-03-31 14:38:13 +0000170 *data* may be a string specifying additional data to send to the
171 server, or ``None`` if no such data is needed. Currently HTTP
172 requests are the only ones that use *data*; the HTTP request will
173 be a POST instead of a GET when the *data* parameter is provided.
174 *data* should be a buffer in the standard
Georg Brandl116aa622007-08-15 14:28:22 +0000175 :mimetype:`application/x-www-form-urlencoded` format. The
Georg Brandl7fe2c4a2008-12-05 07:32:56 +0000176 :func:`urllib.parse.urlencode` function takes a mapping or sequence
177 of 2-tuples and returns a string in this format.
Georg Brandl116aa622007-08-15 14:28:22 +0000178
Jeremy Hyltone2573162009-03-31 14:38:13 +0000179 *headers* should be a dictionary, and will be treated as if
180 :meth:`add_header` was called with each key and value as arguments.
181 This is often used to "spoof" the ``User-Agent`` header, which is
182 used by a browser to identify itself -- some HTTP servers only
183 allow requests coming from common browsers as opposed to scripts.
184 For example, Mozilla Firefox may identify itself as ``"Mozilla/5.0
185 (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11"``, while
186 :mod:`urllib`'s default user agent string is
187 ``"Python-urllib/2.6"`` (on Python 2.6).
Georg Brandl116aa622007-08-15 14:28:22 +0000188
Jeremy Hyltone2573162009-03-31 14:38:13 +0000189 The final two arguments are only of interest for correct handling
190 of third-party HTTP cookies:
Georg Brandl116aa622007-08-15 14:28:22 +0000191
Jeremy Hyltone2573162009-03-31 14:38:13 +0000192 *origin_req_host* should be the request-host of the origin
193 transaction, as defined by :rfc:`2965`. It defaults to
194 ``http.cookiejar.request_host(self)``. This is the host name or IP
195 address of the original request that was initiated by the user.
196 For example, if the request is for an image in an HTML document,
197 this should be the request-host of the request for the page
Georg Brandl24420152008-05-26 16:32:26 +0000198 containing the image.
Georg Brandl116aa622007-08-15 14:28:22 +0000199
Jeremy Hyltone2573162009-03-31 14:38:13 +0000200 *unverifiable* should indicate whether the request is unverifiable,
201 as defined by RFC 2965. It defaults to False. An unverifiable
202 request is one whose URL the user did not have the option to
203 approve. For example, if the request is for an image in an HTML
204 document, and the user had no option to approve the automatic
205 fetching of the image, this should be true.
Georg Brandl116aa622007-08-15 14:28:22 +0000206
Georg Brandl7f01a132009-09-16 15:58:14 +0000207
208.. class:: URLopener(proxies=None, **x509)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000209
210 Base class for opening and reading URLs. Unless you need to support opening
211 objects using schemes other than :file:`http:`, :file:`ftp:`, or :file:`file:`,
212 you probably want to use :class:`FancyURLopener`.
213
214 By default, the :class:`URLopener` class sends a :mailheader:`User-Agent` header
215 of ``urllib/VVV``, where *VVV* is the :mod:`urllib` version number.
216 Applications can define their own :mailheader:`User-Agent` header by subclassing
217 :class:`URLopener` or :class:`FancyURLopener` and setting the class attribute
218 :attr:`version` to an appropriate string value in the subclass definition.
219
220 The optional *proxies* parameter should be a dictionary mapping scheme names to
221 proxy URLs, where an empty dictionary turns proxies off completely. Its default
222 value is ``None``, in which case environmental proxy settings will be used if
223 present, as discussed in the definition of :func:`urlopen`, above.
224
225 Additional keyword parameters, collected in *x509*, may be used for
226 authentication of the client when using the :file:`https:` scheme. The keywords
227 *key_file* and *cert_file* are supported to provide an SSL key and certificate;
228 both are needed to support client authentication.
229
230 :class:`URLopener` objects will raise an :exc:`IOError` exception if the server
231 returns an error code.
232
Georg Brandl7f01a132009-09-16 15:58:14 +0000233 .. method:: open(fullurl, data=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000234
235 Open *fullurl* using the appropriate protocol. This method sets up cache and
236 proxy information, then calls the appropriate open method with its input
237 arguments. If the scheme is not recognized, :meth:`open_unknown` is called.
238 The *data* argument has the same meaning as the *data* argument of
239 :func:`urlopen`.
240
241
Georg Brandl7f01a132009-09-16 15:58:14 +0000242 .. method:: open_unknown(fullurl, data=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000243
244 Overridable interface to open unknown URL types.
245
246
Georg Brandl7f01a132009-09-16 15:58:14 +0000247 .. method:: retrieve(url, filename=None, reporthook=None, data=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000248
249 Retrieves the contents of *url* and places it in *filename*. The return value
250 is a tuple consisting of a local filename and either a
251 :class:`email.message.Message` object containing the response headers (for remote
252 URLs) or ``None`` (for local URLs). The caller must then open and read the
253 contents of *filename*. If *filename* is not given and the URL refers to a
254 local file, the input filename is returned. If the URL is non-local and
255 *filename* is not given, the filename is the output of :func:`tempfile.mktemp`
256 with a suffix that matches the suffix of the last path component of the input
257 URL. If *reporthook* is given, it must be a function accepting three numeric
258 parameters. It will be called after each chunk of data is read from the
259 network. *reporthook* is ignored for local URLs.
260
261 If the *url* uses the :file:`http:` scheme identifier, the optional *data*
262 argument may be given to specify a ``POST`` request (normally the request type
263 is ``GET``). The *data* argument must in standard
264 :mimetype:`application/x-www-form-urlencoded` format; see the :func:`urlencode`
265 function below.
266
267
268 .. attribute:: version
269
270 Variable that specifies the user agent of the opener object. To get
271 :mod:`urllib` to tell servers that it is a particular user agent, set this in a
272 subclass as a class variable or in the constructor before calling the base
273 constructor.
274
275
276.. class:: FancyURLopener(...)
277
278 :class:`FancyURLopener` subclasses :class:`URLopener` providing default handling
279 for the following HTTP response codes: 301, 302, 303, 307 and 401. For the 30x
280 response codes listed above, the :mailheader:`Location` header is used to fetch
281 the actual URL. For 401 response codes (authentication required), basic HTTP
282 authentication is performed. For the 30x response codes, recursion is bounded
283 by the value of the *maxtries* attribute, which defaults to 10.
284
285 For all other response codes, the method :meth:`http_error_default` is called
286 which you can override in subclasses to handle the error appropriately.
287
288 .. note::
289
290 According to the letter of :rfc:`2616`, 301 and 302 responses to POST requests
291 must not be automatically redirected without confirmation by the user. In
292 reality, browsers do allow automatic redirection of these responses, changing
293 the POST to a GET, and :mod:`urllib` reproduces this behaviour.
294
295 The parameters to the constructor are the same as those for :class:`URLopener`.
296
297 .. note::
298
299 When performing basic authentication, a :class:`FancyURLopener` instance calls
300 its :meth:`prompt_user_passwd` method. The default implementation asks the
301 users for the required information on the controlling terminal. A subclass may
302 override this method to support more appropriate behavior if needed.
303
304 The :class:`FancyURLopener` class offers one additional method that should be
305 overloaded to provide the appropriate behavior:
306
307 .. method:: prompt_user_passwd(host, realm)
308
309 Return information needed to authenticate the user at the given host in the
310 specified security realm. The return value should be a tuple, ``(user,
311 password)``, which can be used for basic authentication.
312
313 The implementation prompts for this information on the terminal; an application
314 should override this method to use an appropriate interaction model in the local
315 environment.
Georg Brandl116aa622007-08-15 14:28:22 +0000316
317.. class:: OpenerDirector()
318
319 The :class:`OpenerDirector` class opens URLs via :class:`BaseHandler`\ s chained
320 together. It manages the chaining of handlers, and recovery from errors.
321
322
323.. class:: BaseHandler()
324
325 This is the base class for all registered handlers --- and handles only the
326 simple mechanics of registration.
327
328
329.. class:: HTTPDefaultErrorHandler()
330
331 A class which defines a default handler for HTTP error responses; all responses
332 are turned into :exc:`HTTPError` exceptions.
333
334
335.. class:: HTTPRedirectHandler()
336
337 A class to handle redirections.
338
339
Georg Brandl7f01a132009-09-16 15:58:14 +0000340.. class:: HTTPCookieProcessor(cookiejar=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000341
342 A class to handle HTTP Cookies.
343
344
Georg Brandl7f01a132009-09-16 15:58:14 +0000345.. class:: ProxyHandler(proxies=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000346
347 Cause requests to go through a proxy. If *proxies* is given, it must be a
348 dictionary mapping protocol names to URLs of proxies. The default is to read the
349 list of proxies from the environment variables :envvar:`<protocol>_proxy`.
Senthil Kumarana51a1b32009-10-18 01:42:33 +0000350 If no proxy environment variables are set, in a Windows environment, proxy
351 settings are obtained from the registry's Internet Settings section and in a
352 Mac OS X environment, proxy information is retrieved from the OS X System
353 Configuration Framework.
354
Christian Heimese25f35e2008-03-20 10:49:03 +0000355 To disable autodetected proxy pass an empty dictionary.
Georg Brandl116aa622007-08-15 14:28:22 +0000356
357
358.. class:: HTTPPasswordMgr()
359
360 Keep a database of ``(realm, uri) -> (user, password)`` mappings.
361
362
363.. class:: HTTPPasswordMgrWithDefaultRealm()
364
365 Keep a database of ``(realm, uri) -> (user, password)`` mappings. A realm of
366 ``None`` is considered a catch-all realm, which is searched if no other realm
367 fits.
368
369
Georg Brandl7f01a132009-09-16 15:58:14 +0000370.. class:: AbstractBasicAuthHandler(password_mgr=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000371
372 This is a mixin class that helps with HTTP authentication, both to the remote
373 host and to a proxy. *password_mgr*, if given, should be something that is
374 compatible with :class:`HTTPPasswordMgr`; refer to section
375 :ref:`http-password-mgr` for information on the interface that must be
376 supported.
377
378
Georg Brandl7f01a132009-09-16 15:58:14 +0000379.. class:: HTTPBasicAuthHandler(password_mgr=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000380
381 Handle authentication with the remote host. *password_mgr*, if given, should be
382 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
383 :ref:`http-password-mgr` for information on the interface that must be
384 supported.
385
386
Georg Brandl7f01a132009-09-16 15:58:14 +0000387.. class:: ProxyBasicAuthHandler(password_mgr=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000388
389 Handle authentication with the proxy. *password_mgr*, if given, should be
390 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
391 :ref:`http-password-mgr` for information on the interface that must be
392 supported.
393
394
Georg Brandl7f01a132009-09-16 15:58:14 +0000395.. class:: AbstractDigestAuthHandler(password_mgr=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000396
397 This is a mixin class that helps with HTTP authentication, both to the remote
398 host and to a proxy. *password_mgr*, if given, should be something that is
399 compatible with :class:`HTTPPasswordMgr`; refer to section
400 :ref:`http-password-mgr` for information on the interface that must be
401 supported.
402
403
Georg Brandl7f01a132009-09-16 15:58:14 +0000404.. class:: HTTPDigestAuthHandler(password_mgr=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000405
406 Handle authentication with the remote host. *password_mgr*, if given, should be
407 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
408 :ref:`http-password-mgr` for information on the interface that must be
409 supported.
410
411
Georg Brandl7f01a132009-09-16 15:58:14 +0000412.. class:: ProxyDigestAuthHandler(password_mgr=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000413
414 Handle authentication with the proxy. *password_mgr*, if given, should be
415 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
416 :ref:`http-password-mgr` for information on the interface that must be
417 supported.
418
419
420.. class:: HTTPHandler()
421
422 A class to handle opening of HTTP URLs.
423
424
425.. class:: HTTPSHandler()
426
427 A class to handle opening of HTTPS URLs.
428
429
430.. class:: FileHandler()
431
432 Open local files.
433
434
435.. class:: FTPHandler()
436
437 Open FTP URLs.
438
439
440.. class:: CacheFTPHandler()
441
442 Open FTP URLs, keeping a cache of open FTP connections to minimize delays.
443
444
445.. class:: UnknownHandler()
446
447 A catch-all class to handle unknown URLs.
448
449
450.. _request-objects:
451
452Request Objects
453---------------
454
Jeremy Hyltone2573162009-03-31 14:38:13 +0000455The following methods describe :class:`Request`'s public interface,
456and so all may be overridden in subclasses. It also defines several
457public attributes that can be used by clients to inspect the parsed
458request.
Georg Brandl116aa622007-08-15 14:28:22 +0000459
Jeremy Hyltone2573162009-03-31 14:38:13 +0000460.. attribute:: Request.full_url
461
462 The original URL passed to the constructor.
463
464.. attribute:: Request.type
465
466 The URI scheme.
467
468.. attribute:: Request.host
469
470 The URI authority, typically a host, but may also contain a port
471 separated by a colon.
472
473.. attribute:: Request.origin_req_host
474
475 The original host for the request, without port.
476
477.. attribute:: Request.selector
478
479 The URI path. If the :class:`Request` uses a proxy, then selector
480 will be the full url that is passed to the proxy.
481
482.. attribute:: Request.data
483
484 The entity body for the request, or None if not specified.
485
486.. attribute:: Request.unverifiable
487
488 boolean, indicates whether the request is unverifiable as defined
489 by RFC 2965.
Georg Brandl116aa622007-08-15 14:28:22 +0000490
491.. method:: Request.add_data(data)
492
493 Set the :class:`Request` data to *data*. This is ignored by all handlers except
494 HTTP handlers --- and there it should be a byte string, and will change the
495 request to be ``POST`` rather than ``GET``.
496
497
498.. method:: Request.get_method()
499
500 Return a string indicating the HTTP request method. This is only meaningful for
501 HTTP requests, and currently always returns ``'GET'`` or ``'POST'``.
502
503
504.. method:: Request.has_data()
505
506 Return whether the instance has a non-\ ``None`` data.
507
508
509.. method:: Request.get_data()
510
511 Return the instance's data.
512
513
514.. method:: Request.add_header(key, val)
515
516 Add another header to the request. Headers are currently ignored by all
517 handlers except HTTP handlers, where they are added to the list of headers sent
518 to the server. Note that there cannot be more than one header with the same
519 name, and later calls will overwrite previous calls in case the *key* collides.
520 Currently, this is no loss of HTTP functionality, since all headers which have
521 meaning when used more than once have a (header-specific) way of gaining the
522 same functionality using only one header.
523
524
525.. method:: Request.add_unredirected_header(key, header)
526
527 Add a header that will not be added to a redirected request.
528
Georg Brandl116aa622007-08-15 14:28:22 +0000529
530.. method:: Request.has_header(header)
531
532 Return whether the instance has the named header (checks both regular and
533 unredirected).
534
Georg Brandl116aa622007-08-15 14:28:22 +0000535
536.. method:: Request.get_full_url()
537
538 Return the URL given in the constructor.
539
540
541.. method:: Request.get_type()
542
543 Return the type of the URL --- also known as the scheme.
544
545
546.. method:: Request.get_host()
547
548 Return the host to which a connection will be made.
549
550
551.. method:: Request.get_selector()
552
553 Return the selector --- the part of the URL that is sent to the server.
554
555
556.. method:: Request.set_proxy(host, type)
557
558 Prepare the request by connecting to a proxy server. The *host* and *type* will
559 replace those of the instance, and the instance's selector will be the original
560 URL given in the constructor.
561
562
563.. method:: Request.get_origin_req_host()
564
565 Return the request-host of the origin transaction, as defined by :rfc:`2965`.
566 See the documentation for the :class:`Request` constructor.
567
568
569.. method:: Request.is_unverifiable()
570
571 Return whether the request is unverifiable, as defined by RFC 2965. See the
572 documentation for the :class:`Request` constructor.
573
574
575.. _opener-director-objects:
576
577OpenerDirector Objects
578----------------------
579
580:class:`OpenerDirector` instances have the following methods:
581
582
583.. method:: OpenerDirector.add_handler(handler)
584
585 *handler* should be an instance of :class:`BaseHandler`. The following methods
586 are searched, and added to the possible chains (note that HTTP errors are a
587 special case).
588
589 * :meth:`protocol_open` --- signal that the handler knows how to open *protocol*
590 URLs.
591
592 * :meth:`http_error_type` --- signal that the handler knows how to handle HTTP
593 errors with HTTP error code *type*.
594
595 * :meth:`protocol_error` --- signal that the handler knows how to handle errors
596 from (non-\ ``http``) *protocol*.
597
598 * :meth:`protocol_request` --- signal that the handler knows how to pre-process
599 *protocol* requests.
600
601 * :meth:`protocol_response` --- signal that the handler knows how to
602 post-process *protocol* responses.
603
604
Georg Brandl7f01a132009-09-16 15:58:14 +0000605.. method:: OpenerDirector.open(url, data=None[, timeout])
Georg Brandl116aa622007-08-15 14:28:22 +0000606
607 Open the given *url* (which can be a request object or a string), optionally
Alexandre Vassalotti5f8ced22008-05-16 00:03:33 +0000608 passing the given *data*. Arguments, return values and exceptions raised are
609 the same as those of :func:`urlopen` (which simply calls the :meth:`open`
610 method on the currently installed global :class:`OpenerDirector`). The
611 optional *timeout* parameter specifies a timeout in seconds for blocking
Georg Brandlf78e02b2008-06-10 17:40:04 +0000612 operations like the connection attempt (if not specified, the global default
Georg Brandl325524e2010-05-21 20:57:33 +0000613 timeout setting will be used). The timeout feature actually works only for
Georg Brandlf78e02b2008-06-10 17:40:04 +0000614 HTTP, HTTPS, FTP and FTPS connections).
Georg Brandl116aa622007-08-15 14:28:22 +0000615
Georg Brandl116aa622007-08-15 14:28:22 +0000616
Georg Brandl7f01a132009-09-16 15:58:14 +0000617.. method:: OpenerDirector.error(proto, *args)
Georg Brandl116aa622007-08-15 14:28:22 +0000618
619 Handle an error of the given protocol. This will call the registered error
620 handlers for the given protocol with the given arguments (which are protocol
621 specific). The HTTP protocol is a special case which uses the HTTP response
622 code to determine the specific error handler; refer to the :meth:`http_error_\*`
623 methods of the handler classes.
624
625 Return values and exceptions raised are the same as those of :func:`urlopen`.
626
627OpenerDirector objects open URLs in three stages:
628
629The order in which these methods are called within each stage is determined by
630sorting the handler instances.
631
632#. Every handler with a method named like :meth:`protocol_request` has that
633 method called to pre-process the request.
634
635#. Handlers with a method named like :meth:`protocol_open` are called to handle
636 the request. This stage ends when a handler either returns a non-\ :const:`None`
637 value (ie. a response), or raises an exception (usually :exc:`URLError`).
638 Exceptions are allowed to propagate.
639
640 In fact, the above algorithm is first tried for methods named
641 :meth:`default_open`. If all such methods return :const:`None`, the algorithm
642 is repeated for methods named like :meth:`protocol_open`. If all such methods
643 return :const:`None`, the algorithm is repeated for methods named
644 :meth:`unknown_open`.
645
646 Note that the implementation of these methods may involve calls of the parent
Georg Brandla5eacee2010-07-23 16:55:26 +0000647 :class:`OpenerDirector` instance's :meth:`~OpenerDirector.open` and
648 :meth:`~OpenerDirector.error` methods.
Georg Brandl116aa622007-08-15 14:28:22 +0000649
650#. Every handler with a method named like :meth:`protocol_response` has that
651 method called to post-process the response.
652
653
654.. _base-handler-objects:
655
656BaseHandler Objects
657-------------------
658
659:class:`BaseHandler` objects provide a couple of methods that are directly
660useful, and others that are meant to be used by derived classes. These are
661intended for direct use:
662
663
664.. method:: BaseHandler.add_parent(director)
665
666 Add a director as parent.
667
668
669.. method:: BaseHandler.close()
670
671 Remove any parents.
672
673The following members and methods should only be used by classes derived from
674:class:`BaseHandler`.
675
676.. note::
677
678 The convention has been adopted that subclasses defining
679 :meth:`protocol_request` or :meth:`protocol_response` methods are named
680 :class:`\*Processor`; all others are named :class:`\*Handler`.
681
682
683.. attribute:: BaseHandler.parent
684
685 A valid :class:`OpenerDirector`, which can be used to open using a different
686 protocol, or handle errors.
687
688
689.. method:: BaseHandler.default_open(req)
690
691 This method is *not* defined in :class:`BaseHandler`, but subclasses should
692 define it if they want to catch all URLs.
693
694 This method, if implemented, will be called by the parent
695 :class:`OpenerDirector`. It should return a file-like object as described in
696 the return value of the :meth:`open` of :class:`OpenerDirector`, or ``None``.
697 It should raise :exc:`URLError`, unless a truly exceptional thing happens (for
698 example, :exc:`MemoryError` should not be mapped to :exc:`URLError`).
699
700 This method will be called before any protocol-specific open method.
701
702
703.. method:: BaseHandler.protocol_open(req)
704 :noindex:
705
706 This method is *not* defined in :class:`BaseHandler`, but subclasses should
707 define it if they want to handle URLs with the given protocol.
708
709 This method, if defined, will be called by the parent :class:`OpenerDirector`.
710 Return values should be the same as for :meth:`default_open`.
711
712
713.. method:: BaseHandler.unknown_open(req)
714
715 This method is *not* defined in :class:`BaseHandler`, but subclasses should
716 define it if they want to catch all URLs with no specific registered handler to
717 open it.
718
719 This method, if implemented, will be called by the :attr:`parent`
720 :class:`OpenerDirector`. Return values should be the same as for
721 :meth:`default_open`.
722
723
724.. method:: BaseHandler.http_error_default(req, fp, code, msg, hdrs)
725
726 This method is *not* defined in :class:`BaseHandler`, but subclasses should
727 override it if they intend to provide a catch-all for otherwise unhandled HTTP
728 errors. It will be called automatically by the :class:`OpenerDirector` getting
729 the error, and should not normally be called in other circumstances.
730
731 *req* will be a :class:`Request` object, *fp* will be a file-like object with
732 the HTTP error body, *code* will be the three-digit code of the error, *msg*
733 will be the user-visible explanation of the code and *hdrs* will be a mapping
734 object with the headers of the error.
735
736 Return values and exceptions raised should be the same as those of
737 :func:`urlopen`.
738
739
740.. method:: BaseHandler.http_error_nnn(req, fp, code, msg, hdrs)
741
742 *nnn* should be a three-digit HTTP error code. This method is also not defined
743 in :class:`BaseHandler`, but will be called, if it exists, on an instance of a
744 subclass, when an HTTP error with code *nnn* occurs.
745
746 Subclasses should override this method to handle specific HTTP errors.
747
748 Arguments, return values and exceptions raised should be the same as for
749 :meth:`http_error_default`.
750
751
752.. method:: BaseHandler.protocol_request(req)
753 :noindex:
754
755 This method is *not* defined in :class:`BaseHandler`, but subclasses should
756 define it if they want to pre-process requests of the given protocol.
757
758 This method, if defined, will be called by the parent :class:`OpenerDirector`.
759 *req* will be a :class:`Request` object. The return value should be a
760 :class:`Request` object.
761
762
763.. method:: BaseHandler.protocol_response(req, response)
764 :noindex:
765
766 This method is *not* defined in :class:`BaseHandler`, but subclasses should
767 define it if they want to post-process responses of the given protocol.
768
769 This method, if defined, will be called by the parent :class:`OpenerDirector`.
770 *req* will be a :class:`Request` object. *response* will be an object
771 implementing the same interface as the return value of :func:`urlopen`. The
772 return value should implement the same interface as the return value of
773 :func:`urlopen`.
774
775
776.. _http-redirect-handler:
777
778HTTPRedirectHandler Objects
779---------------------------
780
781.. note::
782
783 Some HTTP redirections require action from this module's client code. If this
784 is the case, :exc:`HTTPError` is raised. See :rfc:`2616` for details of the
785 precise meanings of the various redirection codes.
786
787
Georg Brandl9617a592009-02-13 10:40:43 +0000788.. method:: HTTPRedirectHandler.redirect_request(req, fp, code, msg, hdrs, newurl)
Georg Brandl116aa622007-08-15 14:28:22 +0000789
790 Return a :class:`Request` or ``None`` in response to a redirect. This is called
791 by the default implementations of the :meth:`http_error_30\*` methods when a
792 redirection is received from the server. If a redirection should take place,
793 return a new :class:`Request` to allow :meth:`http_error_30\*` to perform the
Georg Brandl9617a592009-02-13 10:40:43 +0000794 redirect to *newurl*. Otherwise, raise :exc:`HTTPError` if no other handler
795 should try to handle this URL, or return ``None`` if you can't but another
796 handler might.
Georg Brandl116aa622007-08-15 14:28:22 +0000797
798 .. note::
799
800 The default implementation of this method does not strictly follow :rfc:`2616`,
801 which says that 301 and 302 responses to ``POST`` requests must not be
802 automatically redirected without confirmation by the user. In reality, browsers
803 do allow automatic redirection of these responses, changing the POST to a
804 ``GET``, and the default implementation reproduces this behavior.
805
806
807.. method:: HTTPRedirectHandler.http_error_301(req, fp, code, msg, hdrs)
808
Georg Brandl9617a592009-02-13 10:40:43 +0000809 Redirect to the ``Location:`` or ``URI:`` URL. This method is called by the
810 parent :class:`OpenerDirector` when getting an HTTP 'moved permanently' response.
Georg Brandl116aa622007-08-15 14:28:22 +0000811
812
813.. method:: HTTPRedirectHandler.http_error_302(req, fp, code, msg, hdrs)
814
815 The same as :meth:`http_error_301`, but called for the 'found' response.
816
817
818.. method:: HTTPRedirectHandler.http_error_303(req, fp, code, msg, hdrs)
819
820 The same as :meth:`http_error_301`, but called for the 'see other' response.
821
822
823.. method:: HTTPRedirectHandler.http_error_307(req, fp, code, msg, hdrs)
824
825 The same as :meth:`http_error_301`, but called for the 'temporary redirect'
826 response.
827
828
829.. _http-cookie-processor:
830
831HTTPCookieProcessor Objects
832---------------------------
833
Georg Brandl116aa622007-08-15 14:28:22 +0000834:class:`HTTPCookieProcessor` instances have one attribute:
835
Georg Brandl116aa622007-08-15 14:28:22 +0000836.. attribute:: HTTPCookieProcessor.cookiejar
837
Georg Brandl24420152008-05-26 16:32:26 +0000838 The :class:`http.cookiejar.CookieJar` in which cookies are stored.
Georg Brandl116aa622007-08-15 14:28:22 +0000839
840
841.. _proxy-handler:
842
843ProxyHandler Objects
844--------------------
845
846
847.. method:: ProxyHandler.protocol_open(request)
848 :noindex:
849
850 The :class:`ProxyHandler` will have a method :meth:`protocol_open` for every
851 *protocol* which has a proxy in the *proxies* dictionary given in the
852 constructor. The method will modify requests to go through the proxy, by
853 calling ``request.set_proxy()``, and call the next handler in the chain to
854 actually execute the protocol.
855
856
857.. _http-password-mgr:
858
859HTTPPasswordMgr Objects
860-----------------------
861
862These methods are available on :class:`HTTPPasswordMgr` and
863:class:`HTTPPasswordMgrWithDefaultRealm` objects.
864
865
866.. method:: HTTPPasswordMgr.add_password(realm, uri, user, passwd)
867
868 *uri* can be either a single URI, or a sequence of URIs. *realm*, *user* and
869 *passwd* must be strings. This causes ``(user, passwd)`` to be used as
870 authentication tokens when authentication for *realm* and a super-URI of any of
871 the given URIs is given.
872
873
874.. method:: HTTPPasswordMgr.find_user_password(realm, authuri)
875
876 Get user/password for given realm and URI, if any. This method will return
877 ``(None, None)`` if there is no matching user/password.
878
879 For :class:`HTTPPasswordMgrWithDefaultRealm` objects, the realm ``None`` will be
880 searched if the given *realm* has no matching user/password.
881
882
883.. _abstract-basic-auth-handler:
884
885AbstractBasicAuthHandler Objects
886--------------------------------
887
888
889.. method:: AbstractBasicAuthHandler.http_error_auth_reqed(authreq, host, req, headers)
890
891 Handle an authentication request by getting a user/password pair, and re-trying
892 the request. *authreq* should be the name of the header where the information
893 about the realm is included in the request, *host* specifies the URL and path to
894 authenticate for, *req* should be the (failed) :class:`Request` object, and
895 *headers* should be the error headers.
896
897 *host* is either an authority (e.g. ``"python.org"``) or a URL containing an
898 authority component (e.g. ``"http://python.org/"``). In either case, the
899 authority must not contain a userinfo component (so, ``"python.org"`` and
900 ``"python.org:80"`` are fine, ``"joe:password@python.org"`` is not).
901
902
903.. _http-basic-auth-handler:
904
905HTTPBasicAuthHandler Objects
906----------------------------
907
908
909.. method:: HTTPBasicAuthHandler.http_error_401(req, fp, code, msg, hdrs)
910
911 Retry the request with authentication information, if available.
912
913
914.. _proxy-basic-auth-handler:
915
916ProxyBasicAuthHandler Objects
917-----------------------------
918
919
920.. method:: ProxyBasicAuthHandler.http_error_407(req, fp, code, msg, hdrs)
921
922 Retry the request with authentication information, if available.
923
924
925.. _abstract-digest-auth-handler:
926
927AbstractDigestAuthHandler Objects
928---------------------------------
929
930
931.. method:: AbstractDigestAuthHandler.http_error_auth_reqed(authreq, host, req, headers)
932
933 *authreq* should be the name of the header where the information about the realm
934 is included in the request, *host* should be the host to authenticate to, *req*
935 should be the (failed) :class:`Request` object, and *headers* should be the
936 error headers.
937
938
939.. _http-digest-auth-handler:
940
941HTTPDigestAuthHandler Objects
942-----------------------------
943
944
945.. method:: HTTPDigestAuthHandler.http_error_401(req, fp, code, msg, hdrs)
946
947 Retry the request with authentication information, if available.
948
949
950.. _proxy-digest-auth-handler:
951
952ProxyDigestAuthHandler Objects
953------------------------------
954
955
956.. method:: ProxyDigestAuthHandler.http_error_407(req, fp, code, msg, hdrs)
957
958 Retry the request with authentication information, if available.
959
960
961.. _http-handler-objects:
962
963HTTPHandler Objects
964-------------------
965
966
967.. method:: HTTPHandler.http_open(req)
968
969 Send an HTTP request, which can be either GET or POST, depending on
970 ``req.has_data()``.
971
972
973.. _https-handler-objects:
974
975HTTPSHandler Objects
976--------------------
977
978
979.. method:: HTTPSHandler.https_open(req)
980
981 Send an HTTPS request, which can be either GET or POST, depending on
982 ``req.has_data()``.
983
984
985.. _file-handler-objects:
986
987FileHandler Objects
988-------------------
989
990
991.. method:: FileHandler.file_open(req)
992
993 Open the file locally, if there is no host name, or the host name is
994 ``'localhost'``. Change the protocol to ``ftp`` otherwise, and retry opening it
995 using :attr:`parent`.
996
997
998.. _ftp-handler-objects:
999
1000FTPHandler Objects
1001------------------
1002
1003
1004.. method:: FTPHandler.ftp_open(req)
1005
1006 Open the FTP file indicated by *req*. The login is always done with empty
1007 username and password.
1008
1009
1010.. _cacheftp-handler-objects:
1011
1012CacheFTPHandler Objects
1013-----------------------
1014
1015:class:`CacheFTPHandler` objects are :class:`FTPHandler` objects with the
1016following additional methods:
1017
1018
1019.. method:: CacheFTPHandler.setTimeout(t)
1020
1021 Set timeout of connections to *t* seconds.
1022
1023
1024.. method:: CacheFTPHandler.setMaxConns(m)
1025
1026 Set maximum number of cached connections to *m*.
1027
1028
1029.. _unknown-handler-objects:
1030
1031UnknownHandler Objects
1032----------------------
1033
1034
1035.. method:: UnknownHandler.unknown_open()
1036
1037 Raise a :exc:`URLError` exception.
1038
1039
1040.. _http-error-processor-objects:
1041
1042HTTPErrorProcessor Objects
1043--------------------------
1044
Georg Brandl116aa622007-08-15 14:28:22 +00001045.. method:: HTTPErrorProcessor.unknown_open()
1046
1047 Process HTTP error responses.
1048
1049 For 200 error codes, the response object is returned immediately.
1050
1051 For non-200 error codes, this simply passes the job on to the
1052 :meth:`protocol_error_code` handler methods, via :meth:`OpenerDirector.error`.
Georg Brandl0f7ede42008-06-23 11:23:31 +00001053 Eventually, :class:`HTTPDefaultErrorHandler` will raise an
Georg Brandl116aa622007-08-15 14:28:22 +00001054 :exc:`HTTPError` if no other handler handles the error.
1055
Georg Brandl0f7ede42008-06-23 11:23:31 +00001056
1057.. _urllib-request-examples:
Georg Brandl116aa622007-08-15 14:28:22 +00001058
1059Examples
1060--------
1061
Senthil Kumaran0c2d8b82010-04-22 10:53:30 +00001062This example gets the python.org main page and displays the first 300 bytes of
Georg Brandlbdc55ab2010-04-20 18:15:54 +00001063it. ::
Georg Brandl116aa622007-08-15 14:28:22 +00001064
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001065 >>> import urllib.request
1066 >>> f = urllib.request.urlopen('http://www.python.org/')
Senthil Kumaran0c2d8b82010-04-22 10:53:30 +00001067 >>> print(f.read(300))
1068 b'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
1069 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">\n\n\n<html
1070 xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">\n\n<head>\n
1071 <meta http-equiv="content-type" content="text/html; charset=utf-8" />\n
1072 <title>Python Programming '
Senthil Kumaranb213ee32010-04-15 17:18:22 +00001073
Senthil Kumaran0c2d8b82010-04-22 10:53:30 +00001074Note that urlopen returns a bytes object. This is because there is no way
1075for urlopen to automatically determine the encoding of the byte stream
1076it receives from the http server. In general, a program will decode
1077the returned bytes object to string once it determines or guesses
1078the appropriate encoding.
Senthil Kumaranb213ee32010-04-15 17:18:22 +00001079
Senthil Kumaran0c2d8b82010-04-22 10:53:30 +00001080The following W3C document, http://www.w3.org/International/O-charset , lists
1081the various ways in which a (X)HTML or a XML document could have specified its
1082encoding information.
1083
1084As python.org website uses *utf-8* encoding as specified in it's meta tag, we
1085will use same for decoding the bytes object. ::
Senthil Kumaranb213ee32010-04-15 17:18:22 +00001086
1087 >>> import urllib.request
1088 >>> f = urllib.request.urlopen('http://www.python.org/')
Georg Brandlfe4fd832010-05-21 21:01:32 +00001089 >>> print(f.read(100).decode('utf-8'))
Senthil Kumaran0c2d8b82010-04-22 10:53:30 +00001090 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
1091 "http://www.w3.org/TR/xhtml1/DTD/xhtm
1092
Georg Brandl116aa622007-08-15 14:28:22 +00001093
Senthil Kumaranb213ee32010-04-15 17:18:22 +00001094In the following example, we are sending a data-stream to the stdin of a CGI
1095and reading the data it returns to us. Note that this example will only work
1096when the Python installation supports SSL. ::
Georg Brandl116aa622007-08-15 14:28:22 +00001097
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001098 >>> import urllib.request
1099 >>> req = urllib.request.Request(url='https://localhost/cgi-bin/test.cgi',
Georg Brandl116aa622007-08-15 14:28:22 +00001100 ... data='This data is passed to stdin of the CGI')
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001101 >>> f = urllib.request.urlopen(req)
Senthil Kumaranb213ee32010-04-15 17:18:22 +00001102 >>> print(f.read().decode('utf-8'))
Georg Brandl116aa622007-08-15 14:28:22 +00001103 Got Data: "This data is passed to stdin of the CGI"
1104
1105The code for the sample CGI used in the above example is::
1106
1107 #!/usr/bin/env python
1108 import sys
1109 data = sys.stdin.read()
Collin Winterc79461b2007-09-01 23:34:30 +00001110 print('Content-type: text-plain\n\nGot Data: "%s"' % data)
Georg Brandl116aa622007-08-15 14:28:22 +00001111
1112Use of Basic HTTP Authentication::
1113
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001114 import urllib.request
Georg Brandl116aa622007-08-15 14:28:22 +00001115 # Create an OpenerDirector with support for Basic HTTP Authentication...
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001116 auth_handler = urllib.request.HTTPBasicAuthHandler()
Georg Brandl116aa622007-08-15 14:28:22 +00001117 auth_handler.add_password(realm='PDQ Application',
1118 uri='https://mahler:8092/site-updates.py',
1119 user='klem',
1120 passwd='kadidd!ehopper')
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001121 opener = urllib.request.build_opener(auth_handler)
Georg Brandl116aa622007-08-15 14:28:22 +00001122 # ...and install it globally so it can be used with urlopen.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001123 urllib.request.install_opener(opener)
1124 urllib.request.urlopen('http://www.example.com/login.html')
Georg Brandl116aa622007-08-15 14:28:22 +00001125
1126:func:`build_opener` provides many handlers by default, including a
1127:class:`ProxyHandler`. By default, :class:`ProxyHandler` uses the environment
1128variables named ``<scheme>_proxy``, where ``<scheme>`` is the URL scheme
1129involved. For example, the :envvar:`http_proxy` environment variable is read to
1130obtain the HTTP proxy's URL.
1131
1132This example replaces the default :class:`ProxyHandler` with one that uses
Georg Brandl2ee470f2008-07-16 12:55:28 +00001133programmatically-supplied proxy URLs, and adds proxy authorization support with
Georg Brandl116aa622007-08-15 14:28:22 +00001134:class:`ProxyBasicAuthHandler`. ::
1135
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001136 proxy_handler = urllib.request.ProxyHandler({'http': 'http://www.example.com:3128/'})
Senthil Kumaran037f8362009-12-24 02:24:37 +00001137 proxy_auth_handler = urllib.request.ProxyBasicAuthHandler()
Georg Brandl116aa622007-08-15 14:28:22 +00001138 proxy_auth_handler.add_password('realm', 'host', 'username', 'password')
1139
Senthil Kumaran037f8362009-12-24 02:24:37 +00001140 opener = urllib.request.build_opener(proxy_handler, proxy_auth_handler)
Georg Brandl116aa622007-08-15 14:28:22 +00001141 # This time, rather than install the OpenerDirector, we use it directly:
1142 opener.open('http://www.example.com/login.html')
1143
1144Adding HTTP headers:
1145
1146Use the *headers* argument to the :class:`Request` constructor, or::
1147
Georg Brandl029986a2008-06-23 11:44:14 +00001148 import urllib.request
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001149 req = urllib.request.Request('http://www.example.com/')
Georg Brandl116aa622007-08-15 14:28:22 +00001150 req.add_header('Referer', 'http://www.python.org/')
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001151 r = urllib.request.urlopen(req)
Georg Brandl116aa622007-08-15 14:28:22 +00001152
1153:class:`OpenerDirector` automatically adds a :mailheader:`User-Agent` header to
1154every :class:`Request`. To change this::
1155
Georg Brandl029986a2008-06-23 11:44:14 +00001156 import urllib.request
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001157 opener = urllib.request.build_opener()
Georg Brandl116aa622007-08-15 14:28:22 +00001158 opener.addheaders = [('User-agent', 'Mozilla/5.0')]
1159 opener.open('http://www.example.com/')
1160
1161Also, remember that a few standard headers (:mailheader:`Content-Length`,
1162:mailheader:`Content-Type` and :mailheader:`Host`) are added when the
1163:class:`Request` is passed to :func:`urlopen` (or :meth:`OpenerDirector.open`).
1164
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001165.. _urllib-examples:
1166
1167Here is an example session that uses the ``GET`` method to retrieve a URL
1168containing parameters::
1169
1170 >>> import urllib.request
1171 >>> import urllib.parse
1172 >>> params = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
1173 >>> f = urllib.request.urlopen("http://www.musi-cal.com/cgi-bin/query?%s" % params)
Senthil Kumaranb213ee32010-04-15 17:18:22 +00001174 >>> print(f.read().decode('utf-8'))
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001175
1176The following example uses the ``POST`` method instead::
1177
1178 >>> import urllib.request
1179 >>> import urllib.parse
1180 >>> params = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
1181 >>> f = urllib.request.urlopen("http://www.musi-cal.com/cgi-bin/query", params)
Senthil Kumaranb213ee32010-04-15 17:18:22 +00001182 >>> print(f.read().decode('utf-8'))
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001183
1184The following example uses an explicitly specified HTTP proxy, overriding
1185environment settings::
1186
1187 >>> import urllib.request
1188 >>> proxies = {'http': 'http://proxy.example.com:8080/'}
1189 >>> opener = urllib.request.FancyURLopener(proxies)
1190 >>> f = opener.open("http://www.python.org")
Senthil Kumaranb213ee32010-04-15 17:18:22 +00001191 >>> f.read().decode('utf-8')
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001192
1193The following example uses no proxies at all, overriding environment settings::
1194
1195 >>> import urllib.request
1196 >>> opener = urllib.request.FancyURLopener({})
1197 >>> f = opener.open("http://www.python.org/")
Senthil Kumaranb213ee32010-04-15 17:18:22 +00001198 >>> f.read().decode('utf-8')
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001199
1200
1201:mod:`urllib.request` Restrictions
1202----------------------------------
1203
1204 .. index::
1205 pair: HTTP; protocol
1206 pair: FTP; protocol
1207
1208* Currently, only the following protocols are supported: HTTP, (versions 0.9 and
1209 1.0), FTP, and local files.
1210
1211* The caching feature of :func:`urlretrieve` has been disabled until I find the
1212 time to hack proper processing of Expiration time headers.
1213
1214* There should be a function to query whether a particular URL is in the cache.
1215
1216* For backward compatibility, if a URL appears to point to a local file but the
1217 file can't be opened, the URL is re-interpreted using the FTP protocol. This
1218 can sometimes cause confusing error messages.
1219
1220* The :func:`urlopen` and :func:`urlretrieve` functions can cause arbitrarily
1221 long delays while waiting for a network connection to be set up. This means
1222 that it is difficult to build an interactive Web client using these functions
1223 without using threads.
1224
1225 .. index::
1226 single: HTML
1227 pair: HTTP; protocol
1228
1229* The data returned by :func:`urlopen` or :func:`urlretrieve` is the raw data
1230 returned by the server. This may be binary data (such as an image), plain text
1231 or (for example) HTML. The HTTP protocol provides type information in the reply
1232 header, which can be inspected by looking at the :mailheader:`Content-Type`
1233 header. If the returned data is HTML, you can use the module
1234 :mod:`html.parser` to parse it.
1235
1236 .. index:: single: FTP
1237
1238* The code handling the FTP protocol cannot differentiate between a file and a
1239 directory. This can lead to unexpected behavior when attempting to read a URL
1240 that points to a file that is not accessible. If the URL ends in a ``/``, it is
1241 assumed to refer to a directory and will be handled accordingly. But if an
1242 attempt to read a file leads to a 550 error (meaning the URL cannot be found or
1243 is not accessible, often for permission reasons), then the path is treated as a
1244 directory in order to handle the case when a directory is specified by a URL but
1245 the trailing ``/`` has been left off. This can cause misleading results when
1246 you try to fetch a file whose read permissions make it inaccessible; the FTP
1247 code will try to read it, fail with a 550 error, and then perform a directory
1248 listing for the unreadable file. If fine-grained control is needed, consider
1249 using the :mod:`ftplib` module, subclassing :class:`FancyURLOpener`, or changing
1250 *_urlopener* to meet your needs.
1251
Georg Brandl0f7ede42008-06-23 11:23:31 +00001252
1253
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001254:mod:`urllib.response` --- Response classes used by urllib.
1255===========================================================
Georg Brandl0f7ede42008-06-23 11:23:31 +00001256
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001257.. module:: urllib.response
1258 :synopsis: Response classes used by urllib.
1259
1260The :mod:`urllib.response` module defines functions and classes which define a
Georg Brandl0f7ede42008-06-23 11:23:31 +00001261minimal file like interface, including ``read()`` and ``readline()``. The
1262typical response object is an addinfourl instance, which defines and ``info()``
1263method and that returns headers and a ``geturl()`` method that returns the url.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001264Functions defined by this module are used internally by the
1265:mod:`urllib.request` module.
1266