blob: ccd946b909790364f305bcb39422ff1795fcde17 [file] [log] [blame]
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001:mod:`urllib.request` --- extensible library for opening URLs
2=============================================================
Georg Brandl116aa622007-08-15 14:28:22 +00003
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00004.. module:: urllib.request
Georg Brandl116aa622007-08-15 14:28:22 +00005 :synopsis: Next generation URL opening library.
Jeremy Hyltone2573162009-03-31 14:38:13 +00006.. moduleauthor:: Jeremy Hylton <jeremy@alum.mit.edu>
Georg Brandl116aa622007-08-15 14:28:22 +00007.. sectionauthor:: Moshe Zadka <moshez@users.sourceforge.net>
8
9
Georg Brandl0f7ede42008-06-23 11:23:31 +000010The :mod:`urllib.request` module defines functions and classes which help in
11opening URLs (mostly HTTP) in a complex world --- basic and digest
12authentication, redirections, cookies and more.
Georg Brandl116aa622007-08-15 14:28:22 +000013
Antoine Pitrou509dd542010-09-29 11:25:47 +000014
Senthil Kumaranaca8fd72008-06-23 04:41:59 +000015The :mod:`urllib.request` module defines the following functions:
Georg Brandl116aa622007-08-15 14:28:22 +000016
17
Georg Brandlb044b2a2009-09-16 16:05:59 +000018.. function:: urlopen(url, data=None[, timeout])
Georg Brandl116aa622007-08-15 14:28:22 +000019
Jeremy Hyltone2573162009-03-31 14:38:13 +000020 Open the URL *url*, which can be either a string or a
21 :class:`Request` object.
Georg Brandl116aa622007-08-15 14:28:22 +000022
Senthil Kumaranf066e272010-10-05 18:41:01 +000023 .. warning::
24 HTTPS requests do not do any verification of the server's certificate.
25
Jeremy Hyltone2573162009-03-31 14:38:13 +000026 *data* may be a string specifying additional data to send to the
27 server, or ``None`` if no such data is needed. Currently HTTP
28 requests are the only ones that use *data*; the HTTP request will
29 be a POST instead of a GET when the *data* parameter is provided.
30 *data* should be a buffer in the standard
Georg Brandl116aa622007-08-15 14:28:22 +000031 :mimetype:`application/x-www-form-urlencoded` format. The
Georg Brandl7fe2c4a2008-12-05 07:32:56 +000032 :func:`urllib.parse.urlencode` function takes a mapping or sequence
33 of 2-tuples and returns a string in this format.
Georg Brandl116aa622007-08-15 14:28:22 +000034
Jeremy Hyltone2573162009-03-31 14:38:13 +000035 The optional *timeout* parameter specifies a timeout in seconds for
36 blocking operations like the connection attempt (if not specified,
37 the global default timeout setting will be used). This actually
Senthil Kumaranf066e272010-10-05 18:41:01 +000038 only works for HTTP, HTTPS and FTP connections.
Georg Brandl116aa622007-08-15 14:28:22 +000039
Senthil Kumaranaca8fd72008-06-23 04:41:59 +000040 This function returns a file-like object with two additional methods from
41 the :mod:`urllib.response` module
Georg Brandl116aa622007-08-15 14:28:22 +000042
Jeremy Hyltone2573162009-03-31 14:38:13 +000043 * :meth:`geturl` --- return the URL of the resource retrieved,
44 commonly used to determine if a redirect was followed
Georg Brandl116aa622007-08-15 14:28:22 +000045
Georg Brandl2dd01042009-02-27 16:46:46 +000046 * :meth:`info` --- return the meta-information of the page, such as headers,
Senthil Kumaran783df8d2010-06-28 17:35:17 +000047 in the form of an :func:`email.message_from_string` instance (see
48 `Quick Reference to HTTP Headers <http://www.cs.tut.fi/~jkorpela/http.html>`_)
Georg Brandl116aa622007-08-15 14:28:22 +000049
50 Raises :exc:`URLError` on errors.
51
Georg Brandl2dd01042009-02-27 16:46:46 +000052 Note that ``None`` may be returned if no handler handles the request (though
53 the default installed global :class:`OpenerDirector` uses
54 :class:`UnknownHandler` to ensure this never happens).
55
Senthil Kumaran6eb181a2009-10-18 01:57:26 +000056 In addition, default installed :class:`ProxyHandler` makes sure the requests
57 are handled through the proxy when they are set.
58
Georg Brandl2dd01042009-02-27 16:46:46 +000059 The legacy ``urllib.urlopen`` function from Python 2.6 and earlier has been
60 discontinued; :func:`urlopen` corresponds to the old ``urllib2.urlopen``.
61 Proxy handling, which was done by passing a dictionary parameter to
62 ``urllib.urlopen``, can be obtained by using :class:`ProxyHandler` objects.
Georg Brandl116aa622007-08-15 14:28:22 +000063
Georg Brandl116aa622007-08-15 14:28:22 +000064.. function:: install_opener(opener)
65
66 Install an :class:`OpenerDirector` instance as the default global opener.
67 Installing an opener is only necessary if you want urlopen to use that opener;
68 otherwise, simply call :meth:`OpenerDirector.open` instead of :func:`urlopen`.
69 The code does not check for a real :class:`OpenerDirector`, and any class with
70 the appropriate interface will work.
71
72
73.. function:: build_opener([handler, ...])
74
75 Return an :class:`OpenerDirector` instance, which chains the handlers in the
76 order given. *handler*\s can be either instances of :class:`BaseHandler`, or
77 subclasses of :class:`BaseHandler` (in which case it must be possible to call
78 the constructor without any parameters). Instances of the following classes
79 will be in front of the *handler*\s, unless the *handler*\s contain them,
80 instances of them or subclasses of them: :class:`ProxyHandler`,
81 :class:`UnknownHandler`, :class:`HTTPHandler`, :class:`HTTPDefaultErrorHandler`,
82 :class:`HTTPRedirectHandler`, :class:`FTPHandler`, :class:`FileHandler`,
83 :class:`HTTPErrorProcessor`.
84
Georg Brandlb044b2a2009-09-16 16:05:59 +000085 If the Python installation has SSL support (i.e., if the :mod:`ssl` module
86 can be imported), :class:`HTTPSHandler` will also be added.
Georg Brandl116aa622007-08-15 14:28:22 +000087
Georg Brandle6bcc912008-05-12 18:05:20 +000088 A :class:`BaseHandler` subclass may also change its :attr:`handler_order`
89 member variable to modify its position in the handlers list.
Georg Brandl116aa622007-08-15 14:28:22 +000090
Georg Brandlb044b2a2009-09-16 16:05:59 +000091
92.. function:: urlretrieve(url, filename=None, reporthook=None, data=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +000093
94 Copy a network object denoted by a URL to a local file, if necessary. If the URL
95 points to a local file, or a valid cached copy of the object exists, the object
96 is not copied. Return a tuple ``(filename, headers)`` where *filename* is the
97 local file name under which the object can be found, and *headers* is whatever
98 the :meth:`info` method of the object returned by :func:`urlopen` returned (for
99 a remote object, possibly cached). Exceptions are the same as for
100 :func:`urlopen`.
101
102 The second argument, if present, specifies the file location to copy to (if
103 absent, the location will be a tempfile with a generated name). The third
104 argument, if present, is a hook function that will be called once on
105 establishment of the network connection and once after each block read
106 thereafter. The hook will be passed three arguments; a count of blocks
107 transferred so far, a block size in bytes, and the total size of the file. The
108 third argument may be ``-1`` on older FTP servers which do not return a file
109 size in response to a retrieval request.
110
111 If the *url* uses the :file:`http:` scheme identifier, the optional *data*
112 argument may be given to specify a ``POST`` request (normally the request type
113 is ``GET``). The *data* argument must in standard
114 :mimetype:`application/x-www-form-urlencoded` format; see the :func:`urlencode`
115 function below.
116
117 :func:`urlretrieve` will raise :exc:`ContentTooShortError` when it detects that
118 the amount of data available was less than the expected amount (which is the
119 size reported by a *Content-Length* header). This can occur, for example, when
120 the download is interrupted.
121
122 The *Content-Length* is treated as a lower bound: if there's more data to read,
123 urlretrieve reads more data, but if less data is available, it raises the
124 exception.
125
126 You can still retrieve the downloaded data in this case, it is stored in the
127 :attr:`content` attribute of the exception instance.
128
129 If no *Content-Length* header was supplied, urlretrieve can not check the size
130 of the data it has downloaded, and just returns it. In this case you just have
131 to assume that the download was successful.
Georg Brandl116aa622007-08-15 14:28:22 +0000132
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000133.. function:: urlcleanup()
Georg Brandl116aa622007-08-15 14:28:22 +0000134
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000135 Clear the cache that may have been built up by previous calls to
136 :func:`urlretrieve`.
Christian Heimes292d3512008-02-03 16:51:08 +0000137
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000138.. function:: pathname2url(path)
Christian Heimes292d3512008-02-03 16:51:08 +0000139
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000140 Convert the pathname *path* from the local syntax for a path to the form used in
141 the path component of a URL. This does not produce a complete URL. The return
142 value will already be quoted using the :func:`quote` function.
Christian Heimes292d3512008-02-03 16:51:08 +0000143
144
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000145.. function:: url2pathname(path)
146
Senthil Kumaranea54b032010-08-09 20:05:35 +0000147 Convert the path component *path* from a percent-encoded URL to the local syntax for a
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000148 path. This does not accept a complete URL. This function uses :func:`unquote`
149 to decode *path*.
Georg Brandl116aa622007-08-15 14:28:22 +0000150
Senthil Kumaranc2eca3d2010-02-26 00:55:09 +0000151.. function:: getproxies()
152
153 This helper function returns a dictionary of scheme to proxy server URL
154 mappings. It scans the environment for variables named ``<scheme>_proxy``
155 for all operating systems first, and when it cannot find it, looks for proxy
156 information from Mac OSX System Configuration for Mac OS X and Windows
157 Systems Registry for Windows.
158
Georg Brandlb044b2a2009-09-16 16:05:59 +0000159
Georg Brandl116aa622007-08-15 14:28:22 +0000160The following classes are provided:
161
Georg Brandlb044b2a2009-09-16 16:05:59 +0000162.. class:: Request(url, data=None, headers={}, origin_req_host=None, unverifiable=False)
Georg Brandl116aa622007-08-15 14:28:22 +0000163
164 This class is an abstraction of a URL request.
165
166 *url* should be a string containing a valid URL.
167
Jeremy Hyltone2573162009-03-31 14:38:13 +0000168 *data* may be a string specifying additional data to send to the
169 server, or ``None`` if no such data is needed. Currently HTTP
170 requests are the only ones that use *data*; the HTTP request will
171 be a POST instead of a GET when the *data* parameter is provided.
172 *data* should be a buffer in the standard
Georg Brandl116aa622007-08-15 14:28:22 +0000173 :mimetype:`application/x-www-form-urlencoded` format. The
Georg Brandl7fe2c4a2008-12-05 07:32:56 +0000174 :func:`urllib.parse.urlencode` function takes a mapping or sequence
175 of 2-tuples and returns a string in this format.
Georg Brandl116aa622007-08-15 14:28:22 +0000176
Jeremy Hyltone2573162009-03-31 14:38:13 +0000177 *headers* should be a dictionary, and will be treated as if
178 :meth:`add_header` was called with each key and value as arguments.
179 This is often used to "spoof" the ``User-Agent`` header, which is
180 used by a browser to identify itself -- some HTTP servers only
181 allow requests coming from common browsers as opposed to scripts.
182 For example, Mozilla Firefox may identify itself as ``"Mozilla/5.0
183 (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11"``, while
184 :mod:`urllib`'s default user agent string is
185 ``"Python-urllib/2.6"`` (on Python 2.6).
Georg Brandl116aa622007-08-15 14:28:22 +0000186
Jeremy Hyltone2573162009-03-31 14:38:13 +0000187 The final two arguments are only of interest for correct handling
188 of third-party HTTP cookies:
Georg Brandl116aa622007-08-15 14:28:22 +0000189
Jeremy Hyltone2573162009-03-31 14:38:13 +0000190 *origin_req_host* should be the request-host of the origin
191 transaction, as defined by :rfc:`2965`. It defaults to
192 ``http.cookiejar.request_host(self)``. This is the host name or IP
193 address of the original request that was initiated by the user.
194 For example, if the request is for an image in an HTML document,
195 this should be the request-host of the request for the page
Georg Brandl24420152008-05-26 16:32:26 +0000196 containing the image.
Georg Brandl116aa622007-08-15 14:28:22 +0000197
Jeremy Hyltone2573162009-03-31 14:38:13 +0000198 *unverifiable* should indicate whether the request is unverifiable,
199 as defined by RFC 2965. It defaults to False. An unverifiable
200 request is one whose URL the user did not have the option to
201 approve. For example, if the request is for an image in an HTML
202 document, and the user had no option to approve the automatic
203 fetching of the image, this should be true.
Georg Brandl116aa622007-08-15 14:28:22 +0000204
Georg Brandlb044b2a2009-09-16 16:05:59 +0000205
206.. class:: URLopener(proxies=None, **x509)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000207
208 Base class for opening and reading URLs. Unless you need to support opening
209 objects using schemes other than :file:`http:`, :file:`ftp:`, or :file:`file:`,
210 you probably want to use :class:`FancyURLopener`.
211
212 By default, the :class:`URLopener` class sends a :mailheader:`User-Agent` header
213 of ``urllib/VVV``, where *VVV* is the :mod:`urllib` version number.
214 Applications can define their own :mailheader:`User-Agent` header by subclassing
215 :class:`URLopener` or :class:`FancyURLopener` and setting the class attribute
216 :attr:`version` to an appropriate string value in the subclass definition.
217
218 The optional *proxies* parameter should be a dictionary mapping scheme names to
219 proxy URLs, where an empty dictionary turns proxies off completely. Its default
220 value is ``None``, in which case environmental proxy settings will be used if
221 present, as discussed in the definition of :func:`urlopen`, above.
222
223 Additional keyword parameters, collected in *x509*, may be used for
224 authentication of the client when using the :file:`https:` scheme. The keywords
225 *key_file* and *cert_file* are supported to provide an SSL key and certificate;
226 both are needed to support client authentication.
227
228 :class:`URLopener` objects will raise an :exc:`IOError` exception if the server
229 returns an error code.
230
Georg Brandlb044b2a2009-09-16 16:05:59 +0000231 .. method:: open(fullurl, data=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000232
233 Open *fullurl* using the appropriate protocol. This method sets up cache and
234 proxy information, then calls the appropriate open method with its input
235 arguments. If the scheme is not recognized, :meth:`open_unknown` is called.
236 The *data* argument has the same meaning as the *data* argument of
237 :func:`urlopen`.
238
239
Georg Brandlb044b2a2009-09-16 16:05:59 +0000240 .. method:: open_unknown(fullurl, data=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000241
242 Overridable interface to open unknown URL types.
243
244
Georg Brandlb044b2a2009-09-16 16:05:59 +0000245 .. method:: retrieve(url, filename=None, reporthook=None, data=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000246
247 Retrieves the contents of *url* and places it in *filename*. The return value
248 is a tuple consisting of a local filename and either a
249 :class:`email.message.Message` object containing the response headers (for remote
250 URLs) or ``None`` (for local URLs). The caller must then open and read the
251 contents of *filename*. If *filename* is not given and the URL refers to a
252 local file, the input filename is returned. If the URL is non-local and
253 *filename* is not given, the filename is the output of :func:`tempfile.mktemp`
254 with a suffix that matches the suffix of the last path component of the input
255 URL. If *reporthook* is given, it must be a function accepting three numeric
256 parameters. It will be called after each chunk of data is read from the
257 network. *reporthook* is ignored for local URLs.
258
259 If the *url* uses the :file:`http:` scheme identifier, the optional *data*
260 argument may be given to specify a ``POST`` request (normally the request type
261 is ``GET``). The *data* argument must in standard
262 :mimetype:`application/x-www-form-urlencoded` format; see the :func:`urlencode`
263 function below.
264
265
266 .. attribute:: version
267
268 Variable that specifies the user agent of the opener object. To get
269 :mod:`urllib` to tell servers that it is a particular user agent, set this in a
270 subclass as a class variable or in the constructor before calling the base
271 constructor.
272
273
274.. class:: FancyURLopener(...)
275
276 :class:`FancyURLopener` subclasses :class:`URLopener` providing default handling
277 for the following HTTP response codes: 301, 302, 303, 307 and 401. For the 30x
278 response codes listed above, the :mailheader:`Location` header is used to fetch
279 the actual URL. For 401 response codes (authentication required), basic HTTP
280 authentication is performed. For the 30x response codes, recursion is bounded
281 by the value of the *maxtries* attribute, which defaults to 10.
282
283 For all other response codes, the method :meth:`http_error_default` is called
284 which you can override in subclasses to handle the error appropriately.
285
286 .. note::
287
288 According to the letter of :rfc:`2616`, 301 and 302 responses to POST requests
289 must not be automatically redirected without confirmation by the user. In
290 reality, browsers do allow automatic redirection of these responses, changing
291 the POST to a GET, and :mod:`urllib` reproduces this behaviour.
292
293 The parameters to the constructor are the same as those for :class:`URLopener`.
294
295 .. note::
296
297 When performing basic authentication, a :class:`FancyURLopener` instance calls
298 its :meth:`prompt_user_passwd` method. The default implementation asks the
299 users for the required information on the controlling terminal. A subclass may
300 override this method to support more appropriate behavior if needed.
301
302 The :class:`FancyURLopener` class offers one additional method that should be
303 overloaded to provide the appropriate behavior:
304
305 .. method:: prompt_user_passwd(host, realm)
306
307 Return information needed to authenticate the user at the given host in the
308 specified security realm. The return value should be a tuple, ``(user,
309 password)``, which can be used for basic authentication.
310
311 The implementation prompts for this information on the terminal; an application
312 should override this method to use an appropriate interaction model in the local
313 environment.
Georg Brandl116aa622007-08-15 14:28:22 +0000314
315.. class:: OpenerDirector()
316
317 The :class:`OpenerDirector` class opens URLs via :class:`BaseHandler`\ s chained
318 together. It manages the chaining of handlers, and recovery from errors.
319
320
321.. class:: BaseHandler()
322
323 This is the base class for all registered handlers --- and handles only the
324 simple mechanics of registration.
325
326
327.. class:: HTTPDefaultErrorHandler()
328
329 A class which defines a default handler for HTTP error responses; all responses
330 are turned into :exc:`HTTPError` exceptions.
331
332
333.. class:: HTTPRedirectHandler()
334
335 A class to handle redirections.
336
337
Georg Brandlb044b2a2009-09-16 16:05:59 +0000338.. class:: HTTPCookieProcessor(cookiejar=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000339
340 A class to handle HTTP Cookies.
341
342
Georg Brandlb044b2a2009-09-16 16:05:59 +0000343.. class:: ProxyHandler(proxies=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000344
345 Cause requests to go through a proxy. If *proxies* is given, it must be a
346 dictionary mapping protocol names to URLs of proxies. The default is to read the
347 list of proxies from the environment variables :envvar:`<protocol>_proxy`.
Senthil Kumaran6eb181a2009-10-18 01:57:26 +0000348 If no proxy environment variables are set, in a Windows environment, proxy
349 settings are obtained from the registry's Internet Settings section and in a
350 Mac OS X environment, proxy information is retrieved from the OS X System
351 Configuration Framework.
352
Christian Heimese25f35e2008-03-20 10:49:03 +0000353 To disable autodetected proxy pass an empty dictionary.
Georg Brandl116aa622007-08-15 14:28:22 +0000354
355
356.. class:: HTTPPasswordMgr()
357
358 Keep a database of ``(realm, uri) -> (user, password)`` mappings.
359
360
361.. class:: HTTPPasswordMgrWithDefaultRealm()
362
363 Keep a database of ``(realm, uri) -> (user, password)`` mappings. A realm of
364 ``None`` is considered a catch-all realm, which is searched if no other realm
365 fits.
366
367
Georg Brandlb044b2a2009-09-16 16:05:59 +0000368.. class:: AbstractBasicAuthHandler(password_mgr=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000369
370 This is a mixin class that helps with HTTP authentication, both to the remote
371 host and to a proxy. *password_mgr*, if given, should be something that is
372 compatible with :class:`HTTPPasswordMgr`; refer to section
373 :ref:`http-password-mgr` for information on the interface that must be
374 supported.
375
376
Georg Brandlb044b2a2009-09-16 16:05:59 +0000377.. class:: HTTPBasicAuthHandler(password_mgr=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000378
379 Handle authentication with the remote host. *password_mgr*, if given, should be
380 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
381 :ref:`http-password-mgr` for information on the interface that must be
382 supported.
383
384
Georg Brandlb044b2a2009-09-16 16:05:59 +0000385.. class:: ProxyBasicAuthHandler(password_mgr=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000386
387 Handle authentication with the proxy. *password_mgr*, if given, should be
388 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
389 :ref:`http-password-mgr` for information on the interface that must be
390 supported.
391
392
Georg Brandlb044b2a2009-09-16 16:05:59 +0000393.. class:: AbstractDigestAuthHandler(password_mgr=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000394
395 This is a mixin class that helps with HTTP authentication, both to the remote
396 host and to a proxy. *password_mgr*, if given, should be something that is
397 compatible with :class:`HTTPPasswordMgr`; refer to section
398 :ref:`http-password-mgr` for information on the interface that must be
399 supported.
400
401
Georg Brandlb044b2a2009-09-16 16:05:59 +0000402.. class:: HTTPDigestAuthHandler(password_mgr=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000403
404 Handle authentication with the remote host. *password_mgr*, if given, should be
405 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
406 :ref:`http-password-mgr` for information on the interface that must be
407 supported.
408
409
Georg Brandlb044b2a2009-09-16 16:05:59 +0000410.. class:: ProxyDigestAuthHandler(password_mgr=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000411
412 Handle authentication with the proxy. *password_mgr*, if given, should be
413 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
414 :ref:`http-password-mgr` for information on the interface that must be
415 supported.
416
417
418.. class:: HTTPHandler()
419
420 A class to handle opening of HTTP URLs.
421
422
423.. class:: HTTPSHandler()
424
425 A class to handle opening of HTTPS URLs.
426
427
428.. class:: FileHandler()
429
430 Open local files.
431
432
433.. class:: FTPHandler()
434
435 Open FTP URLs.
436
437
438.. class:: CacheFTPHandler()
439
440 Open FTP URLs, keeping a cache of open FTP connections to minimize delays.
441
442
443.. class:: UnknownHandler()
444
445 A catch-all class to handle unknown URLs.
446
447
448.. _request-objects:
449
450Request Objects
451---------------
452
Jeremy Hyltone2573162009-03-31 14:38:13 +0000453The following methods describe :class:`Request`'s public interface,
454and so all may be overridden in subclasses. It also defines several
455public attributes that can be used by clients to inspect the parsed
456request.
Georg Brandl116aa622007-08-15 14:28:22 +0000457
Jeremy Hyltone2573162009-03-31 14:38:13 +0000458.. attribute:: Request.full_url
459
460 The original URL passed to the constructor.
461
462.. attribute:: Request.type
463
464 The URI scheme.
465
466.. attribute:: Request.host
467
468 The URI authority, typically a host, but may also contain a port
469 separated by a colon.
470
471.. attribute:: Request.origin_req_host
472
473 The original host for the request, without port.
474
475.. attribute:: Request.selector
476
477 The URI path. If the :class:`Request` uses a proxy, then selector
478 will be the full url that is passed to the proxy.
479
480.. attribute:: Request.data
481
482 The entity body for the request, or None if not specified.
483
484.. attribute:: Request.unverifiable
485
486 boolean, indicates whether the request is unverifiable as defined
487 by RFC 2965.
Georg Brandl116aa622007-08-15 14:28:22 +0000488
489.. method:: Request.add_data(data)
490
491 Set the :class:`Request` data to *data*. This is ignored by all handlers except
492 HTTP handlers --- and there it should be a byte string, and will change the
493 request to be ``POST`` rather than ``GET``.
494
495
496.. method:: Request.get_method()
497
498 Return a string indicating the HTTP request method. This is only meaningful for
499 HTTP requests, and currently always returns ``'GET'`` or ``'POST'``.
500
501
502.. method:: Request.has_data()
503
504 Return whether the instance has a non-\ ``None`` data.
505
506
507.. method:: Request.get_data()
508
509 Return the instance's data.
510
511
512.. method:: Request.add_header(key, val)
513
514 Add another header to the request. Headers are currently ignored by all
515 handlers except HTTP handlers, where they are added to the list of headers sent
516 to the server. Note that there cannot be more than one header with the same
517 name, and later calls will overwrite previous calls in case the *key* collides.
518 Currently, this is no loss of HTTP functionality, since all headers which have
519 meaning when used more than once have a (header-specific) way of gaining the
520 same functionality using only one header.
521
522
523.. method:: Request.add_unredirected_header(key, header)
524
525 Add a header that will not be added to a redirected request.
526
Georg Brandl116aa622007-08-15 14:28:22 +0000527
528.. method:: Request.has_header(header)
529
530 Return whether the instance has the named header (checks both regular and
531 unredirected).
532
Georg Brandl116aa622007-08-15 14:28:22 +0000533
534.. method:: Request.get_full_url()
535
536 Return the URL given in the constructor.
537
538
539.. method:: Request.get_type()
540
541 Return the type of the URL --- also known as the scheme.
542
543
544.. method:: Request.get_host()
545
546 Return the host to which a connection will be made.
547
548
549.. method:: Request.get_selector()
550
551 Return the selector --- the part of the URL that is sent to the server.
552
553
554.. method:: Request.set_proxy(host, type)
555
556 Prepare the request by connecting to a proxy server. The *host* and *type* will
557 replace those of the instance, and the instance's selector will be the original
558 URL given in the constructor.
559
560
561.. method:: Request.get_origin_req_host()
562
563 Return the request-host of the origin transaction, as defined by :rfc:`2965`.
564 See the documentation for the :class:`Request` constructor.
565
566
567.. method:: Request.is_unverifiable()
568
569 Return whether the request is unverifiable, as defined by RFC 2965. See the
570 documentation for the :class:`Request` constructor.
571
572
573.. _opener-director-objects:
574
575OpenerDirector Objects
576----------------------
577
578:class:`OpenerDirector` instances have the following methods:
579
580
581.. method:: OpenerDirector.add_handler(handler)
582
583 *handler* should be an instance of :class:`BaseHandler`. The following methods
584 are searched, and added to the possible chains (note that HTTP errors are a
585 special case).
586
587 * :meth:`protocol_open` --- signal that the handler knows how to open *protocol*
588 URLs.
589
590 * :meth:`http_error_type` --- signal that the handler knows how to handle HTTP
591 errors with HTTP error code *type*.
592
593 * :meth:`protocol_error` --- signal that the handler knows how to handle errors
594 from (non-\ ``http``) *protocol*.
595
596 * :meth:`protocol_request` --- signal that the handler knows how to pre-process
597 *protocol* requests.
598
599 * :meth:`protocol_response` --- signal that the handler knows how to
600 post-process *protocol* responses.
601
602
Georg Brandlb044b2a2009-09-16 16:05:59 +0000603.. method:: OpenerDirector.open(url, data=None[, timeout])
Georg Brandl116aa622007-08-15 14:28:22 +0000604
605 Open the given *url* (which can be a request object or a string), optionally
Alexandre Vassalotti5f8ced22008-05-16 00:03:33 +0000606 passing the given *data*. Arguments, return values and exceptions raised are
607 the same as those of :func:`urlopen` (which simply calls the :meth:`open`
608 method on the currently installed global :class:`OpenerDirector`). The
609 optional *timeout* parameter specifies a timeout in seconds for blocking
Georg Brandlf78e02b2008-06-10 17:40:04 +0000610 operations like the connection attempt (if not specified, the global default
Georg Brandl1bb061d2010-05-21 21:01:43 +0000611 timeout setting will be used). The timeout feature actually works only for
Senthil Kumaranf066e272010-10-05 18:41:01 +0000612 HTTP, HTTPS and FTP connections).
Georg Brandl116aa622007-08-15 14:28:22 +0000613
Georg Brandl116aa622007-08-15 14:28:22 +0000614
Georg Brandlb044b2a2009-09-16 16:05:59 +0000615.. method:: OpenerDirector.error(proto, *args)
Georg Brandl116aa622007-08-15 14:28:22 +0000616
617 Handle an error of the given protocol. This will call the registered error
618 handlers for the given protocol with the given arguments (which are protocol
619 specific). The HTTP protocol is a special case which uses the HTTP response
620 code to determine the specific error handler; refer to the :meth:`http_error_\*`
621 methods of the handler classes.
622
623 Return values and exceptions raised are the same as those of :func:`urlopen`.
624
625OpenerDirector objects open URLs in three stages:
626
627The order in which these methods are called within each stage is determined by
628sorting the handler instances.
629
630#. Every handler with a method named like :meth:`protocol_request` has that
631 method called to pre-process the request.
632
633#. Handlers with a method named like :meth:`protocol_open` are called to handle
634 the request. This stage ends when a handler either returns a non-\ :const:`None`
635 value (ie. a response), or raises an exception (usually :exc:`URLError`).
636 Exceptions are allowed to propagate.
637
638 In fact, the above algorithm is first tried for methods named
639 :meth:`default_open`. If all such methods return :const:`None`, the algorithm
640 is repeated for methods named like :meth:`protocol_open`. If all such methods
641 return :const:`None`, the algorithm is repeated for methods named
642 :meth:`unknown_open`.
643
644 Note that the implementation of these methods may involve calls of the parent
Georg Brandl8b256ca2010-08-01 21:25:46 +0000645 :class:`OpenerDirector` instance's :meth:`~OpenerDirector.open` and
646 :meth:`~OpenerDirector.error` methods.
Georg Brandl116aa622007-08-15 14:28:22 +0000647
648#. Every handler with a method named like :meth:`protocol_response` has that
649 method called to post-process the response.
650
651
652.. _base-handler-objects:
653
654BaseHandler Objects
655-------------------
656
657:class:`BaseHandler` objects provide a couple of methods that are directly
658useful, and others that are meant to be used by derived classes. These are
659intended for direct use:
660
661
662.. method:: BaseHandler.add_parent(director)
663
664 Add a director as parent.
665
666
667.. method:: BaseHandler.close()
668
669 Remove any parents.
670
671The following members and methods should only be used by classes derived from
672:class:`BaseHandler`.
673
674.. note::
675
676 The convention has been adopted that subclasses defining
677 :meth:`protocol_request` or :meth:`protocol_response` methods are named
678 :class:`\*Processor`; all others are named :class:`\*Handler`.
679
680
681.. attribute:: BaseHandler.parent
682
683 A valid :class:`OpenerDirector`, which can be used to open using a different
684 protocol, or handle errors.
685
686
687.. method:: BaseHandler.default_open(req)
688
689 This method is *not* defined in :class:`BaseHandler`, but subclasses should
690 define it if they want to catch all URLs.
691
692 This method, if implemented, will be called by the parent
693 :class:`OpenerDirector`. It should return a file-like object as described in
694 the return value of the :meth:`open` of :class:`OpenerDirector`, or ``None``.
695 It should raise :exc:`URLError`, unless a truly exceptional thing happens (for
696 example, :exc:`MemoryError` should not be mapped to :exc:`URLError`).
697
698 This method will be called before any protocol-specific open method.
699
700
701.. method:: BaseHandler.protocol_open(req)
702 :noindex:
703
704 This method is *not* defined in :class:`BaseHandler`, but subclasses should
705 define it if they want to handle URLs with the given protocol.
706
707 This method, if defined, will be called by the parent :class:`OpenerDirector`.
708 Return values should be the same as for :meth:`default_open`.
709
710
711.. method:: BaseHandler.unknown_open(req)
712
713 This method is *not* defined in :class:`BaseHandler`, but subclasses should
714 define it if they want to catch all URLs with no specific registered handler to
715 open it.
716
717 This method, if implemented, will be called by the :attr:`parent`
718 :class:`OpenerDirector`. Return values should be the same as for
719 :meth:`default_open`.
720
721
722.. method:: BaseHandler.http_error_default(req, fp, code, msg, hdrs)
723
724 This method is *not* defined in :class:`BaseHandler`, but subclasses should
725 override it if they intend to provide a catch-all for otherwise unhandled HTTP
726 errors. It will be called automatically by the :class:`OpenerDirector` getting
727 the error, and should not normally be called in other circumstances.
728
729 *req* will be a :class:`Request` object, *fp* will be a file-like object with
730 the HTTP error body, *code* will be the three-digit code of the error, *msg*
731 will be the user-visible explanation of the code and *hdrs* will be a mapping
732 object with the headers of the error.
733
734 Return values and exceptions raised should be the same as those of
735 :func:`urlopen`.
736
737
738.. method:: BaseHandler.http_error_nnn(req, fp, code, msg, hdrs)
739
740 *nnn* should be a three-digit HTTP error code. This method is also not defined
741 in :class:`BaseHandler`, but will be called, if it exists, on an instance of a
742 subclass, when an HTTP error with code *nnn* occurs.
743
744 Subclasses should override this method to handle specific HTTP errors.
745
746 Arguments, return values and exceptions raised should be the same as for
747 :meth:`http_error_default`.
748
749
750.. method:: BaseHandler.protocol_request(req)
751 :noindex:
752
753 This method is *not* defined in :class:`BaseHandler`, but subclasses should
754 define it if they want to pre-process requests of the given protocol.
755
756 This method, if defined, will be called by the parent :class:`OpenerDirector`.
757 *req* will be a :class:`Request` object. The return value should be a
758 :class:`Request` object.
759
760
761.. method:: BaseHandler.protocol_response(req, response)
762 :noindex:
763
764 This method is *not* defined in :class:`BaseHandler`, but subclasses should
765 define it if they want to post-process responses of the given protocol.
766
767 This method, if defined, will be called by the parent :class:`OpenerDirector`.
768 *req* will be a :class:`Request` object. *response* will be an object
769 implementing the same interface as the return value of :func:`urlopen`. The
770 return value should implement the same interface as the return value of
771 :func:`urlopen`.
772
773
774.. _http-redirect-handler:
775
776HTTPRedirectHandler Objects
777---------------------------
778
779.. note::
780
781 Some HTTP redirections require action from this module's client code. If this
782 is the case, :exc:`HTTPError` is raised. See :rfc:`2616` for details of the
783 precise meanings of the various redirection codes.
784
785
Georg Brandl9617a592009-02-13 10:40:43 +0000786.. method:: HTTPRedirectHandler.redirect_request(req, fp, code, msg, hdrs, newurl)
Georg Brandl116aa622007-08-15 14:28:22 +0000787
788 Return a :class:`Request` or ``None`` in response to a redirect. This is called
789 by the default implementations of the :meth:`http_error_30\*` methods when a
790 redirection is received from the server. If a redirection should take place,
791 return a new :class:`Request` to allow :meth:`http_error_30\*` to perform the
Georg Brandl9617a592009-02-13 10:40:43 +0000792 redirect to *newurl*. Otherwise, raise :exc:`HTTPError` if no other handler
793 should try to handle this URL, or return ``None`` if you can't but another
794 handler might.
Georg Brandl116aa622007-08-15 14:28:22 +0000795
796 .. note::
797
798 The default implementation of this method does not strictly follow :rfc:`2616`,
799 which says that 301 and 302 responses to ``POST`` requests must not be
800 automatically redirected without confirmation by the user. In reality, browsers
801 do allow automatic redirection of these responses, changing the POST to a
802 ``GET``, and the default implementation reproduces this behavior.
803
804
805.. method:: HTTPRedirectHandler.http_error_301(req, fp, code, msg, hdrs)
806
Georg Brandl9617a592009-02-13 10:40:43 +0000807 Redirect to the ``Location:`` or ``URI:`` URL. This method is called by the
808 parent :class:`OpenerDirector` when getting an HTTP 'moved permanently' response.
Georg Brandl116aa622007-08-15 14:28:22 +0000809
810
811.. method:: HTTPRedirectHandler.http_error_302(req, fp, code, msg, hdrs)
812
813 The same as :meth:`http_error_301`, but called for the 'found' response.
814
815
816.. method:: HTTPRedirectHandler.http_error_303(req, fp, code, msg, hdrs)
817
818 The same as :meth:`http_error_301`, but called for the 'see other' response.
819
820
821.. method:: HTTPRedirectHandler.http_error_307(req, fp, code, msg, hdrs)
822
823 The same as :meth:`http_error_301`, but called for the 'temporary redirect'
824 response.
825
826
827.. _http-cookie-processor:
828
829HTTPCookieProcessor Objects
830---------------------------
831
Georg Brandl116aa622007-08-15 14:28:22 +0000832:class:`HTTPCookieProcessor` instances have one attribute:
833
Georg Brandl116aa622007-08-15 14:28:22 +0000834.. attribute:: HTTPCookieProcessor.cookiejar
835
Georg Brandl24420152008-05-26 16:32:26 +0000836 The :class:`http.cookiejar.CookieJar` in which cookies are stored.
Georg Brandl116aa622007-08-15 14:28:22 +0000837
838
839.. _proxy-handler:
840
841ProxyHandler Objects
842--------------------
843
844
845.. method:: ProxyHandler.protocol_open(request)
846 :noindex:
847
848 The :class:`ProxyHandler` will have a method :meth:`protocol_open` for every
849 *protocol* which has a proxy in the *proxies* dictionary given in the
850 constructor. The method will modify requests to go through the proxy, by
851 calling ``request.set_proxy()``, and call the next handler in the chain to
852 actually execute the protocol.
853
854
855.. _http-password-mgr:
856
857HTTPPasswordMgr Objects
858-----------------------
859
860These methods are available on :class:`HTTPPasswordMgr` and
861:class:`HTTPPasswordMgrWithDefaultRealm` objects.
862
863
864.. method:: HTTPPasswordMgr.add_password(realm, uri, user, passwd)
865
866 *uri* can be either a single URI, or a sequence of URIs. *realm*, *user* and
867 *passwd* must be strings. This causes ``(user, passwd)`` to be used as
868 authentication tokens when authentication for *realm* and a super-URI of any of
869 the given URIs is given.
870
871
872.. method:: HTTPPasswordMgr.find_user_password(realm, authuri)
873
874 Get user/password for given realm and URI, if any. This method will return
875 ``(None, None)`` if there is no matching user/password.
876
877 For :class:`HTTPPasswordMgrWithDefaultRealm` objects, the realm ``None`` will be
878 searched if the given *realm* has no matching user/password.
879
880
881.. _abstract-basic-auth-handler:
882
883AbstractBasicAuthHandler Objects
884--------------------------------
885
886
887.. method:: AbstractBasicAuthHandler.http_error_auth_reqed(authreq, host, req, headers)
888
889 Handle an authentication request by getting a user/password pair, and re-trying
890 the request. *authreq* should be the name of the header where the information
891 about the realm is included in the request, *host* specifies the URL and path to
892 authenticate for, *req* should be the (failed) :class:`Request` object, and
893 *headers* should be the error headers.
894
895 *host* is either an authority (e.g. ``"python.org"``) or a URL containing an
896 authority component (e.g. ``"http://python.org/"``). In either case, the
897 authority must not contain a userinfo component (so, ``"python.org"`` and
898 ``"python.org:80"`` are fine, ``"joe:password@python.org"`` is not).
899
900
901.. _http-basic-auth-handler:
902
903HTTPBasicAuthHandler Objects
904----------------------------
905
906
907.. method:: HTTPBasicAuthHandler.http_error_401(req, fp, code, msg, hdrs)
908
909 Retry the request with authentication information, if available.
910
911
912.. _proxy-basic-auth-handler:
913
914ProxyBasicAuthHandler Objects
915-----------------------------
916
917
918.. method:: ProxyBasicAuthHandler.http_error_407(req, fp, code, msg, hdrs)
919
920 Retry the request with authentication information, if available.
921
922
923.. _abstract-digest-auth-handler:
924
925AbstractDigestAuthHandler Objects
926---------------------------------
927
928
929.. method:: AbstractDigestAuthHandler.http_error_auth_reqed(authreq, host, req, headers)
930
931 *authreq* should be the name of the header where the information about the realm
932 is included in the request, *host* should be the host to authenticate to, *req*
933 should be the (failed) :class:`Request` object, and *headers* should be the
934 error headers.
935
936
937.. _http-digest-auth-handler:
938
939HTTPDigestAuthHandler Objects
940-----------------------------
941
942
943.. method:: HTTPDigestAuthHandler.http_error_401(req, fp, code, msg, hdrs)
944
945 Retry the request with authentication information, if available.
946
947
948.. _proxy-digest-auth-handler:
949
950ProxyDigestAuthHandler Objects
951------------------------------
952
953
954.. method:: ProxyDigestAuthHandler.http_error_407(req, fp, code, msg, hdrs)
955
956 Retry the request with authentication information, if available.
957
958
959.. _http-handler-objects:
960
961HTTPHandler Objects
962-------------------
963
964
965.. method:: HTTPHandler.http_open(req)
966
967 Send an HTTP request, which can be either GET or POST, depending on
968 ``req.has_data()``.
969
970
971.. _https-handler-objects:
972
973HTTPSHandler Objects
974--------------------
975
976
977.. method:: HTTPSHandler.https_open(req)
978
979 Send an HTTPS request, which can be either GET or POST, depending on
980 ``req.has_data()``.
981
982
983.. _file-handler-objects:
984
985FileHandler Objects
986-------------------
987
988
989.. method:: FileHandler.file_open(req)
990
991 Open the file locally, if there is no host name, or the host name is
992 ``'localhost'``. Change the protocol to ``ftp`` otherwise, and retry opening it
993 using :attr:`parent`.
994
995
996.. _ftp-handler-objects:
997
998FTPHandler Objects
999------------------
1000
1001
1002.. method:: FTPHandler.ftp_open(req)
1003
1004 Open the FTP file indicated by *req*. The login is always done with empty
1005 username and password.
1006
1007
1008.. _cacheftp-handler-objects:
1009
1010CacheFTPHandler Objects
1011-----------------------
1012
1013:class:`CacheFTPHandler` objects are :class:`FTPHandler` objects with the
1014following additional methods:
1015
1016
1017.. method:: CacheFTPHandler.setTimeout(t)
1018
1019 Set timeout of connections to *t* seconds.
1020
1021
1022.. method:: CacheFTPHandler.setMaxConns(m)
1023
1024 Set maximum number of cached connections to *m*.
1025
1026
1027.. _unknown-handler-objects:
1028
1029UnknownHandler Objects
1030----------------------
1031
1032
1033.. method:: UnknownHandler.unknown_open()
1034
1035 Raise a :exc:`URLError` exception.
1036
1037
1038.. _http-error-processor-objects:
1039
1040HTTPErrorProcessor Objects
1041--------------------------
1042
Georg Brandl116aa622007-08-15 14:28:22 +00001043.. method:: HTTPErrorProcessor.unknown_open()
1044
1045 Process HTTP error responses.
1046
1047 For 200 error codes, the response object is returned immediately.
1048
1049 For non-200 error codes, this simply passes the job on to the
1050 :meth:`protocol_error_code` handler methods, via :meth:`OpenerDirector.error`.
Georg Brandl0f7ede42008-06-23 11:23:31 +00001051 Eventually, :class:`HTTPDefaultErrorHandler` will raise an
Georg Brandl116aa622007-08-15 14:28:22 +00001052 :exc:`HTTPError` if no other handler handles the error.
1053
Georg Brandl0f7ede42008-06-23 11:23:31 +00001054
1055.. _urllib-request-examples:
Georg Brandl116aa622007-08-15 14:28:22 +00001056
1057Examples
1058--------
1059
Senthil Kumarand0ab48f2010-04-22 10:58:56 +00001060This example gets the python.org main page and displays the first 300 bytes of
Georg Brandl16489242010-10-06 08:03:21 +00001061it. ::
Georg Brandl116aa622007-08-15 14:28:22 +00001062
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001063 >>> import urllib.request
1064 >>> f = urllib.request.urlopen('http://www.python.org/')
Senthil Kumarand0ab48f2010-04-22 10:58:56 +00001065 >>> print(f.read(300))
1066 b'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
1067 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">\n\n\n<html
1068 xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">\n\n<head>\n
1069 <meta http-equiv="content-type" content="text/html; charset=utf-8" />\n
1070 <title>Python Programming '
Senthil Kumaran0e3e4852010-04-15 17:21:29 +00001071
Senthil Kumarand0ab48f2010-04-22 10:58:56 +00001072Note that urlopen returns a bytes object. This is because there is no way
1073for urlopen to automatically determine the encoding of the byte stream
1074it receives from the http server. In general, a program will decode
1075the returned bytes object to string once it determines or guesses
1076the appropriate encoding.
Senthil Kumaran0e3e4852010-04-15 17:21:29 +00001077
Senthil Kumarand0ab48f2010-04-22 10:58:56 +00001078The following W3C document, http://www.w3.org/International/O-charset , lists
1079the various ways in which a (X)HTML or a XML document could have specified its
1080encoding information.
1081
1082As python.org website uses *utf-8* encoding as specified in it's meta tag, we
1083will use same for decoding the bytes object. ::
Senthil Kumaran0e3e4852010-04-15 17:21:29 +00001084
1085 >>> import urllib.request
1086 >>> f = urllib.request.urlopen('http://www.python.org/')
Georg Brandl4e0bd6d2010-05-21 21:02:56 +00001087 >>> print(f.read(100).decode('utf-8'))
Senthil Kumarand0ab48f2010-04-22 10:58:56 +00001088 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
1089 "http://www.w3.org/TR/xhtml1/DTD/xhtm
1090
Georg Brandl116aa622007-08-15 14:28:22 +00001091
Senthil Kumaran0e3e4852010-04-15 17:21:29 +00001092In the following example, we are sending a data-stream to the stdin of a CGI
1093and reading the data it returns to us. Note that this example will only work
1094when the Python installation supports SSL. ::
Georg Brandl116aa622007-08-15 14:28:22 +00001095
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001096 >>> import urllib.request
1097 >>> req = urllib.request.Request(url='https://localhost/cgi-bin/test.cgi',
Georg Brandl116aa622007-08-15 14:28:22 +00001098 ... data='This data is passed to stdin of the CGI')
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001099 >>> f = urllib.request.urlopen(req)
Senthil Kumaran0e3e4852010-04-15 17:21:29 +00001100 >>> print(f.read().decode('utf-8'))
Georg Brandl116aa622007-08-15 14:28:22 +00001101 Got Data: "This data is passed to stdin of the CGI"
1102
1103The code for the sample CGI used in the above example is::
1104
1105 #!/usr/bin/env python
1106 import sys
1107 data = sys.stdin.read()
Collin Winterc79461b2007-09-01 23:34:30 +00001108 print('Content-type: text-plain\n\nGot Data: "%s"' % data)
Georg Brandl116aa622007-08-15 14:28:22 +00001109
1110Use of Basic HTTP Authentication::
1111
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001112 import urllib.request
Georg Brandl116aa622007-08-15 14:28:22 +00001113 # Create an OpenerDirector with support for Basic HTTP Authentication...
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001114 auth_handler = urllib.request.HTTPBasicAuthHandler()
Georg Brandl116aa622007-08-15 14:28:22 +00001115 auth_handler.add_password(realm='PDQ Application',
1116 uri='https://mahler:8092/site-updates.py',
1117 user='klem',
1118 passwd='kadidd!ehopper')
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001119 opener = urllib.request.build_opener(auth_handler)
Georg Brandl116aa622007-08-15 14:28:22 +00001120 # ...and install it globally so it can be used with urlopen.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001121 urllib.request.install_opener(opener)
1122 urllib.request.urlopen('http://www.example.com/login.html')
Georg Brandl116aa622007-08-15 14:28:22 +00001123
1124:func:`build_opener` provides many handlers by default, including a
1125:class:`ProxyHandler`. By default, :class:`ProxyHandler` uses the environment
1126variables named ``<scheme>_proxy``, where ``<scheme>`` is the URL scheme
1127involved. For example, the :envvar:`http_proxy` environment variable is read to
1128obtain the HTTP proxy's URL.
1129
1130This example replaces the default :class:`ProxyHandler` with one that uses
Georg Brandl2ee470f2008-07-16 12:55:28 +00001131programmatically-supplied proxy URLs, and adds proxy authorization support with
Georg Brandl116aa622007-08-15 14:28:22 +00001132:class:`ProxyBasicAuthHandler`. ::
1133
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001134 proxy_handler = urllib.request.ProxyHandler({'http': 'http://www.example.com:3128/'})
Senthil Kumaranf9d95f72009-12-24 02:27:00 +00001135 proxy_auth_handler = urllib.request.ProxyBasicAuthHandler()
Georg Brandl116aa622007-08-15 14:28:22 +00001136 proxy_auth_handler.add_password('realm', 'host', 'username', 'password')
1137
Senthil Kumaranf9d95f72009-12-24 02:27:00 +00001138 opener = urllib.request.build_opener(proxy_handler, proxy_auth_handler)
Georg Brandl116aa622007-08-15 14:28:22 +00001139 # This time, rather than install the OpenerDirector, we use it directly:
1140 opener.open('http://www.example.com/login.html')
1141
1142Adding HTTP headers:
1143
1144Use the *headers* argument to the :class:`Request` constructor, or::
1145
Georg Brandl029986a2008-06-23 11:44:14 +00001146 import urllib.request
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001147 req = urllib.request.Request('http://www.example.com/')
Georg Brandl116aa622007-08-15 14:28:22 +00001148 req.add_header('Referer', 'http://www.python.org/')
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001149 r = urllib.request.urlopen(req)
Georg Brandl116aa622007-08-15 14:28:22 +00001150
1151:class:`OpenerDirector` automatically adds a :mailheader:`User-Agent` header to
1152every :class:`Request`. To change this::
1153
Georg Brandl029986a2008-06-23 11:44:14 +00001154 import urllib.request
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001155 opener = urllib.request.build_opener()
Georg Brandl116aa622007-08-15 14:28:22 +00001156 opener.addheaders = [('User-agent', 'Mozilla/5.0')]
1157 opener.open('http://www.example.com/')
1158
1159Also, remember that a few standard headers (:mailheader:`Content-Length`,
1160:mailheader:`Content-Type` and :mailheader:`Host`) are added when the
1161:class:`Request` is passed to :func:`urlopen` (or :meth:`OpenerDirector.open`).
1162
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001163.. _urllib-examples:
1164
1165Here is an example session that uses the ``GET`` method to retrieve a URL
1166containing parameters::
1167
1168 >>> import urllib.request
1169 >>> import urllib.parse
1170 >>> params = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
1171 >>> f = urllib.request.urlopen("http://www.musi-cal.com/cgi-bin/query?%s" % params)
Senthil Kumaran0e3e4852010-04-15 17:21:29 +00001172 >>> print(f.read().decode('utf-8'))
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001173
1174The following example uses the ``POST`` method instead::
1175
1176 >>> import urllib.request
1177 >>> import urllib.parse
1178 >>> params = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
1179 >>> f = urllib.request.urlopen("http://www.musi-cal.com/cgi-bin/query", params)
Senthil Kumaran0e3e4852010-04-15 17:21:29 +00001180 >>> print(f.read().decode('utf-8'))
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001181
1182The following example uses an explicitly specified HTTP proxy, overriding
1183environment settings::
1184
1185 >>> import urllib.request
1186 >>> proxies = {'http': 'http://proxy.example.com:8080/'}
1187 >>> opener = urllib.request.FancyURLopener(proxies)
1188 >>> f = opener.open("http://www.python.org")
Senthil Kumaran0e3e4852010-04-15 17:21:29 +00001189 >>> f.read().decode('utf-8')
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001190
1191The following example uses no proxies at all, overriding environment settings::
1192
1193 >>> import urllib.request
1194 >>> opener = urllib.request.FancyURLopener({})
1195 >>> f = opener.open("http://www.python.org/")
Senthil Kumaran0e3e4852010-04-15 17:21:29 +00001196 >>> f.read().decode('utf-8')
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001197
1198
1199:mod:`urllib.request` Restrictions
1200----------------------------------
1201
1202 .. index::
1203 pair: HTTP; protocol
1204 pair: FTP; protocol
1205
1206* Currently, only the following protocols are supported: HTTP, (versions 0.9 and
1207 1.0), FTP, and local files.
1208
1209* The caching feature of :func:`urlretrieve` has been disabled until I find the
1210 time to hack proper processing of Expiration time headers.
1211
1212* There should be a function to query whether a particular URL is in the cache.
1213
1214* For backward compatibility, if a URL appears to point to a local file but the
1215 file can't be opened, the URL is re-interpreted using the FTP protocol. This
1216 can sometimes cause confusing error messages.
1217
1218* The :func:`urlopen` and :func:`urlretrieve` functions can cause arbitrarily
1219 long delays while waiting for a network connection to be set up. This means
1220 that it is difficult to build an interactive Web client using these functions
1221 without using threads.
1222
1223 .. index::
1224 single: HTML
1225 pair: HTTP; protocol
1226
1227* The data returned by :func:`urlopen` or :func:`urlretrieve` is the raw data
1228 returned by the server. This may be binary data (such as an image), plain text
1229 or (for example) HTML. The HTTP protocol provides type information in the reply
1230 header, which can be inspected by looking at the :mailheader:`Content-Type`
1231 header. If the returned data is HTML, you can use the module
1232 :mod:`html.parser` to parse it.
1233
1234 .. index:: single: FTP
1235
1236* The code handling the FTP protocol cannot differentiate between a file and a
1237 directory. This can lead to unexpected behavior when attempting to read a URL
1238 that points to a file that is not accessible. If the URL ends in a ``/``, it is
1239 assumed to refer to a directory and will be handled accordingly. But if an
1240 attempt to read a file leads to a 550 error (meaning the URL cannot be found or
1241 is not accessible, often for permission reasons), then the path is treated as a
1242 directory in order to handle the case when a directory is specified by a URL but
1243 the trailing ``/`` has been left off. This can cause misleading results when
1244 you try to fetch a file whose read permissions make it inaccessible; the FTP
1245 code will try to read it, fail with a 550 error, and then perform a directory
1246 listing for the unreadable file. If fine-grained control is needed, consider
1247 using the :mod:`ftplib` module, subclassing :class:`FancyURLOpener`, or changing
1248 *_urlopener* to meet your needs.
1249
Georg Brandl0f7ede42008-06-23 11:23:31 +00001250
1251
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001252:mod:`urllib.response` --- Response classes used by urllib.
1253===========================================================
Georg Brandl0f7ede42008-06-23 11:23:31 +00001254
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001255.. module:: urllib.response
1256 :synopsis: Response classes used by urllib.
1257
1258The :mod:`urllib.response` module defines functions and classes which define a
Georg Brandl0f7ede42008-06-23 11:23:31 +00001259minimal file like interface, including ``read()`` and ``readline()``. The
Ezio Melotti92165e62010-11-18 19:49:19 +00001260typical response object is an addinfourl instance, which defines an ``info()``
Georg Brandl0f7ede42008-06-23 11:23:31 +00001261method and that returns headers and a ``geturl()`` method that returns the url.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001262Functions defined by this module are used internally by the
1263:mod:`urllib.request` module.
1264