blob: b4a7f288c34b36dca1fd378c9550c808afd17784 [file] [log] [blame]
Georg Brandlf6c8fd62011-02-25 09:48:21 +00001:mod:`urllib.request` --- Extensible library for opening URLs
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00002=============================================================
Georg Brandl116aa622007-08-15 14:28:22 +00003
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00004.. module:: urllib.request
Georg Brandl116aa622007-08-15 14:28:22 +00005 :synopsis: Next generation URL opening library.
Jeremy Hyltone2573162009-03-31 14:38:13 +00006.. moduleauthor:: Jeremy Hylton <jeremy@alum.mit.edu>
Georg Brandl116aa622007-08-15 14:28:22 +00007.. sectionauthor:: Moshe Zadka <moshez@users.sourceforge.net>
8
9
Georg Brandl0f7ede42008-06-23 11:23:31 +000010The :mod:`urllib.request` module defines functions and classes which help in
11opening URLs (mostly HTTP) in a complex world --- basic and digest
12authentication, redirections, cookies and more.
Georg Brandl116aa622007-08-15 14:28:22 +000013
Antoine Pitrou509dd542010-09-29 11:25:47 +000014
Senthil Kumaranaca8fd72008-06-23 04:41:59 +000015The :mod:`urllib.request` module defines the following functions:
Georg Brandl116aa622007-08-15 14:28:22 +000016
17
Georg Brandlb044b2a2009-09-16 16:05:59 +000018.. function:: urlopen(url, data=None[, timeout])
Georg Brandl116aa622007-08-15 14:28:22 +000019
Jeremy Hyltone2573162009-03-31 14:38:13 +000020 Open the URL *url*, which can be either a string or a
21 :class:`Request` object.
Georg Brandl116aa622007-08-15 14:28:22 +000022
Senthil Kumaranf066e272010-10-05 18:41:01 +000023 .. warning::
24 HTTPS requests do not do any verification of the server's certificate.
25
Jeremy Hyltone2573162009-03-31 14:38:13 +000026 *data* may be a string specifying additional data to send to the
27 server, or ``None`` if no such data is needed. Currently HTTP
28 requests are the only ones that use *data*; the HTTP request will
29 be a POST instead of a GET when the *data* parameter is provided.
30 *data* should be a buffer in the standard
Georg Brandl116aa622007-08-15 14:28:22 +000031 :mimetype:`application/x-www-form-urlencoded` format. The
Georg Brandl7fe2c4a2008-12-05 07:32:56 +000032 :func:`urllib.parse.urlencode` function takes a mapping or sequence
33 of 2-tuples and returns a string in this format.
Georg Brandl116aa622007-08-15 14:28:22 +000034
Jeremy Hyltone2573162009-03-31 14:38:13 +000035 The optional *timeout* parameter specifies a timeout in seconds for
36 blocking operations like the connection attempt (if not specified,
37 the global default timeout setting will be used). This actually
Senthil Kumaranf066e272010-10-05 18:41:01 +000038 only works for HTTP, HTTPS and FTP connections.
Georg Brandl116aa622007-08-15 14:28:22 +000039
Senthil Kumaranaca8fd72008-06-23 04:41:59 +000040 This function returns a file-like object with two additional methods from
41 the :mod:`urllib.response` module
Georg Brandl116aa622007-08-15 14:28:22 +000042
Jeremy Hyltone2573162009-03-31 14:38:13 +000043 * :meth:`geturl` --- return the URL of the resource retrieved,
44 commonly used to determine if a redirect was followed
Georg Brandl116aa622007-08-15 14:28:22 +000045
Georg Brandl2dd01042009-02-27 16:46:46 +000046 * :meth:`info` --- return the meta-information of the page, such as headers,
Senthil Kumaran783df8d2010-06-28 17:35:17 +000047 in the form of an :func:`email.message_from_string` instance (see
48 `Quick Reference to HTTP Headers <http://www.cs.tut.fi/~jkorpela/http.html>`_)
Georg Brandl116aa622007-08-15 14:28:22 +000049
50 Raises :exc:`URLError` on errors.
51
Georg Brandl2dd01042009-02-27 16:46:46 +000052 Note that ``None`` may be returned if no handler handles the request (though
53 the default installed global :class:`OpenerDirector` uses
54 :class:`UnknownHandler` to ensure this never happens).
55
Senthil Kumaran6eb181a2009-10-18 01:57:26 +000056 In addition, default installed :class:`ProxyHandler` makes sure the requests
57 are handled through the proxy when they are set.
58
Georg Brandl2dd01042009-02-27 16:46:46 +000059 The legacy ``urllib.urlopen`` function from Python 2.6 and earlier has been
60 discontinued; :func:`urlopen` corresponds to the old ``urllib2.urlopen``.
61 Proxy handling, which was done by passing a dictionary parameter to
62 ``urllib.urlopen``, can be obtained by using :class:`ProxyHandler` objects.
Georg Brandl116aa622007-08-15 14:28:22 +000063
Georg Brandl116aa622007-08-15 14:28:22 +000064.. function:: install_opener(opener)
65
66 Install an :class:`OpenerDirector` instance as the default global opener.
67 Installing an opener is only necessary if you want urlopen to use that opener;
68 otherwise, simply call :meth:`OpenerDirector.open` instead of :func:`urlopen`.
69 The code does not check for a real :class:`OpenerDirector`, and any class with
70 the appropriate interface will work.
71
72
73.. function:: build_opener([handler, ...])
74
75 Return an :class:`OpenerDirector` instance, which chains the handlers in the
76 order given. *handler*\s can be either instances of :class:`BaseHandler`, or
77 subclasses of :class:`BaseHandler` (in which case it must be possible to call
78 the constructor without any parameters). Instances of the following classes
79 will be in front of the *handler*\s, unless the *handler*\s contain them,
80 instances of them or subclasses of them: :class:`ProxyHandler`,
81 :class:`UnknownHandler`, :class:`HTTPHandler`, :class:`HTTPDefaultErrorHandler`,
82 :class:`HTTPRedirectHandler`, :class:`FTPHandler`, :class:`FileHandler`,
83 :class:`HTTPErrorProcessor`.
84
Georg Brandlb044b2a2009-09-16 16:05:59 +000085 If the Python installation has SSL support (i.e., if the :mod:`ssl` module
86 can be imported), :class:`HTTPSHandler` will also be added.
Georg Brandl116aa622007-08-15 14:28:22 +000087
Georg Brandle6bcc912008-05-12 18:05:20 +000088 A :class:`BaseHandler` subclass may also change its :attr:`handler_order`
89 member variable to modify its position in the handlers list.
Georg Brandl116aa622007-08-15 14:28:22 +000090
Georg Brandlb044b2a2009-09-16 16:05:59 +000091
92.. function:: urlretrieve(url, filename=None, reporthook=None, data=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +000093
94 Copy a network object denoted by a URL to a local file, if necessary. If the URL
95 points to a local file, or a valid cached copy of the object exists, the object
96 is not copied. Return a tuple ``(filename, headers)`` where *filename* is the
97 local file name under which the object can be found, and *headers* is whatever
98 the :meth:`info` method of the object returned by :func:`urlopen` returned (for
99 a remote object, possibly cached). Exceptions are the same as for
100 :func:`urlopen`.
101
102 The second argument, if present, specifies the file location to copy to (if
103 absent, the location will be a tempfile with a generated name). The third
104 argument, if present, is a hook function that will be called once on
105 establishment of the network connection and once after each block read
106 thereafter. The hook will be passed three arguments; a count of blocks
107 transferred so far, a block size in bytes, and the total size of the file. The
108 third argument may be ``-1`` on older FTP servers which do not return a file
109 size in response to a retrieval request.
110
111 If the *url* uses the :file:`http:` scheme identifier, the optional *data*
112 argument may be given to specify a ``POST`` request (normally the request type
113 is ``GET``). The *data* argument must in standard
114 :mimetype:`application/x-www-form-urlencoded` format; see the :func:`urlencode`
115 function below.
116
117 :func:`urlretrieve` will raise :exc:`ContentTooShortError` when it detects that
118 the amount of data available was less than the expected amount (which is the
119 size reported by a *Content-Length* header). This can occur, for example, when
120 the download is interrupted.
121
122 The *Content-Length* is treated as a lower bound: if there's more data to read,
123 urlretrieve reads more data, but if less data is available, it raises the
124 exception.
125
126 You can still retrieve the downloaded data in this case, it is stored in the
127 :attr:`content` attribute of the exception instance.
128
129 If no *Content-Length* header was supplied, urlretrieve can not check the size
130 of the data it has downloaded, and just returns it. In this case you just have
131 to assume that the download was successful.
Georg Brandl116aa622007-08-15 14:28:22 +0000132
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000133.. function:: urlcleanup()
Georg Brandl116aa622007-08-15 14:28:22 +0000134
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000135 Clear the cache that may have been built up by previous calls to
136 :func:`urlretrieve`.
Christian Heimes292d3512008-02-03 16:51:08 +0000137
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000138.. function:: pathname2url(path)
Christian Heimes292d3512008-02-03 16:51:08 +0000139
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000140 Convert the pathname *path* from the local syntax for a path to the form used in
141 the path component of a URL. This does not produce a complete URL. The return
142 value will already be quoted using the :func:`quote` function.
Christian Heimes292d3512008-02-03 16:51:08 +0000143
144
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000145.. function:: url2pathname(path)
146
Senthil Kumaranea54b032010-08-09 20:05:35 +0000147 Convert the path component *path* from a percent-encoded URL to the local syntax for a
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000148 path. This does not accept a complete URL. This function uses :func:`unquote`
149 to decode *path*.
Georg Brandl116aa622007-08-15 14:28:22 +0000150
Senthil Kumaranc2eca3d2010-02-26 00:55:09 +0000151.. function:: getproxies()
152
153 This helper function returns a dictionary of scheme to proxy server URL
154 mappings. It scans the environment for variables named ``<scheme>_proxy``
155 for all operating systems first, and when it cannot find it, looks for proxy
156 information from Mac OSX System Configuration for Mac OS X and Windows
157 Systems Registry for Windows.
158
Georg Brandlb044b2a2009-09-16 16:05:59 +0000159
Georg Brandl116aa622007-08-15 14:28:22 +0000160The following classes are provided:
161
Georg Brandlb044b2a2009-09-16 16:05:59 +0000162.. class:: Request(url, data=None, headers={}, origin_req_host=None, unverifiable=False)
Georg Brandl116aa622007-08-15 14:28:22 +0000163
164 This class is an abstraction of a URL request.
165
166 *url* should be a string containing a valid URL.
167
Jeremy Hyltone2573162009-03-31 14:38:13 +0000168 *data* may be a string specifying additional data to send to the
169 server, or ``None`` if no such data is needed. Currently HTTP
170 requests are the only ones that use *data*; the HTTP request will
171 be a POST instead of a GET when the *data* parameter is provided.
172 *data* should be a buffer in the standard
Georg Brandl116aa622007-08-15 14:28:22 +0000173 :mimetype:`application/x-www-form-urlencoded` format. The
Georg Brandl7fe2c4a2008-12-05 07:32:56 +0000174 :func:`urllib.parse.urlencode` function takes a mapping or sequence
175 of 2-tuples and returns a string in this format.
Georg Brandl116aa622007-08-15 14:28:22 +0000176
Jeremy Hyltone2573162009-03-31 14:38:13 +0000177 *headers* should be a dictionary, and will be treated as if
178 :meth:`add_header` was called with each key and value as arguments.
179 This is often used to "spoof" the ``User-Agent`` header, which is
180 used by a browser to identify itself -- some HTTP servers only
181 allow requests coming from common browsers as opposed to scripts.
182 For example, Mozilla Firefox may identify itself as ``"Mozilla/5.0
183 (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11"``, while
184 :mod:`urllib`'s default user agent string is
185 ``"Python-urllib/2.6"`` (on Python 2.6).
Georg Brandl116aa622007-08-15 14:28:22 +0000186
Jeremy Hyltone2573162009-03-31 14:38:13 +0000187 The final two arguments are only of interest for correct handling
188 of third-party HTTP cookies:
Georg Brandl116aa622007-08-15 14:28:22 +0000189
Jeremy Hyltone2573162009-03-31 14:38:13 +0000190 *origin_req_host* should be the request-host of the origin
191 transaction, as defined by :rfc:`2965`. It defaults to
192 ``http.cookiejar.request_host(self)``. This is the host name or IP
193 address of the original request that was initiated by the user.
194 For example, if the request is for an image in an HTML document,
195 this should be the request-host of the request for the page
Georg Brandl24420152008-05-26 16:32:26 +0000196 containing the image.
Georg Brandl116aa622007-08-15 14:28:22 +0000197
Jeremy Hyltone2573162009-03-31 14:38:13 +0000198 *unverifiable* should indicate whether the request is unverifiable,
199 as defined by RFC 2965. It defaults to False. An unverifiable
200 request is one whose URL the user did not have the option to
201 approve. For example, if the request is for an image in an HTML
202 document, and the user had no option to approve the automatic
203 fetching of the image, this should be true.
Georg Brandl116aa622007-08-15 14:28:22 +0000204
Georg Brandlb044b2a2009-09-16 16:05:59 +0000205
206.. class:: URLopener(proxies=None, **x509)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000207
208 Base class for opening and reading URLs. Unless you need to support opening
209 objects using schemes other than :file:`http:`, :file:`ftp:`, or :file:`file:`,
210 you probably want to use :class:`FancyURLopener`.
211
212 By default, the :class:`URLopener` class sends a :mailheader:`User-Agent` header
213 of ``urllib/VVV``, where *VVV* is the :mod:`urllib` version number.
214 Applications can define their own :mailheader:`User-Agent` header by subclassing
215 :class:`URLopener` or :class:`FancyURLopener` and setting the class attribute
216 :attr:`version` to an appropriate string value in the subclass definition.
217
218 The optional *proxies* parameter should be a dictionary mapping scheme names to
219 proxy URLs, where an empty dictionary turns proxies off completely. Its default
220 value is ``None``, in which case environmental proxy settings will be used if
221 present, as discussed in the definition of :func:`urlopen`, above.
222
223 Additional keyword parameters, collected in *x509*, may be used for
224 authentication of the client when using the :file:`https:` scheme. The keywords
225 *key_file* and *cert_file* are supported to provide an SSL key and certificate;
226 both are needed to support client authentication.
227
228 :class:`URLopener` objects will raise an :exc:`IOError` exception if the server
229 returns an error code.
230
Georg Brandlb044b2a2009-09-16 16:05:59 +0000231 .. method:: open(fullurl, data=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000232
233 Open *fullurl* using the appropriate protocol. This method sets up cache and
234 proxy information, then calls the appropriate open method with its input
235 arguments. If the scheme is not recognized, :meth:`open_unknown` is called.
236 The *data* argument has the same meaning as the *data* argument of
237 :func:`urlopen`.
238
239
Georg Brandlb044b2a2009-09-16 16:05:59 +0000240 .. method:: open_unknown(fullurl, data=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000241
242 Overridable interface to open unknown URL types.
243
244
Georg Brandlb044b2a2009-09-16 16:05:59 +0000245 .. method:: retrieve(url, filename=None, reporthook=None, data=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000246
247 Retrieves the contents of *url* and places it in *filename*. The return value
248 is a tuple consisting of a local filename and either a
249 :class:`email.message.Message` object containing the response headers (for remote
250 URLs) or ``None`` (for local URLs). The caller must then open and read the
251 contents of *filename*. If *filename* is not given and the URL refers to a
252 local file, the input filename is returned. If the URL is non-local and
253 *filename* is not given, the filename is the output of :func:`tempfile.mktemp`
254 with a suffix that matches the suffix of the last path component of the input
255 URL. If *reporthook* is given, it must be a function accepting three numeric
256 parameters. It will be called after each chunk of data is read from the
257 network. *reporthook* is ignored for local URLs.
258
259 If the *url* uses the :file:`http:` scheme identifier, the optional *data*
260 argument may be given to specify a ``POST`` request (normally the request type
261 is ``GET``). The *data* argument must in standard
262 :mimetype:`application/x-www-form-urlencoded` format; see the :func:`urlencode`
263 function below.
264
265
266 .. attribute:: version
267
268 Variable that specifies the user agent of the opener object. To get
269 :mod:`urllib` to tell servers that it is a particular user agent, set this in a
270 subclass as a class variable or in the constructor before calling the base
271 constructor.
272
273
274.. class:: FancyURLopener(...)
275
276 :class:`FancyURLopener` subclasses :class:`URLopener` providing default handling
277 for the following HTTP response codes: 301, 302, 303, 307 and 401. For the 30x
278 response codes listed above, the :mailheader:`Location` header is used to fetch
279 the actual URL. For 401 response codes (authentication required), basic HTTP
280 authentication is performed. For the 30x response codes, recursion is bounded
281 by the value of the *maxtries* attribute, which defaults to 10.
282
283 For all other response codes, the method :meth:`http_error_default` is called
284 which you can override in subclasses to handle the error appropriately.
285
286 .. note::
287
288 According to the letter of :rfc:`2616`, 301 and 302 responses to POST requests
289 must not be automatically redirected without confirmation by the user. In
290 reality, browsers do allow automatic redirection of these responses, changing
291 the POST to a GET, and :mod:`urllib` reproduces this behaviour.
292
293 The parameters to the constructor are the same as those for :class:`URLopener`.
294
295 .. note::
296
297 When performing basic authentication, a :class:`FancyURLopener` instance calls
298 its :meth:`prompt_user_passwd` method. The default implementation asks the
299 users for the required information on the controlling terminal. A subclass may
300 override this method to support more appropriate behavior if needed.
301
Georg Brandlf6c8fd62011-02-25 09:48:21 +0000302 The :class:`FancyURLopener` class offers one additional method that should be
303 overloaded to provide the appropriate behavior:
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000304
Georg Brandlf6c8fd62011-02-25 09:48:21 +0000305 .. method:: prompt_user_passwd(host, realm)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000306
Georg Brandlf6c8fd62011-02-25 09:48:21 +0000307 Return information needed to authenticate the user at the given host in the
308 specified security realm. The return value should be a tuple, ``(user,
309 password)``, which can be used for basic authentication.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000310
Georg Brandlf6c8fd62011-02-25 09:48:21 +0000311 The implementation prompts for this information on the terminal; an application
312 should override this method to use an appropriate interaction model in the local
313 environment.
314
Georg Brandl116aa622007-08-15 14:28:22 +0000315
316.. class:: OpenerDirector()
317
318 The :class:`OpenerDirector` class opens URLs via :class:`BaseHandler`\ s chained
319 together. It manages the chaining of handlers, and recovery from errors.
320
321
322.. class:: BaseHandler()
323
324 This is the base class for all registered handlers --- and handles only the
325 simple mechanics of registration.
326
327
328.. class:: HTTPDefaultErrorHandler()
329
330 A class which defines a default handler for HTTP error responses; all responses
331 are turned into :exc:`HTTPError` exceptions.
332
333
334.. class:: HTTPRedirectHandler()
335
336 A class to handle redirections.
337
338
Georg Brandlb044b2a2009-09-16 16:05:59 +0000339.. class:: HTTPCookieProcessor(cookiejar=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000340
341 A class to handle HTTP Cookies.
342
343
Georg Brandlb044b2a2009-09-16 16:05:59 +0000344.. class:: ProxyHandler(proxies=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000345
346 Cause requests to go through a proxy. If *proxies* is given, it must be a
347 dictionary mapping protocol names to URLs of proxies. The default is to read the
348 list of proxies from the environment variables :envvar:`<protocol>_proxy`.
Senthil Kumaran6eb181a2009-10-18 01:57:26 +0000349 If no proxy environment variables are set, in a Windows environment, proxy
350 settings are obtained from the registry's Internet Settings section and in a
351 Mac OS X environment, proxy information is retrieved from the OS X System
352 Configuration Framework.
353
Christian Heimese25f35e2008-03-20 10:49:03 +0000354 To disable autodetected proxy pass an empty dictionary.
Georg Brandl116aa622007-08-15 14:28:22 +0000355
356
357.. class:: HTTPPasswordMgr()
358
359 Keep a database of ``(realm, uri) -> (user, password)`` mappings.
360
361
362.. class:: HTTPPasswordMgrWithDefaultRealm()
363
364 Keep a database of ``(realm, uri) -> (user, password)`` mappings. A realm of
365 ``None`` is considered a catch-all realm, which is searched if no other realm
366 fits.
367
368
Georg Brandlb044b2a2009-09-16 16:05:59 +0000369.. class:: AbstractBasicAuthHandler(password_mgr=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000370
371 This is a mixin class that helps with HTTP authentication, both to the remote
372 host and to a proxy. *password_mgr*, if given, should be something that is
373 compatible with :class:`HTTPPasswordMgr`; refer to section
374 :ref:`http-password-mgr` for information on the interface that must be
375 supported.
376
377
Georg Brandlb044b2a2009-09-16 16:05:59 +0000378.. class:: HTTPBasicAuthHandler(password_mgr=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000379
380 Handle authentication with the remote host. *password_mgr*, if given, should be
381 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
382 :ref:`http-password-mgr` for information on the interface that must be
383 supported.
384
385
Georg Brandlb044b2a2009-09-16 16:05:59 +0000386.. class:: ProxyBasicAuthHandler(password_mgr=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000387
388 Handle authentication with the proxy. *password_mgr*, if given, should be
389 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
390 :ref:`http-password-mgr` for information on the interface that must be
391 supported.
392
393
Georg Brandlb044b2a2009-09-16 16:05:59 +0000394.. class:: AbstractDigestAuthHandler(password_mgr=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000395
396 This is a mixin class that helps with HTTP authentication, both to the remote
397 host and to a proxy. *password_mgr*, if given, should be something that is
398 compatible with :class:`HTTPPasswordMgr`; refer to section
399 :ref:`http-password-mgr` for information on the interface that must be
400 supported.
401
402
Georg Brandlb044b2a2009-09-16 16:05:59 +0000403.. class:: HTTPDigestAuthHandler(password_mgr=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000404
405 Handle authentication with the remote host. *password_mgr*, if given, should be
406 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
407 :ref:`http-password-mgr` for information on the interface that must be
408 supported.
409
410
Georg Brandlb044b2a2009-09-16 16:05:59 +0000411.. class:: ProxyDigestAuthHandler(password_mgr=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000412
413 Handle authentication with the proxy. *password_mgr*, if given, should be
414 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
415 :ref:`http-password-mgr` for information on the interface that must be
416 supported.
417
418
419.. class:: HTTPHandler()
420
421 A class to handle opening of HTTP URLs.
422
423
424.. class:: HTTPSHandler()
425
426 A class to handle opening of HTTPS URLs.
427
428
429.. class:: FileHandler()
430
431 Open local files.
432
433
434.. class:: FTPHandler()
435
436 Open FTP URLs.
437
438
439.. class:: CacheFTPHandler()
440
441 Open FTP URLs, keeping a cache of open FTP connections to minimize delays.
442
443
444.. class:: UnknownHandler()
445
446 A catch-all class to handle unknown URLs.
447
448
449.. _request-objects:
450
451Request Objects
452---------------
453
Jeremy Hyltone2573162009-03-31 14:38:13 +0000454The following methods describe :class:`Request`'s public interface,
455and so all may be overridden in subclasses. It also defines several
456public attributes that can be used by clients to inspect the parsed
457request.
Georg Brandl116aa622007-08-15 14:28:22 +0000458
Jeremy Hyltone2573162009-03-31 14:38:13 +0000459.. attribute:: Request.full_url
460
461 The original URL passed to the constructor.
462
463.. attribute:: Request.type
464
465 The URI scheme.
466
467.. attribute:: Request.host
468
469 The URI authority, typically a host, but may also contain a port
470 separated by a colon.
471
472.. attribute:: Request.origin_req_host
473
474 The original host for the request, without port.
475
476.. attribute:: Request.selector
477
478 The URI path. If the :class:`Request` uses a proxy, then selector
479 will be the full url that is passed to the proxy.
480
481.. attribute:: Request.data
482
483 The entity body for the request, or None if not specified.
484
485.. attribute:: Request.unverifiable
486
487 boolean, indicates whether the request is unverifiable as defined
488 by RFC 2965.
Georg Brandl116aa622007-08-15 14:28:22 +0000489
490.. method:: Request.add_data(data)
491
492 Set the :class:`Request` data to *data*. This is ignored by all handlers except
493 HTTP handlers --- and there it should be a byte string, and will change the
494 request to be ``POST`` rather than ``GET``.
495
496
497.. method:: Request.get_method()
498
499 Return a string indicating the HTTP request method. This is only meaningful for
500 HTTP requests, and currently always returns ``'GET'`` or ``'POST'``.
501
502
503.. method:: Request.has_data()
504
505 Return whether the instance has a non-\ ``None`` data.
506
507
508.. method:: Request.get_data()
509
510 Return the instance's data.
511
512
513.. method:: Request.add_header(key, val)
514
515 Add another header to the request. Headers are currently ignored by all
516 handlers except HTTP handlers, where they are added to the list of headers sent
517 to the server. Note that there cannot be more than one header with the same
518 name, and later calls will overwrite previous calls in case the *key* collides.
519 Currently, this is no loss of HTTP functionality, since all headers which have
520 meaning when used more than once have a (header-specific) way of gaining the
521 same functionality using only one header.
522
523
524.. method:: Request.add_unredirected_header(key, header)
525
526 Add a header that will not be added to a redirected request.
527
Georg Brandl116aa622007-08-15 14:28:22 +0000528
529.. method:: Request.has_header(header)
530
531 Return whether the instance has the named header (checks both regular and
532 unredirected).
533
Georg Brandl116aa622007-08-15 14:28:22 +0000534
535.. method:: Request.get_full_url()
536
537 Return the URL given in the constructor.
538
539
540.. method:: Request.get_type()
541
542 Return the type of the URL --- also known as the scheme.
543
544
545.. method:: Request.get_host()
546
547 Return the host to which a connection will be made.
548
549
550.. method:: Request.get_selector()
551
552 Return the selector --- the part of the URL that is sent to the server.
553
554
555.. method:: Request.set_proxy(host, type)
556
557 Prepare the request by connecting to a proxy server. The *host* and *type* will
558 replace those of the instance, and the instance's selector will be the original
559 URL given in the constructor.
560
561
562.. method:: Request.get_origin_req_host()
563
564 Return the request-host of the origin transaction, as defined by :rfc:`2965`.
565 See the documentation for the :class:`Request` constructor.
566
567
568.. method:: Request.is_unverifiable()
569
570 Return whether the request is unverifiable, as defined by RFC 2965. See the
571 documentation for the :class:`Request` constructor.
572
573
574.. _opener-director-objects:
575
576OpenerDirector Objects
577----------------------
578
579:class:`OpenerDirector` instances have the following methods:
580
581
582.. method:: OpenerDirector.add_handler(handler)
583
584 *handler* should be an instance of :class:`BaseHandler`. The following methods
585 are searched, and added to the possible chains (note that HTTP errors are a
586 special case).
587
588 * :meth:`protocol_open` --- signal that the handler knows how to open *protocol*
589 URLs.
590
591 * :meth:`http_error_type` --- signal that the handler knows how to handle HTTP
592 errors with HTTP error code *type*.
593
594 * :meth:`protocol_error` --- signal that the handler knows how to handle errors
595 from (non-\ ``http``) *protocol*.
596
597 * :meth:`protocol_request` --- signal that the handler knows how to pre-process
598 *protocol* requests.
599
600 * :meth:`protocol_response` --- signal that the handler knows how to
601 post-process *protocol* responses.
602
603
Georg Brandlb044b2a2009-09-16 16:05:59 +0000604.. method:: OpenerDirector.open(url, data=None[, timeout])
Georg Brandl116aa622007-08-15 14:28:22 +0000605
606 Open the given *url* (which can be a request object or a string), optionally
Alexandre Vassalotti5f8ced22008-05-16 00:03:33 +0000607 passing the given *data*. Arguments, return values and exceptions raised are
608 the same as those of :func:`urlopen` (which simply calls the :meth:`open`
609 method on the currently installed global :class:`OpenerDirector`). The
610 optional *timeout* parameter specifies a timeout in seconds for blocking
Georg Brandlf78e02b2008-06-10 17:40:04 +0000611 operations like the connection attempt (if not specified, the global default
Georg Brandl1bb061d2010-05-21 21:01:43 +0000612 timeout setting will be used). The timeout feature actually works only for
Senthil Kumaranf066e272010-10-05 18:41:01 +0000613 HTTP, HTTPS and FTP connections).
Georg Brandl116aa622007-08-15 14:28:22 +0000614
Georg Brandl116aa622007-08-15 14:28:22 +0000615
Georg Brandlb044b2a2009-09-16 16:05:59 +0000616.. method:: OpenerDirector.error(proto, *args)
Georg Brandl116aa622007-08-15 14:28:22 +0000617
618 Handle an error of the given protocol. This will call the registered error
619 handlers for the given protocol with the given arguments (which are protocol
620 specific). The HTTP protocol is a special case which uses the HTTP response
621 code to determine the specific error handler; refer to the :meth:`http_error_\*`
622 methods of the handler classes.
623
624 Return values and exceptions raised are the same as those of :func:`urlopen`.
625
626OpenerDirector objects open URLs in three stages:
627
628The order in which these methods are called within each stage is determined by
629sorting the handler instances.
630
631#. Every handler with a method named like :meth:`protocol_request` has that
632 method called to pre-process the request.
633
634#. Handlers with a method named like :meth:`protocol_open` are called to handle
635 the request. This stage ends when a handler either returns a non-\ :const:`None`
636 value (ie. a response), or raises an exception (usually :exc:`URLError`).
637 Exceptions are allowed to propagate.
638
639 In fact, the above algorithm is first tried for methods named
640 :meth:`default_open`. If all such methods return :const:`None`, the algorithm
641 is repeated for methods named like :meth:`protocol_open`. If all such methods
642 return :const:`None`, the algorithm is repeated for methods named
643 :meth:`unknown_open`.
644
645 Note that the implementation of these methods may involve calls of the parent
Georg Brandl8b256ca2010-08-01 21:25:46 +0000646 :class:`OpenerDirector` instance's :meth:`~OpenerDirector.open` and
647 :meth:`~OpenerDirector.error` methods.
Georg Brandl116aa622007-08-15 14:28:22 +0000648
649#. Every handler with a method named like :meth:`protocol_response` has that
650 method called to post-process the response.
651
652
653.. _base-handler-objects:
654
655BaseHandler Objects
656-------------------
657
658:class:`BaseHandler` objects provide a couple of methods that are directly
659useful, and others that are meant to be used by derived classes. These are
660intended for direct use:
661
662
663.. method:: BaseHandler.add_parent(director)
664
665 Add a director as parent.
666
667
668.. method:: BaseHandler.close()
669
670 Remove any parents.
671
672The following members and methods should only be used by classes derived from
673:class:`BaseHandler`.
674
675.. note::
676
677 The convention has been adopted that subclasses defining
678 :meth:`protocol_request` or :meth:`protocol_response` methods are named
679 :class:`\*Processor`; all others are named :class:`\*Handler`.
680
681
682.. attribute:: BaseHandler.parent
683
684 A valid :class:`OpenerDirector`, which can be used to open using a different
685 protocol, or handle errors.
686
687
688.. method:: BaseHandler.default_open(req)
689
690 This method is *not* defined in :class:`BaseHandler`, but subclasses should
691 define it if they want to catch all URLs.
692
693 This method, if implemented, will be called by the parent
694 :class:`OpenerDirector`. It should return a file-like object as described in
695 the return value of the :meth:`open` of :class:`OpenerDirector`, or ``None``.
696 It should raise :exc:`URLError`, unless a truly exceptional thing happens (for
697 example, :exc:`MemoryError` should not be mapped to :exc:`URLError`).
698
699 This method will be called before any protocol-specific open method.
700
701
702.. method:: BaseHandler.protocol_open(req)
703 :noindex:
704
705 This method is *not* defined in :class:`BaseHandler`, but subclasses should
706 define it if they want to handle URLs with the given protocol.
707
708 This method, if defined, will be called by the parent :class:`OpenerDirector`.
709 Return values should be the same as for :meth:`default_open`.
710
711
712.. method:: BaseHandler.unknown_open(req)
713
714 This method is *not* defined in :class:`BaseHandler`, but subclasses should
715 define it if they want to catch all URLs with no specific registered handler to
716 open it.
717
718 This method, if implemented, will be called by the :attr:`parent`
719 :class:`OpenerDirector`. Return values should be the same as for
720 :meth:`default_open`.
721
722
723.. method:: BaseHandler.http_error_default(req, fp, code, msg, hdrs)
724
725 This method is *not* defined in :class:`BaseHandler`, but subclasses should
726 override it if they intend to provide a catch-all for otherwise unhandled HTTP
727 errors. It will be called automatically by the :class:`OpenerDirector` getting
728 the error, and should not normally be called in other circumstances.
729
730 *req* will be a :class:`Request` object, *fp* will be a file-like object with
731 the HTTP error body, *code* will be the three-digit code of the error, *msg*
732 will be the user-visible explanation of the code and *hdrs* will be a mapping
733 object with the headers of the error.
734
735 Return values and exceptions raised should be the same as those of
736 :func:`urlopen`.
737
738
739.. method:: BaseHandler.http_error_nnn(req, fp, code, msg, hdrs)
740
741 *nnn* should be a three-digit HTTP error code. This method is also not defined
742 in :class:`BaseHandler`, but will be called, if it exists, on an instance of a
743 subclass, when an HTTP error with code *nnn* occurs.
744
745 Subclasses should override this method to handle specific HTTP errors.
746
747 Arguments, return values and exceptions raised should be the same as for
748 :meth:`http_error_default`.
749
750
751.. method:: BaseHandler.protocol_request(req)
752 :noindex:
753
754 This method is *not* defined in :class:`BaseHandler`, but subclasses should
755 define it if they want to pre-process requests of the given protocol.
756
757 This method, if defined, will be called by the parent :class:`OpenerDirector`.
758 *req* will be a :class:`Request` object. The return value should be a
759 :class:`Request` object.
760
761
762.. method:: BaseHandler.protocol_response(req, response)
763 :noindex:
764
765 This method is *not* defined in :class:`BaseHandler`, but subclasses should
766 define it if they want to post-process responses of the given protocol.
767
768 This method, if defined, will be called by the parent :class:`OpenerDirector`.
769 *req* will be a :class:`Request` object. *response* will be an object
770 implementing the same interface as the return value of :func:`urlopen`. The
771 return value should implement the same interface as the return value of
772 :func:`urlopen`.
773
774
775.. _http-redirect-handler:
776
777HTTPRedirectHandler Objects
778---------------------------
779
780.. note::
781
782 Some HTTP redirections require action from this module's client code. If this
783 is the case, :exc:`HTTPError` is raised. See :rfc:`2616` for details of the
784 precise meanings of the various redirection codes.
785
guido@google.coma119df92011-03-29 11:41:02 -0700786 An :class:`HTTPError` exception raised as a security consideration if the
787 HTTPRedirectHandler is presented with a redirected url which is not an HTTP,
788 HTTPS or FTP url.
789
Georg Brandl116aa622007-08-15 14:28:22 +0000790
Georg Brandl9617a592009-02-13 10:40:43 +0000791.. method:: HTTPRedirectHandler.redirect_request(req, fp, code, msg, hdrs, newurl)
Georg Brandl116aa622007-08-15 14:28:22 +0000792
793 Return a :class:`Request` or ``None`` in response to a redirect. This is called
794 by the default implementations of the :meth:`http_error_30\*` methods when a
795 redirection is received from the server. If a redirection should take place,
796 return a new :class:`Request` to allow :meth:`http_error_30\*` to perform the
Georg Brandl9617a592009-02-13 10:40:43 +0000797 redirect to *newurl*. Otherwise, raise :exc:`HTTPError` if no other handler
798 should try to handle this URL, or return ``None`` if you can't but another
799 handler might.
Georg Brandl116aa622007-08-15 14:28:22 +0000800
801 .. note::
802
803 The default implementation of this method does not strictly follow :rfc:`2616`,
804 which says that 301 and 302 responses to ``POST`` requests must not be
805 automatically redirected without confirmation by the user. In reality, browsers
806 do allow automatic redirection of these responses, changing the POST to a
807 ``GET``, and the default implementation reproduces this behavior.
808
809
810.. method:: HTTPRedirectHandler.http_error_301(req, fp, code, msg, hdrs)
811
Georg Brandl9617a592009-02-13 10:40:43 +0000812 Redirect to the ``Location:`` or ``URI:`` URL. This method is called by the
813 parent :class:`OpenerDirector` when getting an HTTP 'moved permanently' response.
Georg Brandl116aa622007-08-15 14:28:22 +0000814
815
816.. method:: HTTPRedirectHandler.http_error_302(req, fp, code, msg, hdrs)
817
818 The same as :meth:`http_error_301`, but called for the 'found' response.
819
820
821.. method:: HTTPRedirectHandler.http_error_303(req, fp, code, msg, hdrs)
822
823 The same as :meth:`http_error_301`, but called for the 'see other' response.
824
825
826.. method:: HTTPRedirectHandler.http_error_307(req, fp, code, msg, hdrs)
827
828 The same as :meth:`http_error_301`, but called for the 'temporary redirect'
829 response.
830
831
832.. _http-cookie-processor:
833
834HTTPCookieProcessor Objects
835---------------------------
836
Georg Brandl116aa622007-08-15 14:28:22 +0000837:class:`HTTPCookieProcessor` instances have one attribute:
838
Georg Brandl116aa622007-08-15 14:28:22 +0000839.. attribute:: HTTPCookieProcessor.cookiejar
840
Georg Brandl24420152008-05-26 16:32:26 +0000841 The :class:`http.cookiejar.CookieJar` in which cookies are stored.
Georg Brandl116aa622007-08-15 14:28:22 +0000842
843
844.. _proxy-handler:
845
846ProxyHandler Objects
847--------------------
848
849
850.. method:: ProxyHandler.protocol_open(request)
851 :noindex:
852
853 The :class:`ProxyHandler` will have a method :meth:`protocol_open` for every
854 *protocol* which has a proxy in the *proxies* dictionary given in the
855 constructor. The method will modify requests to go through the proxy, by
856 calling ``request.set_proxy()``, and call the next handler in the chain to
857 actually execute the protocol.
858
859
860.. _http-password-mgr:
861
862HTTPPasswordMgr Objects
863-----------------------
864
865These methods are available on :class:`HTTPPasswordMgr` and
866:class:`HTTPPasswordMgrWithDefaultRealm` objects.
867
868
869.. method:: HTTPPasswordMgr.add_password(realm, uri, user, passwd)
870
871 *uri* can be either a single URI, or a sequence of URIs. *realm*, *user* and
872 *passwd* must be strings. This causes ``(user, passwd)`` to be used as
873 authentication tokens when authentication for *realm* and a super-URI of any of
874 the given URIs is given.
875
876
877.. method:: HTTPPasswordMgr.find_user_password(realm, authuri)
878
879 Get user/password for given realm and URI, if any. This method will return
880 ``(None, None)`` if there is no matching user/password.
881
882 For :class:`HTTPPasswordMgrWithDefaultRealm` objects, the realm ``None`` will be
883 searched if the given *realm* has no matching user/password.
884
885
886.. _abstract-basic-auth-handler:
887
888AbstractBasicAuthHandler Objects
889--------------------------------
890
891
892.. method:: AbstractBasicAuthHandler.http_error_auth_reqed(authreq, host, req, headers)
893
894 Handle an authentication request by getting a user/password pair, and re-trying
895 the request. *authreq* should be the name of the header where the information
896 about the realm is included in the request, *host* specifies the URL and path to
897 authenticate for, *req* should be the (failed) :class:`Request` object, and
898 *headers* should be the error headers.
899
900 *host* is either an authority (e.g. ``"python.org"``) or a URL containing an
901 authority component (e.g. ``"http://python.org/"``). In either case, the
902 authority must not contain a userinfo component (so, ``"python.org"`` and
903 ``"python.org:80"`` are fine, ``"joe:password@python.org"`` is not).
904
905
906.. _http-basic-auth-handler:
907
908HTTPBasicAuthHandler Objects
909----------------------------
910
911
912.. method:: HTTPBasicAuthHandler.http_error_401(req, fp, code, msg, hdrs)
913
914 Retry the request with authentication information, if available.
915
916
917.. _proxy-basic-auth-handler:
918
919ProxyBasicAuthHandler Objects
920-----------------------------
921
922
923.. method:: ProxyBasicAuthHandler.http_error_407(req, fp, code, msg, hdrs)
924
925 Retry the request with authentication information, if available.
926
927
928.. _abstract-digest-auth-handler:
929
930AbstractDigestAuthHandler Objects
931---------------------------------
932
933
934.. method:: AbstractDigestAuthHandler.http_error_auth_reqed(authreq, host, req, headers)
935
936 *authreq* should be the name of the header where the information about the realm
937 is included in the request, *host* should be the host to authenticate to, *req*
938 should be the (failed) :class:`Request` object, and *headers* should be the
939 error headers.
940
941
942.. _http-digest-auth-handler:
943
944HTTPDigestAuthHandler Objects
945-----------------------------
946
947
948.. method:: HTTPDigestAuthHandler.http_error_401(req, fp, code, msg, hdrs)
949
950 Retry the request with authentication information, if available.
951
952
953.. _proxy-digest-auth-handler:
954
955ProxyDigestAuthHandler Objects
956------------------------------
957
958
959.. method:: ProxyDigestAuthHandler.http_error_407(req, fp, code, msg, hdrs)
960
961 Retry the request with authentication information, if available.
962
963
964.. _http-handler-objects:
965
966HTTPHandler Objects
967-------------------
968
969
970.. method:: HTTPHandler.http_open(req)
971
972 Send an HTTP request, which can be either GET or POST, depending on
973 ``req.has_data()``.
974
975
976.. _https-handler-objects:
977
978HTTPSHandler Objects
979--------------------
980
981
982.. method:: HTTPSHandler.https_open(req)
983
984 Send an HTTPS request, which can be either GET or POST, depending on
985 ``req.has_data()``.
986
987
988.. _file-handler-objects:
989
990FileHandler Objects
991-------------------
992
993
994.. method:: FileHandler.file_open(req)
995
996 Open the file locally, if there is no host name, or the host name is
997 ``'localhost'``. Change the protocol to ``ftp`` otherwise, and retry opening it
998 using :attr:`parent`.
999
1000
1001.. _ftp-handler-objects:
1002
1003FTPHandler Objects
1004------------------
1005
1006
1007.. method:: FTPHandler.ftp_open(req)
1008
1009 Open the FTP file indicated by *req*. The login is always done with empty
1010 username and password.
1011
1012
1013.. _cacheftp-handler-objects:
1014
1015CacheFTPHandler Objects
1016-----------------------
1017
1018:class:`CacheFTPHandler` objects are :class:`FTPHandler` objects with the
1019following additional methods:
1020
1021
1022.. method:: CacheFTPHandler.setTimeout(t)
1023
1024 Set timeout of connections to *t* seconds.
1025
1026
1027.. method:: CacheFTPHandler.setMaxConns(m)
1028
1029 Set maximum number of cached connections to *m*.
1030
1031
1032.. _unknown-handler-objects:
1033
1034UnknownHandler Objects
1035----------------------
1036
1037
1038.. method:: UnknownHandler.unknown_open()
1039
1040 Raise a :exc:`URLError` exception.
1041
1042
1043.. _http-error-processor-objects:
1044
1045HTTPErrorProcessor Objects
1046--------------------------
1047
Georg Brandl116aa622007-08-15 14:28:22 +00001048.. method:: HTTPErrorProcessor.unknown_open()
1049
1050 Process HTTP error responses.
1051
1052 For 200 error codes, the response object is returned immediately.
1053
1054 For non-200 error codes, this simply passes the job on to the
1055 :meth:`protocol_error_code` handler methods, via :meth:`OpenerDirector.error`.
Georg Brandl0f7ede42008-06-23 11:23:31 +00001056 Eventually, :class:`HTTPDefaultErrorHandler` will raise an
Georg Brandl116aa622007-08-15 14:28:22 +00001057 :exc:`HTTPError` if no other handler handles the error.
1058
Georg Brandl0f7ede42008-06-23 11:23:31 +00001059
1060.. _urllib-request-examples:
Georg Brandl116aa622007-08-15 14:28:22 +00001061
1062Examples
1063--------
1064
Senthil Kumarand0ab48f2010-04-22 10:58:56 +00001065This example gets the python.org main page and displays the first 300 bytes of
Georg Brandl16489242010-10-06 08:03:21 +00001066it. ::
Georg Brandl116aa622007-08-15 14:28:22 +00001067
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001068 >>> import urllib.request
1069 >>> f = urllib.request.urlopen('http://www.python.org/')
Senthil Kumarand0ab48f2010-04-22 10:58:56 +00001070 >>> print(f.read(300))
1071 b'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
1072 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">\n\n\n<html
1073 xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">\n\n<head>\n
1074 <meta http-equiv="content-type" content="text/html; charset=utf-8" />\n
1075 <title>Python Programming '
Senthil Kumaran0e3e4852010-04-15 17:21:29 +00001076
Senthil Kumarand0ab48f2010-04-22 10:58:56 +00001077Note that urlopen returns a bytes object. This is because there is no way
1078for urlopen to automatically determine the encoding of the byte stream
1079it receives from the http server. In general, a program will decode
1080the returned bytes object to string once it determines or guesses
1081the appropriate encoding.
Senthil Kumaran0e3e4852010-04-15 17:21:29 +00001082
Senthil Kumarand0ab48f2010-04-22 10:58:56 +00001083The following W3C document, http://www.w3.org/International/O-charset , lists
1084the various ways in which a (X)HTML or a XML document could have specified its
1085encoding information.
1086
1087As python.org website uses *utf-8* encoding as specified in it's meta tag, we
1088will use same for decoding the bytes object. ::
Senthil Kumaran0e3e4852010-04-15 17:21:29 +00001089
1090 >>> import urllib.request
1091 >>> f = urllib.request.urlopen('http://www.python.org/')
Georg Brandl4e0bd6d2010-05-21 21:02:56 +00001092 >>> print(f.read(100).decode('utf-8'))
Senthil Kumarand0ab48f2010-04-22 10:58:56 +00001093 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
1094 "http://www.w3.org/TR/xhtml1/DTD/xhtm
1095
Georg Brandl116aa622007-08-15 14:28:22 +00001096
Senthil Kumaran0e3e4852010-04-15 17:21:29 +00001097In the following example, we are sending a data-stream to the stdin of a CGI
1098and reading the data it returns to us. Note that this example will only work
1099when the Python installation supports SSL. ::
Georg Brandl116aa622007-08-15 14:28:22 +00001100
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001101 >>> import urllib.request
1102 >>> req = urllib.request.Request(url='https://localhost/cgi-bin/test.cgi',
Georg Brandl116aa622007-08-15 14:28:22 +00001103 ... data='This data is passed to stdin of the CGI')
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001104 >>> f = urllib.request.urlopen(req)
Senthil Kumaran0e3e4852010-04-15 17:21:29 +00001105 >>> print(f.read().decode('utf-8'))
Georg Brandl116aa622007-08-15 14:28:22 +00001106 Got Data: "This data is passed to stdin of the CGI"
1107
1108The code for the sample CGI used in the above example is::
1109
1110 #!/usr/bin/env python
1111 import sys
1112 data = sys.stdin.read()
Collin Winterc79461b2007-09-01 23:34:30 +00001113 print('Content-type: text-plain\n\nGot Data: "%s"' % data)
Georg Brandl116aa622007-08-15 14:28:22 +00001114
1115Use of Basic HTTP Authentication::
1116
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001117 import urllib.request
Georg Brandl116aa622007-08-15 14:28:22 +00001118 # Create an OpenerDirector with support for Basic HTTP Authentication...
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001119 auth_handler = urllib.request.HTTPBasicAuthHandler()
Georg Brandl116aa622007-08-15 14:28:22 +00001120 auth_handler.add_password(realm='PDQ Application',
1121 uri='https://mahler:8092/site-updates.py',
1122 user='klem',
1123 passwd='kadidd!ehopper')
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001124 opener = urllib.request.build_opener(auth_handler)
Georg Brandl116aa622007-08-15 14:28:22 +00001125 # ...and install it globally so it can be used with urlopen.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001126 urllib.request.install_opener(opener)
1127 urllib.request.urlopen('http://www.example.com/login.html')
Georg Brandl116aa622007-08-15 14:28:22 +00001128
1129:func:`build_opener` provides many handlers by default, including a
1130:class:`ProxyHandler`. By default, :class:`ProxyHandler` uses the environment
1131variables named ``<scheme>_proxy``, where ``<scheme>`` is the URL scheme
1132involved. For example, the :envvar:`http_proxy` environment variable is read to
1133obtain the HTTP proxy's URL.
1134
1135This example replaces the default :class:`ProxyHandler` with one that uses
Georg Brandl2ee470f2008-07-16 12:55:28 +00001136programmatically-supplied proxy URLs, and adds proxy authorization support with
Georg Brandl116aa622007-08-15 14:28:22 +00001137:class:`ProxyBasicAuthHandler`. ::
1138
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001139 proxy_handler = urllib.request.ProxyHandler({'http': 'http://www.example.com:3128/'})
Senthil Kumaranf9d95f72009-12-24 02:27:00 +00001140 proxy_auth_handler = urllib.request.ProxyBasicAuthHandler()
Georg Brandl116aa622007-08-15 14:28:22 +00001141 proxy_auth_handler.add_password('realm', 'host', 'username', 'password')
1142
Senthil Kumaranf9d95f72009-12-24 02:27:00 +00001143 opener = urllib.request.build_opener(proxy_handler, proxy_auth_handler)
Georg Brandl116aa622007-08-15 14:28:22 +00001144 # This time, rather than install the OpenerDirector, we use it directly:
1145 opener.open('http://www.example.com/login.html')
1146
1147Adding HTTP headers:
1148
1149Use the *headers* argument to the :class:`Request` constructor, or::
1150
Georg Brandl029986a2008-06-23 11:44:14 +00001151 import urllib.request
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001152 req = urllib.request.Request('http://www.example.com/')
Georg Brandl116aa622007-08-15 14:28:22 +00001153 req.add_header('Referer', 'http://www.python.org/')
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001154 r = urllib.request.urlopen(req)
Georg Brandl116aa622007-08-15 14:28:22 +00001155
1156:class:`OpenerDirector` automatically adds a :mailheader:`User-Agent` header to
1157every :class:`Request`. To change this::
1158
Georg Brandl029986a2008-06-23 11:44:14 +00001159 import urllib.request
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001160 opener = urllib.request.build_opener()
Georg Brandl116aa622007-08-15 14:28:22 +00001161 opener.addheaders = [('User-agent', 'Mozilla/5.0')]
1162 opener.open('http://www.example.com/')
1163
1164Also, remember that a few standard headers (:mailheader:`Content-Length`,
1165:mailheader:`Content-Type` and :mailheader:`Host`) are added when the
1166:class:`Request` is passed to :func:`urlopen` (or :meth:`OpenerDirector.open`).
1167
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001168.. _urllib-examples:
1169
1170Here is an example session that uses the ``GET`` method to retrieve a URL
1171containing parameters::
1172
1173 >>> import urllib.request
1174 >>> import urllib.parse
1175 >>> params = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
1176 >>> f = urllib.request.urlopen("http://www.musi-cal.com/cgi-bin/query?%s" % params)
Senthil Kumaran0e3e4852010-04-15 17:21:29 +00001177 >>> print(f.read().decode('utf-8'))
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001178
1179The following example uses the ``POST`` method instead::
1180
1181 >>> import urllib.request
1182 >>> import urllib.parse
1183 >>> params = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
1184 >>> f = urllib.request.urlopen("http://www.musi-cal.com/cgi-bin/query", params)
Senthil Kumaran0e3e4852010-04-15 17:21:29 +00001185 >>> print(f.read().decode('utf-8'))
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001186
1187The following example uses an explicitly specified HTTP proxy, overriding
1188environment settings::
1189
1190 >>> import urllib.request
1191 >>> proxies = {'http': 'http://proxy.example.com:8080/'}
1192 >>> opener = urllib.request.FancyURLopener(proxies)
1193 >>> f = opener.open("http://www.python.org")
Senthil Kumaran0e3e4852010-04-15 17:21:29 +00001194 >>> f.read().decode('utf-8')
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001195
1196The following example uses no proxies at all, overriding environment settings::
1197
1198 >>> import urllib.request
1199 >>> opener = urllib.request.FancyURLopener({})
1200 >>> f = opener.open("http://www.python.org/")
Senthil Kumaran0e3e4852010-04-15 17:21:29 +00001201 >>> f.read().decode('utf-8')
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001202
1203
1204:mod:`urllib.request` Restrictions
1205----------------------------------
1206
1207 .. index::
1208 pair: HTTP; protocol
1209 pair: FTP; protocol
1210
1211* Currently, only the following protocols are supported: HTTP, (versions 0.9 and
1212 1.0), FTP, and local files.
1213
1214* The caching feature of :func:`urlretrieve` has been disabled until I find the
1215 time to hack proper processing of Expiration time headers.
1216
1217* There should be a function to query whether a particular URL is in the cache.
1218
1219* For backward compatibility, if a URL appears to point to a local file but the
1220 file can't be opened, the URL is re-interpreted using the FTP protocol. This
1221 can sometimes cause confusing error messages.
1222
1223* The :func:`urlopen` and :func:`urlretrieve` functions can cause arbitrarily
1224 long delays while waiting for a network connection to be set up. This means
1225 that it is difficult to build an interactive Web client using these functions
1226 without using threads.
1227
1228 .. index::
1229 single: HTML
1230 pair: HTTP; protocol
1231
1232* The data returned by :func:`urlopen` or :func:`urlretrieve` is the raw data
1233 returned by the server. This may be binary data (such as an image), plain text
1234 or (for example) HTML. The HTTP protocol provides type information in the reply
1235 header, which can be inspected by looking at the :mailheader:`Content-Type`
1236 header. If the returned data is HTML, you can use the module
1237 :mod:`html.parser` to parse it.
1238
1239 .. index:: single: FTP
1240
1241* The code handling the FTP protocol cannot differentiate between a file and a
1242 directory. This can lead to unexpected behavior when attempting to read a URL
1243 that points to a file that is not accessible. If the URL ends in a ``/``, it is
1244 assumed to refer to a directory and will be handled accordingly. But if an
1245 attempt to read a file leads to a 550 error (meaning the URL cannot be found or
1246 is not accessible, often for permission reasons), then the path is treated as a
1247 directory in order to handle the case when a directory is specified by a URL but
1248 the trailing ``/`` has been left off. This can cause misleading results when
1249 you try to fetch a file whose read permissions make it inaccessible; the FTP
1250 code will try to read it, fail with a 550 error, and then perform a directory
1251 listing for the unreadable file. If fine-grained control is needed, consider
Éric Araujo09eb9802011-03-20 18:30:37 +01001252 using the :mod:`ftplib` module, subclassing :class:`FancyURLopener`, or changing
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001253 *_urlopener* to meet your needs.
1254
Georg Brandl0f7ede42008-06-23 11:23:31 +00001255
1256
Georg Brandlf6c8fd62011-02-25 09:48:21 +00001257:mod:`urllib.response` --- Response classes used by urllib
1258==========================================================
Georg Brandl0f7ede42008-06-23 11:23:31 +00001259
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001260.. module:: urllib.response
1261 :synopsis: Response classes used by urllib.
1262
1263The :mod:`urllib.response` module defines functions and classes which define a
Georg Brandl0f7ede42008-06-23 11:23:31 +00001264minimal file like interface, including ``read()`` and ``readline()``. The
Ezio Melotti92165e62010-11-18 19:49:19 +00001265typical response object is an addinfourl instance, which defines an ``info()``
Georg Brandl0f7ede42008-06-23 11:23:31 +00001266method and that returns headers and a ``geturl()`` method that returns the url.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001267Functions defined by this module are used internally by the
1268:mod:`urllib.request` module.
1269