blob: fe61030f3511b7fdc57c90efc8a6d2bd08fb05dd [file] [log] [blame]
Georg Brandlf6c8fd62011-02-25 09:48:21 +00001:mod:`urllib.request` --- Extensible library for opening URLs
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00002=============================================================
Georg Brandl116aa622007-08-15 14:28:22 +00003
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00004.. module:: urllib.request
Georg Brandl116aa622007-08-15 14:28:22 +00005 :synopsis: Next generation URL opening library.
Jeremy Hyltone2573162009-03-31 14:38:13 +00006.. moduleauthor:: Jeremy Hylton <jeremy@alum.mit.edu>
Georg Brandl116aa622007-08-15 14:28:22 +00007.. sectionauthor:: Moshe Zadka <moshez@users.sourceforge.net>
8
9
Georg Brandl0f7ede42008-06-23 11:23:31 +000010The :mod:`urllib.request` module defines functions and classes which help in
11opening URLs (mostly HTTP) in a complex world --- basic and digest
12authentication, redirections, cookies and more.
Georg Brandl116aa622007-08-15 14:28:22 +000013
Antoine Pitrou509dd542010-09-29 11:25:47 +000014
Senthil Kumaranaca8fd72008-06-23 04:41:59 +000015The :mod:`urllib.request` module defines the following functions:
Georg Brandl116aa622007-08-15 14:28:22 +000016
17
Georg Brandlb044b2a2009-09-16 16:05:59 +000018.. function:: urlopen(url, data=None[, timeout])
Georg Brandl116aa622007-08-15 14:28:22 +000019
Jeremy Hyltone2573162009-03-31 14:38:13 +000020 Open the URL *url*, which can be either a string or a
21 :class:`Request` object.
Georg Brandl116aa622007-08-15 14:28:22 +000022
Senthil Kumaranf066e272010-10-05 18:41:01 +000023 .. warning::
24 HTTPS requests do not do any verification of the server's certificate.
25
Jeremy Hyltone2573162009-03-31 14:38:13 +000026 *data* may be a string specifying additional data to send to the
27 server, or ``None`` if no such data is needed. Currently HTTP
28 requests are the only ones that use *data*; the HTTP request will
29 be a POST instead of a GET when the *data* parameter is provided.
30 *data* should be a buffer in the standard
Georg Brandl116aa622007-08-15 14:28:22 +000031 :mimetype:`application/x-www-form-urlencoded` format. The
Georg Brandl7fe2c4a2008-12-05 07:32:56 +000032 :func:`urllib.parse.urlencode` function takes a mapping or sequence
33 of 2-tuples and returns a string in this format.
Georg Brandl116aa622007-08-15 14:28:22 +000034
Jeremy Hyltone2573162009-03-31 14:38:13 +000035 The optional *timeout* parameter specifies a timeout in seconds for
36 blocking operations like the connection attempt (if not specified,
37 the global default timeout setting will be used). This actually
Senthil Kumaranf066e272010-10-05 18:41:01 +000038 only works for HTTP, HTTPS and FTP connections.
Georg Brandl116aa622007-08-15 14:28:22 +000039
Senthil Kumaranaca8fd72008-06-23 04:41:59 +000040 This function returns a file-like object with two additional methods from
41 the :mod:`urllib.response` module
Georg Brandl116aa622007-08-15 14:28:22 +000042
Jeremy Hyltone2573162009-03-31 14:38:13 +000043 * :meth:`geturl` --- return the URL of the resource retrieved,
44 commonly used to determine if a redirect was followed
Georg Brandl116aa622007-08-15 14:28:22 +000045
Georg Brandl2dd01042009-02-27 16:46:46 +000046 * :meth:`info` --- return the meta-information of the page, such as headers,
Senthil Kumaran783df8d2010-06-28 17:35:17 +000047 in the form of an :func:`email.message_from_string` instance (see
48 `Quick Reference to HTTP Headers <http://www.cs.tut.fi/~jkorpela/http.html>`_)
Georg Brandl116aa622007-08-15 14:28:22 +000049
50 Raises :exc:`URLError` on errors.
51
Georg Brandl2dd01042009-02-27 16:46:46 +000052 Note that ``None`` may be returned if no handler handles the request (though
53 the default installed global :class:`OpenerDirector` uses
54 :class:`UnknownHandler` to ensure this never happens).
55
Senthil Kumaran6eb181a2009-10-18 01:57:26 +000056 In addition, default installed :class:`ProxyHandler` makes sure the requests
57 are handled through the proxy when they are set.
58
Georg Brandl2dd01042009-02-27 16:46:46 +000059 The legacy ``urllib.urlopen`` function from Python 2.6 and earlier has been
60 discontinued; :func:`urlopen` corresponds to the old ``urllib2.urlopen``.
61 Proxy handling, which was done by passing a dictionary parameter to
62 ``urllib.urlopen``, can be obtained by using :class:`ProxyHandler` objects.
Georg Brandl116aa622007-08-15 14:28:22 +000063
Georg Brandl116aa622007-08-15 14:28:22 +000064.. function:: install_opener(opener)
65
66 Install an :class:`OpenerDirector` instance as the default global opener.
67 Installing an opener is only necessary if you want urlopen to use that opener;
68 otherwise, simply call :meth:`OpenerDirector.open` instead of :func:`urlopen`.
69 The code does not check for a real :class:`OpenerDirector`, and any class with
70 the appropriate interface will work.
71
72
73.. function:: build_opener([handler, ...])
74
75 Return an :class:`OpenerDirector` instance, which chains the handlers in the
76 order given. *handler*\s can be either instances of :class:`BaseHandler`, or
77 subclasses of :class:`BaseHandler` (in which case it must be possible to call
78 the constructor without any parameters). Instances of the following classes
79 will be in front of the *handler*\s, unless the *handler*\s contain them,
80 instances of them or subclasses of them: :class:`ProxyHandler`,
81 :class:`UnknownHandler`, :class:`HTTPHandler`, :class:`HTTPDefaultErrorHandler`,
82 :class:`HTTPRedirectHandler`, :class:`FTPHandler`, :class:`FileHandler`,
83 :class:`HTTPErrorProcessor`.
84
Georg Brandlb044b2a2009-09-16 16:05:59 +000085 If the Python installation has SSL support (i.e., if the :mod:`ssl` module
86 can be imported), :class:`HTTPSHandler` will also be added.
Georg Brandl116aa622007-08-15 14:28:22 +000087
Georg Brandle6bcc912008-05-12 18:05:20 +000088 A :class:`BaseHandler` subclass may also change its :attr:`handler_order`
89 member variable to modify its position in the handlers list.
Georg Brandl116aa622007-08-15 14:28:22 +000090
Georg Brandlb044b2a2009-09-16 16:05:59 +000091
92.. function:: urlretrieve(url, filename=None, reporthook=None, data=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +000093
94 Copy a network object denoted by a URL to a local file, if necessary. If the URL
95 points to a local file, or a valid cached copy of the object exists, the object
96 is not copied. Return a tuple ``(filename, headers)`` where *filename* is the
97 local file name under which the object can be found, and *headers* is whatever
98 the :meth:`info` method of the object returned by :func:`urlopen` returned (for
99 a remote object, possibly cached). Exceptions are the same as for
100 :func:`urlopen`.
101
102 The second argument, if present, specifies the file location to copy to (if
103 absent, the location will be a tempfile with a generated name). The third
104 argument, if present, is a hook function that will be called once on
105 establishment of the network connection and once after each block read
106 thereafter. The hook will be passed three arguments; a count of blocks
107 transferred so far, a block size in bytes, and the total size of the file. The
108 third argument may be ``-1`` on older FTP servers which do not return a file
109 size in response to a retrieval request.
110
111 If the *url* uses the :file:`http:` scheme identifier, the optional *data*
112 argument may be given to specify a ``POST`` request (normally the request type
113 is ``GET``). The *data* argument must in standard
114 :mimetype:`application/x-www-form-urlencoded` format; see the :func:`urlencode`
115 function below.
116
117 :func:`urlretrieve` will raise :exc:`ContentTooShortError` when it detects that
118 the amount of data available was less than the expected amount (which is the
119 size reported by a *Content-Length* header). This can occur, for example, when
120 the download is interrupted.
121
122 The *Content-Length* is treated as a lower bound: if there's more data to read,
123 urlretrieve reads more data, but if less data is available, it raises the
124 exception.
125
126 You can still retrieve the downloaded data in this case, it is stored in the
127 :attr:`content` attribute of the exception instance.
128
129 If no *Content-Length* header was supplied, urlretrieve can not check the size
130 of the data it has downloaded, and just returns it. In this case you just have
131 to assume that the download was successful.
Georg Brandl116aa622007-08-15 14:28:22 +0000132
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000133.. function:: urlcleanup()
Georg Brandl116aa622007-08-15 14:28:22 +0000134
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000135 Clear the cache that may have been built up by previous calls to
136 :func:`urlretrieve`.
Christian Heimes292d3512008-02-03 16:51:08 +0000137
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000138.. function:: pathname2url(path)
Christian Heimes292d3512008-02-03 16:51:08 +0000139
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000140 Convert the pathname *path* from the local syntax for a path to the form used in
141 the path component of a URL. This does not produce a complete URL. The return
142 value will already be quoted using the :func:`quote` function.
Christian Heimes292d3512008-02-03 16:51:08 +0000143
144
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000145.. function:: url2pathname(path)
146
Senthil Kumaranea54b032010-08-09 20:05:35 +0000147 Convert the path component *path* from a percent-encoded URL to the local syntax for a
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000148 path. This does not accept a complete URL. This function uses :func:`unquote`
149 to decode *path*.
Georg Brandl116aa622007-08-15 14:28:22 +0000150
Senthil Kumaranc2eca3d2010-02-26 00:55:09 +0000151.. function:: getproxies()
152
153 This helper function returns a dictionary of scheme to proxy server URL
154 mappings. It scans the environment for variables named ``<scheme>_proxy``
155 for all operating systems first, and when it cannot find it, looks for proxy
156 information from Mac OSX System Configuration for Mac OS X and Windows
157 Systems Registry for Windows.
158
Georg Brandlb044b2a2009-09-16 16:05:59 +0000159
Georg Brandl116aa622007-08-15 14:28:22 +0000160The following classes are provided:
161
Georg Brandlb044b2a2009-09-16 16:05:59 +0000162.. class:: Request(url, data=None, headers={}, origin_req_host=None, unverifiable=False)
Georg Brandl116aa622007-08-15 14:28:22 +0000163
164 This class is an abstraction of a URL request.
165
166 *url* should be a string containing a valid URL.
167
Jeremy Hyltone2573162009-03-31 14:38:13 +0000168 *data* may be a string specifying additional data to send to the
169 server, or ``None`` if no such data is needed. Currently HTTP
170 requests are the only ones that use *data*; the HTTP request will
171 be a POST instead of a GET when the *data* parameter is provided.
172 *data* should be a buffer in the standard
Georg Brandl116aa622007-08-15 14:28:22 +0000173 :mimetype:`application/x-www-form-urlencoded` format. The
Georg Brandl7fe2c4a2008-12-05 07:32:56 +0000174 :func:`urllib.parse.urlencode` function takes a mapping or sequence
175 of 2-tuples and returns a string in this format.
Georg Brandl116aa622007-08-15 14:28:22 +0000176
Jeremy Hyltone2573162009-03-31 14:38:13 +0000177 *headers* should be a dictionary, and will be treated as if
178 :meth:`add_header` was called with each key and value as arguments.
179 This is often used to "spoof" the ``User-Agent`` header, which is
180 used by a browser to identify itself -- some HTTP servers only
181 allow requests coming from common browsers as opposed to scripts.
182 For example, Mozilla Firefox may identify itself as ``"Mozilla/5.0
183 (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11"``, while
184 :mod:`urllib`'s default user agent string is
185 ``"Python-urllib/2.6"`` (on Python 2.6).
Georg Brandl116aa622007-08-15 14:28:22 +0000186
Jeremy Hyltone2573162009-03-31 14:38:13 +0000187 The final two arguments are only of interest for correct handling
188 of third-party HTTP cookies:
Georg Brandl116aa622007-08-15 14:28:22 +0000189
Jeremy Hyltone2573162009-03-31 14:38:13 +0000190 *origin_req_host* should be the request-host of the origin
191 transaction, as defined by :rfc:`2965`. It defaults to
192 ``http.cookiejar.request_host(self)``. This is the host name or IP
193 address of the original request that was initiated by the user.
194 For example, if the request is for an image in an HTML document,
195 this should be the request-host of the request for the page
Georg Brandl24420152008-05-26 16:32:26 +0000196 containing the image.
Georg Brandl116aa622007-08-15 14:28:22 +0000197
Jeremy Hyltone2573162009-03-31 14:38:13 +0000198 *unverifiable* should indicate whether the request is unverifiable,
199 as defined by RFC 2965. It defaults to False. An unverifiable
200 request is one whose URL the user did not have the option to
201 approve. For example, if the request is for an image in an HTML
202 document, and the user had no option to approve the automatic
203 fetching of the image, this should be true.
Georg Brandl116aa622007-08-15 14:28:22 +0000204
Georg Brandlb044b2a2009-09-16 16:05:59 +0000205
206.. class:: URLopener(proxies=None, **x509)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000207
208 Base class for opening and reading URLs. Unless you need to support opening
209 objects using schemes other than :file:`http:`, :file:`ftp:`, or :file:`file:`,
210 you probably want to use :class:`FancyURLopener`.
211
212 By default, the :class:`URLopener` class sends a :mailheader:`User-Agent` header
213 of ``urllib/VVV``, where *VVV* is the :mod:`urllib` version number.
214 Applications can define their own :mailheader:`User-Agent` header by subclassing
215 :class:`URLopener` or :class:`FancyURLopener` and setting the class attribute
216 :attr:`version` to an appropriate string value in the subclass definition.
217
218 The optional *proxies* parameter should be a dictionary mapping scheme names to
219 proxy URLs, where an empty dictionary turns proxies off completely. Its default
220 value is ``None``, in which case environmental proxy settings will be used if
221 present, as discussed in the definition of :func:`urlopen`, above.
222
223 Additional keyword parameters, collected in *x509*, may be used for
224 authentication of the client when using the :file:`https:` scheme. The keywords
225 *key_file* and *cert_file* are supported to provide an SSL key and certificate;
226 both are needed to support client authentication.
227
228 :class:`URLopener` objects will raise an :exc:`IOError` exception if the server
229 returns an error code.
230
Georg Brandlb044b2a2009-09-16 16:05:59 +0000231 .. method:: open(fullurl, data=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000232
233 Open *fullurl* using the appropriate protocol. This method sets up cache and
234 proxy information, then calls the appropriate open method with its input
235 arguments. If the scheme is not recognized, :meth:`open_unknown` is called.
236 The *data* argument has the same meaning as the *data* argument of
237 :func:`urlopen`.
238
239
Georg Brandlb044b2a2009-09-16 16:05:59 +0000240 .. method:: open_unknown(fullurl, data=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000241
242 Overridable interface to open unknown URL types.
243
244
Georg Brandlb044b2a2009-09-16 16:05:59 +0000245 .. method:: retrieve(url, filename=None, reporthook=None, data=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000246
247 Retrieves the contents of *url* and places it in *filename*. The return value
248 is a tuple consisting of a local filename and either a
249 :class:`email.message.Message` object containing the response headers (for remote
250 URLs) or ``None`` (for local URLs). The caller must then open and read the
251 contents of *filename*. If *filename* is not given and the URL refers to a
252 local file, the input filename is returned. If the URL is non-local and
253 *filename* is not given, the filename is the output of :func:`tempfile.mktemp`
254 with a suffix that matches the suffix of the last path component of the input
255 URL. If *reporthook* is given, it must be a function accepting three numeric
256 parameters. It will be called after each chunk of data is read from the
257 network. *reporthook* is ignored for local URLs.
258
259 If the *url* uses the :file:`http:` scheme identifier, the optional *data*
260 argument may be given to specify a ``POST`` request (normally the request type
261 is ``GET``). The *data* argument must in standard
262 :mimetype:`application/x-www-form-urlencoded` format; see the :func:`urlencode`
263 function below.
264
265
266 .. attribute:: version
267
268 Variable that specifies the user agent of the opener object. To get
269 :mod:`urllib` to tell servers that it is a particular user agent, set this in a
270 subclass as a class variable or in the constructor before calling the base
271 constructor.
272
273
274.. class:: FancyURLopener(...)
275
276 :class:`FancyURLopener` subclasses :class:`URLopener` providing default handling
277 for the following HTTP response codes: 301, 302, 303, 307 and 401. For the 30x
278 response codes listed above, the :mailheader:`Location` header is used to fetch
279 the actual URL. For 401 response codes (authentication required), basic HTTP
280 authentication is performed. For the 30x response codes, recursion is bounded
281 by the value of the *maxtries* attribute, which defaults to 10.
282
283 For all other response codes, the method :meth:`http_error_default` is called
284 which you can override in subclasses to handle the error appropriately.
285
286 .. note::
287
288 According to the letter of :rfc:`2616`, 301 and 302 responses to POST requests
289 must not be automatically redirected without confirmation by the user. In
290 reality, browsers do allow automatic redirection of these responses, changing
291 the POST to a GET, and :mod:`urllib` reproduces this behaviour.
292
293 The parameters to the constructor are the same as those for :class:`URLopener`.
294
295 .. note::
296
297 When performing basic authentication, a :class:`FancyURLopener` instance calls
298 its :meth:`prompt_user_passwd` method. The default implementation asks the
299 users for the required information on the controlling terminal. A subclass may
300 override this method to support more appropriate behavior if needed.
301
Georg Brandlf6c8fd62011-02-25 09:48:21 +0000302 The :class:`FancyURLopener` class offers one additional method that should be
303 overloaded to provide the appropriate behavior:
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000304
Georg Brandlf6c8fd62011-02-25 09:48:21 +0000305 .. method:: prompt_user_passwd(host, realm)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000306
Georg Brandlf6c8fd62011-02-25 09:48:21 +0000307 Return information needed to authenticate the user at the given host in the
308 specified security realm. The return value should be a tuple, ``(user,
309 password)``, which can be used for basic authentication.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000310
Georg Brandlf6c8fd62011-02-25 09:48:21 +0000311 The implementation prompts for this information on the terminal; an application
312 should override this method to use an appropriate interaction model in the local
313 environment.
314
Georg Brandl116aa622007-08-15 14:28:22 +0000315
316.. class:: OpenerDirector()
317
318 The :class:`OpenerDirector` class opens URLs via :class:`BaseHandler`\ s chained
319 together. It manages the chaining of handlers, and recovery from errors.
320
321
322.. class:: BaseHandler()
323
324 This is the base class for all registered handlers --- and handles only the
325 simple mechanics of registration.
326
327
328.. class:: HTTPDefaultErrorHandler()
329
330 A class which defines a default handler for HTTP error responses; all responses
331 are turned into :exc:`HTTPError` exceptions.
332
333
334.. class:: HTTPRedirectHandler()
335
336 A class to handle redirections.
337
338
Georg Brandlb044b2a2009-09-16 16:05:59 +0000339.. class:: HTTPCookieProcessor(cookiejar=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000340
341 A class to handle HTTP Cookies.
342
343
Georg Brandlb044b2a2009-09-16 16:05:59 +0000344.. class:: ProxyHandler(proxies=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000345
346 Cause requests to go through a proxy. If *proxies* is given, it must be a
347 dictionary mapping protocol names to URLs of proxies. The default is to read the
348 list of proxies from the environment variables :envvar:`<protocol>_proxy`.
Senthil Kumaran6eb181a2009-10-18 01:57:26 +0000349 If no proxy environment variables are set, in a Windows environment, proxy
350 settings are obtained from the registry's Internet Settings section and in a
351 Mac OS X environment, proxy information is retrieved from the OS X System
352 Configuration Framework.
353
Christian Heimese25f35e2008-03-20 10:49:03 +0000354 To disable autodetected proxy pass an empty dictionary.
Georg Brandl116aa622007-08-15 14:28:22 +0000355
356
357.. class:: HTTPPasswordMgr()
358
359 Keep a database of ``(realm, uri) -> (user, password)`` mappings.
360
361
362.. class:: HTTPPasswordMgrWithDefaultRealm()
363
364 Keep a database of ``(realm, uri) -> (user, password)`` mappings. A realm of
365 ``None`` is considered a catch-all realm, which is searched if no other realm
366 fits.
367
368
Georg Brandlb044b2a2009-09-16 16:05:59 +0000369.. class:: AbstractBasicAuthHandler(password_mgr=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000370
371 This is a mixin class that helps with HTTP authentication, both to the remote
372 host and to a proxy. *password_mgr*, if given, should be something that is
373 compatible with :class:`HTTPPasswordMgr`; refer to section
374 :ref:`http-password-mgr` for information on the interface that must be
375 supported.
376
377
Georg Brandlb044b2a2009-09-16 16:05:59 +0000378.. class:: HTTPBasicAuthHandler(password_mgr=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000379
380 Handle authentication with the remote host. *password_mgr*, if given, should be
381 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
382 :ref:`http-password-mgr` for information on the interface that must be
383 supported.
384
385
Georg Brandlb044b2a2009-09-16 16:05:59 +0000386.. class:: ProxyBasicAuthHandler(password_mgr=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000387
388 Handle authentication with the proxy. *password_mgr*, if given, should be
389 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
390 :ref:`http-password-mgr` for information on the interface that must be
391 supported.
392
393
Georg Brandlb044b2a2009-09-16 16:05:59 +0000394.. class:: AbstractDigestAuthHandler(password_mgr=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000395
396 This is a mixin class that helps with HTTP authentication, both to the remote
397 host and to a proxy. *password_mgr*, if given, should be something that is
398 compatible with :class:`HTTPPasswordMgr`; refer to section
399 :ref:`http-password-mgr` for information on the interface that must be
400 supported.
401
402
Georg Brandlb044b2a2009-09-16 16:05:59 +0000403.. class:: HTTPDigestAuthHandler(password_mgr=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000404
405 Handle authentication with the remote host. *password_mgr*, if given, should be
406 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
407 :ref:`http-password-mgr` for information on the interface that must be
408 supported.
409
410
Georg Brandlb044b2a2009-09-16 16:05:59 +0000411.. class:: ProxyDigestAuthHandler(password_mgr=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000412
413 Handle authentication with the proxy. *password_mgr*, if given, should be
414 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
415 :ref:`http-password-mgr` for information on the interface that must be
416 supported.
417
418
419.. class:: HTTPHandler()
420
421 A class to handle opening of HTTP URLs.
422
423
424.. class:: HTTPSHandler()
425
426 A class to handle opening of HTTPS URLs.
427
428
429.. class:: FileHandler()
430
431 Open local files.
432
433
434.. class:: FTPHandler()
435
436 Open FTP URLs.
437
438
439.. class:: CacheFTPHandler()
440
441 Open FTP URLs, keeping a cache of open FTP connections to minimize delays.
442
443
444.. class:: UnknownHandler()
445
446 A catch-all class to handle unknown URLs.
447
448
449.. _request-objects:
450
451Request Objects
452---------------
453
Jeremy Hyltone2573162009-03-31 14:38:13 +0000454The following methods describe :class:`Request`'s public interface,
455and so all may be overridden in subclasses. It also defines several
456public attributes that can be used by clients to inspect the parsed
457request.
Georg Brandl116aa622007-08-15 14:28:22 +0000458
Jeremy Hyltone2573162009-03-31 14:38:13 +0000459.. attribute:: Request.full_url
460
461 The original URL passed to the constructor.
462
463.. attribute:: Request.type
464
465 The URI scheme.
466
467.. attribute:: Request.host
468
469 The URI authority, typically a host, but may also contain a port
470 separated by a colon.
471
472.. attribute:: Request.origin_req_host
473
474 The original host for the request, without port.
475
476.. attribute:: Request.selector
477
478 The URI path. If the :class:`Request` uses a proxy, then selector
479 will be the full url that is passed to the proxy.
480
481.. attribute:: Request.data
482
483 The entity body for the request, or None if not specified.
484
485.. attribute:: Request.unverifiable
486
487 boolean, indicates whether the request is unverifiable as defined
488 by RFC 2965.
Georg Brandl116aa622007-08-15 14:28:22 +0000489
490.. method:: Request.add_data(data)
491
492 Set the :class:`Request` data to *data*. This is ignored by all handlers except
493 HTTP handlers --- and there it should be a byte string, and will change the
494 request to be ``POST`` rather than ``GET``.
495
496
497.. method:: Request.get_method()
498
499 Return a string indicating the HTTP request method. This is only meaningful for
500 HTTP requests, and currently always returns ``'GET'`` or ``'POST'``.
501
502
503.. method:: Request.has_data()
504
505 Return whether the instance has a non-\ ``None`` data.
506
507
508.. method:: Request.get_data()
509
510 Return the instance's data.
511
512
513.. method:: Request.add_header(key, val)
514
515 Add another header to the request. Headers are currently ignored by all
516 handlers except HTTP handlers, where they are added to the list of headers sent
517 to the server. Note that there cannot be more than one header with the same
518 name, and later calls will overwrite previous calls in case the *key* collides.
519 Currently, this is no loss of HTTP functionality, since all headers which have
520 meaning when used more than once have a (header-specific) way of gaining the
521 same functionality using only one header.
522
523
524.. method:: Request.add_unredirected_header(key, header)
525
526 Add a header that will not be added to a redirected request.
527
Georg Brandl116aa622007-08-15 14:28:22 +0000528
529.. method:: Request.has_header(header)
530
531 Return whether the instance has the named header (checks both regular and
532 unredirected).
533
Georg Brandl116aa622007-08-15 14:28:22 +0000534
535.. method:: Request.get_full_url()
536
537 Return the URL given in the constructor.
538
539
540.. method:: Request.get_type()
541
542 Return the type of the URL --- also known as the scheme.
543
544
545.. method:: Request.get_host()
546
547 Return the host to which a connection will be made.
548
549
550.. method:: Request.get_selector()
551
552 Return the selector --- the part of the URL that is sent to the server.
553
554
555.. method:: Request.set_proxy(host, type)
556
557 Prepare the request by connecting to a proxy server. The *host* and *type* will
558 replace those of the instance, and the instance's selector will be the original
559 URL given in the constructor.
560
561
562.. method:: Request.get_origin_req_host()
563
564 Return the request-host of the origin transaction, as defined by :rfc:`2965`.
565 See the documentation for the :class:`Request` constructor.
566
567
568.. method:: Request.is_unverifiable()
569
570 Return whether the request is unverifiable, as defined by RFC 2965. See the
571 documentation for the :class:`Request` constructor.
572
573
574.. _opener-director-objects:
575
576OpenerDirector Objects
577----------------------
578
579:class:`OpenerDirector` instances have the following methods:
580
581
582.. method:: OpenerDirector.add_handler(handler)
583
584 *handler* should be an instance of :class:`BaseHandler`. The following methods
585 are searched, and added to the possible chains (note that HTTP errors are a
586 special case).
587
588 * :meth:`protocol_open` --- signal that the handler knows how to open *protocol*
589 URLs.
590
591 * :meth:`http_error_type` --- signal that the handler knows how to handle HTTP
592 errors with HTTP error code *type*.
593
594 * :meth:`protocol_error` --- signal that the handler knows how to handle errors
595 from (non-\ ``http``) *protocol*.
596
597 * :meth:`protocol_request` --- signal that the handler knows how to pre-process
598 *protocol* requests.
599
600 * :meth:`protocol_response` --- signal that the handler knows how to
601 post-process *protocol* responses.
602
603
Georg Brandlb044b2a2009-09-16 16:05:59 +0000604.. method:: OpenerDirector.open(url, data=None[, timeout])
Georg Brandl116aa622007-08-15 14:28:22 +0000605
606 Open the given *url* (which can be a request object or a string), optionally
Alexandre Vassalotti5f8ced22008-05-16 00:03:33 +0000607 passing the given *data*. Arguments, return values and exceptions raised are
608 the same as those of :func:`urlopen` (which simply calls the :meth:`open`
609 method on the currently installed global :class:`OpenerDirector`). The
610 optional *timeout* parameter specifies a timeout in seconds for blocking
Georg Brandlf78e02b2008-06-10 17:40:04 +0000611 operations like the connection attempt (if not specified, the global default
Georg Brandl1bb061d2010-05-21 21:01:43 +0000612 timeout setting will be used). The timeout feature actually works only for
Senthil Kumaranf066e272010-10-05 18:41:01 +0000613 HTTP, HTTPS and FTP connections).
Georg Brandl116aa622007-08-15 14:28:22 +0000614
Georg Brandl116aa622007-08-15 14:28:22 +0000615
Georg Brandlb044b2a2009-09-16 16:05:59 +0000616.. method:: OpenerDirector.error(proto, *args)
Georg Brandl116aa622007-08-15 14:28:22 +0000617
618 Handle an error of the given protocol. This will call the registered error
619 handlers for the given protocol with the given arguments (which are protocol
620 specific). The HTTP protocol is a special case which uses the HTTP response
621 code to determine the specific error handler; refer to the :meth:`http_error_\*`
622 methods of the handler classes.
623
624 Return values and exceptions raised are the same as those of :func:`urlopen`.
625
626OpenerDirector objects open URLs in three stages:
627
628The order in which these methods are called within each stage is determined by
629sorting the handler instances.
630
631#. Every handler with a method named like :meth:`protocol_request` has that
632 method called to pre-process the request.
633
634#. Handlers with a method named like :meth:`protocol_open` are called to handle
635 the request. This stage ends when a handler either returns a non-\ :const:`None`
636 value (ie. a response), or raises an exception (usually :exc:`URLError`).
637 Exceptions are allowed to propagate.
638
639 In fact, the above algorithm is first tried for methods named
640 :meth:`default_open`. If all such methods return :const:`None`, the algorithm
641 is repeated for methods named like :meth:`protocol_open`. If all such methods
642 return :const:`None`, the algorithm is repeated for methods named
643 :meth:`unknown_open`.
644
645 Note that the implementation of these methods may involve calls of the parent
Georg Brandl8b256ca2010-08-01 21:25:46 +0000646 :class:`OpenerDirector` instance's :meth:`~OpenerDirector.open` and
647 :meth:`~OpenerDirector.error` methods.
Georg Brandl116aa622007-08-15 14:28:22 +0000648
649#. Every handler with a method named like :meth:`protocol_response` has that
650 method called to post-process the response.
651
652
653.. _base-handler-objects:
654
655BaseHandler Objects
656-------------------
657
658:class:`BaseHandler` objects provide a couple of methods that are directly
659useful, and others that are meant to be used by derived classes. These are
660intended for direct use:
661
662
663.. method:: BaseHandler.add_parent(director)
664
665 Add a director as parent.
666
667
668.. method:: BaseHandler.close()
669
670 Remove any parents.
671
672The following members and methods should only be used by classes derived from
673:class:`BaseHandler`.
674
675.. note::
676
677 The convention has been adopted that subclasses defining
678 :meth:`protocol_request` or :meth:`protocol_response` methods are named
679 :class:`\*Processor`; all others are named :class:`\*Handler`.
680
681
682.. attribute:: BaseHandler.parent
683
684 A valid :class:`OpenerDirector`, which can be used to open using a different
685 protocol, or handle errors.
686
687
688.. method:: BaseHandler.default_open(req)
689
690 This method is *not* defined in :class:`BaseHandler`, but subclasses should
691 define it if they want to catch all URLs.
692
693 This method, if implemented, will be called by the parent
694 :class:`OpenerDirector`. It should return a file-like object as described in
695 the return value of the :meth:`open` of :class:`OpenerDirector`, or ``None``.
696 It should raise :exc:`URLError`, unless a truly exceptional thing happens (for
697 example, :exc:`MemoryError` should not be mapped to :exc:`URLError`).
698
699 This method will be called before any protocol-specific open method.
700
701
702.. method:: BaseHandler.protocol_open(req)
703 :noindex:
704
705 This method is *not* defined in :class:`BaseHandler`, but subclasses should
706 define it if they want to handle URLs with the given protocol.
707
708 This method, if defined, will be called by the parent :class:`OpenerDirector`.
709 Return values should be the same as for :meth:`default_open`.
710
711
712.. method:: BaseHandler.unknown_open(req)
713
714 This method is *not* defined in :class:`BaseHandler`, but subclasses should
715 define it if they want to catch all URLs with no specific registered handler to
716 open it.
717
718 This method, if implemented, will be called by the :attr:`parent`
719 :class:`OpenerDirector`. Return values should be the same as for
720 :meth:`default_open`.
721
722
723.. method:: BaseHandler.http_error_default(req, fp, code, msg, hdrs)
724
725 This method is *not* defined in :class:`BaseHandler`, but subclasses should
726 override it if they intend to provide a catch-all for otherwise unhandled HTTP
727 errors. It will be called automatically by the :class:`OpenerDirector` getting
728 the error, and should not normally be called in other circumstances.
729
730 *req* will be a :class:`Request` object, *fp* will be a file-like object with
731 the HTTP error body, *code* will be the three-digit code of the error, *msg*
732 will be the user-visible explanation of the code and *hdrs* will be a mapping
733 object with the headers of the error.
734
735 Return values and exceptions raised should be the same as those of
736 :func:`urlopen`.
737
738
739.. method:: BaseHandler.http_error_nnn(req, fp, code, msg, hdrs)
740
741 *nnn* should be a three-digit HTTP error code. This method is also not defined
742 in :class:`BaseHandler`, but will be called, if it exists, on an instance of a
743 subclass, when an HTTP error with code *nnn* occurs.
744
745 Subclasses should override this method to handle specific HTTP errors.
746
747 Arguments, return values and exceptions raised should be the same as for
748 :meth:`http_error_default`.
749
750
751.. method:: BaseHandler.protocol_request(req)
752 :noindex:
753
754 This method is *not* defined in :class:`BaseHandler`, but subclasses should
755 define it if they want to pre-process requests of the given protocol.
756
757 This method, if defined, will be called by the parent :class:`OpenerDirector`.
758 *req* will be a :class:`Request` object. The return value should be a
759 :class:`Request` object.
760
761
762.. method:: BaseHandler.protocol_response(req, response)
763 :noindex:
764
765 This method is *not* defined in :class:`BaseHandler`, but subclasses should
766 define it if they want to post-process responses of the given protocol.
767
768 This method, if defined, will be called by the parent :class:`OpenerDirector`.
769 *req* will be a :class:`Request` object. *response* will be an object
770 implementing the same interface as the return value of :func:`urlopen`. The
771 return value should implement the same interface as the return value of
772 :func:`urlopen`.
773
774
775.. _http-redirect-handler:
776
777HTTPRedirectHandler Objects
778---------------------------
779
780.. note::
781
782 Some HTTP redirections require action from this module's client code. If this
783 is the case, :exc:`HTTPError` is raised. See :rfc:`2616` for details of the
784 precise meanings of the various redirection codes.
785
786
Georg Brandl9617a592009-02-13 10:40:43 +0000787.. method:: HTTPRedirectHandler.redirect_request(req, fp, code, msg, hdrs, newurl)
Georg Brandl116aa622007-08-15 14:28:22 +0000788
789 Return a :class:`Request` or ``None`` in response to a redirect. This is called
790 by the default implementations of the :meth:`http_error_30\*` methods when a
791 redirection is received from the server. If a redirection should take place,
792 return a new :class:`Request` to allow :meth:`http_error_30\*` to perform the
Georg Brandl9617a592009-02-13 10:40:43 +0000793 redirect to *newurl*. Otherwise, raise :exc:`HTTPError` if no other handler
794 should try to handle this URL, or return ``None`` if you can't but another
795 handler might.
Georg Brandl116aa622007-08-15 14:28:22 +0000796
797 .. note::
798
799 The default implementation of this method does not strictly follow :rfc:`2616`,
800 which says that 301 and 302 responses to ``POST`` requests must not be
801 automatically redirected without confirmation by the user. In reality, browsers
802 do allow automatic redirection of these responses, changing the POST to a
803 ``GET``, and the default implementation reproduces this behavior.
804
805
806.. method:: HTTPRedirectHandler.http_error_301(req, fp, code, msg, hdrs)
807
Georg Brandl9617a592009-02-13 10:40:43 +0000808 Redirect to the ``Location:`` or ``URI:`` URL. This method is called by the
809 parent :class:`OpenerDirector` when getting an HTTP 'moved permanently' response.
Georg Brandl116aa622007-08-15 14:28:22 +0000810
811
812.. method:: HTTPRedirectHandler.http_error_302(req, fp, code, msg, hdrs)
813
814 The same as :meth:`http_error_301`, but called for the 'found' response.
815
816
817.. method:: HTTPRedirectHandler.http_error_303(req, fp, code, msg, hdrs)
818
819 The same as :meth:`http_error_301`, but called for the 'see other' response.
820
821
822.. method:: HTTPRedirectHandler.http_error_307(req, fp, code, msg, hdrs)
823
824 The same as :meth:`http_error_301`, but called for the 'temporary redirect'
825 response.
826
827
828.. _http-cookie-processor:
829
830HTTPCookieProcessor Objects
831---------------------------
832
Georg Brandl116aa622007-08-15 14:28:22 +0000833:class:`HTTPCookieProcessor` instances have one attribute:
834
Georg Brandl116aa622007-08-15 14:28:22 +0000835.. attribute:: HTTPCookieProcessor.cookiejar
836
Georg Brandl24420152008-05-26 16:32:26 +0000837 The :class:`http.cookiejar.CookieJar` in which cookies are stored.
Georg Brandl116aa622007-08-15 14:28:22 +0000838
839
840.. _proxy-handler:
841
842ProxyHandler Objects
843--------------------
844
845
846.. method:: ProxyHandler.protocol_open(request)
847 :noindex:
848
849 The :class:`ProxyHandler` will have a method :meth:`protocol_open` for every
850 *protocol* which has a proxy in the *proxies* dictionary given in the
851 constructor. The method will modify requests to go through the proxy, by
852 calling ``request.set_proxy()``, and call the next handler in the chain to
853 actually execute the protocol.
854
855
856.. _http-password-mgr:
857
858HTTPPasswordMgr Objects
859-----------------------
860
861These methods are available on :class:`HTTPPasswordMgr` and
862:class:`HTTPPasswordMgrWithDefaultRealm` objects.
863
864
865.. method:: HTTPPasswordMgr.add_password(realm, uri, user, passwd)
866
867 *uri* can be either a single URI, or a sequence of URIs. *realm*, *user* and
868 *passwd* must be strings. This causes ``(user, passwd)`` to be used as
869 authentication tokens when authentication for *realm* and a super-URI of any of
870 the given URIs is given.
871
872
873.. method:: HTTPPasswordMgr.find_user_password(realm, authuri)
874
875 Get user/password for given realm and URI, if any. This method will return
876 ``(None, None)`` if there is no matching user/password.
877
878 For :class:`HTTPPasswordMgrWithDefaultRealm` objects, the realm ``None`` will be
879 searched if the given *realm* has no matching user/password.
880
881
882.. _abstract-basic-auth-handler:
883
884AbstractBasicAuthHandler Objects
885--------------------------------
886
887
888.. method:: AbstractBasicAuthHandler.http_error_auth_reqed(authreq, host, req, headers)
889
890 Handle an authentication request by getting a user/password pair, and re-trying
891 the request. *authreq* should be the name of the header where the information
892 about the realm is included in the request, *host* specifies the URL and path to
893 authenticate for, *req* should be the (failed) :class:`Request` object, and
894 *headers* should be the error headers.
895
896 *host* is either an authority (e.g. ``"python.org"``) or a URL containing an
897 authority component (e.g. ``"http://python.org/"``). In either case, the
898 authority must not contain a userinfo component (so, ``"python.org"`` and
899 ``"python.org:80"`` are fine, ``"joe:password@python.org"`` is not).
900
901
902.. _http-basic-auth-handler:
903
904HTTPBasicAuthHandler Objects
905----------------------------
906
907
908.. method:: HTTPBasicAuthHandler.http_error_401(req, fp, code, msg, hdrs)
909
910 Retry the request with authentication information, if available.
911
912
913.. _proxy-basic-auth-handler:
914
915ProxyBasicAuthHandler Objects
916-----------------------------
917
918
919.. method:: ProxyBasicAuthHandler.http_error_407(req, fp, code, msg, hdrs)
920
921 Retry the request with authentication information, if available.
922
923
924.. _abstract-digest-auth-handler:
925
926AbstractDigestAuthHandler Objects
927---------------------------------
928
929
930.. method:: AbstractDigestAuthHandler.http_error_auth_reqed(authreq, host, req, headers)
931
932 *authreq* should be the name of the header where the information about the realm
933 is included in the request, *host* should be the host to authenticate to, *req*
934 should be the (failed) :class:`Request` object, and *headers* should be the
935 error headers.
936
937
938.. _http-digest-auth-handler:
939
940HTTPDigestAuthHandler Objects
941-----------------------------
942
943
944.. method:: HTTPDigestAuthHandler.http_error_401(req, fp, code, msg, hdrs)
945
946 Retry the request with authentication information, if available.
947
948
949.. _proxy-digest-auth-handler:
950
951ProxyDigestAuthHandler Objects
952------------------------------
953
954
955.. method:: ProxyDigestAuthHandler.http_error_407(req, fp, code, msg, hdrs)
956
957 Retry the request with authentication information, if available.
958
959
960.. _http-handler-objects:
961
962HTTPHandler Objects
963-------------------
964
965
966.. method:: HTTPHandler.http_open(req)
967
968 Send an HTTP request, which can be either GET or POST, depending on
969 ``req.has_data()``.
970
971
972.. _https-handler-objects:
973
974HTTPSHandler Objects
975--------------------
976
977
978.. method:: HTTPSHandler.https_open(req)
979
980 Send an HTTPS request, which can be either GET or POST, depending on
981 ``req.has_data()``.
982
983
984.. _file-handler-objects:
985
986FileHandler Objects
987-------------------
988
989
990.. method:: FileHandler.file_open(req)
991
992 Open the file locally, if there is no host name, or the host name is
993 ``'localhost'``. Change the protocol to ``ftp`` otherwise, and retry opening it
994 using :attr:`parent`.
995
996
997.. _ftp-handler-objects:
998
999FTPHandler Objects
1000------------------
1001
1002
1003.. method:: FTPHandler.ftp_open(req)
1004
1005 Open the FTP file indicated by *req*. The login is always done with empty
1006 username and password.
1007
1008
1009.. _cacheftp-handler-objects:
1010
1011CacheFTPHandler Objects
1012-----------------------
1013
1014:class:`CacheFTPHandler` objects are :class:`FTPHandler` objects with the
1015following additional methods:
1016
1017
1018.. method:: CacheFTPHandler.setTimeout(t)
1019
1020 Set timeout of connections to *t* seconds.
1021
1022
1023.. method:: CacheFTPHandler.setMaxConns(m)
1024
1025 Set maximum number of cached connections to *m*.
1026
1027
1028.. _unknown-handler-objects:
1029
1030UnknownHandler Objects
1031----------------------
1032
1033
1034.. method:: UnknownHandler.unknown_open()
1035
1036 Raise a :exc:`URLError` exception.
1037
1038
1039.. _http-error-processor-objects:
1040
1041HTTPErrorProcessor Objects
1042--------------------------
1043
Georg Brandl116aa622007-08-15 14:28:22 +00001044.. method:: HTTPErrorProcessor.unknown_open()
1045
1046 Process HTTP error responses.
1047
1048 For 200 error codes, the response object is returned immediately.
1049
1050 For non-200 error codes, this simply passes the job on to the
1051 :meth:`protocol_error_code` handler methods, via :meth:`OpenerDirector.error`.
Georg Brandl0f7ede42008-06-23 11:23:31 +00001052 Eventually, :class:`HTTPDefaultErrorHandler` will raise an
Georg Brandl116aa622007-08-15 14:28:22 +00001053 :exc:`HTTPError` if no other handler handles the error.
1054
Georg Brandl0f7ede42008-06-23 11:23:31 +00001055
1056.. _urllib-request-examples:
Georg Brandl116aa622007-08-15 14:28:22 +00001057
1058Examples
1059--------
1060
Senthil Kumarand0ab48f2010-04-22 10:58:56 +00001061This example gets the python.org main page and displays the first 300 bytes of
Georg Brandl16489242010-10-06 08:03:21 +00001062it. ::
Georg Brandl116aa622007-08-15 14:28:22 +00001063
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001064 >>> import urllib.request
1065 >>> f = urllib.request.urlopen('http://www.python.org/')
Senthil Kumarand0ab48f2010-04-22 10:58:56 +00001066 >>> print(f.read(300))
1067 b'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
1068 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">\n\n\n<html
1069 xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">\n\n<head>\n
1070 <meta http-equiv="content-type" content="text/html; charset=utf-8" />\n
1071 <title>Python Programming '
Senthil Kumaran0e3e4852010-04-15 17:21:29 +00001072
Senthil Kumarand0ab48f2010-04-22 10:58:56 +00001073Note that urlopen returns a bytes object. This is because there is no way
1074for urlopen to automatically determine the encoding of the byte stream
1075it receives from the http server. In general, a program will decode
1076the returned bytes object to string once it determines or guesses
1077the appropriate encoding.
Senthil Kumaran0e3e4852010-04-15 17:21:29 +00001078
Senthil Kumarand0ab48f2010-04-22 10:58:56 +00001079The following W3C document, http://www.w3.org/International/O-charset , lists
1080the various ways in which a (X)HTML or a XML document could have specified its
1081encoding information.
1082
1083As python.org website uses *utf-8* encoding as specified in it's meta tag, we
1084will use same for decoding the bytes object. ::
Senthil Kumaran0e3e4852010-04-15 17:21:29 +00001085
1086 >>> import urllib.request
1087 >>> f = urllib.request.urlopen('http://www.python.org/')
Georg Brandl4e0bd6d2010-05-21 21:02:56 +00001088 >>> print(f.read(100).decode('utf-8'))
Senthil Kumarand0ab48f2010-04-22 10:58:56 +00001089 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
1090 "http://www.w3.org/TR/xhtml1/DTD/xhtm
1091
Georg Brandl116aa622007-08-15 14:28:22 +00001092
Senthil Kumaran0e3e4852010-04-15 17:21:29 +00001093In the following example, we are sending a data-stream to the stdin of a CGI
1094and reading the data it returns to us. Note that this example will only work
1095when the Python installation supports SSL. ::
Georg Brandl116aa622007-08-15 14:28:22 +00001096
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001097 >>> import urllib.request
1098 >>> req = urllib.request.Request(url='https://localhost/cgi-bin/test.cgi',
Georg Brandl116aa622007-08-15 14:28:22 +00001099 ... data='This data is passed to stdin of the CGI')
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001100 >>> f = urllib.request.urlopen(req)
Senthil Kumaran0e3e4852010-04-15 17:21:29 +00001101 >>> print(f.read().decode('utf-8'))
Georg Brandl116aa622007-08-15 14:28:22 +00001102 Got Data: "This data is passed to stdin of the CGI"
1103
1104The code for the sample CGI used in the above example is::
1105
1106 #!/usr/bin/env python
1107 import sys
1108 data = sys.stdin.read()
Collin Winterc79461b2007-09-01 23:34:30 +00001109 print('Content-type: text-plain\n\nGot Data: "%s"' % data)
Georg Brandl116aa622007-08-15 14:28:22 +00001110
1111Use of Basic HTTP Authentication::
1112
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001113 import urllib.request
Georg Brandl116aa622007-08-15 14:28:22 +00001114 # Create an OpenerDirector with support for Basic HTTP Authentication...
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001115 auth_handler = urllib.request.HTTPBasicAuthHandler()
Georg Brandl116aa622007-08-15 14:28:22 +00001116 auth_handler.add_password(realm='PDQ Application',
1117 uri='https://mahler:8092/site-updates.py',
1118 user='klem',
1119 passwd='kadidd!ehopper')
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001120 opener = urllib.request.build_opener(auth_handler)
Georg Brandl116aa622007-08-15 14:28:22 +00001121 # ...and install it globally so it can be used with urlopen.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001122 urllib.request.install_opener(opener)
1123 urllib.request.urlopen('http://www.example.com/login.html')
Georg Brandl116aa622007-08-15 14:28:22 +00001124
1125:func:`build_opener` provides many handlers by default, including a
1126:class:`ProxyHandler`. By default, :class:`ProxyHandler` uses the environment
1127variables named ``<scheme>_proxy``, where ``<scheme>`` is the URL scheme
1128involved. For example, the :envvar:`http_proxy` environment variable is read to
1129obtain the HTTP proxy's URL.
1130
1131This example replaces the default :class:`ProxyHandler` with one that uses
Georg Brandl2ee470f2008-07-16 12:55:28 +00001132programmatically-supplied proxy URLs, and adds proxy authorization support with
Georg Brandl116aa622007-08-15 14:28:22 +00001133:class:`ProxyBasicAuthHandler`. ::
1134
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001135 proxy_handler = urllib.request.ProxyHandler({'http': 'http://www.example.com:3128/'})
Senthil Kumaranf9d95f72009-12-24 02:27:00 +00001136 proxy_auth_handler = urllib.request.ProxyBasicAuthHandler()
Georg Brandl116aa622007-08-15 14:28:22 +00001137 proxy_auth_handler.add_password('realm', 'host', 'username', 'password')
1138
Senthil Kumaranf9d95f72009-12-24 02:27:00 +00001139 opener = urllib.request.build_opener(proxy_handler, proxy_auth_handler)
Georg Brandl116aa622007-08-15 14:28:22 +00001140 # This time, rather than install the OpenerDirector, we use it directly:
1141 opener.open('http://www.example.com/login.html')
1142
1143Adding HTTP headers:
1144
1145Use the *headers* argument to the :class:`Request` constructor, or::
1146
Georg Brandl029986a2008-06-23 11:44:14 +00001147 import urllib.request
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001148 req = urllib.request.Request('http://www.example.com/')
Georg Brandl116aa622007-08-15 14:28:22 +00001149 req.add_header('Referer', 'http://www.python.org/')
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001150 r = urllib.request.urlopen(req)
Georg Brandl116aa622007-08-15 14:28:22 +00001151
1152:class:`OpenerDirector` automatically adds a :mailheader:`User-Agent` header to
1153every :class:`Request`. To change this::
1154
Georg Brandl029986a2008-06-23 11:44:14 +00001155 import urllib.request
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001156 opener = urllib.request.build_opener()
Georg Brandl116aa622007-08-15 14:28:22 +00001157 opener.addheaders = [('User-agent', 'Mozilla/5.0')]
1158 opener.open('http://www.example.com/')
1159
1160Also, remember that a few standard headers (:mailheader:`Content-Length`,
1161:mailheader:`Content-Type` and :mailheader:`Host`) are added when the
1162:class:`Request` is passed to :func:`urlopen` (or :meth:`OpenerDirector.open`).
1163
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001164.. _urllib-examples:
1165
1166Here is an example session that uses the ``GET`` method to retrieve a URL
1167containing parameters::
1168
1169 >>> import urllib.request
1170 >>> import urllib.parse
1171 >>> params = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
1172 >>> f = urllib.request.urlopen("http://www.musi-cal.com/cgi-bin/query?%s" % params)
Senthil Kumaran0e3e4852010-04-15 17:21:29 +00001173 >>> print(f.read().decode('utf-8'))
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001174
1175The following example uses the ``POST`` method instead::
1176
1177 >>> import urllib.request
1178 >>> import urllib.parse
1179 >>> params = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
1180 >>> f = urllib.request.urlopen("http://www.musi-cal.com/cgi-bin/query", params)
Senthil Kumaran0e3e4852010-04-15 17:21:29 +00001181 >>> print(f.read().decode('utf-8'))
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001182
1183The following example uses an explicitly specified HTTP proxy, overriding
1184environment settings::
1185
1186 >>> import urllib.request
1187 >>> proxies = {'http': 'http://proxy.example.com:8080/'}
1188 >>> opener = urllib.request.FancyURLopener(proxies)
1189 >>> f = opener.open("http://www.python.org")
Senthil Kumaran0e3e4852010-04-15 17:21:29 +00001190 >>> f.read().decode('utf-8')
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001191
1192The following example uses no proxies at all, overriding environment settings::
1193
1194 >>> import urllib.request
1195 >>> opener = urllib.request.FancyURLopener({})
1196 >>> f = opener.open("http://www.python.org/")
Senthil Kumaran0e3e4852010-04-15 17:21:29 +00001197 >>> f.read().decode('utf-8')
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001198
1199
1200:mod:`urllib.request` Restrictions
1201----------------------------------
1202
1203 .. index::
1204 pair: HTTP; protocol
1205 pair: FTP; protocol
1206
1207* Currently, only the following protocols are supported: HTTP, (versions 0.9 and
1208 1.0), FTP, and local files.
1209
1210* The caching feature of :func:`urlretrieve` has been disabled until I find the
1211 time to hack proper processing of Expiration time headers.
1212
1213* There should be a function to query whether a particular URL is in the cache.
1214
1215* For backward compatibility, if a URL appears to point to a local file but the
1216 file can't be opened, the URL is re-interpreted using the FTP protocol. This
1217 can sometimes cause confusing error messages.
1218
1219* The :func:`urlopen` and :func:`urlretrieve` functions can cause arbitrarily
1220 long delays while waiting for a network connection to be set up. This means
1221 that it is difficult to build an interactive Web client using these functions
1222 without using threads.
1223
1224 .. index::
1225 single: HTML
1226 pair: HTTP; protocol
1227
1228* The data returned by :func:`urlopen` or :func:`urlretrieve` is the raw data
1229 returned by the server. This may be binary data (such as an image), plain text
1230 or (for example) HTML. The HTTP protocol provides type information in the reply
1231 header, which can be inspected by looking at the :mailheader:`Content-Type`
1232 header. If the returned data is HTML, you can use the module
1233 :mod:`html.parser` to parse it.
1234
1235 .. index:: single: FTP
1236
1237* The code handling the FTP protocol cannot differentiate between a file and a
1238 directory. This can lead to unexpected behavior when attempting to read a URL
1239 that points to a file that is not accessible. If the URL ends in a ``/``, it is
1240 assumed to refer to a directory and will be handled accordingly. But if an
1241 attempt to read a file leads to a 550 error (meaning the URL cannot be found or
1242 is not accessible, often for permission reasons), then the path is treated as a
1243 directory in order to handle the case when a directory is specified by a URL but
1244 the trailing ``/`` has been left off. This can cause misleading results when
1245 you try to fetch a file whose read permissions make it inaccessible; the FTP
1246 code will try to read it, fail with a 550 error, and then perform a directory
1247 listing for the unreadable file. If fine-grained control is needed, consider
Éric Araujo09eb9802011-03-20 18:30:37 +01001248 using the :mod:`ftplib` module, subclassing :class:`FancyURLopener`, or changing
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001249 *_urlopener* to meet your needs.
1250
Georg Brandl0f7ede42008-06-23 11:23:31 +00001251
1252
Georg Brandlf6c8fd62011-02-25 09:48:21 +00001253:mod:`urllib.response` --- Response classes used by urllib
1254==========================================================
Georg Brandl0f7ede42008-06-23 11:23:31 +00001255
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001256.. module:: urllib.response
1257 :synopsis: Response classes used by urllib.
1258
1259The :mod:`urllib.response` module defines functions and classes which define a
Georg Brandl0f7ede42008-06-23 11:23:31 +00001260minimal file like interface, including ``read()`` and ``readline()``. The
Ezio Melotti92165e62010-11-18 19:49:19 +00001261typical response object is an addinfourl instance, which defines an ``info()``
Georg Brandl0f7ede42008-06-23 11:23:31 +00001262method and that returns headers and a ``geturl()`` method that returns the url.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001263Functions defined by this module are used internally by the
1264:mod:`urllib.request` module.
1265