blob: a697bdd33b722215f059cd45e9c5d3047b3efb80 [file] [log] [blame]
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001:mod:`urllib.request` --- extensible library for opening URLs
2=============================================================
Georg Brandl116aa622007-08-15 14:28:22 +00003
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00004.. module:: urllib.request
Georg Brandl116aa622007-08-15 14:28:22 +00005 :synopsis: Next generation URL opening library.
Jeremy Hyltone2573162009-03-31 14:38:13 +00006.. moduleauthor:: Jeremy Hylton <jeremy@alum.mit.edu>
Georg Brandl116aa622007-08-15 14:28:22 +00007.. sectionauthor:: Moshe Zadka <moshez@users.sourceforge.net>
8
9
Georg Brandl0f7ede42008-06-23 11:23:31 +000010The :mod:`urllib.request` module defines functions and classes which help in
11opening URLs (mostly HTTP) in a complex world --- basic and digest
12authentication, redirections, cookies and more.
Georg Brandl116aa622007-08-15 14:28:22 +000013
Antoine Pitrou79ecd762010-09-29 11:24:21 +000014
Senthil Kumaranaca8fd72008-06-23 04:41:59 +000015The :mod:`urllib.request` module defines the following functions:
Georg Brandl116aa622007-08-15 14:28:22 +000016
17
Georg Brandl7f01a132009-09-16 15:58:14 +000018.. function:: urlopen(url, data=None[, timeout])
Georg Brandl116aa622007-08-15 14:28:22 +000019
Jeremy Hyltone2573162009-03-31 14:38:13 +000020 Open the URL *url*, which can be either a string or a
21 :class:`Request` object.
Georg Brandl116aa622007-08-15 14:28:22 +000022
Amaury Forgeot d'Arcea8676b2010-10-01 23:42:24 +000023 .. warning::
Senthil Kumaran242690f2010-10-05 18:35:41 +000024 HTTPS requests do not do any verification of the server's certificate.
Amaury Forgeot d'Arcea8676b2010-10-01 23:42:24 +000025
Jeremy Hyltone2573162009-03-31 14:38:13 +000026 *data* may be a string specifying additional data to send to the
27 server, or ``None`` if no such data is needed. Currently HTTP
28 requests are the only ones that use *data*; the HTTP request will
29 be a POST instead of a GET when the *data* parameter is provided.
30 *data* should be a buffer in the standard
Georg Brandl116aa622007-08-15 14:28:22 +000031 :mimetype:`application/x-www-form-urlencoded` format. The
Georg Brandl7fe2c4a2008-12-05 07:32:56 +000032 :func:`urllib.parse.urlencode` function takes a mapping or sequence
Senthil Kumaran6cbe4272010-08-21 16:08:32 +000033 of 2-tuples and returns a string in this format. urllib.request module uses
34 HTTP/1.1 and includes `Connection:close` header in its HTTP requests.
Georg Brandl116aa622007-08-15 14:28:22 +000035
Jeremy Hyltone2573162009-03-31 14:38:13 +000036 The optional *timeout* parameter specifies a timeout in seconds for
37 blocking operations like the connection attempt (if not specified,
38 the global default timeout setting will be used). This actually
Senthil Kumaranc08d9072010-10-05 18:46:56 +000039 only works for HTTP, HTTPS and FTP connections.
Georg Brandl116aa622007-08-15 14:28:22 +000040
Senthil Kumaranaca8fd72008-06-23 04:41:59 +000041 This function returns a file-like object with two additional methods from
42 the :mod:`urllib.response` module
Georg Brandl116aa622007-08-15 14:28:22 +000043
Jeremy Hyltone2573162009-03-31 14:38:13 +000044 * :meth:`geturl` --- return the URL of the resource retrieved,
45 commonly used to determine if a redirect was followed
Georg Brandl116aa622007-08-15 14:28:22 +000046
Georg Brandl2dd01042009-02-27 16:46:46 +000047 * :meth:`info` --- return the meta-information of the page, such as headers,
Senthil Kumaran13a7eb42010-06-28 17:31:40 +000048 in the form of an :func:`email.message_from_string` instance (see
49 `Quick Reference to HTTP Headers <http://www.cs.tut.fi/~jkorpela/http.html>`_)
Georg Brandl116aa622007-08-15 14:28:22 +000050
51 Raises :exc:`URLError` on errors.
52
Georg Brandl2dd01042009-02-27 16:46:46 +000053 Note that ``None`` may be returned if no handler handles the request (though
54 the default installed global :class:`OpenerDirector` uses
55 :class:`UnknownHandler` to ensure this never happens).
56
Senthil Kumarana51a1b32009-10-18 01:42:33 +000057 In addition, default installed :class:`ProxyHandler` makes sure the requests
58 are handled through the proxy when they are set.
59
Georg Brandl2dd01042009-02-27 16:46:46 +000060 The legacy ``urllib.urlopen`` function from Python 2.6 and earlier has been
61 discontinued; :func:`urlopen` corresponds to the old ``urllib2.urlopen``.
62 Proxy handling, which was done by passing a dictionary parameter to
63 ``urllib.urlopen``, can be obtained by using :class:`ProxyHandler` objects.
Georg Brandl116aa622007-08-15 14:28:22 +000064
Georg Brandl116aa622007-08-15 14:28:22 +000065.. function:: install_opener(opener)
66
67 Install an :class:`OpenerDirector` instance as the default global opener.
68 Installing an opener is only necessary if you want urlopen to use that opener;
69 otherwise, simply call :meth:`OpenerDirector.open` instead of :func:`urlopen`.
70 The code does not check for a real :class:`OpenerDirector`, and any class with
71 the appropriate interface will work.
72
73
74.. function:: build_opener([handler, ...])
75
76 Return an :class:`OpenerDirector` instance, which chains the handlers in the
77 order given. *handler*\s can be either instances of :class:`BaseHandler`, or
78 subclasses of :class:`BaseHandler` (in which case it must be possible to call
79 the constructor without any parameters). Instances of the following classes
80 will be in front of the *handler*\s, unless the *handler*\s contain them,
81 instances of them or subclasses of them: :class:`ProxyHandler`,
82 :class:`UnknownHandler`, :class:`HTTPHandler`, :class:`HTTPDefaultErrorHandler`,
83 :class:`HTTPRedirectHandler`, :class:`FTPHandler`, :class:`FileHandler`,
84 :class:`HTTPErrorProcessor`.
85
Georg Brandl7f01a132009-09-16 15:58:14 +000086 If the Python installation has SSL support (i.e., if the :mod:`ssl` module
87 can be imported), :class:`HTTPSHandler` will also be added.
Georg Brandl116aa622007-08-15 14:28:22 +000088
Georg Brandle6bcc912008-05-12 18:05:20 +000089 A :class:`BaseHandler` subclass may also change its :attr:`handler_order`
90 member variable to modify its position in the handlers list.
Georg Brandl116aa622007-08-15 14:28:22 +000091
Georg Brandl7f01a132009-09-16 15:58:14 +000092
93.. function:: urlretrieve(url, filename=None, reporthook=None, data=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +000094
95 Copy a network object denoted by a URL to a local file, if necessary. If the URL
96 points to a local file, or a valid cached copy of the object exists, the object
97 is not copied. Return a tuple ``(filename, headers)`` where *filename* is the
98 local file name under which the object can be found, and *headers* is whatever
99 the :meth:`info` method of the object returned by :func:`urlopen` returned (for
100 a remote object, possibly cached). Exceptions are the same as for
101 :func:`urlopen`.
102
103 The second argument, if present, specifies the file location to copy to (if
104 absent, the location will be a tempfile with a generated name). The third
105 argument, if present, is a hook function that will be called once on
106 establishment of the network connection and once after each block read
107 thereafter. The hook will be passed three arguments; a count of blocks
108 transferred so far, a block size in bytes, and the total size of the file. The
109 third argument may be ``-1`` on older FTP servers which do not return a file
110 size in response to a retrieval request.
111
112 If the *url* uses the :file:`http:` scheme identifier, the optional *data*
113 argument may be given to specify a ``POST`` request (normally the request type
114 is ``GET``). The *data* argument must in standard
115 :mimetype:`application/x-www-form-urlencoded` format; see the :func:`urlencode`
116 function below.
117
118 :func:`urlretrieve` will raise :exc:`ContentTooShortError` when it detects that
119 the amount of data available was less than the expected amount (which is the
120 size reported by a *Content-Length* header). This can occur, for example, when
121 the download is interrupted.
122
123 The *Content-Length* is treated as a lower bound: if there's more data to read,
124 urlretrieve reads more data, but if less data is available, it raises the
125 exception.
126
127 You can still retrieve the downloaded data in this case, it is stored in the
128 :attr:`content` attribute of the exception instance.
129
130 If no *Content-Length* header was supplied, urlretrieve can not check the size
131 of the data it has downloaded, and just returns it. In this case you just have
132 to assume that the download was successful.
Georg Brandl116aa622007-08-15 14:28:22 +0000133
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000134.. function:: urlcleanup()
Georg Brandl116aa622007-08-15 14:28:22 +0000135
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000136 Clear the cache that may have been built up by previous calls to
137 :func:`urlretrieve`.
Christian Heimes292d3512008-02-03 16:51:08 +0000138
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000139.. function:: pathname2url(path)
Christian Heimes292d3512008-02-03 16:51:08 +0000140
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000141 Convert the pathname *path* from the local syntax for a path to the form used in
142 the path component of a URL. This does not produce a complete URL. The return
143 value will already be quoted using the :func:`quote` function.
Christian Heimes292d3512008-02-03 16:51:08 +0000144
145
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000146.. function:: url2pathname(path)
147
Senthil Kumaranf0769e82010-08-09 19:53:52 +0000148 Convert the path component *path* from a percent-encoded URL to the local syntax for a
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000149 path. This does not accept a complete URL. This function uses :func:`unquote`
150 to decode *path*.
Georg Brandl116aa622007-08-15 14:28:22 +0000151
Senthil Kumaran7e557a62010-02-26 00:53:23 +0000152.. function:: getproxies()
153
154 This helper function returns a dictionary of scheme to proxy server URL
155 mappings. It scans the environment for variables named ``<scheme>_proxy``
156 for all operating systems first, and when it cannot find it, looks for proxy
157 information from Mac OSX System Configuration for Mac OS X and Windows
158 Systems Registry for Windows.
159
Georg Brandl7f01a132009-09-16 15:58:14 +0000160
Georg Brandl116aa622007-08-15 14:28:22 +0000161The following classes are provided:
162
Georg Brandl7f01a132009-09-16 15:58:14 +0000163.. class:: Request(url, data=None, headers={}, origin_req_host=None, unverifiable=False)
Georg Brandl116aa622007-08-15 14:28:22 +0000164
165 This class is an abstraction of a URL request.
166
167 *url* should be a string containing a valid URL.
168
Jeremy Hyltone2573162009-03-31 14:38:13 +0000169 *data* may be a string specifying additional data to send to the
170 server, or ``None`` if no such data is needed. Currently HTTP
171 requests are the only ones that use *data*; the HTTP request will
172 be a POST instead of a GET when the *data* parameter is provided.
173 *data* should be a buffer in the standard
Georg Brandl116aa622007-08-15 14:28:22 +0000174 :mimetype:`application/x-www-form-urlencoded` format. The
Georg Brandl7fe2c4a2008-12-05 07:32:56 +0000175 :func:`urllib.parse.urlencode` function takes a mapping or sequence
176 of 2-tuples and returns a string in this format.
Georg Brandl116aa622007-08-15 14:28:22 +0000177
Jeremy Hyltone2573162009-03-31 14:38:13 +0000178 *headers* should be a dictionary, and will be treated as if
179 :meth:`add_header` was called with each key and value as arguments.
180 This is often used to "spoof" the ``User-Agent`` header, which is
181 used by a browser to identify itself -- some HTTP servers only
182 allow requests coming from common browsers as opposed to scripts.
183 For example, Mozilla Firefox may identify itself as ``"Mozilla/5.0
184 (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11"``, while
185 :mod:`urllib`'s default user agent string is
186 ``"Python-urllib/2.6"`` (on Python 2.6).
Georg Brandl116aa622007-08-15 14:28:22 +0000187
Jeremy Hyltone2573162009-03-31 14:38:13 +0000188 The final two arguments are only of interest for correct handling
189 of third-party HTTP cookies:
Georg Brandl116aa622007-08-15 14:28:22 +0000190
Jeremy Hyltone2573162009-03-31 14:38:13 +0000191 *origin_req_host* should be the request-host of the origin
192 transaction, as defined by :rfc:`2965`. It defaults to
193 ``http.cookiejar.request_host(self)``. This is the host name or IP
194 address of the original request that was initiated by the user.
195 For example, if the request is for an image in an HTML document,
196 this should be the request-host of the request for the page
Georg Brandl24420152008-05-26 16:32:26 +0000197 containing the image.
Georg Brandl116aa622007-08-15 14:28:22 +0000198
Jeremy Hyltone2573162009-03-31 14:38:13 +0000199 *unverifiable* should indicate whether the request is unverifiable,
200 as defined by RFC 2965. It defaults to False. An unverifiable
201 request is one whose URL the user did not have the option to
202 approve. For example, if the request is for an image in an HTML
203 document, and the user had no option to approve the automatic
204 fetching of the image, this should be true.
Georg Brandl116aa622007-08-15 14:28:22 +0000205
Georg Brandl7f01a132009-09-16 15:58:14 +0000206
207.. class:: URLopener(proxies=None, **x509)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000208
209 Base class for opening and reading URLs. Unless you need to support opening
210 objects using schemes other than :file:`http:`, :file:`ftp:`, or :file:`file:`,
211 you probably want to use :class:`FancyURLopener`.
212
213 By default, the :class:`URLopener` class sends a :mailheader:`User-Agent` header
214 of ``urllib/VVV``, where *VVV* is the :mod:`urllib` version number.
215 Applications can define their own :mailheader:`User-Agent` header by subclassing
216 :class:`URLopener` or :class:`FancyURLopener` and setting the class attribute
217 :attr:`version` to an appropriate string value in the subclass definition.
218
219 The optional *proxies* parameter should be a dictionary mapping scheme names to
220 proxy URLs, where an empty dictionary turns proxies off completely. Its default
221 value is ``None``, in which case environmental proxy settings will be used if
222 present, as discussed in the definition of :func:`urlopen`, above.
223
224 Additional keyword parameters, collected in *x509*, may be used for
225 authentication of the client when using the :file:`https:` scheme. The keywords
226 *key_file* and *cert_file* are supported to provide an SSL key and certificate;
227 both are needed to support client authentication.
228
229 :class:`URLopener` objects will raise an :exc:`IOError` exception if the server
230 returns an error code.
231
Georg Brandl7f01a132009-09-16 15:58:14 +0000232 .. method:: open(fullurl, data=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000233
234 Open *fullurl* using the appropriate protocol. This method sets up cache and
235 proxy information, then calls the appropriate open method with its input
236 arguments. If the scheme is not recognized, :meth:`open_unknown` is called.
237 The *data* argument has the same meaning as the *data* argument of
238 :func:`urlopen`.
239
240
Georg Brandl7f01a132009-09-16 15:58:14 +0000241 .. method:: open_unknown(fullurl, data=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000242
243 Overridable interface to open unknown URL types.
244
245
Georg Brandl7f01a132009-09-16 15:58:14 +0000246 .. method:: retrieve(url, filename=None, reporthook=None, data=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000247
248 Retrieves the contents of *url* and places it in *filename*. The return value
249 is a tuple consisting of a local filename and either a
250 :class:`email.message.Message` object containing the response headers (for remote
251 URLs) or ``None`` (for local URLs). The caller must then open and read the
252 contents of *filename*. If *filename* is not given and the URL refers to a
253 local file, the input filename is returned. If the URL is non-local and
254 *filename* is not given, the filename is the output of :func:`tempfile.mktemp`
255 with a suffix that matches the suffix of the last path component of the input
256 URL. If *reporthook* is given, it must be a function accepting three numeric
257 parameters. It will be called after each chunk of data is read from the
258 network. *reporthook* is ignored for local URLs.
259
260 If the *url* uses the :file:`http:` scheme identifier, the optional *data*
261 argument may be given to specify a ``POST`` request (normally the request type
262 is ``GET``). The *data* argument must in standard
263 :mimetype:`application/x-www-form-urlencoded` format; see the :func:`urlencode`
264 function below.
265
266
267 .. attribute:: version
268
269 Variable that specifies the user agent of the opener object. To get
270 :mod:`urllib` to tell servers that it is a particular user agent, set this in a
271 subclass as a class variable or in the constructor before calling the base
272 constructor.
273
274
275.. class:: FancyURLopener(...)
276
277 :class:`FancyURLopener` subclasses :class:`URLopener` providing default handling
278 for the following HTTP response codes: 301, 302, 303, 307 and 401. For the 30x
279 response codes listed above, the :mailheader:`Location` header is used to fetch
280 the actual URL. For 401 response codes (authentication required), basic HTTP
281 authentication is performed. For the 30x response codes, recursion is bounded
282 by the value of the *maxtries* attribute, which defaults to 10.
283
284 For all other response codes, the method :meth:`http_error_default` is called
285 which you can override in subclasses to handle the error appropriately.
286
287 .. note::
288
289 According to the letter of :rfc:`2616`, 301 and 302 responses to POST requests
290 must not be automatically redirected without confirmation by the user. In
291 reality, browsers do allow automatic redirection of these responses, changing
292 the POST to a GET, and :mod:`urllib` reproduces this behaviour.
293
294 The parameters to the constructor are the same as those for :class:`URLopener`.
295
296 .. note::
297
298 When performing basic authentication, a :class:`FancyURLopener` instance calls
299 its :meth:`prompt_user_passwd` method. The default implementation asks the
300 users for the required information on the controlling terminal. A subclass may
301 override this method to support more appropriate behavior if needed.
302
303 The :class:`FancyURLopener` class offers one additional method that should be
304 overloaded to provide the appropriate behavior:
305
306 .. method:: prompt_user_passwd(host, realm)
307
308 Return information needed to authenticate the user at the given host in the
309 specified security realm. The return value should be a tuple, ``(user,
310 password)``, which can be used for basic authentication.
311
312 The implementation prompts for this information on the terminal; an application
313 should override this method to use an appropriate interaction model in the local
314 environment.
Georg Brandl116aa622007-08-15 14:28:22 +0000315
316.. class:: OpenerDirector()
317
318 The :class:`OpenerDirector` class opens URLs via :class:`BaseHandler`\ s chained
319 together. It manages the chaining of handlers, and recovery from errors.
320
321
322.. class:: BaseHandler()
323
324 This is the base class for all registered handlers --- and handles only the
325 simple mechanics of registration.
326
327
328.. class:: HTTPDefaultErrorHandler()
329
330 A class which defines a default handler for HTTP error responses; all responses
331 are turned into :exc:`HTTPError` exceptions.
332
333
334.. class:: HTTPRedirectHandler()
335
336 A class to handle redirections.
337
338
Georg Brandl7f01a132009-09-16 15:58:14 +0000339.. class:: HTTPCookieProcessor(cookiejar=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000340
341 A class to handle HTTP Cookies.
342
343
Georg Brandl7f01a132009-09-16 15:58:14 +0000344.. class:: ProxyHandler(proxies=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000345
346 Cause requests to go through a proxy. If *proxies* is given, it must be a
347 dictionary mapping protocol names to URLs of proxies. The default is to read the
348 list of proxies from the environment variables :envvar:`<protocol>_proxy`.
Senthil Kumarana51a1b32009-10-18 01:42:33 +0000349 If no proxy environment variables are set, in a Windows environment, proxy
350 settings are obtained from the registry's Internet Settings section and in a
351 Mac OS X environment, proxy information is retrieved from the OS X System
352 Configuration Framework.
353
Christian Heimese25f35e2008-03-20 10:49:03 +0000354 To disable autodetected proxy pass an empty dictionary.
Georg Brandl116aa622007-08-15 14:28:22 +0000355
356
357.. class:: HTTPPasswordMgr()
358
359 Keep a database of ``(realm, uri) -> (user, password)`` mappings.
360
361
362.. class:: HTTPPasswordMgrWithDefaultRealm()
363
364 Keep a database of ``(realm, uri) -> (user, password)`` mappings. A realm of
365 ``None`` is considered a catch-all realm, which is searched if no other realm
366 fits.
367
368
Georg Brandl7f01a132009-09-16 15:58:14 +0000369.. class:: AbstractBasicAuthHandler(password_mgr=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000370
371 This is a mixin class that helps with HTTP authentication, both to the remote
372 host and to a proxy. *password_mgr*, if given, should be something that is
373 compatible with :class:`HTTPPasswordMgr`; refer to section
374 :ref:`http-password-mgr` for information on the interface that must be
375 supported.
376
377
Georg Brandl7f01a132009-09-16 15:58:14 +0000378.. class:: HTTPBasicAuthHandler(password_mgr=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000379
380 Handle authentication with the remote host. *password_mgr*, if given, should be
381 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
382 :ref:`http-password-mgr` for information on the interface that must be
383 supported.
384
385
Georg Brandl7f01a132009-09-16 15:58:14 +0000386.. class:: ProxyBasicAuthHandler(password_mgr=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000387
388 Handle authentication with the proxy. *password_mgr*, if given, should be
389 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
390 :ref:`http-password-mgr` for information on the interface that must be
391 supported.
392
393
Georg Brandl7f01a132009-09-16 15:58:14 +0000394.. class:: AbstractDigestAuthHandler(password_mgr=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000395
396 This is a mixin class that helps with HTTP authentication, both to the remote
397 host and to a proxy. *password_mgr*, if given, should be something that is
398 compatible with :class:`HTTPPasswordMgr`; refer to section
399 :ref:`http-password-mgr` for information on the interface that must be
400 supported.
401
402
Georg Brandl7f01a132009-09-16 15:58:14 +0000403.. class:: HTTPDigestAuthHandler(password_mgr=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000404
405 Handle authentication with the remote host. *password_mgr*, if given, should be
406 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
407 :ref:`http-password-mgr` for information on the interface that must be
408 supported.
409
410
Georg Brandl7f01a132009-09-16 15:58:14 +0000411.. class:: ProxyDigestAuthHandler(password_mgr=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000412
413 Handle authentication with the proxy. *password_mgr*, if given, should be
414 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
415 :ref:`http-password-mgr` for information on the interface that must be
416 supported.
417
418
419.. class:: HTTPHandler()
420
421 A class to handle opening of HTTP URLs.
422
423
424.. class:: HTTPSHandler()
425
426 A class to handle opening of HTTPS URLs.
427
428
429.. class:: FileHandler()
430
431 Open local files.
432
433
434.. class:: FTPHandler()
435
436 Open FTP URLs.
437
438
439.. class:: CacheFTPHandler()
440
441 Open FTP URLs, keeping a cache of open FTP connections to minimize delays.
442
443
444.. class:: UnknownHandler()
445
446 A catch-all class to handle unknown URLs.
447
448
449.. _request-objects:
450
451Request Objects
452---------------
453
Jeremy Hyltone2573162009-03-31 14:38:13 +0000454The following methods describe :class:`Request`'s public interface,
455and so all may be overridden in subclasses. It also defines several
456public attributes that can be used by clients to inspect the parsed
457request.
Georg Brandl116aa622007-08-15 14:28:22 +0000458
Jeremy Hyltone2573162009-03-31 14:38:13 +0000459.. attribute:: Request.full_url
460
461 The original URL passed to the constructor.
462
463.. attribute:: Request.type
464
465 The URI scheme.
466
467.. attribute:: Request.host
468
469 The URI authority, typically a host, but may also contain a port
470 separated by a colon.
471
472.. attribute:: Request.origin_req_host
473
474 The original host for the request, without port.
475
476.. attribute:: Request.selector
477
478 The URI path. If the :class:`Request` uses a proxy, then selector
479 will be the full url that is passed to the proxy.
480
481.. attribute:: Request.data
482
483 The entity body for the request, or None if not specified.
484
485.. attribute:: Request.unverifiable
486
487 boolean, indicates whether the request is unverifiable as defined
488 by RFC 2965.
Georg Brandl116aa622007-08-15 14:28:22 +0000489
490.. method:: Request.add_data(data)
491
492 Set the :class:`Request` data to *data*. This is ignored by all handlers except
493 HTTP handlers --- and there it should be a byte string, and will change the
494 request to be ``POST`` rather than ``GET``.
495
496
497.. method:: Request.get_method()
498
499 Return a string indicating the HTTP request method. This is only meaningful for
500 HTTP requests, and currently always returns ``'GET'`` or ``'POST'``.
501
502
503.. method:: Request.has_data()
504
505 Return whether the instance has a non-\ ``None`` data.
506
507
508.. method:: Request.get_data()
509
510 Return the instance's data.
511
512
513.. method:: Request.add_header(key, val)
514
515 Add another header to the request. Headers are currently ignored by all
516 handlers except HTTP handlers, where they are added to the list of headers sent
517 to the server. Note that there cannot be more than one header with the same
518 name, and later calls will overwrite previous calls in case the *key* collides.
519 Currently, this is no loss of HTTP functionality, since all headers which have
520 meaning when used more than once have a (header-specific) way of gaining the
521 same functionality using only one header.
522
523
524.. method:: Request.add_unredirected_header(key, header)
525
526 Add a header that will not be added to a redirected request.
527
Georg Brandl116aa622007-08-15 14:28:22 +0000528
529.. method:: Request.has_header(header)
530
531 Return whether the instance has the named header (checks both regular and
532 unredirected).
533
Georg Brandl116aa622007-08-15 14:28:22 +0000534
535.. method:: Request.get_full_url()
536
537 Return the URL given in the constructor.
538
539
540.. method:: Request.get_type()
541
542 Return the type of the URL --- also known as the scheme.
543
544
545.. method:: Request.get_host()
546
547 Return the host to which a connection will be made.
548
549
550.. method:: Request.get_selector()
551
552 Return the selector --- the part of the URL that is sent to the server.
553
554
555.. method:: Request.set_proxy(host, type)
556
557 Prepare the request by connecting to a proxy server. The *host* and *type* will
558 replace those of the instance, and the instance's selector will be the original
559 URL given in the constructor.
560
561
562.. method:: Request.get_origin_req_host()
563
564 Return the request-host of the origin transaction, as defined by :rfc:`2965`.
565 See the documentation for the :class:`Request` constructor.
566
567
568.. method:: Request.is_unverifiable()
569
570 Return whether the request is unverifiable, as defined by RFC 2965. See the
571 documentation for the :class:`Request` constructor.
572
573
574.. _opener-director-objects:
575
576OpenerDirector Objects
577----------------------
578
579:class:`OpenerDirector` instances have the following methods:
580
581
582.. method:: OpenerDirector.add_handler(handler)
583
584 *handler* should be an instance of :class:`BaseHandler`. The following methods
585 are searched, and added to the possible chains (note that HTTP errors are a
586 special case).
587
588 * :meth:`protocol_open` --- signal that the handler knows how to open *protocol*
589 URLs.
590
591 * :meth:`http_error_type` --- signal that the handler knows how to handle HTTP
592 errors with HTTP error code *type*.
593
594 * :meth:`protocol_error` --- signal that the handler knows how to handle errors
595 from (non-\ ``http``) *protocol*.
596
597 * :meth:`protocol_request` --- signal that the handler knows how to pre-process
598 *protocol* requests.
599
600 * :meth:`protocol_response` --- signal that the handler knows how to
601 post-process *protocol* responses.
602
603
Georg Brandl7f01a132009-09-16 15:58:14 +0000604.. method:: OpenerDirector.open(url, data=None[, timeout])
Georg Brandl116aa622007-08-15 14:28:22 +0000605
606 Open the given *url* (which can be a request object or a string), optionally
Alexandre Vassalotti5f8ced22008-05-16 00:03:33 +0000607 passing the given *data*. Arguments, return values and exceptions raised are
608 the same as those of :func:`urlopen` (which simply calls the :meth:`open`
609 method on the currently installed global :class:`OpenerDirector`). The
610 optional *timeout* parameter specifies a timeout in seconds for blocking
Georg Brandlf78e02b2008-06-10 17:40:04 +0000611 operations like the connection attempt (if not specified, the global default
Georg Brandl325524e2010-05-21 20:57:33 +0000612 timeout setting will be used). The timeout feature actually works only for
Senthil Kumaranc08d9072010-10-05 18:46:56 +0000613 HTTP, HTTPS and FTP connections).
Georg Brandl116aa622007-08-15 14:28:22 +0000614
Georg Brandl116aa622007-08-15 14:28:22 +0000615
Georg Brandl7f01a132009-09-16 15:58:14 +0000616.. method:: OpenerDirector.error(proto, *args)
Georg Brandl116aa622007-08-15 14:28:22 +0000617
618 Handle an error of the given protocol. This will call the registered error
619 handlers for the given protocol with the given arguments (which are protocol
620 specific). The HTTP protocol is a special case which uses the HTTP response
621 code to determine the specific error handler; refer to the :meth:`http_error_\*`
622 methods of the handler classes.
623
624 Return values and exceptions raised are the same as those of :func:`urlopen`.
625
626OpenerDirector objects open URLs in three stages:
627
628The order in which these methods are called within each stage is determined by
629sorting the handler instances.
630
631#. Every handler with a method named like :meth:`protocol_request` has that
632 method called to pre-process the request.
633
634#. Handlers with a method named like :meth:`protocol_open` are called to handle
635 the request. This stage ends when a handler either returns a non-\ :const:`None`
636 value (ie. a response), or raises an exception (usually :exc:`URLError`).
637 Exceptions are allowed to propagate.
638
639 In fact, the above algorithm is first tried for methods named
640 :meth:`default_open`. If all such methods return :const:`None`, the algorithm
641 is repeated for methods named like :meth:`protocol_open`. If all such methods
642 return :const:`None`, the algorithm is repeated for methods named
643 :meth:`unknown_open`.
644
645 Note that the implementation of these methods may involve calls of the parent
Georg Brandla5eacee2010-07-23 16:55:26 +0000646 :class:`OpenerDirector` instance's :meth:`~OpenerDirector.open` and
647 :meth:`~OpenerDirector.error` methods.
Georg Brandl116aa622007-08-15 14:28:22 +0000648
649#. Every handler with a method named like :meth:`protocol_response` has that
650 method called to post-process the response.
651
652
653.. _base-handler-objects:
654
655BaseHandler Objects
656-------------------
657
658:class:`BaseHandler` objects provide a couple of methods that are directly
659useful, and others that are meant to be used by derived classes. These are
660intended for direct use:
661
662
663.. method:: BaseHandler.add_parent(director)
664
665 Add a director as parent.
666
667
668.. method:: BaseHandler.close()
669
670 Remove any parents.
671
672The following members and methods should only be used by classes derived from
673:class:`BaseHandler`.
674
675.. note::
676
677 The convention has been adopted that subclasses defining
678 :meth:`protocol_request` or :meth:`protocol_response` methods are named
679 :class:`\*Processor`; all others are named :class:`\*Handler`.
680
681
682.. attribute:: BaseHandler.parent
683
684 A valid :class:`OpenerDirector`, which can be used to open using a different
685 protocol, or handle errors.
686
687
688.. method:: BaseHandler.default_open(req)
689
690 This method is *not* defined in :class:`BaseHandler`, but subclasses should
691 define it if they want to catch all URLs.
692
693 This method, if implemented, will be called by the parent
694 :class:`OpenerDirector`. It should return a file-like object as described in
695 the return value of the :meth:`open` of :class:`OpenerDirector`, or ``None``.
696 It should raise :exc:`URLError`, unless a truly exceptional thing happens (for
697 example, :exc:`MemoryError` should not be mapped to :exc:`URLError`).
698
699 This method will be called before any protocol-specific open method.
700
701
702.. method:: BaseHandler.protocol_open(req)
703 :noindex:
704
705 This method is *not* defined in :class:`BaseHandler`, but subclasses should
706 define it if they want to handle URLs with the given protocol.
707
708 This method, if defined, will be called by the parent :class:`OpenerDirector`.
709 Return values should be the same as for :meth:`default_open`.
710
711
712.. method:: BaseHandler.unknown_open(req)
713
714 This method is *not* defined in :class:`BaseHandler`, but subclasses should
715 define it if they want to catch all URLs with no specific registered handler to
716 open it.
717
718 This method, if implemented, will be called by the :attr:`parent`
719 :class:`OpenerDirector`. Return values should be the same as for
720 :meth:`default_open`.
721
722
723.. method:: BaseHandler.http_error_default(req, fp, code, msg, hdrs)
724
725 This method is *not* defined in :class:`BaseHandler`, but subclasses should
726 override it if they intend to provide a catch-all for otherwise unhandled HTTP
727 errors. It will be called automatically by the :class:`OpenerDirector` getting
728 the error, and should not normally be called in other circumstances.
729
730 *req* will be a :class:`Request` object, *fp* will be a file-like object with
731 the HTTP error body, *code* will be the three-digit code of the error, *msg*
732 will be the user-visible explanation of the code and *hdrs* will be a mapping
733 object with the headers of the error.
734
735 Return values and exceptions raised should be the same as those of
736 :func:`urlopen`.
737
738
739.. method:: BaseHandler.http_error_nnn(req, fp, code, msg, hdrs)
740
741 *nnn* should be a three-digit HTTP error code. This method is also not defined
742 in :class:`BaseHandler`, but will be called, if it exists, on an instance of a
743 subclass, when an HTTP error with code *nnn* occurs.
744
745 Subclasses should override this method to handle specific HTTP errors.
746
747 Arguments, return values and exceptions raised should be the same as for
748 :meth:`http_error_default`.
749
750
751.. method:: BaseHandler.protocol_request(req)
752 :noindex:
753
754 This method is *not* defined in :class:`BaseHandler`, but subclasses should
755 define it if they want to pre-process requests of the given protocol.
756
757 This method, if defined, will be called by the parent :class:`OpenerDirector`.
758 *req* will be a :class:`Request` object. The return value should be a
759 :class:`Request` object.
760
761
762.. method:: BaseHandler.protocol_response(req, response)
763 :noindex:
764
765 This method is *not* defined in :class:`BaseHandler`, but subclasses should
766 define it if they want to post-process responses of the given protocol.
767
768 This method, if defined, will be called by the parent :class:`OpenerDirector`.
769 *req* will be a :class:`Request` object. *response* will be an object
770 implementing the same interface as the return value of :func:`urlopen`. The
771 return value should implement the same interface as the return value of
772 :func:`urlopen`.
773
774
775.. _http-redirect-handler:
776
777HTTPRedirectHandler Objects
778---------------------------
779
780.. note::
781
782 Some HTTP redirections require action from this module's client code. If this
783 is the case, :exc:`HTTPError` is raised. See :rfc:`2616` for details of the
784 precise meanings of the various redirection codes.
785
786
Georg Brandl9617a592009-02-13 10:40:43 +0000787.. method:: HTTPRedirectHandler.redirect_request(req, fp, code, msg, hdrs, newurl)
Georg Brandl116aa622007-08-15 14:28:22 +0000788
789 Return a :class:`Request` or ``None`` in response to a redirect. This is called
790 by the default implementations of the :meth:`http_error_30\*` methods when a
791 redirection is received from the server. If a redirection should take place,
792 return a new :class:`Request` to allow :meth:`http_error_30\*` to perform the
Georg Brandl9617a592009-02-13 10:40:43 +0000793 redirect to *newurl*. Otherwise, raise :exc:`HTTPError` if no other handler
794 should try to handle this URL, or return ``None`` if you can't but another
795 handler might.
Georg Brandl116aa622007-08-15 14:28:22 +0000796
797 .. note::
798
799 The default implementation of this method does not strictly follow :rfc:`2616`,
800 which says that 301 and 302 responses to ``POST`` requests must not be
801 automatically redirected without confirmation by the user. In reality, browsers
802 do allow automatic redirection of these responses, changing the POST to a
803 ``GET``, and the default implementation reproduces this behavior.
804
805
806.. method:: HTTPRedirectHandler.http_error_301(req, fp, code, msg, hdrs)
807
Georg Brandl9617a592009-02-13 10:40:43 +0000808 Redirect to the ``Location:`` or ``URI:`` URL. This method is called by the
809 parent :class:`OpenerDirector` when getting an HTTP 'moved permanently' response.
Georg Brandl116aa622007-08-15 14:28:22 +0000810
811
812.. method:: HTTPRedirectHandler.http_error_302(req, fp, code, msg, hdrs)
813
814 The same as :meth:`http_error_301`, but called for the 'found' response.
815
816
817.. method:: HTTPRedirectHandler.http_error_303(req, fp, code, msg, hdrs)
818
819 The same as :meth:`http_error_301`, but called for the 'see other' response.
820
821
822.. method:: HTTPRedirectHandler.http_error_307(req, fp, code, msg, hdrs)
823
824 The same as :meth:`http_error_301`, but called for the 'temporary redirect'
825 response.
826
827
828.. _http-cookie-processor:
829
830HTTPCookieProcessor Objects
831---------------------------
832
Georg Brandl116aa622007-08-15 14:28:22 +0000833:class:`HTTPCookieProcessor` instances have one attribute:
834
Georg Brandl116aa622007-08-15 14:28:22 +0000835.. attribute:: HTTPCookieProcessor.cookiejar
836
Georg Brandl24420152008-05-26 16:32:26 +0000837 The :class:`http.cookiejar.CookieJar` in which cookies are stored.
Georg Brandl116aa622007-08-15 14:28:22 +0000838
839
840.. _proxy-handler:
841
842ProxyHandler Objects
843--------------------
844
845
846.. method:: ProxyHandler.protocol_open(request)
847 :noindex:
848
849 The :class:`ProxyHandler` will have a method :meth:`protocol_open` for every
850 *protocol* which has a proxy in the *proxies* dictionary given in the
851 constructor. The method will modify requests to go through the proxy, by
852 calling ``request.set_proxy()``, and call the next handler in the chain to
853 actually execute the protocol.
854
855
856.. _http-password-mgr:
857
858HTTPPasswordMgr Objects
859-----------------------
860
861These methods are available on :class:`HTTPPasswordMgr` and
862:class:`HTTPPasswordMgrWithDefaultRealm` objects.
863
864
865.. method:: HTTPPasswordMgr.add_password(realm, uri, user, passwd)
866
867 *uri* can be either a single URI, or a sequence of URIs. *realm*, *user* and
868 *passwd* must be strings. This causes ``(user, passwd)`` to be used as
869 authentication tokens when authentication for *realm* and a super-URI of any of
870 the given URIs is given.
871
872
873.. method:: HTTPPasswordMgr.find_user_password(realm, authuri)
874
875 Get user/password for given realm and URI, if any. This method will return
876 ``(None, None)`` if there is no matching user/password.
877
878 For :class:`HTTPPasswordMgrWithDefaultRealm` objects, the realm ``None`` will be
879 searched if the given *realm* has no matching user/password.
880
881
882.. _abstract-basic-auth-handler:
883
884AbstractBasicAuthHandler Objects
885--------------------------------
886
887
888.. method:: AbstractBasicAuthHandler.http_error_auth_reqed(authreq, host, req, headers)
889
890 Handle an authentication request by getting a user/password pair, and re-trying
891 the request. *authreq* should be the name of the header where the information
892 about the realm is included in the request, *host* specifies the URL and path to
893 authenticate for, *req* should be the (failed) :class:`Request` object, and
894 *headers* should be the error headers.
895
896 *host* is either an authority (e.g. ``"python.org"``) or a URL containing an
897 authority component (e.g. ``"http://python.org/"``). In either case, the
898 authority must not contain a userinfo component (so, ``"python.org"`` and
899 ``"python.org:80"`` are fine, ``"joe:password@python.org"`` is not).
900
901
902.. _http-basic-auth-handler:
903
904HTTPBasicAuthHandler Objects
905----------------------------
906
907
908.. method:: HTTPBasicAuthHandler.http_error_401(req, fp, code, msg, hdrs)
909
910 Retry the request with authentication information, if available.
911
912
913.. _proxy-basic-auth-handler:
914
915ProxyBasicAuthHandler Objects
916-----------------------------
917
918
919.. method:: ProxyBasicAuthHandler.http_error_407(req, fp, code, msg, hdrs)
920
921 Retry the request with authentication information, if available.
922
923
924.. _abstract-digest-auth-handler:
925
926AbstractDigestAuthHandler Objects
927---------------------------------
928
929
930.. method:: AbstractDigestAuthHandler.http_error_auth_reqed(authreq, host, req, headers)
931
932 *authreq* should be the name of the header where the information about the realm
933 is included in the request, *host* should be the host to authenticate to, *req*
934 should be the (failed) :class:`Request` object, and *headers* should be the
935 error headers.
936
937
938.. _http-digest-auth-handler:
939
940HTTPDigestAuthHandler Objects
941-----------------------------
942
943
944.. method:: HTTPDigestAuthHandler.http_error_401(req, fp, code, msg, hdrs)
945
946 Retry the request with authentication information, if available.
947
948
949.. _proxy-digest-auth-handler:
950
951ProxyDigestAuthHandler Objects
952------------------------------
953
954
955.. method:: ProxyDigestAuthHandler.http_error_407(req, fp, code, msg, hdrs)
956
957 Retry the request with authentication information, if available.
958
959
960.. _http-handler-objects:
961
962HTTPHandler Objects
963-------------------
964
965
966.. method:: HTTPHandler.http_open(req)
967
968 Send an HTTP request, which can be either GET or POST, depending on
969 ``req.has_data()``.
970
971
972.. _https-handler-objects:
973
974HTTPSHandler Objects
975--------------------
976
977
978.. method:: HTTPSHandler.https_open(req)
979
980 Send an HTTPS request, which can be either GET or POST, depending on
981 ``req.has_data()``.
982
983
984.. _file-handler-objects:
985
986FileHandler Objects
987-------------------
988
989
990.. method:: FileHandler.file_open(req)
991
992 Open the file locally, if there is no host name, or the host name is
993 ``'localhost'``. Change the protocol to ``ftp`` otherwise, and retry opening it
994 using :attr:`parent`.
995
996
997.. _ftp-handler-objects:
998
999FTPHandler Objects
1000------------------
1001
1002
1003.. method:: FTPHandler.ftp_open(req)
1004
1005 Open the FTP file indicated by *req*. The login is always done with empty
1006 username and password.
1007
1008
1009.. _cacheftp-handler-objects:
1010
1011CacheFTPHandler Objects
1012-----------------------
1013
1014:class:`CacheFTPHandler` objects are :class:`FTPHandler` objects with the
1015following additional methods:
1016
1017
1018.. method:: CacheFTPHandler.setTimeout(t)
1019
1020 Set timeout of connections to *t* seconds.
1021
1022
1023.. method:: CacheFTPHandler.setMaxConns(m)
1024
1025 Set maximum number of cached connections to *m*.
1026
1027
1028.. _unknown-handler-objects:
1029
1030UnknownHandler Objects
1031----------------------
1032
1033
1034.. method:: UnknownHandler.unknown_open()
1035
1036 Raise a :exc:`URLError` exception.
1037
1038
1039.. _http-error-processor-objects:
1040
1041HTTPErrorProcessor Objects
1042--------------------------
1043
Georg Brandl116aa622007-08-15 14:28:22 +00001044.. method:: HTTPErrorProcessor.unknown_open()
1045
1046 Process HTTP error responses.
1047
1048 For 200 error codes, the response object is returned immediately.
1049
1050 For non-200 error codes, this simply passes the job on to the
1051 :meth:`protocol_error_code` handler methods, via :meth:`OpenerDirector.error`.
Georg Brandl0f7ede42008-06-23 11:23:31 +00001052 Eventually, :class:`HTTPDefaultErrorHandler` will raise an
Georg Brandl116aa622007-08-15 14:28:22 +00001053 :exc:`HTTPError` if no other handler handles the error.
1054
Georg Brandl0f7ede42008-06-23 11:23:31 +00001055
1056.. _urllib-request-examples:
Georg Brandl116aa622007-08-15 14:28:22 +00001057
1058Examples
1059--------
1060
Senthil Kumaran0c2d8b82010-04-22 10:53:30 +00001061This example gets the python.org main page and displays the first 300 bytes of
Georg Brandlbdc55ab2010-04-20 18:15:54 +00001062it. ::
Georg Brandl116aa622007-08-15 14:28:22 +00001063
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001064 >>> import urllib.request
1065 >>> f = urllib.request.urlopen('http://www.python.org/')
Senthil Kumaran0c2d8b82010-04-22 10:53:30 +00001066 >>> print(f.read(300))
1067 b'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
1068 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">\n\n\n<html
1069 xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">\n\n<head>\n
1070 <meta http-equiv="content-type" content="text/html; charset=utf-8" />\n
1071 <title>Python Programming '
Senthil Kumaranb213ee32010-04-15 17:18:22 +00001072
Senthil Kumaran0c2d8b82010-04-22 10:53:30 +00001073Note that urlopen returns a bytes object. This is because there is no way
1074for urlopen to automatically determine the encoding of the byte stream
1075it receives from the http server. In general, a program will decode
1076the returned bytes object to string once it determines or guesses
1077the appropriate encoding.
Senthil Kumaranb213ee32010-04-15 17:18:22 +00001078
Senthil Kumaran0c2d8b82010-04-22 10:53:30 +00001079The following W3C document, http://www.w3.org/International/O-charset , lists
1080the various ways in which a (X)HTML or a XML document could have specified its
1081encoding information.
1082
1083As python.org website uses *utf-8* encoding as specified in it's meta tag, we
1084will use same for decoding the bytes object. ::
Senthil Kumaranb213ee32010-04-15 17:18:22 +00001085
1086 >>> import urllib.request
1087 >>> f = urllib.request.urlopen('http://www.python.org/')
Georg Brandlfe4fd832010-05-21 21:01:32 +00001088 >>> print(f.read(100).decode('utf-8'))
Senthil Kumaran0c2d8b82010-04-22 10:53:30 +00001089 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
1090 "http://www.w3.org/TR/xhtml1/DTD/xhtm
1091
Georg Brandl116aa622007-08-15 14:28:22 +00001092
Senthil Kumaranb213ee32010-04-15 17:18:22 +00001093In the following example, we are sending a data-stream to the stdin of a CGI
1094and reading the data it returns to us. Note that this example will only work
1095when the Python installation supports SSL. ::
Georg Brandl116aa622007-08-15 14:28:22 +00001096
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001097 >>> import urllib.request
1098 >>> req = urllib.request.Request(url='https://localhost/cgi-bin/test.cgi',
Georg Brandl116aa622007-08-15 14:28:22 +00001099 ... data='This data is passed to stdin of the CGI')
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001100 >>> f = urllib.request.urlopen(req)
Senthil Kumaranb213ee32010-04-15 17:18:22 +00001101 >>> print(f.read().decode('utf-8'))
Georg Brandl116aa622007-08-15 14:28:22 +00001102 Got Data: "This data is passed to stdin of the CGI"
1103
1104The code for the sample CGI used in the above example is::
1105
1106 #!/usr/bin/env python
1107 import sys
1108 data = sys.stdin.read()
Collin Winterc79461b2007-09-01 23:34:30 +00001109 print('Content-type: text-plain\n\nGot Data: "%s"' % data)
Georg Brandl116aa622007-08-15 14:28:22 +00001110
1111Use of Basic HTTP Authentication::
1112
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001113 import urllib.request
Georg Brandl116aa622007-08-15 14:28:22 +00001114 # Create an OpenerDirector with support for Basic HTTP Authentication...
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001115 auth_handler = urllib.request.HTTPBasicAuthHandler()
Georg Brandl116aa622007-08-15 14:28:22 +00001116 auth_handler.add_password(realm='PDQ Application',
1117 uri='https://mahler:8092/site-updates.py',
1118 user='klem',
1119 passwd='kadidd!ehopper')
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001120 opener = urllib.request.build_opener(auth_handler)
Georg Brandl116aa622007-08-15 14:28:22 +00001121 # ...and install it globally so it can be used with urlopen.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001122 urllib.request.install_opener(opener)
1123 urllib.request.urlopen('http://www.example.com/login.html')
Georg Brandl116aa622007-08-15 14:28:22 +00001124
1125:func:`build_opener` provides many handlers by default, including a
1126:class:`ProxyHandler`. By default, :class:`ProxyHandler` uses the environment
1127variables named ``<scheme>_proxy``, where ``<scheme>`` is the URL scheme
1128involved. For example, the :envvar:`http_proxy` environment variable is read to
1129obtain the HTTP proxy's URL.
1130
1131This example replaces the default :class:`ProxyHandler` with one that uses
Georg Brandl2ee470f2008-07-16 12:55:28 +00001132programmatically-supplied proxy URLs, and adds proxy authorization support with
Georg Brandl116aa622007-08-15 14:28:22 +00001133:class:`ProxyBasicAuthHandler`. ::
1134
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001135 proxy_handler = urllib.request.ProxyHandler({'http': 'http://www.example.com:3128/'})
Senthil Kumaran037f8362009-12-24 02:24:37 +00001136 proxy_auth_handler = urllib.request.ProxyBasicAuthHandler()
Georg Brandl116aa622007-08-15 14:28:22 +00001137 proxy_auth_handler.add_password('realm', 'host', 'username', 'password')
1138
Senthil Kumaran037f8362009-12-24 02:24:37 +00001139 opener = urllib.request.build_opener(proxy_handler, proxy_auth_handler)
Georg Brandl116aa622007-08-15 14:28:22 +00001140 # This time, rather than install the OpenerDirector, we use it directly:
1141 opener.open('http://www.example.com/login.html')
1142
1143Adding HTTP headers:
1144
1145Use the *headers* argument to the :class:`Request` constructor, or::
1146
Georg Brandl029986a2008-06-23 11:44:14 +00001147 import urllib.request
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001148 req = urllib.request.Request('http://www.example.com/')
Georg Brandl116aa622007-08-15 14:28:22 +00001149 req.add_header('Referer', 'http://www.python.org/')
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001150 r = urllib.request.urlopen(req)
Georg Brandl116aa622007-08-15 14:28:22 +00001151
1152:class:`OpenerDirector` automatically adds a :mailheader:`User-Agent` header to
1153every :class:`Request`. To change this::
1154
Georg Brandl029986a2008-06-23 11:44:14 +00001155 import urllib.request
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001156 opener = urllib.request.build_opener()
Georg Brandl116aa622007-08-15 14:28:22 +00001157 opener.addheaders = [('User-agent', 'Mozilla/5.0')]
1158 opener.open('http://www.example.com/')
1159
1160Also, remember that a few standard headers (:mailheader:`Content-Length`,
1161:mailheader:`Content-Type` and :mailheader:`Host`) are added when the
1162:class:`Request` is passed to :func:`urlopen` (or :meth:`OpenerDirector.open`).
1163
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001164.. _urllib-examples:
1165
1166Here is an example session that uses the ``GET`` method to retrieve a URL
1167containing parameters::
1168
1169 >>> import urllib.request
1170 >>> import urllib.parse
1171 >>> params = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
1172 >>> f = urllib.request.urlopen("http://www.musi-cal.com/cgi-bin/query?%s" % params)
Senthil Kumaranb213ee32010-04-15 17:18:22 +00001173 >>> print(f.read().decode('utf-8'))
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001174
1175The following example uses the ``POST`` method instead::
1176
1177 >>> import urllib.request
1178 >>> import urllib.parse
1179 >>> params = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
1180 >>> f = urllib.request.urlopen("http://www.musi-cal.com/cgi-bin/query", params)
Senthil Kumaranb213ee32010-04-15 17:18:22 +00001181 >>> print(f.read().decode('utf-8'))
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001182
1183The following example uses an explicitly specified HTTP proxy, overriding
1184environment settings::
1185
1186 >>> import urllib.request
1187 >>> proxies = {'http': 'http://proxy.example.com:8080/'}
1188 >>> opener = urllib.request.FancyURLopener(proxies)
1189 >>> f = opener.open("http://www.python.org")
Senthil Kumaranb213ee32010-04-15 17:18:22 +00001190 >>> f.read().decode('utf-8')
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001191
1192The following example uses no proxies at all, overriding environment settings::
1193
1194 >>> import urllib.request
1195 >>> opener = urllib.request.FancyURLopener({})
1196 >>> f = opener.open("http://www.python.org/")
Senthil Kumaranb213ee32010-04-15 17:18:22 +00001197 >>> f.read().decode('utf-8')
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001198
1199
1200:mod:`urllib.request` Restrictions
1201----------------------------------
1202
1203 .. index::
1204 pair: HTTP; protocol
1205 pair: FTP; protocol
1206
1207* Currently, only the following protocols are supported: HTTP, (versions 0.9 and
1208 1.0), FTP, and local files.
1209
1210* The caching feature of :func:`urlretrieve` has been disabled until I find the
1211 time to hack proper processing of Expiration time headers.
1212
1213* There should be a function to query whether a particular URL is in the cache.
1214
1215* For backward compatibility, if a URL appears to point to a local file but the
1216 file can't be opened, the URL is re-interpreted using the FTP protocol. This
1217 can sometimes cause confusing error messages.
1218
1219* The :func:`urlopen` and :func:`urlretrieve` functions can cause arbitrarily
1220 long delays while waiting for a network connection to be set up. This means
1221 that it is difficult to build an interactive Web client using these functions
1222 without using threads.
1223
1224 .. index::
1225 single: HTML
1226 pair: HTTP; protocol
1227
1228* The data returned by :func:`urlopen` or :func:`urlretrieve` is the raw data
1229 returned by the server. This may be binary data (such as an image), plain text
1230 or (for example) HTML. The HTTP protocol provides type information in the reply
1231 header, which can be inspected by looking at the :mailheader:`Content-Type`
1232 header. If the returned data is HTML, you can use the module
1233 :mod:`html.parser` to parse it.
1234
1235 .. index:: single: FTP
1236
1237* The code handling the FTP protocol cannot differentiate between a file and a
1238 directory. This can lead to unexpected behavior when attempting to read a URL
1239 that points to a file that is not accessible. If the URL ends in a ``/``, it is
1240 assumed to refer to a directory and will be handled accordingly. But if an
1241 attempt to read a file leads to a 550 error (meaning the URL cannot be found or
1242 is not accessible, often for permission reasons), then the path is treated as a
1243 directory in order to handle the case when a directory is specified by a URL but
1244 the trailing ``/`` has been left off. This can cause misleading results when
1245 you try to fetch a file whose read permissions make it inaccessible; the FTP
1246 code will try to read it, fail with a 550 error, and then perform a directory
1247 listing for the unreadable file. If fine-grained control is needed, consider
1248 using the :mod:`ftplib` module, subclassing :class:`FancyURLOpener`, or changing
1249 *_urlopener* to meet your needs.
1250
Georg Brandl0f7ede42008-06-23 11:23:31 +00001251
1252
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001253:mod:`urllib.response` --- Response classes used by urllib.
1254===========================================================
Georg Brandl0f7ede42008-06-23 11:23:31 +00001255
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001256.. module:: urllib.response
1257 :synopsis: Response classes used by urllib.
1258
1259The :mod:`urllib.response` module defines functions and classes which define a
Georg Brandl0f7ede42008-06-23 11:23:31 +00001260minimal file like interface, including ``read()`` and ``readline()``. The
1261typical response object is an addinfourl instance, which defines and ``info()``
1262method and that returns headers and a ``geturl()`` method that returns the url.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001263Functions defined by this module are used internally by the
1264:mod:`urllib.request` module.
1265