blob: b43d9a9c9d2b85ee3701b4a0fe69ff7b6de7801b [file] [log] [blame]
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001:mod:`urllib.request` --- extensible library for opening URLs
2=============================================================
Georg Brandl116aa622007-08-15 14:28:22 +00003
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00004.. module:: urllib.request
Georg Brandl116aa622007-08-15 14:28:22 +00005 :synopsis: Next generation URL opening library.
Jeremy Hyltone2573162009-03-31 14:38:13 +00006.. moduleauthor:: Jeremy Hylton <jeremy@alum.mit.edu>
Georg Brandl116aa622007-08-15 14:28:22 +00007.. sectionauthor:: Moshe Zadka <moshez@users.sourceforge.net>
8
9
Georg Brandl0f7ede42008-06-23 11:23:31 +000010The :mod:`urllib.request` module defines functions and classes which help in
11opening URLs (mostly HTTP) in a complex world --- basic and digest
12authentication, redirections, cookies and more.
Georg Brandl116aa622007-08-15 14:28:22 +000013
Senthil Kumaranaca8fd72008-06-23 04:41:59 +000014The :mod:`urllib.request` module defines the following functions:
Georg Brandl116aa622007-08-15 14:28:22 +000015
16
Georg Brandl7f01a132009-09-16 15:58:14 +000017.. function:: urlopen(url, data=None[, timeout])
Georg Brandl116aa622007-08-15 14:28:22 +000018
Jeremy Hyltone2573162009-03-31 14:38:13 +000019 Open the URL *url*, which can be either a string or a
20 :class:`Request` object.
Georg Brandl116aa622007-08-15 14:28:22 +000021
Jeremy Hyltone2573162009-03-31 14:38:13 +000022 *data* may be a string specifying additional data to send to the
23 server, or ``None`` if no such data is needed. Currently HTTP
24 requests are the only ones that use *data*; the HTTP request will
25 be a POST instead of a GET when the *data* parameter is provided.
26 *data* should be a buffer in the standard
Georg Brandl116aa622007-08-15 14:28:22 +000027 :mimetype:`application/x-www-form-urlencoded` format. The
Georg Brandl7fe2c4a2008-12-05 07:32:56 +000028 :func:`urllib.parse.urlencode` function takes a mapping or sequence
29 of 2-tuples and returns a string in this format.
Georg Brandl116aa622007-08-15 14:28:22 +000030
Jeremy Hyltone2573162009-03-31 14:38:13 +000031 The optional *timeout* parameter specifies a timeout in seconds for
32 blocking operations like the connection attempt (if not specified,
33 the global default timeout setting will be used). This actually
34 only works for HTTP, HTTPS, FTP and FTPS connections.
Georg Brandl116aa622007-08-15 14:28:22 +000035
Senthil Kumaranaca8fd72008-06-23 04:41:59 +000036 This function returns a file-like object with two additional methods from
37 the :mod:`urllib.response` module
Georg Brandl116aa622007-08-15 14:28:22 +000038
Jeremy Hyltone2573162009-03-31 14:38:13 +000039 * :meth:`geturl` --- return the URL of the resource retrieved,
40 commonly used to determine if a redirect was followed
Georg Brandl116aa622007-08-15 14:28:22 +000041
Georg Brandl2dd01042009-02-27 16:46:46 +000042 * :meth:`info` --- return the meta-information of the page, such as headers,
Benjamin Petersond23f8222009-04-05 19:13:16 +000043 in the form of an :class:`http.client.HTTPMessage` instance (see `Quick
Georg Brandl2dd01042009-02-27 16:46:46 +000044 Reference to HTTP Headers <http://www.cs.tut.fi/~jkorpela/http.html>`_)
Georg Brandl116aa622007-08-15 14:28:22 +000045
46 Raises :exc:`URLError` on errors.
47
Georg Brandl2dd01042009-02-27 16:46:46 +000048 Note that ``None`` may be returned if no handler handles the request (though
49 the default installed global :class:`OpenerDirector` uses
50 :class:`UnknownHandler` to ensure this never happens).
51
Senthil Kumarana51a1b32009-10-18 01:42:33 +000052 In addition, default installed :class:`ProxyHandler` makes sure the requests
53 are handled through the proxy when they are set.
54
Georg Brandl2dd01042009-02-27 16:46:46 +000055 The legacy ``urllib.urlopen`` function from Python 2.6 and earlier has been
56 discontinued; :func:`urlopen` corresponds to the old ``urllib2.urlopen``.
57 Proxy handling, which was done by passing a dictionary parameter to
58 ``urllib.urlopen``, can be obtained by using :class:`ProxyHandler` objects.
Georg Brandl116aa622007-08-15 14:28:22 +000059
Georg Brandl116aa622007-08-15 14:28:22 +000060.. function:: install_opener(opener)
61
62 Install an :class:`OpenerDirector` instance as the default global opener.
63 Installing an opener is only necessary if you want urlopen to use that opener;
64 otherwise, simply call :meth:`OpenerDirector.open` instead of :func:`urlopen`.
65 The code does not check for a real :class:`OpenerDirector`, and any class with
66 the appropriate interface will work.
67
68
69.. function:: build_opener([handler, ...])
70
71 Return an :class:`OpenerDirector` instance, which chains the handlers in the
72 order given. *handler*\s can be either instances of :class:`BaseHandler`, or
73 subclasses of :class:`BaseHandler` (in which case it must be possible to call
74 the constructor without any parameters). Instances of the following classes
75 will be in front of the *handler*\s, unless the *handler*\s contain them,
76 instances of them or subclasses of them: :class:`ProxyHandler`,
77 :class:`UnknownHandler`, :class:`HTTPHandler`, :class:`HTTPDefaultErrorHandler`,
78 :class:`HTTPRedirectHandler`, :class:`FTPHandler`, :class:`FileHandler`,
79 :class:`HTTPErrorProcessor`.
80
Georg Brandl7f01a132009-09-16 15:58:14 +000081 If the Python installation has SSL support (i.e., if the :mod:`ssl` module
82 can be imported), :class:`HTTPSHandler` will also be added.
Georg Brandl116aa622007-08-15 14:28:22 +000083
Georg Brandle6bcc912008-05-12 18:05:20 +000084 A :class:`BaseHandler` subclass may also change its :attr:`handler_order`
85 member variable to modify its position in the handlers list.
Georg Brandl116aa622007-08-15 14:28:22 +000086
Georg Brandl7f01a132009-09-16 15:58:14 +000087
88.. function:: urlretrieve(url, filename=None, reporthook=None, data=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +000089
90 Copy a network object denoted by a URL to a local file, if necessary. If the URL
91 points to a local file, or a valid cached copy of the object exists, the object
92 is not copied. Return a tuple ``(filename, headers)`` where *filename* is the
93 local file name under which the object can be found, and *headers* is whatever
94 the :meth:`info` method of the object returned by :func:`urlopen` returned (for
95 a remote object, possibly cached). Exceptions are the same as for
96 :func:`urlopen`.
97
98 The second argument, if present, specifies the file location to copy to (if
99 absent, the location will be a tempfile with a generated name). The third
100 argument, if present, is a hook function that will be called once on
101 establishment of the network connection and once after each block read
102 thereafter. The hook will be passed three arguments; a count of blocks
103 transferred so far, a block size in bytes, and the total size of the file. The
104 third argument may be ``-1`` on older FTP servers which do not return a file
105 size in response to a retrieval request.
106
107 If the *url* uses the :file:`http:` scheme identifier, the optional *data*
108 argument may be given to specify a ``POST`` request (normally the request type
109 is ``GET``). The *data* argument must in standard
110 :mimetype:`application/x-www-form-urlencoded` format; see the :func:`urlencode`
111 function below.
112
113 :func:`urlretrieve` will raise :exc:`ContentTooShortError` when it detects that
114 the amount of data available was less than the expected amount (which is the
115 size reported by a *Content-Length* header). This can occur, for example, when
116 the download is interrupted.
117
118 The *Content-Length* is treated as a lower bound: if there's more data to read,
119 urlretrieve reads more data, but if less data is available, it raises the
120 exception.
121
122 You can still retrieve the downloaded data in this case, it is stored in the
123 :attr:`content` attribute of the exception instance.
124
125 If no *Content-Length* header was supplied, urlretrieve can not check the size
126 of the data it has downloaded, and just returns it. In this case you just have
127 to assume that the download was successful.
Georg Brandl116aa622007-08-15 14:28:22 +0000128
129
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000130.. data:: _urlopener
Georg Brandl116aa622007-08-15 14:28:22 +0000131
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000132 The public functions :func:`urlopen` and :func:`urlretrieve` create an instance
133 of the :class:`FancyURLopener` class and use it to perform their requested
134 actions. To override this functionality, programmers can create a subclass of
135 :class:`URLopener` or :class:`FancyURLopener`, then assign an instance of that
136 class to the ``urllib._urlopener`` variable before calling the desired function.
137 For example, applications may want to specify a different
138 :mailheader:`User-Agent` header than :class:`URLopener` defines. This can be
139 accomplished with the following code::
Georg Brandl116aa622007-08-15 14:28:22 +0000140
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000141 import urllib.request
Christian Heimes292d3512008-02-03 16:51:08 +0000142
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000143 class AppURLopener(urllib.request.FancyURLopener):
144 version = "App/1.7"
145
146 urllib._urlopener = AppURLopener()
Christian Heimes292d3512008-02-03 16:51:08 +0000147
Georg Brandl116aa622007-08-15 14:28:22 +0000148
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000149.. function:: urlcleanup()
Georg Brandl116aa622007-08-15 14:28:22 +0000150
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000151 Clear the cache that may have been built up by previous calls to
152 :func:`urlretrieve`.
Christian Heimes292d3512008-02-03 16:51:08 +0000153
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000154.. function:: pathname2url(path)
Christian Heimes292d3512008-02-03 16:51:08 +0000155
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000156 Convert the pathname *path* from the local syntax for a path to the form used in
157 the path component of a URL. This does not produce a complete URL. The return
158 value will already be quoted using the :func:`quote` function.
Christian Heimes292d3512008-02-03 16:51:08 +0000159
160
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000161.. function:: url2pathname(path)
162
163 Convert the path component *path* from an encoded URL to the local syntax for a
164 path. This does not accept a complete URL. This function uses :func:`unquote`
165 to decode *path*.
Georg Brandl116aa622007-08-15 14:28:22 +0000166
Georg Brandl7f01a132009-09-16 15:58:14 +0000167
Georg Brandl116aa622007-08-15 14:28:22 +0000168The following classes are provided:
169
Georg Brandl7f01a132009-09-16 15:58:14 +0000170.. class:: Request(url, data=None, headers={}, origin_req_host=None, unverifiable=False)
Georg Brandl116aa622007-08-15 14:28:22 +0000171
172 This class is an abstraction of a URL request.
173
174 *url* should be a string containing a valid URL.
175
Jeremy Hyltone2573162009-03-31 14:38:13 +0000176 *data* may be a string specifying additional data to send to the
177 server, or ``None`` if no such data is needed. Currently HTTP
178 requests are the only ones that use *data*; the HTTP request will
179 be a POST instead of a GET when the *data* parameter is provided.
180 *data* should be a buffer in the standard
Georg Brandl116aa622007-08-15 14:28:22 +0000181 :mimetype:`application/x-www-form-urlencoded` format. The
Georg Brandl7fe2c4a2008-12-05 07:32:56 +0000182 :func:`urllib.parse.urlencode` function takes a mapping or sequence
183 of 2-tuples and returns a string in this format.
Georg Brandl116aa622007-08-15 14:28:22 +0000184
Jeremy Hyltone2573162009-03-31 14:38:13 +0000185 *headers* should be a dictionary, and will be treated as if
186 :meth:`add_header` was called with each key and value as arguments.
187 This is often used to "spoof" the ``User-Agent`` header, which is
188 used by a browser to identify itself -- some HTTP servers only
189 allow requests coming from common browsers as opposed to scripts.
190 For example, Mozilla Firefox may identify itself as ``"Mozilla/5.0
191 (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11"``, while
192 :mod:`urllib`'s default user agent string is
193 ``"Python-urllib/2.6"`` (on Python 2.6).
Georg Brandl116aa622007-08-15 14:28:22 +0000194
Jeremy Hyltone2573162009-03-31 14:38:13 +0000195 The final two arguments are only of interest for correct handling
196 of third-party HTTP cookies:
Georg Brandl116aa622007-08-15 14:28:22 +0000197
Jeremy Hyltone2573162009-03-31 14:38:13 +0000198 *origin_req_host* should be the request-host of the origin
199 transaction, as defined by :rfc:`2965`. It defaults to
200 ``http.cookiejar.request_host(self)``. This is the host name or IP
201 address of the original request that was initiated by the user.
202 For example, if the request is for an image in an HTML document,
203 this should be the request-host of the request for the page
Georg Brandl24420152008-05-26 16:32:26 +0000204 containing the image.
Georg Brandl116aa622007-08-15 14:28:22 +0000205
Jeremy Hyltone2573162009-03-31 14:38:13 +0000206 *unverifiable* should indicate whether the request is unverifiable,
207 as defined by RFC 2965. It defaults to False. An unverifiable
208 request is one whose URL the user did not have the option to
209 approve. For example, if the request is for an image in an HTML
210 document, and the user had no option to approve the automatic
211 fetching of the image, this should be true.
Georg Brandl116aa622007-08-15 14:28:22 +0000212
Georg Brandl7f01a132009-09-16 15:58:14 +0000213
214.. class:: URLopener(proxies=None, **x509)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000215
216 Base class for opening and reading URLs. Unless you need to support opening
217 objects using schemes other than :file:`http:`, :file:`ftp:`, or :file:`file:`,
218 you probably want to use :class:`FancyURLopener`.
219
220 By default, the :class:`URLopener` class sends a :mailheader:`User-Agent` header
221 of ``urllib/VVV``, where *VVV* is the :mod:`urllib` version number.
222 Applications can define their own :mailheader:`User-Agent` header by subclassing
223 :class:`URLopener` or :class:`FancyURLopener` and setting the class attribute
224 :attr:`version` to an appropriate string value in the subclass definition.
225
226 The optional *proxies* parameter should be a dictionary mapping scheme names to
227 proxy URLs, where an empty dictionary turns proxies off completely. Its default
228 value is ``None``, in which case environmental proxy settings will be used if
229 present, as discussed in the definition of :func:`urlopen`, above.
230
231 Additional keyword parameters, collected in *x509*, may be used for
232 authentication of the client when using the :file:`https:` scheme. The keywords
233 *key_file* and *cert_file* are supported to provide an SSL key and certificate;
234 both are needed to support client authentication.
235
236 :class:`URLopener` objects will raise an :exc:`IOError` exception if the server
237 returns an error code.
238
Georg Brandl7f01a132009-09-16 15:58:14 +0000239 .. method:: open(fullurl, data=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000240
241 Open *fullurl* using the appropriate protocol. This method sets up cache and
242 proxy information, then calls the appropriate open method with its input
243 arguments. If the scheme is not recognized, :meth:`open_unknown` is called.
244 The *data* argument has the same meaning as the *data* argument of
245 :func:`urlopen`.
246
247
Georg Brandl7f01a132009-09-16 15:58:14 +0000248 .. method:: open_unknown(fullurl, data=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000249
250 Overridable interface to open unknown URL types.
251
252
Georg Brandl7f01a132009-09-16 15:58:14 +0000253 .. method:: retrieve(url, filename=None, reporthook=None, data=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000254
255 Retrieves the contents of *url* and places it in *filename*. The return value
256 is a tuple consisting of a local filename and either a
257 :class:`email.message.Message` object containing the response headers (for remote
258 URLs) or ``None`` (for local URLs). The caller must then open and read the
259 contents of *filename*. If *filename* is not given and the URL refers to a
260 local file, the input filename is returned. If the URL is non-local and
261 *filename* is not given, the filename is the output of :func:`tempfile.mktemp`
262 with a suffix that matches the suffix of the last path component of the input
263 URL. If *reporthook* is given, it must be a function accepting three numeric
264 parameters. It will be called after each chunk of data is read from the
265 network. *reporthook* is ignored for local URLs.
266
267 If the *url* uses the :file:`http:` scheme identifier, the optional *data*
268 argument may be given to specify a ``POST`` request (normally the request type
269 is ``GET``). The *data* argument must in standard
270 :mimetype:`application/x-www-form-urlencoded` format; see the :func:`urlencode`
271 function below.
272
273
274 .. attribute:: version
275
276 Variable that specifies the user agent of the opener object. To get
277 :mod:`urllib` to tell servers that it is a particular user agent, set this in a
278 subclass as a class variable or in the constructor before calling the base
279 constructor.
280
281
282.. class:: FancyURLopener(...)
283
284 :class:`FancyURLopener` subclasses :class:`URLopener` providing default handling
285 for the following HTTP response codes: 301, 302, 303, 307 and 401. For the 30x
286 response codes listed above, the :mailheader:`Location` header is used to fetch
287 the actual URL. For 401 response codes (authentication required), basic HTTP
288 authentication is performed. For the 30x response codes, recursion is bounded
289 by the value of the *maxtries* attribute, which defaults to 10.
290
291 For all other response codes, the method :meth:`http_error_default` is called
292 which you can override in subclasses to handle the error appropriately.
293
294 .. note::
295
296 According to the letter of :rfc:`2616`, 301 and 302 responses to POST requests
297 must not be automatically redirected without confirmation by the user. In
298 reality, browsers do allow automatic redirection of these responses, changing
299 the POST to a GET, and :mod:`urllib` reproduces this behaviour.
300
301 The parameters to the constructor are the same as those for :class:`URLopener`.
302
303 .. note::
304
305 When performing basic authentication, a :class:`FancyURLopener` instance calls
306 its :meth:`prompt_user_passwd` method. The default implementation asks the
307 users for the required information on the controlling terminal. A subclass may
308 override this method to support more appropriate behavior if needed.
309
310 The :class:`FancyURLopener` class offers one additional method that should be
311 overloaded to provide the appropriate behavior:
312
313 .. method:: prompt_user_passwd(host, realm)
314
315 Return information needed to authenticate the user at the given host in the
316 specified security realm. The return value should be a tuple, ``(user,
317 password)``, which can be used for basic authentication.
318
319 The implementation prompts for this information on the terminal; an application
320 should override this method to use an appropriate interaction model in the local
321 environment.
Georg Brandl116aa622007-08-15 14:28:22 +0000322
323.. class:: OpenerDirector()
324
325 The :class:`OpenerDirector` class opens URLs via :class:`BaseHandler`\ s chained
326 together. It manages the chaining of handlers, and recovery from errors.
327
328
329.. class:: BaseHandler()
330
331 This is the base class for all registered handlers --- and handles only the
332 simple mechanics of registration.
333
334
335.. class:: HTTPDefaultErrorHandler()
336
337 A class which defines a default handler for HTTP error responses; all responses
338 are turned into :exc:`HTTPError` exceptions.
339
340
341.. class:: HTTPRedirectHandler()
342
343 A class to handle redirections.
344
345
Georg Brandl7f01a132009-09-16 15:58:14 +0000346.. class:: HTTPCookieProcessor(cookiejar=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000347
348 A class to handle HTTP Cookies.
349
350
Georg Brandl7f01a132009-09-16 15:58:14 +0000351.. class:: ProxyHandler(proxies=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000352
353 Cause requests to go through a proxy. If *proxies* is given, it must be a
354 dictionary mapping protocol names to URLs of proxies. The default is to read the
355 list of proxies from the environment variables :envvar:`<protocol>_proxy`.
Senthil Kumarana51a1b32009-10-18 01:42:33 +0000356 If no proxy environment variables are set, in a Windows environment, proxy
357 settings are obtained from the registry's Internet Settings section and in a
358 Mac OS X environment, proxy information is retrieved from the OS X System
359 Configuration Framework.
360
Christian Heimese25f35e2008-03-20 10:49:03 +0000361 To disable autodetected proxy pass an empty dictionary.
Georg Brandl116aa622007-08-15 14:28:22 +0000362
363
364.. class:: HTTPPasswordMgr()
365
366 Keep a database of ``(realm, uri) -> (user, password)`` mappings.
367
368
369.. class:: HTTPPasswordMgrWithDefaultRealm()
370
371 Keep a database of ``(realm, uri) -> (user, password)`` mappings. A realm of
372 ``None`` is considered a catch-all realm, which is searched if no other realm
373 fits.
374
375
Georg Brandl7f01a132009-09-16 15:58:14 +0000376.. class:: AbstractBasicAuthHandler(password_mgr=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000377
378 This is a mixin class that helps with HTTP authentication, both to the remote
379 host and to a proxy. *password_mgr*, if given, should be something that is
380 compatible with :class:`HTTPPasswordMgr`; refer to section
381 :ref:`http-password-mgr` for information on the interface that must be
382 supported.
383
384
Georg Brandl7f01a132009-09-16 15:58:14 +0000385.. class:: HTTPBasicAuthHandler(password_mgr=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000386
387 Handle authentication with the remote host. *password_mgr*, if given, should be
388 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
389 :ref:`http-password-mgr` for information on the interface that must be
390 supported.
391
392
Georg Brandl7f01a132009-09-16 15:58:14 +0000393.. class:: ProxyBasicAuthHandler(password_mgr=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000394
395 Handle authentication with the proxy. *password_mgr*, if given, should be
396 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
397 :ref:`http-password-mgr` for information on the interface that must be
398 supported.
399
400
Georg Brandl7f01a132009-09-16 15:58:14 +0000401.. class:: AbstractDigestAuthHandler(password_mgr=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000402
403 This is a mixin class that helps with HTTP authentication, both to the remote
404 host and to a proxy. *password_mgr*, if given, should be something that is
405 compatible with :class:`HTTPPasswordMgr`; refer to section
406 :ref:`http-password-mgr` for information on the interface that must be
407 supported.
408
409
Georg Brandl7f01a132009-09-16 15:58:14 +0000410.. class:: HTTPDigestAuthHandler(password_mgr=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000411
412 Handle authentication with the remote host. *password_mgr*, if given, should be
413 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
414 :ref:`http-password-mgr` for information on the interface that must be
415 supported.
416
417
Georg Brandl7f01a132009-09-16 15:58:14 +0000418.. class:: ProxyDigestAuthHandler(password_mgr=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000419
420 Handle authentication with the proxy. *password_mgr*, if given, should be
421 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
422 :ref:`http-password-mgr` for information on the interface that must be
423 supported.
424
425
426.. class:: HTTPHandler()
427
428 A class to handle opening of HTTP URLs.
429
430
431.. class:: HTTPSHandler()
432
433 A class to handle opening of HTTPS URLs.
434
435
436.. class:: FileHandler()
437
438 Open local files.
439
440
441.. class:: FTPHandler()
442
443 Open FTP URLs.
444
445
446.. class:: CacheFTPHandler()
447
448 Open FTP URLs, keeping a cache of open FTP connections to minimize delays.
449
450
451.. class:: UnknownHandler()
452
453 A catch-all class to handle unknown URLs.
454
455
456.. _request-objects:
457
458Request Objects
459---------------
460
Jeremy Hyltone2573162009-03-31 14:38:13 +0000461The following methods describe :class:`Request`'s public interface,
462and so all may be overridden in subclasses. It also defines several
463public attributes that can be used by clients to inspect the parsed
464request.
Georg Brandl116aa622007-08-15 14:28:22 +0000465
Jeremy Hyltone2573162009-03-31 14:38:13 +0000466.. attribute:: Request.full_url
467
468 The original URL passed to the constructor.
469
470.. attribute:: Request.type
471
472 The URI scheme.
473
474.. attribute:: Request.host
475
476 The URI authority, typically a host, but may also contain a port
477 separated by a colon.
478
479.. attribute:: Request.origin_req_host
480
481 The original host for the request, without port.
482
483.. attribute:: Request.selector
484
485 The URI path. If the :class:`Request` uses a proxy, then selector
486 will be the full url that is passed to the proxy.
487
488.. attribute:: Request.data
489
490 The entity body for the request, or None if not specified.
491
492.. attribute:: Request.unverifiable
493
494 boolean, indicates whether the request is unverifiable as defined
495 by RFC 2965.
Georg Brandl116aa622007-08-15 14:28:22 +0000496
497.. method:: Request.add_data(data)
498
499 Set the :class:`Request` data to *data*. This is ignored by all handlers except
500 HTTP handlers --- and there it should be a byte string, and will change the
501 request to be ``POST`` rather than ``GET``.
502
503
504.. method:: Request.get_method()
505
506 Return a string indicating the HTTP request method. This is only meaningful for
507 HTTP requests, and currently always returns ``'GET'`` or ``'POST'``.
508
509
510.. method:: Request.has_data()
511
512 Return whether the instance has a non-\ ``None`` data.
513
514
515.. method:: Request.get_data()
516
517 Return the instance's data.
518
519
520.. method:: Request.add_header(key, val)
521
522 Add another header to the request. Headers are currently ignored by all
523 handlers except HTTP handlers, where they are added to the list of headers sent
524 to the server. Note that there cannot be more than one header with the same
525 name, and later calls will overwrite previous calls in case the *key* collides.
526 Currently, this is no loss of HTTP functionality, since all headers which have
527 meaning when used more than once have a (header-specific) way of gaining the
528 same functionality using only one header.
529
530
531.. method:: Request.add_unredirected_header(key, header)
532
533 Add a header that will not be added to a redirected request.
534
Georg Brandl116aa622007-08-15 14:28:22 +0000535
536.. method:: Request.has_header(header)
537
538 Return whether the instance has the named header (checks both regular and
539 unredirected).
540
Georg Brandl116aa622007-08-15 14:28:22 +0000541
542.. method:: Request.get_full_url()
543
544 Return the URL given in the constructor.
545
546
547.. method:: Request.get_type()
548
549 Return the type of the URL --- also known as the scheme.
550
551
552.. method:: Request.get_host()
553
554 Return the host to which a connection will be made.
555
556
557.. method:: Request.get_selector()
558
559 Return the selector --- the part of the URL that is sent to the server.
560
561
562.. method:: Request.set_proxy(host, type)
563
564 Prepare the request by connecting to a proxy server. The *host* and *type* will
565 replace those of the instance, and the instance's selector will be the original
566 URL given in the constructor.
567
568
569.. method:: Request.get_origin_req_host()
570
571 Return the request-host of the origin transaction, as defined by :rfc:`2965`.
572 See the documentation for the :class:`Request` constructor.
573
574
575.. method:: Request.is_unverifiable()
576
577 Return whether the request is unverifiable, as defined by RFC 2965. See the
578 documentation for the :class:`Request` constructor.
579
580
581.. _opener-director-objects:
582
583OpenerDirector Objects
584----------------------
585
586:class:`OpenerDirector` instances have the following methods:
587
588
589.. method:: OpenerDirector.add_handler(handler)
590
591 *handler* should be an instance of :class:`BaseHandler`. The following methods
592 are searched, and added to the possible chains (note that HTTP errors are a
593 special case).
594
595 * :meth:`protocol_open` --- signal that the handler knows how to open *protocol*
596 URLs.
597
598 * :meth:`http_error_type` --- signal that the handler knows how to handle HTTP
599 errors with HTTP error code *type*.
600
601 * :meth:`protocol_error` --- signal that the handler knows how to handle errors
602 from (non-\ ``http``) *protocol*.
603
604 * :meth:`protocol_request` --- signal that the handler knows how to pre-process
605 *protocol* requests.
606
607 * :meth:`protocol_response` --- signal that the handler knows how to
608 post-process *protocol* responses.
609
610
Georg Brandl7f01a132009-09-16 15:58:14 +0000611.. method:: OpenerDirector.open(url, data=None[, timeout])
Georg Brandl116aa622007-08-15 14:28:22 +0000612
613 Open the given *url* (which can be a request object or a string), optionally
Alexandre Vassalotti5f8ced22008-05-16 00:03:33 +0000614 passing the given *data*. Arguments, return values and exceptions raised are
615 the same as those of :func:`urlopen` (which simply calls the :meth:`open`
616 method on the currently installed global :class:`OpenerDirector`). The
617 optional *timeout* parameter specifies a timeout in seconds for blocking
Georg Brandlf78e02b2008-06-10 17:40:04 +0000618 operations like the connection attempt (if not specified, the global default
619 timeout setting will be usedi). The timeout feature actually works only for
620 HTTP, HTTPS, FTP and FTPS connections).
Georg Brandl116aa622007-08-15 14:28:22 +0000621
Georg Brandl116aa622007-08-15 14:28:22 +0000622
Georg Brandl7f01a132009-09-16 15:58:14 +0000623.. method:: OpenerDirector.error(proto, *args)
Georg Brandl116aa622007-08-15 14:28:22 +0000624
625 Handle an error of the given protocol. This will call the registered error
626 handlers for the given protocol with the given arguments (which are protocol
627 specific). The HTTP protocol is a special case which uses the HTTP response
628 code to determine the specific error handler; refer to the :meth:`http_error_\*`
629 methods of the handler classes.
630
631 Return values and exceptions raised are the same as those of :func:`urlopen`.
632
633OpenerDirector objects open URLs in three stages:
634
635The order in which these methods are called within each stage is determined by
636sorting the handler instances.
637
638#. Every handler with a method named like :meth:`protocol_request` has that
639 method called to pre-process the request.
640
641#. Handlers with a method named like :meth:`protocol_open` are called to handle
642 the request. This stage ends when a handler either returns a non-\ :const:`None`
643 value (ie. a response), or raises an exception (usually :exc:`URLError`).
644 Exceptions are allowed to propagate.
645
646 In fact, the above algorithm is first tried for methods named
647 :meth:`default_open`. If all such methods return :const:`None`, the algorithm
648 is repeated for methods named like :meth:`protocol_open`. If all such methods
649 return :const:`None`, the algorithm is repeated for methods named
650 :meth:`unknown_open`.
651
652 Note that the implementation of these methods may involve calls of the parent
653 :class:`OpenerDirector` instance's :meth:`.open` and :meth:`.error` methods.
654
655#. Every handler with a method named like :meth:`protocol_response` has that
656 method called to post-process the response.
657
658
659.. _base-handler-objects:
660
661BaseHandler Objects
662-------------------
663
664:class:`BaseHandler` objects provide a couple of methods that are directly
665useful, and others that are meant to be used by derived classes. These are
666intended for direct use:
667
668
669.. method:: BaseHandler.add_parent(director)
670
671 Add a director as parent.
672
673
674.. method:: BaseHandler.close()
675
676 Remove any parents.
677
678The following members and methods should only be used by classes derived from
679:class:`BaseHandler`.
680
681.. note::
682
683 The convention has been adopted that subclasses defining
684 :meth:`protocol_request` or :meth:`protocol_response` methods are named
685 :class:`\*Processor`; all others are named :class:`\*Handler`.
686
687
688.. attribute:: BaseHandler.parent
689
690 A valid :class:`OpenerDirector`, which can be used to open using a different
691 protocol, or handle errors.
692
693
694.. method:: BaseHandler.default_open(req)
695
696 This method is *not* defined in :class:`BaseHandler`, but subclasses should
697 define it if they want to catch all URLs.
698
699 This method, if implemented, will be called by the parent
700 :class:`OpenerDirector`. It should return a file-like object as described in
701 the return value of the :meth:`open` of :class:`OpenerDirector`, or ``None``.
702 It should raise :exc:`URLError`, unless a truly exceptional thing happens (for
703 example, :exc:`MemoryError` should not be mapped to :exc:`URLError`).
704
705 This method will be called before any protocol-specific open method.
706
707
708.. method:: BaseHandler.protocol_open(req)
709 :noindex:
710
711 This method is *not* defined in :class:`BaseHandler`, but subclasses should
712 define it if they want to handle URLs with the given protocol.
713
714 This method, if defined, will be called by the parent :class:`OpenerDirector`.
715 Return values should be the same as for :meth:`default_open`.
716
717
718.. method:: BaseHandler.unknown_open(req)
719
720 This method is *not* defined in :class:`BaseHandler`, but subclasses should
721 define it if they want to catch all URLs with no specific registered handler to
722 open it.
723
724 This method, if implemented, will be called by the :attr:`parent`
725 :class:`OpenerDirector`. Return values should be the same as for
726 :meth:`default_open`.
727
728
729.. method:: BaseHandler.http_error_default(req, fp, code, msg, hdrs)
730
731 This method is *not* defined in :class:`BaseHandler`, but subclasses should
732 override it if they intend to provide a catch-all for otherwise unhandled HTTP
733 errors. It will be called automatically by the :class:`OpenerDirector` getting
734 the error, and should not normally be called in other circumstances.
735
736 *req* will be a :class:`Request` object, *fp* will be a file-like object with
737 the HTTP error body, *code* will be the three-digit code of the error, *msg*
738 will be the user-visible explanation of the code and *hdrs* will be a mapping
739 object with the headers of the error.
740
741 Return values and exceptions raised should be the same as those of
742 :func:`urlopen`.
743
744
745.. method:: BaseHandler.http_error_nnn(req, fp, code, msg, hdrs)
746
747 *nnn* should be a three-digit HTTP error code. This method is also not defined
748 in :class:`BaseHandler`, but will be called, if it exists, on an instance of a
749 subclass, when an HTTP error with code *nnn* occurs.
750
751 Subclasses should override this method to handle specific HTTP errors.
752
753 Arguments, return values and exceptions raised should be the same as for
754 :meth:`http_error_default`.
755
756
757.. method:: BaseHandler.protocol_request(req)
758 :noindex:
759
760 This method is *not* defined in :class:`BaseHandler`, but subclasses should
761 define it if they want to pre-process requests of the given protocol.
762
763 This method, if defined, will be called by the parent :class:`OpenerDirector`.
764 *req* will be a :class:`Request` object. The return value should be a
765 :class:`Request` object.
766
767
768.. method:: BaseHandler.protocol_response(req, response)
769 :noindex:
770
771 This method is *not* defined in :class:`BaseHandler`, but subclasses should
772 define it if they want to post-process responses of the given protocol.
773
774 This method, if defined, will be called by the parent :class:`OpenerDirector`.
775 *req* will be a :class:`Request` object. *response* will be an object
776 implementing the same interface as the return value of :func:`urlopen`. The
777 return value should implement the same interface as the return value of
778 :func:`urlopen`.
779
780
781.. _http-redirect-handler:
782
783HTTPRedirectHandler Objects
784---------------------------
785
786.. note::
787
788 Some HTTP redirections require action from this module's client code. If this
789 is the case, :exc:`HTTPError` is raised. See :rfc:`2616` for details of the
790 precise meanings of the various redirection codes.
791
792
Georg Brandl9617a592009-02-13 10:40:43 +0000793.. method:: HTTPRedirectHandler.redirect_request(req, fp, code, msg, hdrs, newurl)
Georg Brandl116aa622007-08-15 14:28:22 +0000794
795 Return a :class:`Request` or ``None`` in response to a redirect. This is called
796 by the default implementations of the :meth:`http_error_30\*` methods when a
797 redirection is received from the server. If a redirection should take place,
798 return a new :class:`Request` to allow :meth:`http_error_30\*` to perform the
Georg Brandl9617a592009-02-13 10:40:43 +0000799 redirect to *newurl*. Otherwise, raise :exc:`HTTPError` if no other handler
800 should try to handle this URL, or return ``None`` if you can't but another
801 handler might.
Georg Brandl116aa622007-08-15 14:28:22 +0000802
803 .. note::
804
805 The default implementation of this method does not strictly follow :rfc:`2616`,
806 which says that 301 and 302 responses to ``POST`` requests must not be
807 automatically redirected without confirmation by the user. In reality, browsers
808 do allow automatic redirection of these responses, changing the POST to a
809 ``GET``, and the default implementation reproduces this behavior.
810
811
812.. method:: HTTPRedirectHandler.http_error_301(req, fp, code, msg, hdrs)
813
Georg Brandl9617a592009-02-13 10:40:43 +0000814 Redirect to the ``Location:`` or ``URI:`` URL. This method is called by the
815 parent :class:`OpenerDirector` when getting an HTTP 'moved permanently' response.
Georg Brandl116aa622007-08-15 14:28:22 +0000816
817
818.. method:: HTTPRedirectHandler.http_error_302(req, fp, code, msg, hdrs)
819
820 The same as :meth:`http_error_301`, but called for the 'found' response.
821
822
823.. method:: HTTPRedirectHandler.http_error_303(req, fp, code, msg, hdrs)
824
825 The same as :meth:`http_error_301`, but called for the 'see other' response.
826
827
828.. method:: HTTPRedirectHandler.http_error_307(req, fp, code, msg, hdrs)
829
830 The same as :meth:`http_error_301`, but called for the 'temporary redirect'
831 response.
832
833
834.. _http-cookie-processor:
835
836HTTPCookieProcessor Objects
837---------------------------
838
Georg Brandl116aa622007-08-15 14:28:22 +0000839:class:`HTTPCookieProcessor` instances have one attribute:
840
Georg Brandl116aa622007-08-15 14:28:22 +0000841.. attribute:: HTTPCookieProcessor.cookiejar
842
Georg Brandl24420152008-05-26 16:32:26 +0000843 The :class:`http.cookiejar.CookieJar` in which cookies are stored.
Georg Brandl116aa622007-08-15 14:28:22 +0000844
845
846.. _proxy-handler:
847
848ProxyHandler Objects
849--------------------
850
851
852.. method:: ProxyHandler.protocol_open(request)
853 :noindex:
854
855 The :class:`ProxyHandler` will have a method :meth:`protocol_open` for every
856 *protocol* which has a proxy in the *proxies* dictionary given in the
857 constructor. The method will modify requests to go through the proxy, by
858 calling ``request.set_proxy()``, and call the next handler in the chain to
859 actually execute the protocol.
860
861
862.. _http-password-mgr:
863
864HTTPPasswordMgr Objects
865-----------------------
866
867These methods are available on :class:`HTTPPasswordMgr` and
868:class:`HTTPPasswordMgrWithDefaultRealm` objects.
869
870
871.. method:: HTTPPasswordMgr.add_password(realm, uri, user, passwd)
872
873 *uri* can be either a single URI, or a sequence of URIs. *realm*, *user* and
874 *passwd* must be strings. This causes ``(user, passwd)`` to be used as
875 authentication tokens when authentication for *realm* and a super-URI of any of
876 the given URIs is given.
877
878
879.. method:: HTTPPasswordMgr.find_user_password(realm, authuri)
880
881 Get user/password for given realm and URI, if any. This method will return
882 ``(None, None)`` if there is no matching user/password.
883
884 For :class:`HTTPPasswordMgrWithDefaultRealm` objects, the realm ``None`` will be
885 searched if the given *realm* has no matching user/password.
886
887
888.. _abstract-basic-auth-handler:
889
890AbstractBasicAuthHandler Objects
891--------------------------------
892
893
894.. method:: AbstractBasicAuthHandler.http_error_auth_reqed(authreq, host, req, headers)
895
896 Handle an authentication request by getting a user/password pair, and re-trying
897 the request. *authreq* should be the name of the header where the information
898 about the realm is included in the request, *host* specifies the URL and path to
899 authenticate for, *req* should be the (failed) :class:`Request` object, and
900 *headers* should be the error headers.
901
902 *host* is either an authority (e.g. ``"python.org"``) or a URL containing an
903 authority component (e.g. ``"http://python.org/"``). In either case, the
904 authority must not contain a userinfo component (so, ``"python.org"`` and
905 ``"python.org:80"`` are fine, ``"joe:password@python.org"`` is not).
906
907
908.. _http-basic-auth-handler:
909
910HTTPBasicAuthHandler Objects
911----------------------------
912
913
914.. method:: HTTPBasicAuthHandler.http_error_401(req, fp, code, msg, hdrs)
915
916 Retry the request with authentication information, if available.
917
918
919.. _proxy-basic-auth-handler:
920
921ProxyBasicAuthHandler Objects
922-----------------------------
923
924
925.. method:: ProxyBasicAuthHandler.http_error_407(req, fp, code, msg, hdrs)
926
927 Retry the request with authentication information, if available.
928
929
930.. _abstract-digest-auth-handler:
931
932AbstractDigestAuthHandler Objects
933---------------------------------
934
935
936.. method:: AbstractDigestAuthHandler.http_error_auth_reqed(authreq, host, req, headers)
937
938 *authreq* should be the name of the header where the information about the realm
939 is included in the request, *host* should be the host to authenticate to, *req*
940 should be the (failed) :class:`Request` object, and *headers* should be the
941 error headers.
942
943
944.. _http-digest-auth-handler:
945
946HTTPDigestAuthHandler Objects
947-----------------------------
948
949
950.. method:: HTTPDigestAuthHandler.http_error_401(req, fp, code, msg, hdrs)
951
952 Retry the request with authentication information, if available.
953
954
955.. _proxy-digest-auth-handler:
956
957ProxyDigestAuthHandler Objects
958------------------------------
959
960
961.. method:: ProxyDigestAuthHandler.http_error_407(req, fp, code, msg, hdrs)
962
963 Retry the request with authentication information, if available.
964
965
966.. _http-handler-objects:
967
968HTTPHandler Objects
969-------------------
970
971
972.. method:: HTTPHandler.http_open(req)
973
974 Send an HTTP request, which can be either GET or POST, depending on
975 ``req.has_data()``.
976
977
978.. _https-handler-objects:
979
980HTTPSHandler Objects
981--------------------
982
983
984.. method:: HTTPSHandler.https_open(req)
985
986 Send an HTTPS request, which can be either GET or POST, depending on
987 ``req.has_data()``.
988
989
990.. _file-handler-objects:
991
992FileHandler Objects
993-------------------
994
995
996.. method:: FileHandler.file_open(req)
997
998 Open the file locally, if there is no host name, or the host name is
999 ``'localhost'``. Change the protocol to ``ftp`` otherwise, and retry opening it
1000 using :attr:`parent`.
1001
1002
1003.. _ftp-handler-objects:
1004
1005FTPHandler Objects
1006------------------
1007
1008
1009.. method:: FTPHandler.ftp_open(req)
1010
1011 Open the FTP file indicated by *req*. The login is always done with empty
1012 username and password.
1013
1014
1015.. _cacheftp-handler-objects:
1016
1017CacheFTPHandler Objects
1018-----------------------
1019
1020:class:`CacheFTPHandler` objects are :class:`FTPHandler` objects with the
1021following additional methods:
1022
1023
1024.. method:: CacheFTPHandler.setTimeout(t)
1025
1026 Set timeout of connections to *t* seconds.
1027
1028
1029.. method:: CacheFTPHandler.setMaxConns(m)
1030
1031 Set maximum number of cached connections to *m*.
1032
1033
1034.. _unknown-handler-objects:
1035
1036UnknownHandler Objects
1037----------------------
1038
1039
1040.. method:: UnknownHandler.unknown_open()
1041
1042 Raise a :exc:`URLError` exception.
1043
1044
1045.. _http-error-processor-objects:
1046
1047HTTPErrorProcessor Objects
1048--------------------------
1049
Georg Brandl116aa622007-08-15 14:28:22 +00001050.. method:: HTTPErrorProcessor.unknown_open()
1051
1052 Process HTTP error responses.
1053
1054 For 200 error codes, the response object is returned immediately.
1055
1056 For non-200 error codes, this simply passes the job on to the
1057 :meth:`protocol_error_code` handler methods, via :meth:`OpenerDirector.error`.
Georg Brandl0f7ede42008-06-23 11:23:31 +00001058 Eventually, :class:`HTTPDefaultErrorHandler` will raise an
Georg Brandl116aa622007-08-15 14:28:22 +00001059 :exc:`HTTPError` if no other handler handles the error.
1060
Georg Brandl0f7ede42008-06-23 11:23:31 +00001061
1062.. _urllib-request-examples:
Georg Brandl116aa622007-08-15 14:28:22 +00001063
1064Examples
1065--------
1066
1067This example gets the python.org main page and displays the first 100 bytes of
1068it::
1069
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001070 >>> import urllib.request
1071 >>> f = urllib.request.urlopen('http://www.python.org/')
Collin Winterc79461b2007-09-01 23:34:30 +00001072 >>> print(f.read(100))
Georg Brandl116aa622007-08-15 14:28:22 +00001073 <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
1074 <?xml-stylesheet href="./css/ht2html
1075
1076Here we are sending a data-stream to the stdin of a CGI and reading the data it
1077returns to us. Note that this example will only work when the Python
1078installation supports SSL. ::
1079
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001080 >>> import urllib.request
1081 >>> req = urllib.request.Request(url='https://localhost/cgi-bin/test.cgi',
Georg Brandl116aa622007-08-15 14:28:22 +00001082 ... data='This data is passed to stdin of the CGI')
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001083 >>> f = urllib.request.urlopen(req)
Collin Winterc79461b2007-09-01 23:34:30 +00001084 >>> print(f.read())
Georg Brandl116aa622007-08-15 14:28:22 +00001085 Got Data: "This data is passed to stdin of the CGI"
1086
1087The code for the sample CGI used in the above example is::
1088
1089 #!/usr/bin/env python
1090 import sys
1091 data = sys.stdin.read()
Collin Winterc79461b2007-09-01 23:34:30 +00001092 print('Content-type: text-plain\n\nGot Data: "%s"' % data)
Georg Brandl116aa622007-08-15 14:28:22 +00001093
1094Use of Basic HTTP Authentication::
1095
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001096 import urllib.request
Georg Brandl116aa622007-08-15 14:28:22 +00001097 # Create an OpenerDirector with support for Basic HTTP Authentication...
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001098 auth_handler = urllib.request.HTTPBasicAuthHandler()
Georg Brandl116aa622007-08-15 14:28:22 +00001099 auth_handler.add_password(realm='PDQ Application',
1100 uri='https://mahler:8092/site-updates.py',
1101 user='klem',
1102 passwd='kadidd!ehopper')
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001103 opener = urllib.request.build_opener(auth_handler)
Georg Brandl116aa622007-08-15 14:28:22 +00001104 # ...and install it globally so it can be used with urlopen.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001105 urllib.request.install_opener(opener)
1106 urllib.request.urlopen('http://www.example.com/login.html')
Georg Brandl116aa622007-08-15 14:28:22 +00001107
1108:func:`build_opener` provides many handlers by default, including a
1109:class:`ProxyHandler`. By default, :class:`ProxyHandler` uses the environment
1110variables named ``<scheme>_proxy``, where ``<scheme>`` is the URL scheme
1111involved. For example, the :envvar:`http_proxy` environment variable is read to
1112obtain the HTTP proxy's URL.
1113
1114This example replaces the default :class:`ProxyHandler` with one that uses
Georg Brandl2ee470f2008-07-16 12:55:28 +00001115programmatically-supplied proxy URLs, and adds proxy authorization support with
Georg Brandl116aa622007-08-15 14:28:22 +00001116:class:`ProxyBasicAuthHandler`. ::
1117
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001118 proxy_handler = urllib.request.ProxyHandler({'http': 'http://www.example.com:3128/'})
Senthil Kumaran037f8362009-12-24 02:24:37 +00001119 proxy_auth_handler = urllib.request.ProxyBasicAuthHandler()
Georg Brandl116aa622007-08-15 14:28:22 +00001120 proxy_auth_handler.add_password('realm', 'host', 'username', 'password')
1121
Senthil Kumaran037f8362009-12-24 02:24:37 +00001122 opener = urllib.request.build_opener(proxy_handler, proxy_auth_handler)
Georg Brandl116aa622007-08-15 14:28:22 +00001123 # This time, rather than install the OpenerDirector, we use it directly:
1124 opener.open('http://www.example.com/login.html')
1125
1126Adding HTTP headers:
1127
1128Use the *headers* argument to the :class:`Request` constructor, or::
1129
Georg Brandl029986a2008-06-23 11:44:14 +00001130 import urllib.request
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001131 req = urllib.request.Request('http://www.example.com/')
Georg Brandl116aa622007-08-15 14:28:22 +00001132 req.add_header('Referer', 'http://www.python.org/')
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001133 r = urllib.request.urlopen(req)
Georg Brandl116aa622007-08-15 14:28:22 +00001134
1135:class:`OpenerDirector` automatically adds a :mailheader:`User-Agent` header to
1136every :class:`Request`. To change this::
1137
Georg Brandl029986a2008-06-23 11:44:14 +00001138 import urllib.request
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001139 opener = urllib.request.build_opener()
Georg Brandl116aa622007-08-15 14:28:22 +00001140 opener.addheaders = [('User-agent', 'Mozilla/5.0')]
1141 opener.open('http://www.example.com/')
1142
1143Also, remember that a few standard headers (:mailheader:`Content-Length`,
1144:mailheader:`Content-Type` and :mailheader:`Host`) are added when the
1145:class:`Request` is passed to :func:`urlopen` (or :meth:`OpenerDirector.open`).
1146
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001147.. _urllib-examples:
1148
1149Here is an example session that uses the ``GET`` method to retrieve a URL
1150containing parameters::
1151
1152 >>> import urllib.request
1153 >>> import urllib.parse
1154 >>> params = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
1155 >>> f = urllib.request.urlopen("http://www.musi-cal.com/cgi-bin/query?%s" % params)
1156 >>> print(f.read())
1157
1158The following example uses the ``POST`` method instead::
1159
1160 >>> import urllib.request
1161 >>> import urllib.parse
1162 >>> params = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
1163 >>> f = urllib.request.urlopen("http://www.musi-cal.com/cgi-bin/query", params)
1164 >>> print(f.read())
1165
1166The following example uses an explicitly specified HTTP proxy, overriding
1167environment settings::
1168
1169 >>> import urllib.request
1170 >>> proxies = {'http': 'http://proxy.example.com:8080/'}
1171 >>> opener = urllib.request.FancyURLopener(proxies)
1172 >>> f = opener.open("http://www.python.org")
1173 >>> f.read()
1174
1175The following example uses no proxies at all, overriding environment settings::
1176
1177 >>> import urllib.request
1178 >>> opener = urllib.request.FancyURLopener({})
1179 >>> f = opener.open("http://www.python.org/")
1180 >>> f.read()
1181
1182
1183:mod:`urllib.request` Restrictions
1184----------------------------------
1185
1186 .. index::
1187 pair: HTTP; protocol
1188 pair: FTP; protocol
1189
1190* Currently, only the following protocols are supported: HTTP, (versions 0.9 and
1191 1.0), FTP, and local files.
1192
1193* The caching feature of :func:`urlretrieve` has been disabled until I find the
1194 time to hack proper processing of Expiration time headers.
1195
1196* There should be a function to query whether a particular URL is in the cache.
1197
1198* For backward compatibility, if a URL appears to point to a local file but the
1199 file can't be opened, the URL is re-interpreted using the FTP protocol. This
1200 can sometimes cause confusing error messages.
1201
1202* The :func:`urlopen` and :func:`urlretrieve` functions can cause arbitrarily
1203 long delays while waiting for a network connection to be set up. This means
1204 that it is difficult to build an interactive Web client using these functions
1205 without using threads.
1206
1207 .. index::
1208 single: HTML
1209 pair: HTTP; protocol
1210
1211* The data returned by :func:`urlopen` or :func:`urlretrieve` is the raw data
1212 returned by the server. This may be binary data (such as an image), plain text
1213 or (for example) HTML. The HTTP protocol provides type information in the reply
1214 header, which can be inspected by looking at the :mailheader:`Content-Type`
1215 header. If the returned data is HTML, you can use the module
1216 :mod:`html.parser` to parse it.
1217
1218 .. index:: single: FTP
1219
1220* The code handling the FTP protocol cannot differentiate between a file and a
1221 directory. This can lead to unexpected behavior when attempting to read a URL
1222 that points to a file that is not accessible. If the URL ends in a ``/``, it is
1223 assumed to refer to a directory and will be handled accordingly. But if an
1224 attempt to read a file leads to a 550 error (meaning the URL cannot be found or
1225 is not accessible, often for permission reasons), then the path is treated as a
1226 directory in order to handle the case when a directory is specified by a URL but
1227 the trailing ``/`` has been left off. This can cause misleading results when
1228 you try to fetch a file whose read permissions make it inaccessible; the FTP
1229 code will try to read it, fail with a 550 error, and then perform a directory
1230 listing for the unreadable file. If fine-grained control is needed, consider
1231 using the :mod:`ftplib` module, subclassing :class:`FancyURLOpener`, or changing
1232 *_urlopener* to meet your needs.
1233
Georg Brandl0f7ede42008-06-23 11:23:31 +00001234
1235
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001236:mod:`urllib.response` --- Response classes used by urllib.
1237===========================================================
Georg Brandl0f7ede42008-06-23 11:23:31 +00001238
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001239.. module:: urllib.response
1240 :synopsis: Response classes used by urllib.
1241
1242The :mod:`urllib.response` module defines functions and classes which define a
Georg Brandl0f7ede42008-06-23 11:23:31 +00001243minimal file like interface, including ``read()`` and ``readline()``. The
1244typical response object is an addinfourl instance, which defines and ``info()``
1245method and that returns headers and a ``geturl()`` method that returns the url.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001246Functions defined by this module are used internally by the
1247:mod:`urllib.request` module.
1248