blob: b05641c7ae89f5eb60946999482fc95cc0822d65 [file] [log] [blame]
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001:mod:`urllib.request` --- extensible library for opening URLs
2=============================================================
Georg Brandl116aa622007-08-15 14:28:22 +00003
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00004.. module:: urllib.request
Georg Brandl116aa622007-08-15 14:28:22 +00005 :synopsis: Next generation URL opening library.
Jeremy Hyltone2573162009-03-31 14:38:13 +00006.. moduleauthor:: Jeremy Hylton <jeremy@alum.mit.edu>
Georg Brandl116aa622007-08-15 14:28:22 +00007.. sectionauthor:: Moshe Zadka <moshez@users.sourceforge.net>
8
9
Georg Brandl0f7ede42008-06-23 11:23:31 +000010The :mod:`urllib.request` module defines functions and classes which help in
11opening URLs (mostly HTTP) in a complex world --- basic and digest
12authentication, redirections, cookies and more.
Georg Brandl116aa622007-08-15 14:28:22 +000013
Senthil Kumaranaca8fd72008-06-23 04:41:59 +000014The :mod:`urllib.request` module defines the following functions:
Georg Brandl116aa622007-08-15 14:28:22 +000015
16
Georg Brandlb044b2a2009-09-16 16:05:59 +000017.. function:: urlopen(url, data=None[, timeout])
Georg Brandl116aa622007-08-15 14:28:22 +000018
Jeremy Hyltone2573162009-03-31 14:38:13 +000019 Open the URL *url*, which can be either a string or a
20 :class:`Request` object.
Georg Brandl116aa622007-08-15 14:28:22 +000021
Jeremy Hyltone2573162009-03-31 14:38:13 +000022 *data* may be a string specifying additional data to send to the
23 server, or ``None`` if no such data is needed. Currently HTTP
24 requests are the only ones that use *data*; the HTTP request will
25 be a POST instead of a GET when the *data* parameter is provided.
26 *data* should be a buffer in the standard
Georg Brandl116aa622007-08-15 14:28:22 +000027 :mimetype:`application/x-www-form-urlencoded` format. The
Georg Brandl7fe2c4a2008-12-05 07:32:56 +000028 :func:`urllib.parse.urlencode` function takes a mapping or sequence
29 of 2-tuples and returns a string in this format.
Georg Brandl116aa622007-08-15 14:28:22 +000030
Jeremy Hyltone2573162009-03-31 14:38:13 +000031 The optional *timeout* parameter specifies a timeout in seconds for
32 blocking operations like the connection attempt (if not specified,
33 the global default timeout setting will be used). This actually
34 only works for HTTP, HTTPS, FTP and FTPS connections.
Georg Brandl116aa622007-08-15 14:28:22 +000035
Senthil Kumaranaca8fd72008-06-23 04:41:59 +000036 This function returns a file-like object with two additional methods from
37 the :mod:`urllib.response` module
Georg Brandl116aa622007-08-15 14:28:22 +000038
Jeremy Hyltone2573162009-03-31 14:38:13 +000039 * :meth:`geturl` --- return the URL of the resource retrieved,
40 commonly used to determine if a redirect was followed
Georg Brandl116aa622007-08-15 14:28:22 +000041
Georg Brandl2dd01042009-02-27 16:46:46 +000042 * :meth:`info` --- return the meta-information of the page, such as headers,
Benjamin Petersond23f8222009-04-05 19:13:16 +000043 in the form of an :class:`http.client.HTTPMessage` instance (see `Quick
Georg Brandl2dd01042009-02-27 16:46:46 +000044 Reference to HTTP Headers <http://www.cs.tut.fi/~jkorpela/http.html>`_)
Georg Brandl116aa622007-08-15 14:28:22 +000045
46 Raises :exc:`URLError` on errors.
47
Georg Brandl2dd01042009-02-27 16:46:46 +000048 Note that ``None`` may be returned if no handler handles the request (though
49 the default installed global :class:`OpenerDirector` uses
50 :class:`UnknownHandler` to ensure this never happens).
51
52 The legacy ``urllib.urlopen`` function from Python 2.6 and earlier has been
53 discontinued; :func:`urlopen` corresponds to the old ``urllib2.urlopen``.
54 Proxy handling, which was done by passing a dictionary parameter to
55 ``urllib.urlopen``, can be obtained by using :class:`ProxyHandler` objects.
Georg Brandl116aa622007-08-15 14:28:22 +000056
Georg Brandl116aa622007-08-15 14:28:22 +000057.. function:: install_opener(opener)
58
59 Install an :class:`OpenerDirector` instance as the default global opener.
60 Installing an opener is only necessary if you want urlopen to use that opener;
61 otherwise, simply call :meth:`OpenerDirector.open` instead of :func:`urlopen`.
62 The code does not check for a real :class:`OpenerDirector`, and any class with
63 the appropriate interface will work.
64
65
66.. function:: build_opener([handler, ...])
67
68 Return an :class:`OpenerDirector` instance, which chains the handlers in the
69 order given. *handler*\s can be either instances of :class:`BaseHandler`, or
70 subclasses of :class:`BaseHandler` (in which case it must be possible to call
71 the constructor without any parameters). Instances of the following classes
72 will be in front of the *handler*\s, unless the *handler*\s contain them,
73 instances of them or subclasses of them: :class:`ProxyHandler`,
74 :class:`UnknownHandler`, :class:`HTTPHandler`, :class:`HTTPDefaultErrorHandler`,
75 :class:`HTTPRedirectHandler`, :class:`FTPHandler`, :class:`FileHandler`,
76 :class:`HTTPErrorProcessor`.
77
Georg Brandlb044b2a2009-09-16 16:05:59 +000078 If the Python installation has SSL support (i.e., if the :mod:`ssl` module
79 can be imported), :class:`HTTPSHandler` will also be added.
Georg Brandl116aa622007-08-15 14:28:22 +000080
Georg Brandle6bcc912008-05-12 18:05:20 +000081 A :class:`BaseHandler` subclass may also change its :attr:`handler_order`
82 member variable to modify its position in the handlers list.
Georg Brandl116aa622007-08-15 14:28:22 +000083
Georg Brandlb044b2a2009-09-16 16:05:59 +000084
85.. function:: urlretrieve(url, filename=None, reporthook=None, data=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +000086
87 Copy a network object denoted by a URL to a local file, if necessary. If the URL
88 points to a local file, or a valid cached copy of the object exists, the object
89 is not copied. Return a tuple ``(filename, headers)`` where *filename* is the
90 local file name under which the object can be found, and *headers* is whatever
91 the :meth:`info` method of the object returned by :func:`urlopen` returned (for
92 a remote object, possibly cached). Exceptions are the same as for
93 :func:`urlopen`.
94
95 The second argument, if present, specifies the file location to copy to (if
96 absent, the location will be a tempfile with a generated name). The third
97 argument, if present, is a hook function that will be called once on
98 establishment of the network connection and once after each block read
99 thereafter. The hook will be passed three arguments; a count of blocks
100 transferred so far, a block size in bytes, and the total size of the file. The
101 third argument may be ``-1`` on older FTP servers which do not return a file
102 size in response to a retrieval request.
103
104 If the *url* uses the :file:`http:` scheme identifier, the optional *data*
105 argument may be given to specify a ``POST`` request (normally the request type
106 is ``GET``). The *data* argument must in standard
107 :mimetype:`application/x-www-form-urlencoded` format; see the :func:`urlencode`
108 function below.
109
110 :func:`urlretrieve` will raise :exc:`ContentTooShortError` when it detects that
111 the amount of data available was less than the expected amount (which is the
112 size reported by a *Content-Length* header). This can occur, for example, when
113 the download is interrupted.
114
115 The *Content-Length* is treated as a lower bound: if there's more data to read,
116 urlretrieve reads more data, but if less data is available, it raises the
117 exception.
118
119 You can still retrieve the downloaded data in this case, it is stored in the
120 :attr:`content` attribute of the exception instance.
121
122 If no *Content-Length* header was supplied, urlretrieve can not check the size
123 of the data it has downloaded, and just returns it. In this case you just have
124 to assume that the download was successful.
Georg Brandl116aa622007-08-15 14:28:22 +0000125
126
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000127.. data:: _urlopener
Georg Brandl116aa622007-08-15 14:28:22 +0000128
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000129 The public functions :func:`urlopen` and :func:`urlretrieve` create an instance
130 of the :class:`FancyURLopener` class and use it to perform their requested
131 actions. To override this functionality, programmers can create a subclass of
132 :class:`URLopener` or :class:`FancyURLopener`, then assign an instance of that
133 class to the ``urllib._urlopener`` variable before calling the desired function.
134 For example, applications may want to specify a different
135 :mailheader:`User-Agent` header than :class:`URLopener` defines. This can be
136 accomplished with the following code::
Georg Brandl116aa622007-08-15 14:28:22 +0000137
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000138 import urllib.request
Christian Heimes292d3512008-02-03 16:51:08 +0000139
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000140 class AppURLopener(urllib.request.FancyURLopener):
141 version = "App/1.7"
142
143 urllib._urlopener = AppURLopener()
Christian Heimes292d3512008-02-03 16:51:08 +0000144
Georg Brandl116aa622007-08-15 14:28:22 +0000145
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000146.. function:: urlcleanup()
Georg Brandl116aa622007-08-15 14:28:22 +0000147
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000148 Clear the cache that may have been built up by previous calls to
149 :func:`urlretrieve`.
Christian Heimes292d3512008-02-03 16:51:08 +0000150
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000151.. function:: pathname2url(path)
Christian Heimes292d3512008-02-03 16:51:08 +0000152
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000153 Convert the pathname *path* from the local syntax for a path to the form used in
154 the path component of a URL. This does not produce a complete URL. The return
155 value will already be quoted using the :func:`quote` function.
Christian Heimes292d3512008-02-03 16:51:08 +0000156
157
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000158.. function:: url2pathname(path)
159
160 Convert the path component *path* from an encoded URL to the local syntax for a
161 path. This does not accept a complete URL. This function uses :func:`unquote`
162 to decode *path*.
Georg Brandl116aa622007-08-15 14:28:22 +0000163
Georg Brandlb044b2a2009-09-16 16:05:59 +0000164
Georg Brandl116aa622007-08-15 14:28:22 +0000165The following classes are provided:
166
Georg Brandlb044b2a2009-09-16 16:05:59 +0000167.. class:: Request(url, data=None, headers={}, origin_req_host=None, unverifiable=False)
Georg Brandl116aa622007-08-15 14:28:22 +0000168
169 This class is an abstraction of a URL request.
170
171 *url* should be a string containing a valid URL.
172
Jeremy Hyltone2573162009-03-31 14:38:13 +0000173 *data* may be a string specifying additional data to send to the
174 server, or ``None`` if no such data is needed. Currently HTTP
175 requests are the only ones that use *data*; the HTTP request will
176 be a POST instead of a GET when the *data* parameter is provided.
177 *data* should be a buffer in the standard
Georg Brandl116aa622007-08-15 14:28:22 +0000178 :mimetype:`application/x-www-form-urlencoded` format. The
Georg Brandl7fe2c4a2008-12-05 07:32:56 +0000179 :func:`urllib.parse.urlencode` function takes a mapping or sequence
180 of 2-tuples and returns a string in this format.
Georg Brandl116aa622007-08-15 14:28:22 +0000181
Jeremy Hyltone2573162009-03-31 14:38:13 +0000182 *headers* should be a dictionary, and will be treated as if
183 :meth:`add_header` was called with each key and value as arguments.
184 This is often used to "spoof" the ``User-Agent`` header, which is
185 used by a browser to identify itself -- some HTTP servers only
186 allow requests coming from common browsers as opposed to scripts.
187 For example, Mozilla Firefox may identify itself as ``"Mozilla/5.0
188 (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11"``, while
189 :mod:`urllib`'s default user agent string is
190 ``"Python-urllib/2.6"`` (on Python 2.6).
Georg Brandl116aa622007-08-15 14:28:22 +0000191
Jeremy Hyltone2573162009-03-31 14:38:13 +0000192 The final two arguments are only of interest for correct handling
193 of third-party HTTP cookies:
Georg Brandl116aa622007-08-15 14:28:22 +0000194
Jeremy Hyltone2573162009-03-31 14:38:13 +0000195 *origin_req_host* should be the request-host of the origin
196 transaction, as defined by :rfc:`2965`. It defaults to
197 ``http.cookiejar.request_host(self)``. This is the host name or IP
198 address of the original request that was initiated by the user.
199 For example, if the request is for an image in an HTML document,
200 this should be the request-host of the request for the page
Georg Brandl24420152008-05-26 16:32:26 +0000201 containing the image.
Georg Brandl116aa622007-08-15 14:28:22 +0000202
Jeremy Hyltone2573162009-03-31 14:38:13 +0000203 *unverifiable* should indicate whether the request is unverifiable,
204 as defined by RFC 2965. It defaults to False. An unverifiable
205 request is one whose URL the user did not have the option to
206 approve. For example, if the request is for an image in an HTML
207 document, and the user had no option to approve the automatic
208 fetching of the image, this should be true.
Georg Brandl116aa622007-08-15 14:28:22 +0000209
Georg Brandlb044b2a2009-09-16 16:05:59 +0000210
211.. class:: URLopener(proxies=None, **x509)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000212
213 Base class for opening and reading URLs. Unless you need to support opening
214 objects using schemes other than :file:`http:`, :file:`ftp:`, or :file:`file:`,
215 you probably want to use :class:`FancyURLopener`.
216
217 By default, the :class:`URLopener` class sends a :mailheader:`User-Agent` header
218 of ``urllib/VVV``, where *VVV* is the :mod:`urllib` version number.
219 Applications can define their own :mailheader:`User-Agent` header by subclassing
220 :class:`URLopener` or :class:`FancyURLopener` and setting the class attribute
221 :attr:`version` to an appropriate string value in the subclass definition.
222
223 The optional *proxies* parameter should be a dictionary mapping scheme names to
224 proxy URLs, where an empty dictionary turns proxies off completely. Its default
225 value is ``None``, in which case environmental proxy settings will be used if
226 present, as discussed in the definition of :func:`urlopen`, above.
227
228 Additional keyword parameters, collected in *x509*, may be used for
229 authentication of the client when using the :file:`https:` scheme. The keywords
230 *key_file* and *cert_file* are supported to provide an SSL key and certificate;
231 both are needed to support client authentication.
232
233 :class:`URLopener` objects will raise an :exc:`IOError` exception if the server
234 returns an error code.
235
Georg Brandlb044b2a2009-09-16 16:05:59 +0000236 .. method:: open(fullurl, data=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000237
238 Open *fullurl* using the appropriate protocol. This method sets up cache and
239 proxy information, then calls the appropriate open method with its input
240 arguments. If the scheme is not recognized, :meth:`open_unknown` is called.
241 The *data* argument has the same meaning as the *data* argument of
242 :func:`urlopen`.
243
244
Georg Brandlb044b2a2009-09-16 16:05:59 +0000245 .. method:: open_unknown(fullurl, data=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000246
247 Overridable interface to open unknown URL types.
248
249
Georg Brandlb044b2a2009-09-16 16:05:59 +0000250 .. method:: retrieve(url, filename=None, reporthook=None, data=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000251
252 Retrieves the contents of *url* and places it in *filename*. The return value
253 is a tuple consisting of a local filename and either a
254 :class:`email.message.Message` object containing the response headers (for remote
255 URLs) or ``None`` (for local URLs). The caller must then open and read the
256 contents of *filename*. If *filename* is not given and the URL refers to a
257 local file, the input filename is returned. If the URL is non-local and
258 *filename* is not given, the filename is the output of :func:`tempfile.mktemp`
259 with a suffix that matches the suffix of the last path component of the input
260 URL. If *reporthook* is given, it must be a function accepting three numeric
261 parameters. It will be called after each chunk of data is read from the
262 network. *reporthook* is ignored for local URLs.
263
264 If the *url* uses the :file:`http:` scheme identifier, the optional *data*
265 argument may be given to specify a ``POST`` request (normally the request type
266 is ``GET``). The *data* argument must in standard
267 :mimetype:`application/x-www-form-urlencoded` format; see the :func:`urlencode`
268 function below.
269
270
271 .. attribute:: version
272
273 Variable that specifies the user agent of the opener object. To get
274 :mod:`urllib` to tell servers that it is a particular user agent, set this in a
275 subclass as a class variable or in the constructor before calling the base
276 constructor.
277
278
279.. class:: FancyURLopener(...)
280
281 :class:`FancyURLopener` subclasses :class:`URLopener` providing default handling
282 for the following HTTP response codes: 301, 302, 303, 307 and 401. For the 30x
283 response codes listed above, the :mailheader:`Location` header is used to fetch
284 the actual URL. For 401 response codes (authentication required), basic HTTP
285 authentication is performed. For the 30x response codes, recursion is bounded
286 by the value of the *maxtries* attribute, which defaults to 10.
287
288 For all other response codes, the method :meth:`http_error_default` is called
289 which you can override in subclasses to handle the error appropriately.
290
291 .. note::
292
293 According to the letter of :rfc:`2616`, 301 and 302 responses to POST requests
294 must not be automatically redirected without confirmation by the user. In
295 reality, browsers do allow automatic redirection of these responses, changing
296 the POST to a GET, and :mod:`urllib` reproduces this behaviour.
297
298 The parameters to the constructor are the same as those for :class:`URLopener`.
299
300 .. note::
301
302 When performing basic authentication, a :class:`FancyURLopener` instance calls
303 its :meth:`prompt_user_passwd` method. The default implementation asks the
304 users for the required information on the controlling terminal. A subclass may
305 override this method to support more appropriate behavior if needed.
306
307 The :class:`FancyURLopener` class offers one additional method that should be
308 overloaded to provide the appropriate behavior:
309
310 .. method:: prompt_user_passwd(host, realm)
311
312 Return information needed to authenticate the user at the given host in the
313 specified security realm. The return value should be a tuple, ``(user,
314 password)``, which can be used for basic authentication.
315
316 The implementation prompts for this information on the terminal; an application
317 should override this method to use an appropriate interaction model in the local
318 environment.
Georg Brandl116aa622007-08-15 14:28:22 +0000319
320.. class:: OpenerDirector()
321
322 The :class:`OpenerDirector` class opens URLs via :class:`BaseHandler`\ s chained
323 together. It manages the chaining of handlers, and recovery from errors.
324
325
326.. class:: BaseHandler()
327
328 This is the base class for all registered handlers --- and handles only the
329 simple mechanics of registration.
330
331
332.. class:: HTTPDefaultErrorHandler()
333
334 A class which defines a default handler for HTTP error responses; all responses
335 are turned into :exc:`HTTPError` exceptions.
336
337
338.. class:: HTTPRedirectHandler()
339
340 A class to handle redirections.
341
342
Georg Brandlb044b2a2009-09-16 16:05:59 +0000343.. class:: HTTPCookieProcessor(cookiejar=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000344
345 A class to handle HTTP Cookies.
346
347
Georg Brandlb044b2a2009-09-16 16:05:59 +0000348.. class:: ProxyHandler(proxies=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000349
350 Cause requests to go through a proxy. If *proxies* is given, it must be a
351 dictionary mapping protocol names to URLs of proxies. The default is to read the
352 list of proxies from the environment variables :envvar:`<protocol>_proxy`.
Christian Heimese25f35e2008-03-20 10:49:03 +0000353 To disable autodetected proxy pass an empty dictionary.
Georg Brandl116aa622007-08-15 14:28:22 +0000354
355
356.. class:: HTTPPasswordMgr()
357
358 Keep a database of ``(realm, uri) -> (user, password)`` mappings.
359
360
361.. class:: HTTPPasswordMgrWithDefaultRealm()
362
363 Keep a database of ``(realm, uri) -> (user, password)`` mappings. A realm of
364 ``None`` is considered a catch-all realm, which is searched if no other realm
365 fits.
366
367
Georg Brandlb044b2a2009-09-16 16:05:59 +0000368.. class:: AbstractBasicAuthHandler(password_mgr=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000369
370 This is a mixin class that helps with HTTP authentication, both to the remote
371 host and to a proxy. *password_mgr*, if given, should be something that is
372 compatible with :class:`HTTPPasswordMgr`; refer to section
373 :ref:`http-password-mgr` for information on the interface that must be
374 supported.
375
376
Georg Brandlb044b2a2009-09-16 16:05:59 +0000377.. class:: HTTPBasicAuthHandler(password_mgr=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000378
379 Handle authentication with the remote host. *password_mgr*, if given, should be
380 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
381 :ref:`http-password-mgr` for information on the interface that must be
382 supported.
383
384
Georg Brandlb044b2a2009-09-16 16:05:59 +0000385.. class:: ProxyBasicAuthHandler(password_mgr=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000386
387 Handle authentication with the proxy. *password_mgr*, if given, should be
388 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
389 :ref:`http-password-mgr` for information on the interface that must be
390 supported.
391
392
Georg Brandlb044b2a2009-09-16 16:05:59 +0000393.. class:: AbstractDigestAuthHandler(password_mgr=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000394
395 This is a mixin class that helps with HTTP authentication, both to the remote
396 host and to a proxy. *password_mgr*, if given, should be something that is
397 compatible with :class:`HTTPPasswordMgr`; refer to section
398 :ref:`http-password-mgr` for information on the interface that must be
399 supported.
400
401
Georg Brandlb044b2a2009-09-16 16:05:59 +0000402.. class:: HTTPDigestAuthHandler(password_mgr=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000403
404 Handle authentication with the remote host. *password_mgr*, if given, should be
405 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
406 :ref:`http-password-mgr` for information on the interface that must be
407 supported.
408
409
Georg Brandlb044b2a2009-09-16 16:05:59 +0000410.. class:: ProxyDigestAuthHandler(password_mgr=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000411
412 Handle authentication with the proxy. *password_mgr*, if given, should be
413 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
414 :ref:`http-password-mgr` for information on the interface that must be
415 supported.
416
417
418.. class:: HTTPHandler()
419
420 A class to handle opening of HTTP URLs.
421
422
423.. class:: HTTPSHandler()
424
425 A class to handle opening of HTTPS URLs.
426
427
428.. class:: FileHandler()
429
430 Open local files.
431
432
433.. class:: FTPHandler()
434
435 Open FTP URLs.
436
437
438.. class:: CacheFTPHandler()
439
440 Open FTP URLs, keeping a cache of open FTP connections to minimize delays.
441
442
443.. class:: UnknownHandler()
444
445 A catch-all class to handle unknown URLs.
446
447
448.. _request-objects:
449
450Request Objects
451---------------
452
Jeremy Hyltone2573162009-03-31 14:38:13 +0000453The following methods describe :class:`Request`'s public interface,
454and so all may be overridden in subclasses. It also defines several
455public attributes that can be used by clients to inspect the parsed
456request.
Georg Brandl116aa622007-08-15 14:28:22 +0000457
Jeremy Hyltone2573162009-03-31 14:38:13 +0000458.. attribute:: Request.full_url
459
460 The original URL passed to the constructor.
461
462.. attribute:: Request.type
463
464 The URI scheme.
465
466.. attribute:: Request.host
467
468 The URI authority, typically a host, but may also contain a port
469 separated by a colon.
470
471.. attribute:: Request.origin_req_host
472
473 The original host for the request, without port.
474
475.. attribute:: Request.selector
476
477 The URI path. If the :class:`Request` uses a proxy, then selector
478 will be the full url that is passed to the proxy.
479
480.. attribute:: Request.data
481
482 The entity body for the request, or None if not specified.
483
484.. attribute:: Request.unverifiable
485
486 boolean, indicates whether the request is unverifiable as defined
487 by RFC 2965.
Georg Brandl116aa622007-08-15 14:28:22 +0000488
489.. method:: Request.add_data(data)
490
491 Set the :class:`Request` data to *data*. This is ignored by all handlers except
492 HTTP handlers --- and there it should be a byte string, and will change the
493 request to be ``POST`` rather than ``GET``.
494
495
496.. method:: Request.get_method()
497
498 Return a string indicating the HTTP request method. This is only meaningful for
499 HTTP requests, and currently always returns ``'GET'`` or ``'POST'``.
500
501
502.. method:: Request.has_data()
503
504 Return whether the instance has a non-\ ``None`` data.
505
506
507.. method:: Request.get_data()
508
509 Return the instance's data.
510
511
512.. method:: Request.add_header(key, val)
513
514 Add another header to the request. Headers are currently ignored by all
515 handlers except HTTP handlers, where they are added to the list of headers sent
516 to the server. Note that there cannot be more than one header with the same
517 name, and later calls will overwrite previous calls in case the *key* collides.
518 Currently, this is no loss of HTTP functionality, since all headers which have
519 meaning when used more than once have a (header-specific) way of gaining the
520 same functionality using only one header.
521
522
523.. method:: Request.add_unredirected_header(key, header)
524
525 Add a header that will not be added to a redirected request.
526
Georg Brandl116aa622007-08-15 14:28:22 +0000527
528.. method:: Request.has_header(header)
529
530 Return whether the instance has the named header (checks both regular and
531 unredirected).
532
Georg Brandl116aa622007-08-15 14:28:22 +0000533
534.. method:: Request.get_full_url()
535
536 Return the URL given in the constructor.
537
538
539.. method:: Request.get_type()
540
541 Return the type of the URL --- also known as the scheme.
542
543
544.. method:: Request.get_host()
545
546 Return the host to which a connection will be made.
547
548
549.. method:: Request.get_selector()
550
551 Return the selector --- the part of the URL that is sent to the server.
552
553
554.. method:: Request.set_proxy(host, type)
555
556 Prepare the request by connecting to a proxy server. The *host* and *type* will
557 replace those of the instance, and the instance's selector will be the original
558 URL given in the constructor.
559
560
561.. method:: Request.get_origin_req_host()
562
563 Return the request-host of the origin transaction, as defined by :rfc:`2965`.
564 See the documentation for the :class:`Request` constructor.
565
566
567.. method:: Request.is_unverifiable()
568
569 Return whether the request is unverifiable, as defined by RFC 2965. See the
570 documentation for the :class:`Request` constructor.
571
572
573.. _opener-director-objects:
574
575OpenerDirector Objects
576----------------------
577
578:class:`OpenerDirector` instances have the following methods:
579
580
581.. method:: OpenerDirector.add_handler(handler)
582
583 *handler* should be an instance of :class:`BaseHandler`. The following methods
584 are searched, and added to the possible chains (note that HTTP errors are a
585 special case).
586
587 * :meth:`protocol_open` --- signal that the handler knows how to open *protocol*
588 URLs.
589
590 * :meth:`http_error_type` --- signal that the handler knows how to handle HTTP
591 errors with HTTP error code *type*.
592
593 * :meth:`protocol_error` --- signal that the handler knows how to handle errors
594 from (non-\ ``http``) *protocol*.
595
596 * :meth:`protocol_request` --- signal that the handler knows how to pre-process
597 *protocol* requests.
598
599 * :meth:`protocol_response` --- signal that the handler knows how to
600 post-process *protocol* responses.
601
602
Georg Brandlb044b2a2009-09-16 16:05:59 +0000603.. method:: OpenerDirector.open(url, data=None[, timeout])
Georg Brandl116aa622007-08-15 14:28:22 +0000604
605 Open the given *url* (which can be a request object or a string), optionally
Alexandre Vassalotti5f8ced22008-05-16 00:03:33 +0000606 passing the given *data*. Arguments, return values and exceptions raised are
607 the same as those of :func:`urlopen` (which simply calls the :meth:`open`
608 method on the currently installed global :class:`OpenerDirector`). The
609 optional *timeout* parameter specifies a timeout in seconds for blocking
Georg Brandlf78e02b2008-06-10 17:40:04 +0000610 operations like the connection attempt (if not specified, the global default
611 timeout setting will be usedi). The timeout feature actually works only for
612 HTTP, HTTPS, FTP and FTPS connections).
Georg Brandl116aa622007-08-15 14:28:22 +0000613
Georg Brandl116aa622007-08-15 14:28:22 +0000614
Georg Brandlb044b2a2009-09-16 16:05:59 +0000615.. method:: OpenerDirector.error(proto, *args)
Georg Brandl116aa622007-08-15 14:28:22 +0000616
617 Handle an error of the given protocol. This will call the registered error
618 handlers for the given protocol with the given arguments (which are protocol
619 specific). The HTTP protocol is a special case which uses the HTTP response
620 code to determine the specific error handler; refer to the :meth:`http_error_\*`
621 methods of the handler classes.
622
623 Return values and exceptions raised are the same as those of :func:`urlopen`.
624
625OpenerDirector objects open URLs in three stages:
626
627The order in which these methods are called within each stage is determined by
628sorting the handler instances.
629
630#. Every handler with a method named like :meth:`protocol_request` has that
631 method called to pre-process the request.
632
633#. Handlers with a method named like :meth:`protocol_open` are called to handle
634 the request. This stage ends when a handler either returns a non-\ :const:`None`
635 value (ie. a response), or raises an exception (usually :exc:`URLError`).
636 Exceptions are allowed to propagate.
637
638 In fact, the above algorithm is first tried for methods named
639 :meth:`default_open`. If all such methods return :const:`None`, the algorithm
640 is repeated for methods named like :meth:`protocol_open`. If all such methods
641 return :const:`None`, the algorithm is repeated for methods named
642 :meth:`unknown_open`.
643
644 Note that the implementation of these methods may involve calls of the parent
645 :class:`OpenerDirector` instance's :meth:`.open` and :meth:`.error` methods.
646
647#. Every handler with a method named like :meth:`protocol_response` has that
648 method called to post-process the response.
649
650
651.. _base-handler-objects:
652
653BaseHandler Objects
654-------------------
655
656:class:`BaseHandler` objects provide a couple of methods that are directly
657useful, and others that are meant to be used by derived classes. These are
658intended for direct use:
659
660
661.. method:: BaseHandler.add_parent(director)
662
663 Add a director as parent.
664
665
666.. method:: BaseHandler.close()
667
668 Remove any parents.
669
670The following members and methods should only be used by classes derived from
671:class:`BaseHandler`.
672
673.. note::
674
675 The convention has been adopted that subclasses defining
676 :meth:`protocol_request` or :meth:`protocol_response` methods are named
677 :class:`\*Processor`; all others are named :class:`\*Handler`.
678
679
680.. attribute:: BaseHandler.parent
681
682 A valid :class:`OpenerDirector`, which can be used to open using a different
683 protocol, or handle errors.
684
685
686.. method:: BaseHandler.default_open(req)
687
688 This method is *not* defined in :class:`BaseHandler`, but subclasses should
689 define it if they want to catch all URLs.
690
691 This method, if implemented, will be called by the parent
692 :class:`OpenerDirector`. It should return a file-like object as described in
693 the return value of the :meth:`open` of :class:`OpenerDirector`, or ``None``.
694 It should raise :exc:`URLError`, unless a truly exceptional thing happens (for
695 example, :exc:`MemoryError` should not be mapped to :exc:`URLError`).
696
697 This method will be called before any protocol-specific open method.
698
699
700.. method:: BaseHandler.protocol_open(req)
701 :noindex:
702
703 This method is *not* defined in :class:`BaseHandler`, but subclasses should
704 define it if they want to handle URLs with the given protocol.
705
706 This method, if defined, will be called by the parent :class:`OpenerDirector`.
707 Return values should be the same as for :meth:`default_open`.
708
709
710.. method:: BaseHandler.unknown_open(req)
711
712 This method is *not* defined in :class:`BaseHandler`, but subclasses should
713 define it if they want to catch all URLs with no specific registered handler to
714 open it.
715
716 This method, if implemented, will be called by the :attr:`parent`
717 :class:`OpenerDirector`. Return values should be the same as for
718 :meth:`default_open`.
719
720
721.. method:: BaseHandler.http_error_default(req, fp, code, msg, hdrs)
722
723 This method is *not* defined in :class:`BaseHandler`, but subclasses should
724 override it if they intend to provide a catch-all for otherwise unhandled HTTP
725 errors. It will be called automatically by the :class:`OpenerDirector` getting
726 the error, and should not normally be called in other circumstances.
727
728 *req* will be a :class:`Request` object, *fp* will be a file-like object with
729 the HTTP error body, *code* will be the three-digit code of the error, *msg*
730 will be the user-visible explanation of the code and *hdrs* will be a mapping
731 object with the headers of the error.
732
733 Return values and exceptions raised should be the same as those of
734 :func:`urlopen`.
735
736
737.. method:: BaseHandler.http_error_nnn(req, fp, code, msg, hdrs)
738
739 *nnn* should be a three-digit HTTP error code. This method is also not defined
740 in :class:`BaseHandler`, but will be called, if it exists, on an instance of a
741 subclass, when an HTTP error with code *nnn* occurs.
742
743 Subclasses should override this method to handle specific HTTP errors.
744
745 Arguments, return values and exceptions raised should be the same as for
746 :meth:`http_error_default`.
747
748
749.. method:: BaseHandler.protocol_request(req)
750 :noindex:
751
752 This method is *not* defined in :class:`BaseHandler`, but subclasses should
753 define it if they want to pre-process requests of the given protocol.
754
755 This method, if defined, will be called by the parent :class:`OpenerDirector`.
756 *req* will be a :class:`Request` object. The return value should be a
757 :class:`Request` object.
758
759
760.. method:: BaseHandler.protocol_response(req, response)
761 :noindex:
762
763 This method is *not* defined in :class:`BaseHandler`, but subclasses should
764 define it if they want to post-process responses of the given protocol.
765
766 This method, if defined, will be called by the parent :class:`OpenerDirector`.
767 *req* will be a :class:`Request` object. *response* will be an object
768 implementing the same interface as the return value of :func:`urlopen`. The
769 return value should implement the same interface as the return value of
770 :func:`urlopen`.
771
772
773.. _http-redirect-handler:
774
775HTTPRedirectHandler Objects
776---------------------------
777
778.. note::
779
780 Some HTTP redirections require action from this module's client code. If this
781 is the case, :exc:`HTTPError` is raised. See :rfc:`2616` for details of the
782 precise meanings of the various redirection codes.
783
784
Georg Brandl9617a592009-02-13 10:40:43 +0000785.. method:: HTTPRedirectHandler.redirect_request(req, fp, code, msg, hdrs, newurl)
Georg Brandl116aa622007-08-15 14:28:22 +0000786
787 Return a :class:`Request` or ``None`` in response to a redirect. This is called
788 by the default implementations of the :meth:`http_error_30\*` methods when a
789 redirection is received from the server. If a redirection should take place,
790 return a new :class:`Request` to allow :meth:`http_error_30\*` to perform the
Georg Brandl9617a592009-02-13 10:40:43 +0000791 redirect to *newurl*. Otherwise, raise :exc:`HTTPError` if no other handler
792 should try to handle this URL, or return ``None`` if you can't but another
793 handler might.
Georg Brandl116aa622007-08-15 14:28:22 +0000794
795 .. note::
796
797 The default implementation of this method does not strictly follow :rfc:`2616`,
798 which says that 301 and 302 responses to ``POST`` requests must not be
799 automatically redirected without confirmation by the user. In reality, browsers
800 do allow automatic redirection of these responses, changing the POST to a
801 ``GET``, and the default implementation reproduces this behavior.
802
803
804.. method:: HTTPRedirectHandler.http_error_301(req, fp, code, msg, hdrs)
805
Georg Brandl9617a592009-02-13 10:40:43 +0000806 Redirect to the ``Location:`` or ``URI:`` URL. This method is called by the
807 parent :class:`OpenerDirector` when getting an HTTP 'moved permanently' response.
Georg Brandl116aa622007-08-15 14:28:22 +0000808
809
810.. method:: HTTPRedirectHandler.http_error_302(req, fp, code, msg, hdrs)
811
812 The same as :meth:`http_error_301`, but called for the 'found' response.
813
814
815.. method:: HTTPRedirectHandler.http_error_303(req, fp, code, msg, hdrs)
816
817 The same as :meth:`http_error_301`, but called for the 'see other' response.
818
819
820.. method:: HTTPRedirectHandler.http_error_307(req, fp, code, msg, hdrs)
821
822 The same as :meth:`http_error_301`, but called for the 'temporary redirect'
823 response.
824
825
826.. _http-cookie-processor:
827
828HTTPCookieProcessor Objects
829---------------------------
830
Georg Brandl116aa622007-08-15 14:28:22 +0000831:class:`HTTPCookieProcessor` instances have one attribute:
832
Georg Brandl116aa622007-08-15 14:28:22 +0000833.. attribute:: HTTPCookieProcessor.cookiejar
834
Georg Brandl24420152008-05-26 16:32:26 +0000835 The :class:`http.cookiejar.CookieJar` in which cookies are stored.
Georg Brandl116aa622007-08-15 14:28:22 +0000836
837
838.. _proxy-handler:
839
840ProxyHandler Objects
841--------------------
842
843
844.. method:: ProxyHandler.protocol_open(request)
845 :noindex:
846
847 The :class:`ProxyHandler` will have a method :meth:`protocol_open` for every
848 *protocol* which has a proxy in the *proxies* dictionary given in the
849 constructor. The method will modify requests to go through the proxy, by
850 calling ``request.set_proxy()``, and call the next handler in the chain to
851 actually execute the protocol.
852
853
854.. _http-password-mgr:
855
856HTTPPasswordMgr Objects
857-----------------------
858
859These methods are available on :class:`HTTPPasswordMgr` and
860:class:`HTTPPasswordMgrWithDefaultRealm` objects.
861
862
863.. method:: HTTPPasswordMgr.add_password(realm, uri, user, passwd)
864
865 *uri* can be either a single URI, or a sequence of URIs. *realm*, *user* and
866 *passwd* must be strings. This causes ``(user, passwd)`` to be used as
867 authentication tokens when authentication for *realm* and a super-URI of any of
868 the given URIs is given.
869
870
871.. method:: HTTPPasswordMgr.find_user_password(realm, authuri)
872
873 Get user/password for given realm and URI, if any. This method will return
874 ``(None, None)`` if there is no matching user/password.
875
876 For :class:`HTTPPasswordMgrWithDefaultRealm` objects, the realm ``None`` will be
877 searched if the given *realm* has no matching user/password.
878
879
880.. _abstract-basic-auth-handler:
881
882AbstractBasicAuthHandler Objects
883--------------------------------
884
885
886.. method:: AbstractBasicAuthHandler.http_error_auth_reqed(authreq, host, req, headers)
887
888 Handle an authentication request by getting a user/password pair, and re-trying
889 the request. *authreq* should be the name of the header where the information
890 about the realm is included in the request, *host* specifies the URL and path to
891 authenticate for, *req* should be the (failed) :class:`Request` object, and
892 *headers* should be the error headers.
893
894 *host* is either an authority (e.g. ``"python.org"``) or a URL containing an
895 authority component (e.g. ``"http://python.org/"``). In either case, the
896 authority must not contain a userinfo component (so, ``"python.org"`` and
897 ``"python.org:80"`` are fine, ``"joe:password@python.org"`` is not).
898
899
900.. _http-basic-auth-handler:
901
902HTTPBasicAuthHandler Objects
903----------------------------
904
905
906.. method:: HTTPBasicAuthHandler.http_error_401(req, fp, code, msg, hdrs)
907
908 Retry the request with authentication information, if available.
909
910
911.. _proxy-basic-auth-handler:
912
913ProxyBasicAuthHandler Objects
914-----------------------------
915
916
917.. method:: ProxyBasicAuthHandler.http_error_407(req, fp, code, msg, hdrs)
918
919 Retry the request with authentication information, if available.
920
921
922.. _abstract-digest-auth-handler:
923
924AbstractDigestAuthHandler Objects
925---------------------------------
926
927
928.. method:: AbstractDigestAuthHandler.http_error_auth_reqed(authreq, host, req, headers)
929
930 *authreq* should be the name of the header where the information about the realm
931 is included in the request, *host* should be the host to authenticate to, *req*
932 should be the (failed) :class:`Request` object, and *headers* should be the
933 error headers.
934
935
936.. _http-digest-auth-handler:
937
938HTTPDigestAuthHandler Objects
939-----------------------------
940
941
942.. method:: HTTPDigestAuthHandler.http_error_401(req, fp, code, msg, hdrs)
943
944 Retry the request with authentication information, if available.
945
946
947.. _proxy-digest-auth-handler:
948
949ProxyDigestAuthHandler Objects
950------------------------------
951
952
953.. method:: ProxyDigestAuthHandler.http_error_407(req, fp, code, msg, hdrs)
954
955 Retry the request with authentication information, if available.
956
957
958.. _http-handler-objects:
959
960HTTPHandler Objects
961-------------------
962
963
964.. method:: HTTPHandler.http_open(req)
965
966 Send an HTTP request, which can be either GET or POST, depending on
967 ``req.has_data()``.
968
969
970.. _https-handler-objects:
971
972HTTPSHandler Objects
973--------------------
974
975
976.. method:: HTTPSHandler.https_open(req)
977
978 Send an HTTPS request, which can be either GET or POST, depending on
979 ``req.has_data()``.
980
981
982.. _file-handler-objects:
983
984FileHandler Objects
985-------------------
986
987
988.. method:: FileHandler.file_open(req)
989
990 Open the file locally, if there is no host name, or the host name is
991 ``'localhost'``. Change the protocol to ``ftp`` otherwise, and retry opening it
992 using :attr:`parent`.
993
994
995.. _ftp-handler-objects:
996
997FTPHandler Objects
998------------------
999
1000
1001.. method:: FTPHandler.ftp_open(req)
1002
1003 Open the FTP file indicated by *req*. The login is always done with empty
1004 username and password.
1005
1006
1007.. _cacheftp-handler-objects:
1008
1009CacheFTPHandler Objects
1010-----------------------
1011
1012:class:`CacheFTPHandler` objects are :class:`FTPHandler` objects with the
1013following additional methods:
1014
1015
1016.. method:: CacheFTPHandler.setTimeout(t)
1017
1018 Set timeout of connections to *t* seconds.
1019
1020
1021.. method:: CacheFTPHandler.setMaxConns(m)
1022
1023 Set maximum number of cached connections to *m*.
1024
1025
1026.. _unknown-handler-objects:
1027
1028UnknownHandler Objects
1029----------------------
1030
1031
1032.. method:: UnknownHandler.unknown_open()
1033
1034 Raise a :exc:`URLError` exception.
1035
1036
1037.. _http-error-processor-objects:
1038
1039HTTPErrorProcessor Objects
1040--------------------------
1041
Georg Brandl116aa622007-08-15 14:28:22 +00001042.. method:: HTTPErrorProcessor.unknown_open()
1043
1044 Process HTTP error responses.
1045
1046 For 200 error codes, the response object is returned immediately.
1047
1048 For non-200 error codes, this simply passes the job on to the
1049 :meth:`protocol_error_code` handler methods, via :meth:`OpenerDirector.error`.
Georg Brandl0f7ede42008-06-23 11:23:31 +00001050 Eventually, :class:`HTTPDefaultErrorHandler` will raise an
Georg Brandl116aa622007-08-15 14:28:22 +00001051 :exc:`HTTPError` if no other handler handles the error.
1052
Georg Brandl0f7ede42008-06-23 11:23:31 +00001053
1054.. _urllib-request-examples:
Georg Brandl116aa622007-08-15 14:28:22 +00001055
1056Examples
1057--------
1058
1059This example gets the python.org main page and displays the first 100 bytes of
1060it::
1061
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001062 >>> import urllib.request
1063 >>> f = urllib.request.urlopen('http://www.python.org/')
Collin Winterc79461b2007-09-01 23:34:30 +00001064 >>> print(f.read(100))
Georg Brandl116aa622007-08-15 14:28:22 +00001065 <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
1066 <?xml-stylesheet href="./css/ht2html
1067
1068Here we are sending a data-stream to the stdin of a CGI and reading the data it
1069returns to us. Note that this example will only work when the Python
1070installation supports SSL. ::
1071
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001072 >>> import urllib.request
1073 >>> req = urllib.request.Request(url='https://localhost/cgi-bin/test.cgi',
Georg Brandl116aa622007-08-15 14:28:22 +00001074 ... data='This data is passed to stdin of the CGI')
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001075 >>> f = urllib.request.urlopen(req)
Collin Winterc79461b2007-09-01 23:34:30 +00001076 >>> print(f.read())
Georg Brandl116aa622007-08-15 14:28:22 +00001077 Got Data: "This data is passed to stdin of the CGI"
1078
1079The code for the sample CGI used in the above example is::
1080
1081 #!/usr/bin/env python
1082 import sys
1083 data = sys.stdin.read()
Collin Winterc79461b2007-09-01 23:34:30 +00001084 print('Content-type: text-plain\n\nGot Data: "%s"' % data)
Georg Brandl116aa622007-08-15 14:28:22 +00001085
1086Use of Basic HTTP Authentication::
1087
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001088 import urllib.request
Georg Brandl116aa622007-08-15 14:28:22 +00001089 # Create an OpenerDirector with support for Basic HTTP Authentication...
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001090 auth_handler = urllib.request.HTTPBasicAuthHandler()
Georg Brandl116aa622007-08-15 14:28:22 +00001091 auth_handler.add_password(realm='PDQ Application',
1092 uri='https://mahler:8092/site-updates.py',
1093 user='klem',
1094 passwd='kadidd!ehopper')
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001095 opener = urllib.request.build_opener(auth_handler)
Georg Brandl116aa622007-08-15 14:28:22 +00001096 # ...and install it globally so it can be used with urlopen.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001097 urllib.request.install_opener(opener)
1098 urllib.request.urlopen('http://www.example.com/login.html')
Georg Brandl116aa622007-08-15 14:28:22 +00001099
1100:func:`build_opener` provides many handlers by default, including a
1101:class:`ProxyHandler`. By default, :class:`ProxyHandler` uses the environment
1102variables named ``<scheme>_proxy``, where ``<scheme>`` is the URL scheme
1103involved. For example, the :envvar:`http_proxy` environment variable is read to
1104obtain the HTTP proxy's URL.
1105
1106This example replaces the default :class:`ProxyHandler` with one that uses
Georg Brandl2ee470f2008-07-16 12:55:28 +00001107programmatically-supplied proxy URLs, and adds proxy authorization support with
Georg Brandl116aa622007-08-15 14:28:22 +00001108:class:`ProxyBasicAuthHandler`. ::
1109
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001110 proxy_handler = urllib.request.ProxyHandler({'http': 'http://www.example.com:3128/'})
1111 proxy_auth_handler = urllib.request.HTTPBasicAuthHandler()
Georg Brandl116aa622007-08-15 14:28:22 +00001112 proxy_auth_handler.add_password('realm', 'host', 'username', 'password')
1113
1114 opener = build_opener(proxy_handler, proxy_auth_handler)
1115 # This time, rather than install the OpenerDirector, we use it directly:
1116 opener.open('http://www.example.com/login.html')
1117
1118Adding HTTP headers:
1119
1120Use the *headers* argument to the :class:`Request` constructor, or::
1121
Georg Brandl029986a2008-06-23 11:44:14 +00001122 import urllib.request
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001123 req = urllib.request.Request('http://www.example.com/')
Georg Brandl116aa622007-08-15 14:28:22 +00001124 req.add_header('Referer', 'http://www.python.org/')
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001125 r = urllib.request.urlopen(req)
Georg Brandl116aa622007-08-15 14:28:22 +00001126
1127:class:`OpenerDirector` automatically adds a :mailheader:`User-Agent` header to
1128every :class:`Request`. To change this::
1129
Georg Brandl029986a2008-06-23 11:44:14 +00001130 import urllib.request
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001131 opener = urllib.request.build_opener()
Georg Brandl116aa622007-08-15 14:28:22 +00001132 opener.addheaders = [('User-agent', 'Mozilla/5.0')]
1133 opener.open('http://www.example.com/')
1134
1135Also, remember that a few standard headers (:mailheader:`Content-Length`,
1136:mailheader:`Content-Type` and :mailheader:`Host`) are added when the
1137:class:`Request` is passed to :func:`urlopen` (or :meth:`OpenerDirector.open`).
1138
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001139.. _urllib-examples:
1140
1141Here is an example session that uses the ``GET`` method to retrieve a URL
1142containing parameters::
1143
1144 >>> import urllib.request
1145 >>> import urllib.parse
1146 >>> params = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
1147 >>> f = urllib.request.urlopen("http://www.musi-cal.com/cgi-bin/query?%s" % params)
1148 >>> print(f.read())
1149
1150The following example uses the ``POST`` method instead::
1151
1152 >>> import urllib.request
1153 >>> import urllib.parse
1154 >>> params = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
1155 >>> f = urllib.request.urlopen("http://www.musi-cal.com/cgi-bin/query", params)
1156 >>> print(f.read())
1157
1158The following example uses an explicitly specified HTTP proxy, overriding
1159environment settings::
1160
1161 >>> import urllib.request
1162 >>> proxies = {'http': 'http://proxy.example.com:8080/'}
1163 >>> opener = urllib.request.FancyURLopener(proxies)
1164 >>> f = opener.open("http://www.python.org")
1165 >>> f.read()
1166
1167The following example uses no proxies at all, overriding environment settings::
1168
1169 >>> import urllib.request
1170 >>> opener = urllib.request.FancyURLopener({})
1171 >>> f = opener.open("http://www.python.org/")
1172 >>> f.read()
1173
1174
1175:mod:`urllib.request` Restrictions
1176----------------------------------
1177
1178 .. index::
1179 pair: HTTP; protocol
1180 pair: FTP; protocol
1181
1182* Currently, only the following protocols are supported: HTTP, (versions 0.9 and
1183 1.0), FTP, and local files.
1184
1185* The caching feature of :func:`urlretrieve` has been disabled until I find the
1186 time to hack proper processing of Expiration time headers.
1187
1188* There should be a function to query whether a particular URL is in the cache.
1189
1190* For backward compatibility, if a URL appears to point to a local file but the
1191 file can't be opened, the URL is re-interpreted using the FTP protocol. This
1192 can sometimes cause confusing error messages.
1193
1194* The :func:`urlopen` and :func:`urlretrieve` functions can cause arbitrarily
1195 long delays while waiting for a network connection to be set up. This means
1196 that it is difficult to build an interactive Web client using these functions
1197 without using threads.
1198
1199 .. index::
1200 single: HTML
1201 pair: HTTP; protocol
1202
1203* The data returned by :func:`urlopen` or :func:`urlretrieve` is the raw data
1204 returned by the server. This may be binary data (such as an image), plain text
1205 or (for example) HTML. The HTTP protocol provides type information in the reply
1206 header, which can be inspected by looking at the :mailheader:`Content-Type`
1207 header. If the returned data is HTML, you can use the module
1208 :mod:`html.parser` to parse it.
1209
1210 .. index:: single: FTP
1211
1212* The code handling the FTP protocol cannot differentiate between a file and a
1213 directory. This can lead to unexpected behavior when attempting to read a URL
1214 that points to a file that is not accessible. If the URL ends in a ``/``, it is
1215 assumed to refer to a directory and will be handled accordingly. But if an
1216 attempt to read a file leads to a 550 error (meaning the URL cannot be found or
1217 is not accessible, often for permission reasons), then the path is treated as a
1218 directory in order to handle the case when a directory is specified by a URL but
1219 the trailing ``/`` has been left off. This can cause misleading results when
1220 you try to fetch a file whose read permissions make it inaccessible; the FTP
1221 code will try to read it, fail with a 550 error, and then perform a directory
1222 listing for the unreadable file. If fine-grained control is needed, consider
1223 using the :mod:`ftplib` module, subclassing :class:`FancyURLOpener`, or changing
1224 *_urlopener* to meet your needs.
1225
Georg Brandl0f7ede42008-06-23 11:23:31 +00001226
1227
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001228:mod:`urllib.response` --- Response classes used by urllib.
1229===========================================================
Georg Brandl0f7ede42008-06-23 11:23:31 +00001230
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001231.. module:: urllib.response
1232 :synopsis: Response classes used by urllib.
1233
1234The :mod:`urllib.response` module defines functions and classes which define a
Georg Brandl0f7ede42008-06-23 11:23:31 +00001235minimal file like interface, including ``read()`` and ``readline()``. The
1236typical response object is an addinfourl instance, which defines and ``info()``
1237method and that returns headers and a ``geturl()`` method that returns the url.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001238Functions defined by this module are used internally by the
1239:mod:`urllib.request` module.
1240