blob: d878aac55788a3eb2829480a0feff32d47706eb8 [file] [log] [blame]
Georg Brandl8175dae2010-11-29 14:53:15 +00001:mod:`urllib.request` --- Extensible library for opening URLs
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00002=============================================================
Georg Brandl116aa622007-08-15 14:28:22 +00003
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00004.. module:: urllib.request
Senthil Kumaran6b3434a2012-03-15 18:11:16 -07005 :synopsis: Extensible library for opening URLs.
Jeremy Hyltone2573162009-03-31 14:38:13 +00006.. moduleauthor:: Jeremy Hylton <jeremy@alum.mit.edu>
Georg Brandl116aa622007-08-15 14:28:22 +00007.. sectionauthor:: Moshe Zadka <moshez@users.sourceforge.net>
Senthil Kumaran6b3434a2012-03-15 18:11:16 -07008.. sectionauthor:: Senthil Kumaran <senthil@uthcode.com>
Georg Brandl116aa622007-08-15 14:28:22 +00009
10
Georg Brandl0f7ede42008-06-23 11:23:31 +000011The :mod:`urllib.request` module defines functions and classes which help in
12opening URLs (mostly HTTP) in a complex world --- basic and digest
13authentication, redirections, cookies and more.
Georg Brandl116aa622007-08-15 14:28:22 +000014
Antoine Pitrou79ecd762010-09-29 11:24:21 +000015
Senthil Kumaranaca8fd72008-06-23 04:41:59 +000016The :mod:`urllib.request` module defines the following functions:
Georg Brandl116aa622007-08-15 14:28:22 +000017
18
Senthil Kumaran8b7e1612014-09-19 15:23:30 +080019.. function:: urlopen(url, data=None[, timeout], *, cafile=None, capath=None, cadefault=False, context=None)
Georg Brandl116aa622007-08-15 14:28:22 +000020
Jeremy Hyltone2573162009-03-31 14:38:13 +000021 Open the URL *url*, which can be either a string or a
22 :class:`Request` object.
Georg Brandl116aa622007-08-15 14:28:22 +000023
Senthil Kumaran6b3434a2012-03-15 18:11:16 -070024 *data* must be a bytes object specifying additional data to be sent to the
Senthil Kumaran7bc0d872010-12-19 10:49:52 +000025 server, or ``None`` if no such data is needed. *data* may also be an
26 iterable object and in that case Content-Length value must be specified in
27 the headers. Currently HTTP requests are the only ones that use *data*; the
28 HTTP request will be a POST instead of a GET when the *data* parameter is
Senthil Kumaran6b3434a2012-03-15 18:11:16 -070029 provided.
30
31 *data* should be a buffer in the standard
Georg Brandl116aa622007-08-15 14:28:22 +000032 :mimetype:`application/x-www-form-urlencoded` format. The
Senthil Kumaran7bc0d872010-12-19 10:49:52 +000033 :func:`urllib.parse.urlencode` function takes a mapping or sequence of
Senthil Kumaran6b3434a2012-03-15 18:11:16 -070034 2-tuples and returns a string in this format. It should be encoded to bytes
35 before being used as the *data* parameter. The charset parameter in
36 ``Content-Type`` header may be used to specify the encoding. If charset
37 parameter is not sent with the Content-Type header, the server following the
38 HTTP 1.1 recommendation may assume that the data is encoded in ISO-8859-1
39 encoding. It is advisable to use charset parameter with encoding used in
40 ``Content-Type`` header with the :class:`Request`.
41
42 urllib.request module uses HTTP/1.1 and includes ``Connection:close`` header
43 in its HTTP requests.
Georg Brandl116aa622007-08-15 14:28:22 +000044
Jeremy Hyltone2573162009-03-31 14:38:13 +000045 The optional *timeout* parameter specifies a timeout in seconds for
46 blocking operations like the connection attempt (if not specified,
47 the global default timeout setting will be used). This actually
Senthil Kumaranc08d9072010-10-05 18:46:56 +000048 only works for HTTP, HTTPS and FTP connections.
Georg Brandl116aa622007-08-15 14:28:22 +000049
Senthil Kumaran8b7e1612014-09-19 15:23:30 +080050 If *context* is specified, it must be a :class:`ssl.SSLContext` instance
Benjamin Petersona5c9c372014-11-02 13:17:56 -050051 describing the various SSL options. See :class:`~http.client.HTTPSConnection`
52 for more details.
Senthil Kumaran8b7e1612014-09-19 15:23:30 +080053
Antoine Pitrou803e6d62010-10-13 10:36:15 +000054 The optional *cafile* and *capath* parameters specify a set of trusted
55 CA certificates for HTTPS requests. *cafile* should point to a single
56 file containing a bundle of CA certificates, whereas *capath* should
57 point to a directory of hashed certificate files. More information can
58 be found in :meth:`ssl.SSLContext.load_verify_locations`.
59
Benjamin Peterson378e15d2014-11-23 11:43:33 -060060 The *cadefault* parameter is ignored.
Antoine Pitroude9ac6c2012-05-16 21:40:01 +020061
Senthil Kumaranb98e96a2013-02-07 00:47:01 -080062 For http and https urls, this function returns a
63 :class:`http.client.HTTPResponse` object which has the following
64 :ref:`httpresponse-objects` methods.
Georg Brandl116aa622007-08-15 14:28:22 +000065
Senthil Kumarand03f4672013-02-07 21:43:21 -080066 For ftp, file, and data urls and requests explicity handled by legacy
67 :class:`URLopener` and :class:`FancyURLopener` classes, this function
68 returns a :class:`urllib.response.addinfourl` object which can work as
Senthil Kumaranb98e96a2013-02-07 00:47:01 -080069 :term:`context manager` and has methods such as
70
71 * :meth:`~urllib.response.addinfourl.geturl` --- return the URL of the resource retrieved,
Jeremy Hyltone2573162009-03-31 14:38:13 +000072 commonly used to determine if a redirect was followed
Georg Brandl116aa622007-08-15 14:28:22 +000073
Senthil Kumaranb98e96a2013-02-07 00:47:01 -080074 * :meth:`~urllib.response.addinfourl.info` --- return the meta-information of the page, such as headers,
Senthil Kumaran13a7eb42010-06-28 17:31:40 +000075 in the form of an :func:`email.message_from_string` instance (see
76 `Quick Reference to HTTP Headers <http://www.cs.tut.fi/~jkorpela/http.html>`_)
Georg Brandl116aa622007-08-15 14:28:22 +000077
Senthil Kumaranb98e96a2013-02-07 00:47:01 -080078 * :meth:`~urllib.response.addinfourl.getcode` -- return the HTTP status code of the response.
79
Serhiy Storchaka5e1c0532013-10-13 20:06:50 +030080 Raises :exc:`~urllib.error.URLError` on errors.
Georg Brandl116aa622007-08-15 14:28:22 +000081
Georg Brandl2dd01042009-02-27 16:46:46 +000082 Note that ``None`` may be returned if no handler handles the request (though
83 the default installed global :class:`OpenerDirector` uses
84 :class:`UnknownHandler` to ensure this never happens).
85
R David Murray5aea37a2013-04-28 11:07:16 -040086 In addition, if proxy settings are detected (for example, when a ``*_proxy``
87 environment variable like :envvar:`http_proxy` is set),
88 :class:`ProxyHandler` is default installed and makes sure the requests are
89 handled through the proxy.
Senthil Kumarana51a1b32009-10-18 01:42:33 +000090
Georg Brandl2dd01042009-02-27 16:46:46 +000091 The legacy ``urllib.urlopen`` function from Python 2.6 and earlier has been
Senthil Kumaran6b3434a2012-03-15 18:11:16 -070092 discontinued; :func:`urllib.request.urlopen` corresponds to the old
93 ``urllib2.urlopen``. Proxy handling, which was done by passing a dictionary
94 parameter to ``urllib.urlopen``, can be obtained by using
95 :class:`ProxyHandler` objects.
Georg Brandl116aa622007-08-15 14:28:22 +000096
Antoine Pitrou803e6d62010-10-13 10:36:15 +000097 .. versionchanged:: 3.2
98 *cafile* and *capath* were added.
99
Antoine Pitroud5323212010-10-22 18:19:07 +0000100 .. versionchanged:: 3.2
101 HTTPS virtual hosts are now supported if possible (that is, if
102 :data:`ssl.HAS_SNI` is true).
103
Senthil Kumaran7bc0d872010-12-19 10:49:52 +0000104 .. versionadded:: 3.2
105 *data* can be an iterable object.
106
Antoine Pitroude9ac6c2012-05-16 21:40:01 +0200107 .. versionchanged:: 3.3
108 *cadefault* was added.
109
Senthil Kumaran8b7e1612014-09-19 15:23:30 +0800110 .. versionchanged:: 3.4.3
111 *context* was added.
112
Georg Brandl116aa622007-08-15 14:28:22 +0000113.. function:: install_opener(opener)
114
115 Install an :class:`OpenerDirector` instance as the default global opener.
Senthil Kumaran6b3434a2012-03-15 18:11:16 -0700116 Installing an opener is only necessary if you want urlopen to use that
117 opener; otherwise, simply call :meth:`OpenerDirector.open` instead of
118 :func:`~urllib.request.urlopen`. The code does not check for a real
119 :class:`OpenerDirector`, and any class with the appropriate interface will
120 work.
Georg Brandl116aa622007-08-15 14:28:22 +0000121
122
123.. function:: build_opener([handler, ...])
124
125 Return an :class:`OpenerDirector` instance, which chains the handlers in the
126 order given. *handler*\s can be either instances of :class:`BaseHandler`, or
127 subclasses of :class:`BaseHandler` (in which case it must be possible to call
128 the constructor without any parameters). Instances of the following classes
129 will be in front of the *handler*\s, unless the *handler*\s contain them,
R David Murray5aea37a2013-04-28 11:07:16 -0400130 instances of them or subclasses of them: :class:`ProxyHandler` (if proxy
R David Murray9330a942013-04-28 11:24:35 -0400131 settings are detected), :class:`UnknownHandler`, :class:`HTTPHandler`,
132 :class:`HTTPDefaultErrorHandler`, :class:`HTTPRedirectHandler`,
133 :class:`FTPHandler`, :class:`FileHandler`, :class:`HTTPErrorProcessor`.
Georg Brandl116aa622007-08-15 14:28:22 +0000134
Georg Brandl7f01a132009-09-16 15:58:14 +0000135 If the Python installation has SSL support (i.e., if the :mod:`ssl` module
136 can be imported), :class:`HTTPSHandler` will also be added.
Georg Brandl116aa622007-08-15 14:28:22 +0000137
Georg Brandle6bcc912008-05-12 18:05:20 +0000138 A :class:`BaseHandler` subclass may also change its :attr:`handler_order`
Senthil Kumarana6bac952011-07-04 11:28:30 -0700139 attribute to modify its position in the handlers list.
Georg Brandl116aa622007-08-15 14:28:22 +0000140
Georg Brandl7f01a132009-09-16 15:58:14 +0000141
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000142.. function:: pathname2url(path)
Christian Heimes292d3512008-02-03 16:51:08 +0000143
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000144 Convert the pathname *path* from the local syntax for a path to the form used in
145 the path component of a URL. This does not produce a complete URL. The return
Serhiy Storchaka5e1c0532013-10-13 20:06:50 +0300146 value will already be quoted using the :func:`~urllib.parse.quote` function.
Christian Heimes292d3512008-02-03 16:51:08 +0000147
148
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000149.. function:: url2pathname(path)
150
Senthil Kumaranf0769e82010-08-09 19:53:52 +0000151 Convert the path component *path* from a percent-encoded URL to the local syntax for a
Serhiy Storchaka5e1c0532013-10-13 20:06:50 +0300152 path. This does not accept a complete URL. This function uses
153 :func:`~urllib.parse.unquote` to decode *path*.
Georg Brandl116aa622007-08-15 14:28:22 +0000154
Senthil Kumaran7e557a62010-02-26 00:53:23 +0000155.. function:: getproxies()
156
157 This helper function returns a dictionary of scheme to proxy server URL
Senthil Kumaran19d85c22012-01-11 01:29:08 +0800158 mappings. It scans the environment for variables named ``<scheme>_proxy``,
159 in a case insensitive approach, for all operating systems first, and when it
160 cannot find it, looks for proxy information from Mac OSX System
161 Configuration for Mac OS X and Windows Systems Registry for Windows.
Senthil Kumaran7e557a62010-02-26 00:53:23 +0000162
Georg Brandl7f01a132009-09-16 15:58:14 +0000163
Georg Brandl116aa622007-08-15 14:28:22 +0000164The following classes are provided:
165
Senthil Kumarande49d642011-10-16 23:54:44 +0800166.. class:: Request(url, data=None, headers={}, origin_req_host=None, unverifiable=False, method=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000167
168 This class is an abstraction of a URL request.
169
170 *url* should be a string containing a valid URL.
171
Senthil Kumaran6b3434a2012-03-15 18:11:16 -0700172 *data* must be a bytes object specifying additional data to send to the
Senthil Kumaran87684e62012-03-14 18:08:13 -0700173 server, or ``None`` if no such data is needed. Currently HTTP requests are
174 the only ones that use *data*; the HTTP request will be a POST instead of a
175 GET when the *data* parameter is provided. *data* should be a buffer in the
Senthil Kumaran6b3434a2012-03-15 18:11:16 -0700176 standard :mimetype:`application/x-www-form-urlencoded` format.
177
178 The :func:`urllib.parse.urlencode` function takes a mapping or sequence of
179 2-tuples and returns a string in this format. It should be encoded to bytes
180 before being used as the *data* parameter. The charset parameter in
181 ``Content-Type`` header may be used to specify the encoding. If charset
182 parameter is not sent with the Content-Type header, the server following the
183 HTTP 1.1 recommendation may assume that the data is encoded in ISO-8859-1
184 encoding. It is advisable to use charset parameter with encoding used in
185 ``Content-Type`` header with the :class:`Request`.
186
Georg Brandl116aa622007-08-15 14:28:22 +0000187
Jeremy Hyltone2573162009-03-31 14:38:13 +0000188 *headers* should be a dictionary, and will be treated as if
189 :meth:`add_header` was called with each key and value as arguments.
190 This is often used to "spoof" the ``User-Agent`` header, which is
191 used by a browser to identify itself -- some HTTP servers only
192 allow requests coming from common browsers as opposed to scripts.
193 For example, Mozilla Firefox may identify itself as ``"Mozilla/5.0
194 (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11"``, while
195 :mod:`urllib`'s default user agent string is
196 ``"Python-urllib/2.6"`` (on Python 2.6).
Georg Brandl116aa622007-08-15 14:28:22 +0000197
Senthil Kumaran6b3434a2012-03-15 18:11:16 -0700198 An example of using ``Content-Type`` header with *data* argument would be
199 sending a dictionary like ``{"Content-Type":" application/x-www-form-urlencoded;charset=utf-8"}``
200
Jeremy Hyltone2573162009-03-31 14:38:13 +0000201 The final two arguments are only of interest for correct handling
202 of third-party HTTP cookies:
Georg Brandl116aa622007-08-15 14:28:22 +0000203
Jeremy Hyltone2573162009-03-31 14:38:13 +0000204 *origin_req_host* should be the request-host of the origin
205 transaction, as defined by :rfc:`2965`. It defaults to
206 ``http.cookiejar.request_host(self)``. This is the host name or IP
207 address of the original request that was initiated by the user.
208 For example, if the request is for an image in an HTML document,
209 this should be the request-host of the request for the page
Georg Brandl24420152008-05-26 16:32:26 +0000210 containing the image.
Georg Brandl116aa622007-08-15 14:28:22 +0000211
Jeremy Hyltone2573162009-03-31 14:38:13 +0000212 *unverifiable* should indicate whether the request is unverifiable,
Serhiy Storchakafbc1c262013-11-29 12:17:13 +0200213 as defined by RFC 2965. It defaults to ``False``. An unverifiable
Jeremy Hyltone2573162009-03-31 14:38:13 +0000214 request is one whose URL the user did not have the option to
215 approve. For example, if the request is for an image in an HTML
216 document, and the user had no option to approve the automatic
217 fetching of the image, this should be true.
Georg Brandl116aa622007-08-15 14:28:22 +0000218
Senthil Kumarande49d642011-10-16 23:54:44 +0800219 *method* should be a string that indicates the HTTP request method that
Larry Hastings3732ed22014-03-15 21:13:56 -0700220 will be used (e.g. ``'HEAD'``). If provided, its value is stored in the
Senthil Kumarana41c9422011-10-20 02:37:08 +0800221 :attr:`~Request.method` attribute and is used by :meth:`get_method()`.
Jason R. Coombs0c47f342013-09-22 09:33:45 -0400222 Subclasses may indicate a default method by setting the
223 :attr:`~Request.method` attribute in the class itself.
Senthil Kumarande49d642011-10-16 23:54:44 +0800224
225 .. versionchanged:: 3.3
Georg Brandl61063cc2012-06-24 22:48:30 +0200226 :attr:`Request.method` argument is added to the Request class.
Senthil Kumarande49d642011-10-16 23:54:44 +0800227
Jason R. Coombs0c47f342013-09-22 09:33:45 -0400228 .. versionchanged:: 3.4
229 Default :attr:`Request.method` may be indicated at the class level.
230
Georg Brandl7f01a132009-09-16 15:58:14 +0000231
Georg Brandl116aa622007-08-15 14:28:22 +0000232.. class:: OpenerDirector()
233
234 The :class:`OpenerDirector` class opens URLs via :class:`BaseHandler`\ s chained
235 together. It manages the chaining of handlers, and recovery from errors.
236
237
238.. class:: BaseHandler()
239
240 This is the base class for all registered handlers --- and handles only the
241 simple mechanics of registration.
242
243
244.. class:: HTTPDefaultErrorHandler()
245
246 A class which defines a default handler for HTTP error responses; all responses
Serhiy Storchaka5e1c0532013-10-13 20:06:50 +0300247 are turned into :exc:`~urllib.error.HTTPError` exceptions.
Georg Brandl116aa622007-08-15 14:28:22 +0000248
249
250.. class:: HTTPRedirectHandler()
251
252 A class to handle redirections.
253
254
Georg Brandl7f01a132009-09-16 15:58:14 +0000255.. class:: HTTPCookieProcessor(cookiejar=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000256
257 A class to handle HTTP Cookies.
258
259
Georg Brandl7f01a132009-09-16 15:58:14 +0000260.. class:: ProxyHandler(proxies=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000261
262 Cause requests to go through a proxy. If *proxies* is given, it must be a
R David Murray9330a942013-04-28 11:24:35 -0400263 dictionary mapping protocol names to URLs of proxies. The default is to read
264 the list of proxies from the environment variables
265 :envvar:`<protocol>_proxy`. If no proxy environment variables are set, then
266 in a Windows environment proxy settings are obtained from the registry's
267 Internet Settings section, and in a Mac OS X environment proxy information
268 is retrieved from the OS X System Configuration Framework.
Senthil Kumarana51a1b32009-10-18 01:42:33 +0000269
Christian Heimese25f35e2008-03-20 10:49:03 +0000270 To disable autodetected proxy pass an empty dictionary.
Georg Brandl116aa622007-08-15 14:28:22 +0000271
272
273.. class:: HTTPPasswordMgr()
274
275 Keep a database of ``(realm, uri) -> (user, password)`` mappings.
276
277
278.. class:: HTTPPasswordMgrWithDefaultRealm()
279
280 Keep a database of ``(realm, uri) -> (user, password)`` mappings. A realm of
281 ``None`` is considered a catch-all realm, which is searched if no other realm
282 fits.
283
284
Georg Brandl7f01a132009-09-16 15:58:14 +0000285.. class:: AbstractBasicAuthHandler(password_mgr=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000286
287 This is a mixin class that helps with HTTP authentication, both to the remote
288 host and to a proxy. *password_mgr*, if given, should be something that is
289 compatible with :class:`HTTPPasswordMgr`; refer to section
290 :ref:`http-password-mgr` for information on the interface that must be
291 supported.
292
293
Georg Brandl7f01a132009-09-16 15:58:14 +0000294.. class:: HTTPBasicAuthHandler(password_mgr=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000295
Senthil Kumaran4de00a22011-05-11 21:17:57 +0800296 Handle authentication with the remote host. *password_mgr*, if given, should
297 be something that is compatible with :class:`HTTPPasswordMgr`; refer to
298 section :ref:`http-password-mgr` for information on the interface that must
299 be supported. HTTPBasicAuthHandler will raise a :exc:`ValueError` when
300 presented with a wrong Authentication scheme.
Georg Brandl116aa622007-08-15 14:28:22 +0000301
302
Georg Brandl7f01a132009-09-16 15:58:14 +0000303.. class:: ProxyBasicAuthHandler(password_mgr=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000304
305 Handle authentication with the proxy. *password_mgr*, if given, should be
306 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
307 :ref:`http-password-mgr` for information on the interface that must be
308 supported.
309
310
Georg Brandl7f01a132009-09-16 15:58:14 +0000311.. class:: AbstractDigestAuthHandler(password_mgr=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000312
313 This is a mixin class that helps with HTTP authentication, both to the remote
314 host and to a proxy. *password_mgr*, if given, should be something that is
315 compatible with :class:`HTTPPasswordMgr`; refer to section
316 :ref:`http-password-mgr` for information on the interface that must be
317 supported.
318
319
Georg Brandl7f01a132009-09-16 15:58:14 +0000320.. class:: HTTPDigestAuthHandler(password_mgr=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000321
Senthil Kumaran4de00a22011-05-11 21:17:57 +0800322 Handle authentication with the remote host. *password_mgr*, if given, should
323 be something that is compatible with :class:`HTTPPasswordMgr`; refer to
324 section :ref:`http-password-mgr` for information on the interface that must
325 be supported. When both Digest Authentication Handler and Basic
326 Authentication Handler are both added, Digest Authentication is always tried
327 first. If the Digest Authentication returns a 40x response again, it is sent
328 to Basic Authentication handler to Handle. This Handler method will raise a
329 :exc:`ValueError` when presented with an authentication scheme other than
330 Digest or Basic.
331
Ezio Melottie9c7d6c2011-05-12 01:10:57 +0300332 .. versionchanged:: 3.3
333 Raise :exc:`ValueError` on unsupported Authentication Scheme.
Senthil Kumaran4de00a22011-05-11 21:17:57 +0800334
Georg Brandl116aa622007-08-15 14:28:22 +0000335
336
Georg Brandl7f01a132009-09-16 15:58:14 +0000337.. class:: ProxyDigestAuthHandler(password_mgr=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000338
339 Handle authentication with the proxy. *password_mgr*, if given, should be
340 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
341 :ref:`http-password-mgr` for information on the interface that must be
342 supported.
343
344
345.. class:: HTTPHandler()
346
347 A class to handle opening of HTTP URLs.
348
349
Antoine Pitrou803e6d62010-10-13 10:36:15 +0000350.. class:: HTTPSHandler(debuglevel=0, context=None, check_hostname=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000351
Antoine Pitrou803e6d62010-10-13 10:36:15 +0000352 A class to handle opening of HTTPS URLs. *context* and *check_hostname*
353 have the same meaning as in :class:`http.client.HTTPSConnection`.
354
355 .. versionchanged:: 3.2
356 *context* and *check_hostname* were added.
Georg Brandl116aa622007-08-15 14:28:22 +0000357
358
359.. class:: FileHandler()
360
361 Open local files.
362
Antoine Pitroudf204be2012-11-24 17:59:08 +0100363.. class:: DataHandler()
364
365 Open data URLs.
366
367 .. versionadded:: 3.4
Georg Brandl116aa622007-08-15 14:28:22 +0000368
369.. class:: FTPHandler()
370
371 Open FTP URLs.
372
373
374.. class:: CacheFTPHandler()
375
376 Open FTP URLs, keeping a cache of open FTP connections to minimize delays.
377
378
379.. class:: UnknownHandler()
380
381 A catch-all class to handle unknown URLs.
382
383
Senthil Kumaraned270fa2011-07-18 06:42:46 +0800384.. class:: HTTPErrorProcessor()
385
386 Process HTTP error responses.
387
388
Georg Brandl116aa622007-08-15 14:28:22 +0000389.. _request-objects:
390
391Request Objects
392---------------
393
Jeremy Hyltone2573162009-03-31 14:38:13 +0000394The following methods describe :class:`Request`'s public interface,
395and so all may be overridden in subclasses. It also defines several
396public attributes that can be used by clients to inspect the parsed
397request.
Georg Brandl116aa622007-08-15 14:28:22 +0000398
Jeremy Hyltone2573162009-03-31 14:38:13 +0000399.. attribute:: Request.full_url
400
401 The original URL passed to the constructor.
402
Senthil Kumaran83070752013-05-24 09:14:12 -0700403 .. versionchanged:: 3.4
404
405 Request.full_url is a property with setter, getter and a deleter. Getting
406 :attr:`~Request.full_url` returns the original request URL with the
407 fragment, if it was present.
408
Jeremy Hyltone2573162009-03-31 14:38:13 +0000409.. attribute:: Request.type
410
411 The URI scheme.
412
413.. attribute:: Request.host
414
415 The URI authority, typically a host, but may also contain a port
416 separated by a colon.
417
418.. attribute:: Request.origin_req_host
419
420 The original host for the request, without port.
421
422.. attribute:: Request.selector
423
424 The URI path. If the :class:`Request` uses a proxy, then selector
425 will be the full url that is passed to the proxy.
426
427.. attribute:: Request.data
428
429 The entity body for the request, or None if not specified.
430
Andrew Svetlovbff98fe2012-11-27 23:06:19 +0200431 .. versionchanged:: 3.4
432 Changing value of :attr:`Request.data` now deletes "Content-Length"
433 header if it was previously set or calculated.
434
Jeremy Hyltone2573162009-03-31 14:38:13 +0000435.. attribute:: Request.unverifiable
436
437 boolean, indicates whether the request is unverifiable as defined
438 by RFC 2965.
Georg Brandl116aa622007-08-15 14:28:22 +0000439
Senthil Kumarande49d642011-10-16 23:54:44 +0800440.. attribute:: Request.method
441
Larry Hastings3732ed22014-03-15 21:13:56 -0700442 The HTTP request method to use. By default its value is :const:`None`,
443 which means that :meth:`~Request.get_method` will do its normal computation
444 of the method to be used. Its value can be set (thus overriding the default
445 computation in :meth:`~Request.get_method`) either by providing a default
446 value by setting it at the class level in a :class:`Request` subclass, or by
447 passing a value in to the :class:`Request` constructor via the *method*
448 argument.
Senthil Kumarande49d642011-10-16 23:54:44 +0800449
Senthil Kumarana41c9422011-10-20 02:37:08 +0800450 .. versionadded:: 3.3
Senthil Kumarande49d642011-10-16 23:54:44 +0800451
Larry Hastings3732ed22014-03-15 21:13:56 -0700452 .. versionchanged:: 3.4
453 A default value can now be set in subclasses; previously it could only
454 be set via the constructor argument.
455
Georg Brandl116aa622007-08-15 14:28:22 +0000456
457.. method:: Request.get_method()
458
Senthil Kumarande49d642011-10-16 23:54:44 +0800459 Return a string indicating the HTTP request method. If
460 :attr:`Request.method` is not ``None``, return its value, otherwise return
461 ``'GET'`` if :attr:`Request.data` is ``None``, or ``'POST'`` if it's not.
462 This is only meaningful for HTTP requests.
463
Florent Xicluna95483b62011-10-19 11:44:51 +0200464 .. versionchanged:: 3.3
Senthil Kumarana41c9422011-10-20 02:37:08 +0800465 get_method now looks at the value of :attr:`Request.method`.
Georg Brandl116aa622007-08-15 14:28:22 +0000466
467
Georg Brandl116aa622007-08-15 14:28:22 +0000468.. method:: Request.add_header(key, val)
469
470 Add another header to the request. Headers are currently ignored by all
471 handlers except HTTP handlers, where they are added to the list of headers sent
472 to the server. Note that there cannot be more than one header with the same
473 name, and later calls will overwrite previous calls in case the *key* collides.
474 Currently, this is no loss of HTTP functionality, since all headers which have
475 meaning when used more than once have a (header-specific) way of gaining the
476 same functionality using only one header.
477
478
479.. method:: Request.add_unredirected_header(key, header)
480
481 Add a header that will not be added to a redirected request.
482
Georg Brandl116aa622007-08-15 14:28:22 +0000483
484.. method:: Request.has_header(header)
485
486 Return whether the instance has the named header (checks both regular and
487 unredirected).
488
Georg Brandl116aa622007-08-15 14:28:22 +0000489
Andrew Svetlovbff98fe2012-11-27 23:06:19 +0200490.. method:: Request.remove_header(header)
491
492 Remove named header from the request instance (both from regular and
493 unredirected headers).
494
Georg Brandlc0fc9582012-12-22 10:36:45 +0100495 .. versionadded:: 3.4
496
Andrew Svetlovbff98fe2012-11-27 23:06:19 +0200497
Georg Brandl116aa622007-08-15 14:28:22 +0000498.. method:: Request.get_full_url()
499
500 Return the URL given in the constructor.
501
Senthil Kumaran83070752013-05-24 09:14:12 -0700502 .. versionchanged:: 3.4
503
504 Returns :attr:`Request.full_url`
505
Georg Brandl116aa622007-08-15 14:28:22 +0000506
Georg Brandl116aa622007-08-15 14:28:22 +0000507.. method:: Request.set_proxy(host, type)
508
509 Prepare the request by connecting to a proxy server. The *host* and *type* will
510 replace those of the instance, and the instance's selector will be the original
511 URL given in the constructor.
512
513
Senthil Kumaran8dc50042012-04-29 11:50:39 +0800514.. method:: Request.get_header(header_name, default=None)
515
516 Return the value of the given header. If the header is not present, return
517 the default value.
518
519
520.. method:: Request.header_items()
521
522 Return a list of tuples (header_name, header_value) of the Request headers.
523
Senthil Kumaran6ddec172013-03-19 18:03:39 -0700524.. versionchanged:: 3.4
Georg Brandldf48b972014-03-24 09:06:18 +0100525 The request methods add_data, has_data, get_data, get_type, get_host,
526 get_selector, get_origin_req_host and is_unverifiable that were deprecated
527 since 3.3 have been removed.
528
Georg Brandl116aa622007-08-15 14:28:22 +0000529
530.. _opener-director-objects:
531
532OpenerDirector Objects
533----------------------
534
535:class:`OpenerDirector` instances have the following methods:
536
537
538.. method:: OpenerDirector.add_handler(handler)
539
540 *handler* should be an instance of :class:`BaseHandler`. The following methods
541 are searched, and added to the possible chains (note that HTTP errors are a
542 special case).
543
544 * :meth:`protocol_open` --- signal that the handler knows how to open *protocol*
545 URLs.
546
547 * :meth:`http_error_type` --- signal that the handler knows how to handle HTTP
548 errors with HTTP error code *type*.
549
550 * :meth:`protocol_error` --- signal that the handler knows how to handle errors
551 from (non-\ ``http``) *protocol*.
552
553 * :meth:`protocol_request` --- signal that the handler knows how to pre-process
554 *protocol* requests.
555
556 * :meth:`protocol_response` --- signal that the handler knows how to
557 post-process *protocol* responses.
558
559
Georg Brandl7f01a132009-09-16 15:58:14 +0000560.. method:: OpenerDirector.open(url, data=None[, timeout])
Georg Brandl116aa622007-08-15 14:28:22 +0000561
562 Open the given *url* (which can be a request object or a string), optionally
Alexandre Vassalotti5f8ced22008-05-16 00:03:33 +0000563 passing the given *data*. Arguments, return values and exceptions raised are
564 the same as those of :func:`urlopen` (which simply calls the :meth:`open`
565 method on the currently installed global :class:`OpenerDirector`). The
566 optional *timeout* parameter specifies a timeout in seconds for blocking
Georg Brandlf78e02b2008-06-10 17:40:04 +0000567 operations like the connection attempt (if not specified, the global default
Georg Brandl325524e2010-05-21 20:57:33 +0000568 timeout setting will be used). The timeout feature actually works only for
Senthil Kumaranc08d9072010-10-05 18:46:56 +0000569 HTTP, HTTPS and FTP connections).
Georg Brandl116aa622007-08-15 14:28:22 +0000570
Georg Brandl116aa622007-08-15 14:28:22 +0000571
Georg Brandl7f01a132009-09-16 15:58:14 +0000572.. method:: OpenerDirector.error(proto, *args)
Georg Brandl116aa622007-08-15 14:28:22 +0000573
574 Handle an error of the given protocol. This will call the registered error
575 handlers for the given protocol with the given arguments (which are protocol
576 specific). The HTTP protocol is a special case which uses the HTTP response
577 code to determine the specific error handler; refer to the :meth:`http_error_\*`
578 methods of the handler classes.
579
580 Return values and exceptions raised are the same as those of :func:`urlopen`.
581
582OpenerDirector objects open URLs in three stages:
583
584The order in which these methods are called within each stage is determined by
585sorting the handler instances.
586
587#. Every handler with a method named like :meth:`protocol_request` has that
588 method called to pre-process the request.
589
590#. Handlers with a method named like :meth:`protocol_open` are called to handle
591 the request. This stage ends when a handler either returns a non-\ :const:`None`
Serhiy Storchaka5e1c0532013-10-13 20:06:50 +0300592 value (ie. a response), or raises an exception (usually
593 :exc:`~urllib.error.URLError`). Exceptions are allowed to propagate.
Georg Brandl116aa622007-08-15 14:28:22 +0000594
595 In fact, the above algorithm is first tried for methods named
596 :meth:`default_open`. If all such methods return :const:`None`, the algorithm
597 is repeated for methods named like :meth:`protocol_open`. If all such methods
598 return :const:`None`, the algorithm is repeated for methods named
599 :meth:`unknown_open`.
600
601 Note that the implementation of these methods may involve calls of the parent
Georg Brandla5eacee2010-07-23 16:55:26 +0000602 :class:`OpenerDirector` instance's :meth:`~OpenerDirector.open` and
603 :meth:`~OpenerDirector.error` methods.
Georg Brandl116aa622007-08-15 14:28:22 +0000604
605#. Every handler with a method named like :meth:`protocol_response` has that
606 method called to post-process the response.
607
608
609.. _base-handler-objects:
610
611BaseHandler Objects
612-------------------
613
614:class:`BaseHandler` objects provide a couple of methods that are directly
615useful, and others that are meant to be used by derived classes. These are
616intended for direct use:
617
618
619.. method:: BaseHandler.add_parent(director)
620
621 Add a director as parent.
622
623
624.. method:: BaseHandler.close()
625
626 Remove any parents.
627
Senthil Kumarana6bac952011-07-04 11:28:30 -0700628The following attribute and methods should only be used by classes derived from
Georg Brandl116aa622007-08-15 14:28:22 +0000629:class:`BaseHandler`.
630
631.. note::
632
633 The convention has been adopted that subclasses defining
634 :meth:`protocol_request` or :meth:`protocol_response` methods are named
635 :class:`\*Processor`; all others are named :class:`\*Handler`.
636
637
638.. attribute:: BaseHandler.parent
639
640 A valid :class:`OpenerDirector`, which can be used to open using a different
641 protocol, or handle errors.
642
643
644.. method:: BaseHandler.default_open(req)
645
646 This method is *not* defined in :class:`BaseHandler`, but subclasses should
647 define it if they want to catch all URLs.
648
649 This method, if implemented, will be called by the parent
650 :class:`OpenerDirector`. It should return a file-like object as described in
651 the return value of the :meth:`open` of :class:`OpenerDirector`, or ``None``.
Serhiy Storchaka5e1c0532013-10-13 20:06:50 +0300652 It should raise :exc:`~urllib.error.URLError`, unless a truly exceptional
653 thing happens (for example, :exc:`MemoryError` should not be mapped to
654 :exc:`URLError`).
Georg Brandl116aa622007-08-15 14:28:22 +0000655
656 This method will be called before any protocol-specific open method.
657
658
659.. method:: BaseHandler.protocol_open(req)
660 :noindex:
661
662 This method is *not* defined in :class:`BaseHandler`, but subclasses should
663 define it if they want to handle URLs with the given protocol.
664
665 This method, if defined, will be called by the parent :class:`OpenerDirector`.
666 Return values should be the same as for :meth:`default_open`.
667
668
669.. method:: BaseHandler.unknown_open(req)
670
671 This method is *not* defined in :class:`BaseHandler`, but subclasses should
672 define it if they want to catch all URLs with no specific registered handler to
673 open it.
674
675 This method, if implemented, will be called by the :attr:`parent`
676 :class:`OpenerDirector`. Return values should be the same as for
677 :meth:`default_open`.
678
679
680.. method:: BaseHandler.http_error_default(req, fp, code, msg, hdrs)
681
682 This method is *not* defined in :class:`BaseHandler`, but subclasses should
683 override it if they intend to provide a catch-all for otherwise unhandled HTTP
684 errors. It will be called automatically by the :class:`OpenerDirector` getting
685 the error, and should not normally be called in other circumstances.
686
687 *req* will be a :class:`Request` object, *fp* will be a file-like object with
688 the HTTP error body, *code* will be the three-digit code of the error, *msg*
689 will be the user-visible explanation of the code and *hdrs* will be a mapping
690 object with the headers of the error.
691
692 Return values and exceptions raised should be the same as those of
693 :func:`urlopen`.
694
695
696.. method:: BaseHandler.http_error_nnn(req, fp, code, msg, hdrs)
697
698 *nnn* should be a three-digit HTTP error code. This method is also not defined
699 in :class:`BaseHandler`, but will be called, if it exists, on an instance of a
700 subclass, when an HTTP error with code *nnn* occurs.
701
702 Subclasses should override this method to handle specific HTTP errors.
703
704 Arguments, return values and exceptions raised should be the same as for
705 :meth:`http_error_default`.
706
707
708.. method:: BaseHandler.protocol_request(req)
709 :noindex:
710
711 This method is *not* defined in :class:`BaseHandler`, but subclasses should
712 define it if they want to pre-process requests of the given protocol.
713
714 This method, if defined, will be called by the parent :class:`OpenerDirector`.
715 *req* will be a :class:`Request` object. The return value should be a
716 :class:`Request` object.
717
718
719.. method:: BaseHandler.protocol_response(req, response)
720 :noindex:
721
722 This method is *not* defined in :class:`BaseHandler`, but subclasses should
723 define it if they want to post-process responses of the given protocol.
724
725 This method, if defined, will be called by the parent :class:`OpenerDirector`.
726 *req* will be a :class:`Request` object. *response* will be an object
727 implementing the same interface as the return value of :func:`urlopen`. The
728 return value should implement the same interface as the return value of
729 :func:`urlopen`.
730
731
732.. _http-redirect-handler:
733
734HTTPRedirectHandler Objects
735---------------------------
736
737.. note::
738
739 Some HTTP redirections require action from this module's client code. If this
Serhiy Storchaka5e1c0532013-10-13 20:06:50 +0300740 is the case, :exc:`~urllib.error.HTTPError` is raised. See :rfc:`2616` for
741 details of the precise meanings of the various redirection codes.
Georg Brandl116aa622007-08-15 14:28:22 +0000742
guido@google.coma119df92011-03-29 11:41:02 -0700743 An :class:`HTTPError` exception raised as a security consideration if the
744 HTTPRedirectHandler is presented with a redirected url which is not an HTTP,
745 HTTPS or FTP url.
746
Georg Brandl116aa622007-08-15 14:28:22 +0000747
Georg Brandl9617a592009-02-13 10:40:43 +0000748.. method:: HTTPRedirectHandler.redirect_request(req, fp, code, msg, hdrs, newurl)
Georg Brandl116aa622007-08-15 14:28:22 +0000749
750 Return a :class:`Request` or ``None`` in response to a redirect. This is called
751 by the default implementations of the :meth:`http_error_30\*` methods when a
752 redirection is received from the server. If a redirection should take place,
753 return a new :class:`Request` to allow :meth:`http_error_30\*` to perform the
Serhiy Storchaka5e1c0532013-10-13 20:06:50 +0300754 redirect to *newurl*. Otherwise, raise :exc:`~urllib.error.HTTPError` if
755 no other handler should try to handle this URL, or return ``None`` if you
756 can't but another handler might.
Georg Brandl116aa622007-08-15 14:28:22 +0000757
758 .. note::
759
760 The default implementation of this method does not strictly follow :rfc:`2616`,
761 which says that 301 and 302 responses to ``POST`` requests must not be
762 automatically redirected without confirmation by the user. In reality, browsers
763 do allow automatic redirection of these responses, changing the POST to a
764 ``GET``, and the default implementation reproduces this behavior.
765
766
767.. method:: HTTPRedirectHandler.http_error_301(req, fp, code, msg, hdrs)
768
Georg Brandl9617a592009-02-13 10:40:43 +0000769 Redirect to the ``Location:`` or ``URI:`` URL. This method is called by the
770 parent :class:`OpenerDirector` when getting an HTTP 'moved permanently' response.
Georg Brandl116aa622007-08-15 14:28:22 +0000771
772
773.. method:: HTTPRedirectHandler.http_error_302(req, fp, code, msg, hdrs)
774
775 The same as :meth:`http_error_301`, but called for the 'found' response.
776
777
778.. method:: HTTPRedirectHandler.http_error_303(req, fp, code, msg, hdrs)
779
780 The same as :meth:`http_error_301`, but called for the 'see other' response.
781
782
783.. method:: HTTPRedirectHandler.http_error_307(req, fp, code, msg, hdrs)
784
785 The same as :meth:`http_error_301`, but called for the 'temporary redirect'
786 response.
787
788
789.. _http-cookie-processor:
790
791HTTPCookieProcessor Objects
792---------------------------
793
Georg Brandl116aa622007-08-15 14:28:22 +0000794:class:`HTTPCookieProcessor` instances have one attribute:
795
Georg Brandl116aa622007-08-15 14:28:22 +0000796.. attribute:: HTTPCookieProcessor.cookiejar
797
Georg Brandl24420152008-05-26 16:32:26 +0000798 The :class:`http.cookiejar.CookieJar` in which cookies are stored.
Georg Brandl116aa622007-08-15 14:28:22 +0000799
800
801.. _proxy-handler:
802
803ProxyHandler Objects
804--------------------
805
806
807.. method:: ProxyHandler.protocol_open(request)
808 :noindex:
809
810 The :class:`ProxyHandler` will have a method :meth:`protocol_open` for every
811 *protocol* which has a proxy in the *proxies* dictionary given in the
812 constructor. The method will modify requests to go through the proxy, by
813 calling ``request.set_proxy()``, and call the next handler in the chain to
814 actually execute the protocol.
815
816
817.. _http-password-mgr:
818
819HTTPPasswordMgr Objects
820-----------------------
821
822These methods are available on :class:`HTTPPasswordMgr` and
823:class:`HTTPPasswordMgrWithDefaultRealm` objects.
824
825
826.. method:: HTTPPasswordMgr.add_password(realm, uri, user, passwd)
827
828 *uri* can be either a single URI, or a sequence of URIs. *realm*, *user* and
829 *passwd* must be strings. This causes ``(user, passwd)`` to be used as
830 authentication tokens when authentication for *realm* and a super-URI of any of
831 the given URIs is given.
832
833
834.. method:: HTTPPasswordMgr.find_user_password(realm, authuri)
835
836 Get user/password for given realm and URI, if any. This method will return
837 ``(None, None)`` if there is no matching user/password.
838
839 For :class:`HTTPPasswordMgrWithDefaultRealm` objects, the realm ``None`` will be
840 searched if the given *realm* has no matching user/password.
841
842
843.. _abstract-basic-auth-handler:
844
845AbstractBasicAuthHandler Objects
846--------------------------------
847
848
849.. method:: AbstractBasicAuthHandler.http_error_auth_reqed(authreq, host, req, headers)
850
851 Handle an authentication request by getting a user/password pair, and re-trying
852 the request. *authreq* should be the name of the header where the information
853 about the realm is included in the request, *host* specifies the URL and path to
854 authenticate for, *req* should be the (failed) :class:`Request` object, and
855 *headers* should be the error headers.
856
857 *host* is either an authority (e.g. ``"python.org"``) or a URL containing an
858 authority component (e.g. ``"http://python.org/"``). In either case, the
859 authority must not contain a userinfo component (so, ``"python.org"`` and
860 ``"python.org:80"`` are fine, ``"joe:password@python.org"`` is not).
861
862
863.. _http-basic-auth-handler:
864
865HTTPBasicAuthHandler Objects
866----------------------------
867
868
869.. method:: HTTPBasicAuthHandler.http_error_401(req, fp, code, msg, hdrs)
870
871 Retry the request with authentication information, if available.
872
873
874.. _proxy-basic-auth-handler:
875
876ProxyBasicAuthHandler Objects
877-----------------------------
878
879
880.. method:: ProxyBasicAuthHandler.http_error_407(req, fp, code, msg, hdrs)
881
882 Retry the request with authentication information, if available.
883
884
885.. _abstract-digest-auth-handler:
886
887AbstractDigestAuthHandler Objects
888---------------------------------
889
890
891.. method:: AbstractDigestAuthHandler.http_error_auth_reqed(authreq, host, req, headers)
892
893 *authreq* should be the name of the header where the information about the realm
894 is included in the request, *host* should be the host to authenticate to, *req*
895 should be the (failed) :class:`Request` object, and *headers* should be the
896 error headers.
897
898
899.. _http-digest-auth-handler:
900
901HTTPDigestAuthHandler Objects
902-----------------------------
903
904
905.. method:: HTTPDigestAuthHandler.http_error_401(req, fp, code, msg, hdrs)
906
907 Retry the request with authentication information, if available.
908
909
910.. _proxy-digest-auth-handler:
911
912ProxyDigestAuthHandler Objects
913------------------------------
914
915
916.. method:: ProxyDigestAuthHandler.http_error_407(req, fp, code, msg, hdrs)
917
918 Retry the request with authentication information, if available.
919
920
921.. _http-handler-objects:
922
923HTTPHandler Objects
924-------------------
925
926
927.. method:: HTTPHandler.http_open(req)
928
929 Send an HTTP request, which can be either GET or POST, depending on
930 ``req.has_data()``.
931
932
933.. _https-handler-objects:
934
935HTTPSHandler Objects
936--------------------
937
938
939.. method:: HTTPSHandler.https_open(req)
940
941 Send an HTTPS request, which can be either GET or POST, depending on
942 ``req.has_data()``.
943
944
945.. _file-handler-objects:
946
947FileHandler Objects
948-------------------
949
950
951.. method:: FileHandler.file_open(req)
952
953 Open the file locally, if there is no host name, or the host name is
Senthil Kumaran383c32d2010-10-14 11:57:35 +0000954 ``'localhost'``.
955
Georg Brandl61063cc2012-06-24 22:48:30 +0200956 .. versionchanged:: 3.2
957 This method is applicable only for local hostnames. When a remote
Serhiy Storchaka5e1c0532013-10-13 20:06:50 +0300958 hostname is given, an :exc:`~urllib.error.URLError` is raised.
Georg Brandl116aa622007-08-15 14:28:22 +0000959
960
Antoine Pitroudf204be2012-11-24 17:59:08 +0100961.. _data-handler-objects:
962
963DataHandler Objects
964-------------------
965
966.. method:: DataHandler.data_open(req)
967
968 Read a data URL. This kind of URL contains the content encoded in the URL
969 itself. The data URL syntax is specified in :rfc:`2397`. This implementation
970 ignores white spaces in base64 encoded data URLs so the URL may be wrapped
971 in whatever source file it comes from. But even though some browsers don't
972 mind about a missing padding at the end of a base64 encoded data URL, this
973 implementation will raise an :exc:`ValueError` in that case.
974
975
Georg Brandl116aa622007-08-15 14:28:22 +0000976.. _ftp-handler-objects:
977
978FTPHandler Objects
979------------------
980
981
982.. method:: FTPHandler.ftp_open(req)
983
984 Open the FTP file indicated by *req*. The login is always done with empty
985 username and password.
986
987
988.. _cacheftp-handler-objects:
989
990CacheFTPHandler Objects
991-----------------------
992
993:class:`CacheFTPHandler` objects are :class:`FTPHandler` objects with the
994following additional methods:
995
996
997.. method:: CacheFTPHandler.setTimeout(t)
998
999 Set timeout of connections to *t* seconds.
1000
1001
1002.. method:: CacheFTPHandler.setMaxConns(m)
1003
1004 Set maximum number of cached connections to *m*.
1005
1006
1007.. _unknown-handler-objects:
1008
1009UnknownHandler Objects
1010----------------------
1011
1012
1013.. method:: UnknownHandler.unknown_open()
1014
Serhiy Storchaka5e1c0532013-10-13 20:06:50 +03001015 Raise a :exc:`~urllib.error.URLError` exception.
Georg Brandl116aa622007-08-15 14:28:22 +00001016
1017
1018.. _http-error-processor-objects:
1019
1020HTTPErrorProcessor Objects
1021--------------------------
1022
Senthil Kumaran0215d092011-07-18 07:12:40 +08001023.. method:: HTTPErrorProcessor.http_response()
Georg Brandl116aa622007-08-15 14:28:22 +00001024
1025 Process HTTP error responses.
1026
1027 For 200 error codes, the response object is returned immediately.
1028
1029 For non-200 error codes, this simply passes the job on to the
1030 :meth:`protocol_error_code` handler methods, via :meth:`OpenerDirector.error`.
Georg Brandl0f7ede42008-06-23 11:23:31 +00001031 Eventually, :class:`HTTPDefaultErrorHandler` will raise an
Serhiy Storchaka5e1c0532013-10-13 20:06:50 +03001032 :exc:`~urllib.error.HTTPError` if no other handler handles the error.
Georg Brandl116aa622007-08-15 14:28:22 +00001033
Georg Brandl0f7ede42008-06-23 11:23:31 +00001034
Senthil Kumaran0215d092011-07-18 07:12:40 +08001035.. method:: HTTPErrorProcessor.https_response()
1036
Senthil Kumaran3e7f33f2011-07-18 07:17:20 +08001037 Process HTTPS error responses.
1038
Senthil Kumaran0215d092011-07-18 07:12:40 +08001039 The behavior is same as :meth:`http_response`.
1040
1041
Georg Brandl0f7ede42008-06-23 11:23:31 +00001042.. _urllib-request-examples:
Georg Brandl116aa622007-08-15 14:28:22 +00001043
1044Examples
1045--------
1046
Senthil Kumaran0c2d8b82010-04-22 10:53:30 +00001047This example gets the python.org main page and displays the first 300 bytes of
Georg Brandlbdc55ab2010-04-20 18:15:54 +00001048it. ::
Georg Brandl116aa622007-08-15 14:28:22 +00001049
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001050 >>> import urllib.request
Berker Peksag9575e182015-04-12 13:52:49 +03001051 >>> with urllib.request.urlopen('http://www.python.org/') as f:
1052 ... print(f.read(300))
1053 ...
Senthil Kumaran0c2d8b82010-04-22 10:53:30 +00001054 b'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
1055 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">\n\n\n<html
1056 xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">\n\n<head>\n
1057 <meta http-equiv="content-type" content="text/html; charset=utf-8" />\n
1058 <title>Python Programming '
Senthil Kumaranb213ee32010-04-15 17:18:22 +00001059
Senthil Kumaran0c2d8b82010-04-22 10:53:30 +00001060Note that urlopen returns a bytes object. This is because there is no way
1061for urlopen to automatically determine the encoding of the byte stream
1062it receives from the http server. In general, a program will decode
1063the returned bytes object to string once it determines or guesses
1064the appropriate encoding.
Senthil Kumaranb213ee32010-04-15 17:18:22 +00001065
Serhiy Storchakaa4d170d2013-12-23 18:20:51 +02001066The following W3C document, http://www.w3.org/International/O-charset\ , lists
Senthil Kumaran0c2d8b82010-04-22 10:53:30 +00001067the various ways in which a (X)HTML or a XML document could have specified its
1068encoding information.
1069
Senthil Kumaran21c71ba2012-03-13 19:47:51 -07001070As the python.org website uses *utf-8* encoding as specified in it's meta tag, we
1071will use the same for decoding the bytes object. ::
1072
1073 >>> with urllib.request.urlopen('http://www.python.org/') as f:
1074 ... print(f.read(100).decode('utf-8'))
1075 ...
1076 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
1077 "http://www.w3.org/TR/xhtml1/DTD/xhtm
1078
1079It is also possible to achieve the same result without using the
1080:term:`context manager` approach. ::
Senthil Kumaranb213ee32010-04-15 17:18:22 +00001081
1082 >>> import urllib.request
1083 >>> f = urllib.request.urlopen('http://www.python.org/')
Georg Brandlfe4fd832010-05-21 21:01:32 +00001084 >>> print(f.read(100).decode('utf-8'))
Senthil Kumaran0c2d8b82010-04-22 10:53:30 +00001085 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
1086 "http://www.w3.org/TR/xhtml1/DTD/xhtm
1087
Senthil Kumaranb213ee32010-04-15 17:18:22 +00001088In the following example, we are sending a data-stream to the stdin of a CGI
1089and reading the data it returns to us. Note that this example will only work
1090when the Python installation supports SSL. ::
Georg Brandl116aa622007-08-15 14:28:22 +00001091
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001092 >>> import urllib.request
1093 >>> req = urllib.request.Request(url='https://localhost/cgi-bin/test.cgi',
Senthil Kumaran29333122011-02-11 11:25:47 +00001094 ... data=b'This data is passed to stdin of the CGI')
Berker Peksag9575e182015-04-12 13:52:49 +03001095 >>> with urllib.request.urlopen(req) as f:
1096 ... print(f.read().decode('utf-8'))
1097 ...
Georg Brandl116aa622007-08-15 14:28:22 +00001098 Got Data: "This data is passed to stdin of the CGI"
1099
1100The code for the sample CGI used in the above example is::
1101
1102 #!/usr/bin/env python
1103 import sys
1104 data = sys.stdin.read()
Collin Winterc79461b2007-09-01 23:34:30 +00001105 print('Content-type: text-plain\n\nGot Data: "%s"' % data)
Georg Brandl116aa622007-08-15 14:28:22 +00001106
Senthil Kumarane66cc812013-03-13 13:42:47 -07001107Here is an example of doing a ``PUT`` request using :class:`Request`::
1108
1109 import urllib.request
1110 DATA=b'some data'
1111 req = urllib.request.Request(url='http://localhost:8080', data=DATA,method='PUT')
Berker Peksag9575e182015-04-12 13:52:49 +03001112 with urllib.request.urlopen(req) as f:
1113 pass
Senthil Kumarane66cc812013-03-13 13:42:47 -07001114 print(f.status)
1115 print(f.reason)
1116
Georg Brandl116aa622007-08-15 14:28:22 +00001117Use of Basic HTTP Authentication::
1118
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001119 import urllib.request
Georg Brandl116aa622007-08-15 14:28:22 +00001120 # Create an OpenerDirector with support for Basic HTTP Authentication...
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001121 auth_handler = urllib.request.HTTPBasicAuthHandler()
Georg Brandl116aa622007-08-15 14:28:22 +00001122 auth_handler.add_password(realm='PDQ Application',
1123 uri='https://mahler:8092/site-updates.py',
1124 user='klem',
1125 passwd='kadidd!ehopper')
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001126 opener = urllib.request.build_opener(auth_handler)
Georg Brandl116aa622007-08-15 14:28:22 +00001127 # ...and install it globally so it can be used with urlopen.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001128 urllib.request.install_opener(opener)
1129 urllib.request.urlopen('http://www.example.com/login.html')
Georg Brandl116aa622007-08-15 14:28:22 +00001130
1131:func:`build_opener` provides many handlers by default, including a
1132:class:`ProxyHandler`. By default, :class:`ProxyHandler` uses the environment
1133variables named ``<scheme>_proxy``, where ``<scheme>`` is the URL scheme
1134involved. For example, the :envvar:`http_proxy` environment variable is read to
1135obtain the HTTP proxy's URL.
1136
1137This example replaces the default :class:`ProxyHandler` with one that uses
Georg Brandl2ee470f2008-07-16 12:55:28 +00001138programmatically-supplied proxy URLs, and adds proxy authorization support with
Georg Brandl116aa622007-08-15 14:28:22 +00001139:class:`ProxyBasicAuthHandler`. ::
1140
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001141 proxy_handler = urllib.request.ProxyHandler({'http': 'http://www.example.com:3128/'})
Senthil Kumaran037f8362009-12-24 02:24:37 +00001142 proxy_auth_handler = urllib.request.ProxyBasicAuthHandler()
Georg Brandl116aa622007-08-15 14:28:22 +00001143 proxy_auth_handler.add_password('realm', 'host', 'username', 'password')
1144
Senthil Kumaran037f8362009-12-24 02:24:37 +00001145 opener = urllib.request.build_opener(proxy_handler, proxy_auth_handler)
Georg Brandl116aa622007-08-15 14:28:22 +00001146 # This time, rather than install the OpenerDirector, we use it directly:
1147 opener.open('http://www.example.com/login.html')
1148
1149Adding HTTP headers:
1150
1151Use the *headers* argument to the :class:`Request` constructor, or::
1152
Georg Brandl029986a2008-06-23 11:44:14 +00001153 import urllib.request
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001154 req = urllib.request.Request('http://www.example.com/')
Georg Brandl116aa622007-08-15 14:28:22 +00001155 req.add_header('Referer', 'http://www.python.org/')
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001156 r = urllib.request.urlopen(req)
Georg Brandl116aa622007-08-15 14:28:22 +00001157
1158:class:`OpenerDirector` automatically adds a :mailheader:`User-Agent` header to
1159every :class:`Request`. To change this::
1160
Georg Brandl029986a2008-06-23 11:44:14 +00001161 import urllib.request
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001162 opener = urllib.request.build_opener()
Georg Brandl116aa622007-08-15 14:28:22 +00001163 opener.addheaders = [('User-agent', 'Mozilla/5.0')]
1164 opener.open('http://www.example.com/')
1165
1166Also, remember that a few standard headers (:mailheader:`Content-Length`,
Senthil Kumaran6b3434a2012-03-15 18:11:16 -07001167:mailheader:`Content-Type` without charset parameter and :mailheader:`Host`)
1168are added when the :class:`Request` is passed to :func:`urlopen` (or
1169:meth:`OpenerDirector.open`).
Georg Brandl116aa622007-08-15 14:28:22 +00001170
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001171.. _urllib-examples:
1172
1173Here is an example session that uses the ``GET`` method to retrieve a URL
1174containing parameters::
1175
1176 >>> import urllib.request
1177 >>> import urllib.parse
1178 >>> params = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
Berker Peksag9575e182015-04-12 13:52:49 +03001179 >>> url = "http://www.musi-cal.com/cgi-bin/query?%s" % params
1180 >>> with urllib.request.urlopen(url) as f:
1181 ... print(f.read().decode('utf-8'))
1182 ...
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001183
Senthil Kumaran29333122011-02-11 11:25:47 +00001184The following example uses the ``POST`` method instead. Note that params output
1185from urlencode is encoded to bytes before it is sent to urlopen as data::
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001186
1187 >>> import urllib.request
1188 >>> import urllib.parse
Senthil Kumaran6b3434a2012-03-15 18:11:16 -07001189 >>> data = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
1190 >>> data = data.encode('utf-8')
1191 >>> request = urllib.request.Request("http://requestb.in/xrbl82xr")
1192 >>> # adding charset parameter to the Content-Type header.
1193 >>> request.add_header("Content-Type","application/x-www-form-urlencoded;charset=utf-8")
Berker Peksag9575e182015-04-12 13:52:49 +03001194 >>> with urllib.request.urlopen(request, data) as f:
1195 ... print(f.read().decode('utf-8'))
1196 ...
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001197
1198The following example uses an explicitly specified HTTP proxy, overriding
1199environment settings::
1200
1201 >>> import urllib.request
1202 >>> proxies = {'http': 'http://proxy.example.com:8080/'}
1203 >>> opener = urllib.request.FancyURLopener(proxies)
Berker Peksag9575e182015-04-12 13:52:49 +03001204 >>> with opener.open("http://www.python.org") as f:
1205 ... f.read().decode('utf-8')
1206 ...
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001207
1208The following example uses no proxies at all, overriding environment settings::
1209
1210 >>> import urllib.request
1211 >>> opener = urllib.request.FancyURLopener({})
Berker Peksag9575e182015-04-12 13:52:49 +03001212 >>> with opener.open("http://www.python.org/") as f:
1213 ... f.read().decode('utf-8')
1214 ...
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001215
1216
Antoine Pitroub8eb9cb2010-12-15 19:07:26 +00001217Legacy interface
1218----------------
1219
1220The following functions and classes are ported from the Python 2 module
1221``urllib`` (as opposed to ``urllib2``). They might become deprecated at
1222some point in the future.
1223
Antoine Pitroub8eb9cb2010-12-15 19:07:26 +00001224.. function:: urlretrieve(url, filename=None, reporthook=None, data=None)
1225
Senthil Kumarane24f96a2012-03-13 19:29:33 -07001226 Copy a network object denoted by a URL to a local file. If the URL
1227 points to a local file, the object will not be copied unless filename is supplied.
1228 Return a tuple ``(filename, headers)`` where *filename* is the
Antoine Pitroub8eb9cb2010-12-15 19:07:26 +00001229 local file name under which the object can be found, and *headers* is whatever
1230 the :meth:`info` method of the object returned by :func:`urlopen` returned (for
Senthil Kumarane24f96a2012-03-13 19:29:33 -07001231 a remote object). Exceptions are the same as for :func:`urlopen`.
Antoine Pitroub8eb9cb2010-12-15 19:07:26 +00001232
1233 The second argument, if present, specifies the file location to copy to (if
1234 absent, the location will be a tempfile with a generated name). The third
1235 argument, if present, is a hook function that will be called once on
1236 establishment of the network connection and once after each block read
1237 thereafter. The hook will be passed three arguments; a count of blocks
1238 transferred so far, a block size in bytes, and the total size of the file. The
1239 third argument may be ``-1`` on older FTP servers which do not return a file
1240 size in response to a retrieval request.
1241
Senthil Kumarane24f96a2012-03-13 19:29:33 -07001242 The following example illustrates the most common usage scenario::
1243
1244 >>> import urllib.request
1245 >>> local_filename, headers = urllib.request.urlretrieve('http://python.org/')
1246 >>> html = open(local_filename)
1247 >>> html.close()
1248
Antoine Pitroub8eb9cb2010-12-15 19:07:26 +00001249 If the *url* uses the :file:`http:` scheme identifier, the optional *data*
Senthil Kumarane24f96a2012-03-13 19:29:33 -07001250 argument may be given to specify a ``POST`` request (normally the request
Senthil Kumaran87684e62012-03-14 18:08:13 -07001251 type is ``GET``). The *data* argument must be a bytes object in standard
Senthil Kumarane24f96a2012-03-13 19:29:33 -07001252 :mimetype:`application/x-www-form-urlencoded` format; see the
Serhiy Storchaka5e1c0532013-10-13 20:06:50 +03001253 :func:`urllib.parse.urlencode` function.
Antoine Pitroub8eb9cb2010-12-15 19:07:26 +00001254
1255 :func:`urlretrieve` will raise :exc:`ContentTooShortError` when it detects that
1256 the amount of data available was less than the expected amount (which is the
1257 size reported by a *Content-Length* header). This can occur, for example, when
1258 the download is interrupted.
1259
1260 The *Content-Length* is treated as a lower bound: if there's more data to read,
Senthil Kumarane24f96a2012-03-13 19:29:33 -07001261 urlretrieve reads more data, but if less data is available, it raises the
1262 exception.
Antoine Pitroub8eb9cb2010-12-15 19:07:26 +00001263
1264 You can still retrieve the downloaded data in this case, it is stored in the
1265 :attr:`content` attribute of the exception instance.
1266
Senthil Kumarane24f96a2012-03-13 19:29:33 -07001267 If no *Content-Length* header was supplied, urlretrieve can not check the size
1268 of the data it has downloaded, and just returns it. In this case you just have
1269 to assume that the download was successful.
Antoine Pitroub8eb9cb2010-12-15 19:07:26 +00001270
1271.. function:: urlcleanup()
1272
Senthil Kumarane24f96a2012-03-13 19:29:33 -07001273 Cleans up temporary files that may have been left behind by previous
1274 calls to :func:`urlretrieve`.
Antoine Pitroub8eb9cb2010-12-15 19:07:26 +00001275
1276.. class:: URLopener(proxies=None, **x509)
1277
Senthil Kumaran6227c692013-03-18 17:09:50 -07001278 .. deprecated:: 3.3
1279
Antoine Pitroub8eb9cb2010-12-15 19:07:26 +00001280 Base class for opening and reading URLs. Unless you need to support opening
1281 objects using schemes other than :file:`http:`, :file:`ftp:`, or :file:`file:`,
1282 you probably want to use :class:`FancyURLopener`.
1283
1284 By default, the :class:`URLopener` class sends a :mailheader:`User-Agent` header
1285 of ``urllib/VVV``, where *VVV* is the :mod:`urllib` version number.
1286 Applications can define their own :mailheader:`User-Agent` header by subclassing
1287 :class:`URLopener` or :class:`FancyURLopener` and setting the class attribute
1288 :attr:`version` to an appropriate string value in the subclass definition.
1289
1290 The optional *proxies* parameter should be a dictionary mapping scheme names to
1291 proxy URLs, where an empty dictionary turns proxies off completely. Its default
1292 value is ``None``, in which case environmental proxy settings will be used if
1293 present, as discussed in the definition of :func:`urlopen`, above.
1294
1295 Additional keyword parameters, collected in *x509*, may be used for
1296 authentication of the client when using the :file:`https:` scheme. The keywords
1297 *key_file* and *cert_file* are supported to provide an SSL key and certificate;
1298 both are needed to support client authentication.
1299
Antoine Pitrou4272d6a2011-10-12 19:10:10 +02001300 :class:`URLopener` objects will raise an :exc:`OSError` exception if the server
Antoine Pitroub8eb9cb2010-12-15 19:07:26 +00001301 returns an error code.
1302
1303 .. method:: open(fullurl, data=None)
1304
1305 Open *fullurl* using the appropriate protocol. This method sets up cache and
1306 proxy information, then calls the appropriate open method with its input
1307 arguments. If the scheme is not recognized, :meth:`open_unknown` is called.
1308 The *data* argument has the same meaning as the *data* argument of
1309 :func:`urlopen`.
1310
1311
1312 .. method:: open_unknown(fullurl, data=None)
1313
1314 Overridable interface to open unknown URL types.
1315
1316
1317 .. method:: retrieve(url, filename=None, reporthook=None, data=None)
1318
1319 Retrieves the contents of *url* and places it in *filename*. The return value
1320 is a tuple consisting of a local filename and either a
1321 :class:`email.message.Message` object containing the response headers (for remote
1322 URLs) or ``None`` (for local URLs). The caller must then open and read the
1323 contents of *filename*. If *filename* is not given and the URL refers to a
1324 local file, the input filename is returned. If the URL is non-local and
1325 *filename* is not given, the filename is the output of :func:`tempfile.mktemp`
1326 with a suffix that matches the suffix of the last path component of the input
1327 URL. If *reporthook* is given, it must be a function accepting three numeric
Gregory P. Smith6b0bdab2012-11-10 13:43:44 -08001328 parameters: A chunk number, the maximum size chunks are read in and the total size of the download
1329 (-1 if unknown). It will be called once at the start and after each chunk of data is read from the
Antoine Pitroub8eb9cb2010-12-15 19:07:26 +00001330 network. *reporthook* is ignored for local URLs.
1331
1332 If the *url* uses the :file:`http:` scheme identifier, the optional *data*
1333 argument may be given to specify a ``POST`` request (normally the request type
1334 is ``GET``). The *data* argument must in standard
Serhiy Storchaka5e1c0532013-10-13 20:06:50 +03001335 :mimetype:`application/x-www-form-urlencoded` format; see the
1336 :func:`urllib.parse.urlencode` function.
Antoine Pitroub8eb9cb2010-12-15 19:07:26 +00001337
1338
1339 .. attribute:: version
1340
1341 Variable that specifies the user agent of the opener object. To get
1342 :mod:`urllib` to tell servers that it is a particular user agent, set this in a
1343 subclass as a class variable or in the constructor before calling the base
1344 constructor.
1345
1346
1347.. class:: FancyURLopener(...)
1348
Senthil Kumaran6227c692013-03-18 17:09:50 -07001349 .. deprecated:: 3.3
1350
Antoine Pitroub8eb9cb2010-12-15 19:07:26 +00001351 :class:`FancyURLopener` subclasses :class:`URLopener` providing default handling
1352 for the following HTTP response codes: 301, 302, 303, 307 and 401. For the 30x
1353 response codes listed above, the :mailheader:`Location` header is used to fetch
1354 the actual URL. For 401 response codes (authentication required), basic HTTP
1355 authentication is performed. For the 30x response codes, recursion is bounded
1356 by the value of the *maxtries* attribute, which defaults to 10.
1357
1358 For all other response codes, the method :meth:`http_error_default` is called
1359 which you can override in subclasses to handle the error appropriately.
1360
1361 .. note::
1362
1363 According to the letter of :rfc:`2616`, 301 and 302 responses to POST requests
1364 must not be automatically redirected without confirmation by the user. In
1365 reality, browsers do allow automatic redirection of these responses, changing
1366 the POST to a GET, and :mod:`urllib` reproduces this behaviour.
1367
1368 The parameters to the constructor are the same as those for :class:`URLopener`.
1369
1370 .. note::
1371
1372 When performing basic authentication, a :class:`FancyURLopener` instance calls
1373 its :meth:`prompt_user_passwd` method. The default implementation asks the
1374 users for the required information on the controlling terminal. A subclass may
1375 override this method to support more appropriate behavior if needed.
1376
1377 The :class:`FancyURLopener` class offers one additional method that should be
1378 overloaded to provide the appropriate behavior:
1379
1380 .. method:: prompt_user_passwd(host, realm)
1381
1382 Return information needed to authenticate the user at the given host in the
1383 specified security realm. The return value should be a tuple, ``(user,
1384 password)``, which can be used for basic authentication.
1385
1386 The implementation prompts for this information on the terminal; an application
1387 should override this method to use an appropriate interaction model in the local
1388 environment.
1389
1390
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001391:mod:`urllib.request` Restrictions
1392----------------------------------
1393
1394 .. index::
1395 pair: HTTP; protocol
1396 pair: FTP; protocol
1397
Florent Xicluna83386da2011-10-28 22:03:55 +02001398* Currently, only the following protocols are supported: HTTP (versions 0.9 and
Antoine Pitroudf204be2012-11-24 17:59:08 +01001399 1.0), FTP, local files, and data URLs.
1400
1401 .. versionchanged:: 3.4 Added support for data URLs.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001402
Florent Xicluna305bb662011-10-28 22:14:41 +02001403* The caching feature of :func:`urlretrieve` has been disabled until someone
1404 finds the time to hack proper processing of Expiration time headers.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001405
1406* There should be a function to query whether a particular URL is in the cache.
1407
1408* For backward compatibility, if a URL appears to point to a local file but the
1409 file can't be opened, the URL is re-interpreted using the FTP protocol. This
1410 can sometimes cause confusing error messages.
1411
1412* The :func:`urlopen` and :func:`urlretrieve` functions can cause arbitrarily
1413 long delays while waiting for a network connection to be set up. This means
1414 that it is difficult to build an interactive Web client using these functions
1415 without using threads.
1416
1417 .. index::
1418 single: HTML
1419 pair: HTTP; protocol
1420
1421* The data returned by :func:`urlopen` or :func:`urlretrieve` is the raw data
1422 returned by the server. This may be binary data (such as an image), plain text
1423 or (for example) HTML. The HTTP protocol provides type information in the reply
1424 header, which can be inspected by looking at the :mailheader:`Content-Type`
1425 header. If the returned data is HTML, you can use the module
1426 :mod:`html.parser` to parse it.
1427
1428 .. index:: single: FTP
1429
1430* The code handling the FTP protocol cannot differentiate between a file and a
1431 directory. This can lead to unexpected behavior when attempting to read a URL
1432 that points to a file that is not accessible. If the URL ends in a ``/``, it is
1433 assumed to refer to a directory and will be handled accordingly. But if an
1434 attempt to read a file leads to a 550 error (meaning the URL cannot be found or
1435 is not accessible, often for permission reasons), then the path is treated as a
1436 directory in order to handle the case when a directory is specified by a URL but
1437 the trailing ``/`` has been left off. This can cause misleading results when
1438 you try to fetch a file whose read permissions make it inaccessible; the FTP
1439 code will try to read it, fail with a 550 error, and then perform a directory
1440 listing for the unreadable file. If fine-grained control is needed, consider
Éric Araujo09eb9802011-03-20 18:30:37 +01001441 using the :mod:`ftplib` module, subclassing :class:`FancyURLopener`, or changing
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001442 *_urlopener* to meet your needs.
1443
Georg Brandl0f7ede42008-06-23 11:23:31 +00001444
1445
Georg Brandl8175dae2010-11-29 14:53:15 +00001446:mod:`urllib.response` --- Response classes used by urllib
1447==========================================================
Georg Brandl0f7ede42008-06-23 11:23:31 +00001448
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001449.. module:: urllib.response
1450 :synopsis: Response classes used by urllib.
1451
1452The :mod:`urllib.response` module defines functions and classes which define a
Georg Brandl0f7ede42008-06-23 11:23:31 +00001453minimal file like interface, including ``read()`` and ``readline()``. The
Ezio Melottib9701422010-11-18 19:48:27 +00001454typical response object is an addinfourl instance, which defines an ``info()``
Georg Brandl0f7ede42008-06-23 11:23:31 +00001455method and that returns headers and a ``geturl()`` method that returns the url.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001456Functions defined by this module are used internally by the
1457:mod:`urllib.request` module.
1458