blob: a53c969ec629a552396c0b662ea62365f3ad3b6b [file] [log] [blame]
Georg Brandl8175dae2010-11-29 14:53:15 +00001:mod:`urllib.request` --- Extensible library for opening URLs
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00002=============================================================
Georg Brandl116aa622007-08-15 14:28:22 +00003
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00004.. module:: urllib.request
Senthil Kumaran6b3434a2012-03-15 18:11:16 -07005 :synopsis: Extensible library for opening URLs.
Terry Jan Reedyfa089b92016-06-11 15:02:54 -04006
Jeremy Hyltone2573162009-03-31 14:38:13 +00007.. moduleauthor:: Jeremy Hylton <jeremy@alum.mit.edu>
Georg Brandl116aa622007-08-15 14:28:22 +00008.. sectionauthor:: Moshe Zadka <moshez@users.sourceforge.net>
Senthil Kumaran6b3434a2012-03-15 18:11:16 -07009.. sectionauthor:: Senthil Kumaran <senthil@uthcode.com>
Georg Brandl116aa622007-08-15 14:28:22 +000010
Terry Jan Reedyfa089b92016-06-11 15:02:54 -040011**Source code:** :source:`Lib/urllib/request.py`
12
13--------------
Georg Brandl116aa622007-08-15 14:28:22 +000014
Georg Brandl0f7ede42008-06-23 11:23:31 +000015The :mod:`urllib.request` module defines functions and classes which help in
16opening URLs (mostly HTTP) in a complex world --- basic and digest
17authentication, redirections, cookies and more.
Georg Brandl116aa622007-08-15 14:28:22 +000018
Benjamin Peterson6de708f2015-04-20 18:18:14 -040019.. seealso::
20
Andrew Kuchling58c534d2016-11-08 22:33:31 -050021 The `Requests package <http://docs.python-requests.org/>`_
Martin Panterfe289c02016-05-28 02:20:39 +000022 is recommended for a higher-level HTTP client interface.
Benjamin Peterson6de708f2015-04-20 18:18:14 -040023
Antoine Pitrou79ecd762010-09-29 11:24:21 +000024
Senthil Kumaranaca8fd72008-06-23 04:41:59 +000025The :mod:`urllib.request` module defines the following functions:
Georg Brandl116aa622007-08-15 14:28:22 +000026
27
Senthil Kumarana5c85b32014-09-19 15:23:30 +080028.. function:: urlopen(url, data=None[, timeout], *, cafile=None, capath=None, cadefault=False, context=None)
Georg Brandl116aa622007-08-15 14:28:22 +000029
Jeremy Hyltone2573162009-03-31 14:38:13 +000030 Open the URL *url*, which can be either a string or a
31 :class:`Request` object.
Georg Brandl116aa622007-08-15 14:28:22 +000032
Martin Panter3c0d0ba2016-08-24 06:33:33 +000033 *data* must be an object specifying additional data to be sent to the
34 server, or ``None`` if no such data is needed. See :class:`Request`
35 for details.
Senthil Kumaran6b3434a2012-03-15 18:11:16 -070036
37 urllib.request module uses HTTP/1.1 and includes ``Connection:close`` header
38 in its HTTP requests.
Georg Brandl116aa622007-08-15 14:28:22 +000039
Jeremy Hyltone2573162009-03-31 14:38:13 +000040 The optional *timeout* parameter specifies a timeout in seconds for
41 blocking operations like the connection attempt (if not specified,
42 the global default timeout setting will be used). This actually
Senthil Kumaranc08d9072010-10-05 18:46:56 +000043 only works for HTTP, HTTPS and FTP connections.
Georg Brandl116aa622007-08-15 14:28:22 +000044
Senthil Kumarana5c85b32014-09-19 15:23:30 +080045 If *context* is specified, it must be a :class:`ssl.SSLContext` instance
Benjamin Petersona5c9c372014-11-02 13:17:56 -050046 describing the various SSL options. See :class:`~http.client.HTTPSConnection`
47 for more details.
Senthil Kumarana5c85b32014-09-19 15:23:30 +080048
Antoine Pitrou803e6d62010-10-13 10:36:15 +000049 The optional *cafile* and *capath* parameters specify a set of trusted
50 CA certificates for HTTPS requests. *cafile* should point to a single
51 file containing a bundle of CA certificates, whereas *capath* should
52 point to a directory of hashed certificate files. More information can
53 be found in :meth:`ssl.SSLContext.load_verify_locations`.
54
Benjamin Peterson378e15d2014-11-23 11:43:33 -060055 The *cadefault* parameter is ignored.
Antoine Pitroude9ac6c2012-05-16 21:40:01 +020056
Martin Panter3ee62702016-06-04 04:57:19 +000057 This function always returns an object which can work as a
Senthil Kumaranb98e96a2013-02-07 00:47:01 -080058 :term:`context manager` and has methods such as
59
60 * :meth:`~urllib.response.addinfourl.geturl` --- return the URL of the resource retrieved,
Jeremy Hyltone2573162009-03-31 14:38:13 +000061 commonly used to determine if a redirect was followed
Georg Brandl116aa622007-08-15 14:28:22 +000062
Senthil Kumaranb98e96a2013-02-07 00:47:01 -080063 * :meth:`~urllib.response.addinfourl.info` --- return the meta-information of the page, such as headers,
Senthil Kumaran13a7eb42010-06-28 17:31:40 +000064 in the form of an :func:`email.message_from_string` instance (see
Sanyam Khurana338cd832018-01-20 05:55:37 +053065 `Quick Reference to HTTP Headers <http://jkorpela.fi/http.html>`_)
Georg Brandl116aa622007-08-15 14:28:22 +000066
Senthil Kumaranb98e96a2013-02-07 00:47:01 -080067 * :meth:`~urllib.response.addinfourl.getcode` -- return the HTTP status code of the response.
68
Martin Panterfe289c02016-05-28 02:20:39 +000069 For HTTP and HTTPS URLs, this function returns a
Martin Panter747d48c2015-11-26 11:01:58 +000070 :class:`http.client.HTTPResponse` object slightly modified. In addition
71 to the three new methods above, the msg attribute contains the
72 same information as the :attr:`~http.client.HTTPResponse.reason`
73 attribute --- the reason phrase returned by server --- instead of
74 the response headers as it is specified in the documentation for
75 :class:`~http.client.HTTPResponse`.
76
Martin Panterfe289c02016-05-28 02:20:39 +000077 For FTP, file, and data URLs and requests explicitly handled by legacy
Martin Panter747d48c2015-11-26 11:01:58 +000078 :class:`URLopener` and :class:`FancyURLopener` classes, this function
79 returns a :class:`urllib.response.addinfourl` object.
80
R David Murray21dcb932016-06-03 13:29:17 -040081 Raises :exc:`~urllib.error.URLError` on protocol errors.
Georg Brandl116aa622007-08-15 14:28:22 +000082
Georg Brandl2dd01042009-02-27 16:46:46 +000083 Note that ``None`` may be returned if no handler handles the request (though
84 the default installed global :class:`OpenerDirector` uses
85 :class:`UnknownHandler` to ensure this never happens).
86
R David Murray5aea37a2013-04-28 11:07:16 -040087 In addition, if proxy settings are detected (for example, when a ``*_proxy``
88 environment variable like :envvar:`http_proxy` is set),
89 :class:`ProxyHandler` is default installed and makes sure the requests are
90 handled through the proxy.
Senthil Kumarana51a1b32009-10-18 01:42:33 +000091
Georg Brandl2dd01042009-02-27 16:46:46 +000092 The legacy ``urllib.urlopen`` function from Python 2.6 and earlier has been
Senthil Kumaran6b3434a2012-03-15 18:11:16 -070093 discontinued; :func:`urllib.request.urlopen` corresponds to the old
94 ``urllib2.urlopen``. Proxy handling, which was done by passing a dictionary
95 parameter to ``urllib.urlopen``, can be obtained by using
96 :class:`ProxyHandler` objects.
Georg Brandl116aa622007-08-15 14:28:22 +000097
Xtreak1b69c092019-05-30 23:28:28 +053098 .. audit-event:: urllib.Request "fullurl data headers method"
Steve Dowerb82e17e2019-05-23 08:45:22 -070099
100 The default opener raises an :func:`auditing event <sys.audit>`
Xtreak1b69c092019-05-30 23:28:28 +0530101 ``urllib.Request`` with arguments ``fullurl``, ``data``, ``headers``,
Steve Dowerb82e17e2019-05-23 08:45:22 -0700102 ``method`` taken from the request object.
103
Antoine Pitrou803e6d62010-10-13 10:36:15 +0000104 .. versionchanged:: 3.2
105 *cafile* and *capath* were added.
106
Antoine Pitroud5323212010-10-22 18:19:07 +0000107 .. versionchanged:: 3.2
108 HTTPS virtual hosts are now supported if possible (that is, if
109 :data:`ssl.HAS_SNI` is true).
110
Senthil Kumaran7bc0d872010-12-19 10:49:52 +0000111 .. versionadded:: 3.2
112 *data* can be an iterable object.
113
Antoine Pitroude9ac6c2012-05-16 21:40:01 +0200114 .. versionchanged:: 3.3
115 *cadefault* was added.
116
Benjamin Peterson4a358de2014-11-03 17:04:01 -0500117 .. versionchanged:: 3.4.3
Senthil Kumarana5c85b32014-09-19 15:23:30 +0800118 *context* was added.
119
Christian Heimesd0486372016-09-10 23:23:33 +0200120 .. deprecated:: 3.6
121
122 *cafile*, *capath* and *cadefault* are deprecated in favor of *context*.
123 Please use :meth:`ssl.SSLContext.load_cert_chain` instead, or let
124 :func:`ssl.create_default_context` select the system's trusted CA
125 certificates for you.
Benjamin Peterson030dbb92014-11-02 13:19:15 -0500126
Steve Dowerb82e17e2019-05-23 08:45:22 -0700127
Georg Brandl116aa622007-08-15 14:28:22 +0000128.. function:: install_opener(opener)
129
130 Install an :class:`OpenerDirector` instance as the default global opener.
Senthil Kumaran6b3434a2012-03-15 18:11:16 -0700131 Installing an opener is only necessary if you want urlopen to use that
132 opener; otherwise, simply call :meth:`OpenerDirector.open` instead of
133 :func:`~urllib.request.urlopen`. The code does not check for a real
134 :class:`OpenerDirector`, and any class with the appropriate interface will
135 work.
Georg Brandl116aa622007-08-15 14:28:22 +0000136
137
138.. function:: build_opener([handler, ...])
139
140 Return an :class:`OpenerDirector` instance, which chains the handlers in the
141 order given. *handler*\s can be either instances of :class:`BaseHandler`, or
142 subclasses of :class:`BaseHandler` (in which case it must be possible to call
143 the constructor without any parameters). Instances of the following classes
144 will be in front of the *handler*\s, unless the *handler*\s contain them,
R David Murray5aea37a2013-04-28 11:07:16 -0400145 instances of them or subclasses of them: :class:`ProxyHandler` (if proxy
R David Murray9330a942013-04-28 11:24:35 -0400146 settings are detected), :class:`UnknownHandler`, :class:`HTTPHandler`,
147 :class:`HTTPDefaultErrorHandler`, :class:`HTTPRedirectHandler`,
148 :class:`FTPHandler`, :class:`FileHandler`, :class:`HTTPErrorProcessor`.
Georg Brandl116aa622007-08-15 14:28:22 +0000149
Georg Brandl7f01a132009-09-16 15:58:14 +0000150 If the Python installation has SSL support (i.e., if the :mod:`ssl` module
151 can be imported), :class:`HTTPSHandler` will also be added.
Georg Brandl116aa622007-08-15 14:28:22 +0000152
Georg Brandle6bcc912008-05-12 18:05:20 +0000153 A :class:`BaseHandler` subclass may also change its :attr:`handler_order`
Senthil Kumarana6bac952011-07-04 11:28:30 -0700154 attribute to modify its position in the handlers list.
Georg Brandl116aa622007-08-15 14:28:22 +0000155
Georg Brandl7f01a132009-09-16 15:58:14 +0000156
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000157.. function:: pathname2url(path)
Christian Heimes292d3512008-02-03 16:51:08 +0000158
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000159 Convert the pathname *path* from the local syntax for a path to the form used in
160 the path component of a URL. This does not produce a complete URL. The return
Serhiy Storchaka5e1c0532013-10-13 20:06:50 +0300161 value will already be quoted using the :func:`~urllib.parse.quote` function.
Christian Heimes292d3512008-02-03 16:51:08 +0000162
163
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000164.. function:: url2pathname(path)
165
Senthil Kumaranf0769e82010-08-09 19:53:52 +0000166 Convert the path component *path* from a percent-encoded URL to the local syntax for a
Serhiy Storchaka5e1c0532013-10-13 20:06:50 +0300167 path. This does not accept a complete URL. This function uses
168 :func:`~urllib.parse.unquote` to decode *path*.
Georg Brandl116aa622007-08-15 14:28:22 +0000169
Senthil Kumaran7e557a62010-02-26 00:53:23 +0000170.. function:: getproxies()
171
172 This helper function returns a dictionary of scheme to proxy server URL
Senthil Kumaran19d85c22012-01-11 01:29:08 +0800173 mappings. It scans the environment for variables named ``<scheme>_proxy``,
174 in a case insensitive approach, for all operating systems first, and when it
175 cannot find it, looks for proxy information from Mac OSX System
176 Configuration for Mac OS X and Windows Systems Registry for Windows.
Senthil Kumarana7c0ff22016-04-25 08:16:23 -0700177 If both lowercase and uppercase environment variables exist (and disagree),
178 lowercase is preferred.
Senthil Kumaran7e557a62010-02-26 00:53:23 +0000179
Martin Panteref107ee2017-01-24 00:26:56 +0000180 .. note::
Senthil Kumaran4cbb23f2016-07-30 23:24:16 -0700181
Martin Panteref107ee2017-01-24 00:26:56 +0000182 If the environment variable ``REQUEST_METHOD`` is set, which usually
183 indicates your script is running in a CGI environment, the environment
184 variable ``HTTP_PROXY`` (uppercase ``_PROXY``) will be ignored. This is
185 because that variable can be injected by a client using the "Proxy:" HTTP
186 header. If you need to use an HTTP proxy in a CGI environment, either use
187 ``ProxyHandler`` explicitly, or make sure the variable name is in
188 lowercase (or at least the ``_proxy`` suffix).
Senthil Kumaran17742f22016-07-30 23:39:06 -0700189
Georg Brandl7f01a132009-09-16 15:58:14 +0000190
Georg Brandl116aa622007-08-15 14:28:22 +0000191The following classes are provided:
192
Senthil Kumarande49d642011-10-16 23:54:44 +0800193.. class:: Request(url, data=None, headers={}, origin_req_host=None, unverifiable=False, method=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000194
195 This class is an abstraction of a URL request.
196
197 *url* should be a string containing a valid URL.
198
Martin Panter3c0d0ba2016-08-24 06:33:33 +0000199 *data* must be an object specifying additional data to send to the
200 server, or ``None`` if no such data is needed. Currently HTTP
201 requests are the only ones that use *data*. The supported object
Julien Palard9e30fba2019-03-29 03:15:34 +0100202 types include bytes, file-like objects, and iterables of bytes-like objects.
203 If no ``Content-Length`` nor ``Transfer-Encoding`` header field
Martin Panteref91bb22016-08-27 01:39:26 +0000204 has been provided, :class:`HTTPHandler` will set these headers according
205 to the type of *data*. ``Content-Length`` will be used to send
206 bytes objects, while ``Transfer-Encoding: chunked`` as specified in
207 :rfc:`7230`, Section 3.3.1 will be used to send files and other iterables.
Martin Panter3c0d0ba2016-08-24 06:33:33 +0000208
209 For an HTTP POST request method, *data* should be a buffer in the
210 standard :mimetype:`application/x-www-form-urlencoded` format. The
211 :func:`urllib.parse.urlencode` function takes a mapping or sequence
212 of 2-tuples and returns an ASCII string in this format. It should
213 be encoded to bytes before being used as the *data* parameter.
Georg Brandl116aa622007-08-15 14:28:22 +0000214
Jeremy Hyltone2573162009-03-31 14:38:13 +0000215 *headers* should be a dictionary, and will be treated as if
216 :meth:`add_header` was called with each key and value as arguments.
Martin Panter7aaaded82016-06-01 08:10:50 +0000217 This is often used to "spoof" the ``User-Agent`` header value, which is
Jeremy Hyltone2573162009-03-31 14:38:13 +0000218 used by a browser to identify itself -- some HTTP servers only
219 allow requests coming from common browsers as opposed to scripts.
220 For example, Mozilla Firefox may identify itself as ``"Mozilla/5.0
221 (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11"``, while
222 :mod:`urllib`'s default user agent string is
223 ``"Python-urllib/2.6"`` (on Python 2.6).
Georg Brandl116aa622007-08-15 14:28:22 +0000224
Martin Panter3c0d0ba2016-08-24 06:33:33 +0000225 An appropriate ``Content-Type`` header should be included if the *data*
226 argument is present. If this header has not been provided and *data*
227 is not None, ``Content-Type: application/x-www-form-urlencoded`` will
228 be added as a default.
Senthil Kumaran6b3434a2012-03-15 18:11:16 -0700229
Jeremy Hyltone2573162009-03-31 14:38:13 +0000230 The final two arguments are only of interest for correct handling
231 of third-party HTTP cookies:
Georg Brandl116aa622007-08-15 14:28:22 +0000232
Jeremy Hyltone2573162009-03-31 14:38:13 +0000233 *origin_req_host* should be the request-host of the origin
234 transaction, as defined by :rfc:`2965`. It defaults to
235 ``http.cookiejar.request_host(self)``. This is the host name or IP
236 address of the original request that was initiated by the user.
237 For example, if the request is for an image in an HTML document,
238 this should be the request-host of the request for the page
Georg Brandl24420152008-05-26 16:32:26 +0000239 containing the image.
Georg Brandl116aa622007-08-15 14:28:22 +0000240
Jeremy Hyltone2573162009-03-31 14:38:13 +0000241 *unverifiable* should indicate whether the request is unverifiable,
Serhiy Storchaka0a36ac12018-05-31 07:39:00 +0300242 as defined by :rfc:`2965`. It defaults to ``False``. An unverifiable
Jeremy Hyltone2573162009-03-31 14:38:13 +0000243 request is one whose URL the user did not have the option to
244 approve. For example, if the request is for an image in an HTML
245 document, and the user had no option to approve the automatic
246 fetching of the image, this should be true.
Georg Brandl116aa622007-08-15 14:28:22 +0000247
Senthil Kumarande49d642011-10-16 23:54:44 +0800248 *method* should be a string that indicates the HTTP request method that
Larry Hastings3732ed22014-03-15 21:13:56 -0700249 will be used (e.g. ``'HEAD'``). If provided, its value is stored in the
Senthil Kumarana41c9422011-10-20 02:37:08 +0800250 :attr:`~Request.method` attribute and is used by :meth:`get_method()`.
Martin Panter3c0d0ba2016-08-24 06:33:33 +0000251 The default is ``'GET'`` if *data* is ``None`` or ``'POST'`` otherwise.
252 Subclasses may indicate a different default method by setting the
Jason R. Coombs0c47f342013-09-22 09:33:45 -0400253 :attr:`~Request.method` attribute in the class itself.
Senthil Kumarande49d642011-10-16 23:54:44 +0800254
Martin Panter3c0d0ba2016-08-24 06:33:33 +0000255 .. note::
256 The request will not work as expected if the data object is unable
257 to deliver its content more than once (e.g. a file or an iterable
258 that can produce the content only once) and the request is retried
259 for HTTP redirects or authentication. The *data* is sent to the
260 HTTP server right away after the headers. There is no support for
261 a 100-continue expectation in the library.
262
Senthil Kumarande49d642011-10-16 23:54:44 +0800263 .. versionchanged:: 3.3
Georg Brandl61063cc2012-06-24 22:48:30 +0200264 :attr:`Request.method` argument is added to the Request class.
Senthil Kumarande49d642011-10-16 23:54:44 +0800265
Jason R. Coombs0c47f342013-09-22 09:33:45 -0400266 .. versionchanged:: 3.4
267 Default :attr:`Request.method` may be indicated at the class level.
268
Martin Panter3c0d0ba2016-08-24 06:33:33 +0000269 .. versionchanged:: 3.6
270 Do not raise an error if the ``Content-Length`` has not been
Martin Panteref91bb22016-08-27 01:39:26 +0000271 provided and *data* is neither ``None`` nor a bytes object.
272 Fall back to use chunked transfer encoding instead.
Georg Brandl7f01a132009-09-16 15:58:14 +0000273
Georg Brandl116aa622007-08-15 14:28:22 +0000274.. class:: OpenerDirector()
275
276 The :class:`OpenerDirector` class opens URLs via :class:`BaseHandler`\ s chained
277 together. It manages the chaining of handlers, and recovery from errors.
278
279
280.. class:: BaseHandler()
281
282 This is the base class for all registered handlers --- and handles only the
283 simple mechanics of registration.
284
285
286.. class:: HTTPDefaultErrorHandler()
287
288 A class which defines a default handler for HTTP error responses; all responses
Serhiy Storchaka5e1c0532013-10-13 20:06:50 +0300289 are turned into :exc:`~urllib.error.HTTPError` exceptions.
Georg Brandl116aa622007-08-15 14:28:22 +0000290
291
292.. class:: HTTPRedirectHandler()
293
294 A class to handle redirections.
295
296
Georg Brandl7f01a132009-09-16 15:58:14 +0000297.. class:: HTTPCookieProcessor(cookiejar=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000298
299 A class to handle HTTP Cookies.
300
301
Georg Brandl7f01a132009-09-16 15:58:14 +0000302.. class:: ProxyHandler(proxies=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000303
304 Cause requests to go through a proxy. If *proxies* is given, it must be a
R David Murray9330a942013-04-28 11:24:35 -0400305 dictionary mapping protocol names to URLs of proxies. The default is to read
306 the list of proxies from the environment variables
Serhiy Storchakaddb961d2018-10-26 09:00:49 +0300307 ``<protocol>_proxy``. If no proxy environment variables are set, then
R David Murray9330a942013-04-28 11:24:35 -0400308 in a Windows environment proxy settings are obtained from the registry's
309 Internet Settings section, and in a Mac OS X environment proxy information
310 is retrieved from the OS X System Configuration Framework.
Senthil Kumarana51a1b32009-10-18 01:42:33 +0000311
Christian Heimese25f35e2008-03-20 10:49:03 +0000312 To disable autodetected proxy pass an empty dictionary.
Georg Brandl116aa622007-08-15 14:28:22 +0000313
Senthil Kumaran21ce7172016-04-22 20:34:42 -0700314 The :envvar:`no_proxy` environment variable can be used to specify hosts
315 which shouldn't be reached via proxy; if set, it should be a comma-separated
316 list of hostname suffixes, optionally with ``:port`` appended, for example
317 ``cern.ch,ncsa.uiuc.edu,some.host:8080``.
318
Senthil Kumaran17742f22016-07-30 23:39:06 -0700319 .. note::
320
321 ``HTTP_PROXY`` will be ignored if a variable ``REQUEST_METHOD`` is set;
322 see the documentation on :func:`~urllib.request.getproxies`.
Senthil Kumaran4cbb23f2016-07-30 23:24:16 -0700323
Georg Brandl116aa622007-08-15 14:28:22 +0000324
325.. class:: HTTPPasswordMgr()
326
327 Keep a database of ``(realm, uri) -> (user, password)`` mappings.
328
329
330.. class:: HTTPPasswordMgrWithDefaultRealm()
331
332 Keep a database of ``(realm, uri) -> (user, password)`` mappings. A realm of
333 ``None`` is considered a catch-all realm, which is searched if no other realm
334 fits.
335
336
R David Murray4c7f9952015-04-16 16:36:18 -0400337.. class:: HTTPPasswordMgrWithPriorAuth()
338
339 A variant of :class:`HTTPPasswordMgrWithDefaultRealm` that also has a
340 database of ``uri -> is_authenticated`` mappings. Can be used by a
341 BasicAuth handler to determine when to send authentication credentials
342 immediately instead of waiting for a ``401`` response first.
343
344 .. versionadded:: 3.5
345
346
Georg Brandl7f01a132009-09-16 15:58:14 +0000347.. class:: AbstractBasicAuthHandler(password_mgr=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000348
349 This is a mixin class that helps with HTTP authentication, both to the remote
350 host and to a proxy. *password_mgr*, if given, should be something that is
351 compatible with :class:`HTTPPasswordMgr`; refer to section
352 :ref:`http-password-mgr` for information on the interface that must be
R David Murray4c7f9952015-04-16 16:36:18 -0400353 supported. If *passwd_mgr* also provides ``is_authenticated`` and
354 ``update_authenticated`` methods (see
355 :ref:`http-password-mgr-with-prior-auth`), then the handler will use the
356 ``is_authenticated`` result for a given URI to determine whether or not to
357 send authentication credentials with the request. If ``is_authenticated``
Berker Peksag6d7dced2015-04-17 04:58:45 +0300358 returns ``True`` for the URI, credentials are sent. If ``is_authenticated``
R David Murray4c7f9952015-04-16 16:36:18 -0400359 is ``False``, credentials are not sent, and then if a ``401`` response is
360 received the request is re-sent with the authentication credentials. If
361 authentication succeeds, ``update_authenticated`` is called to set
362 ``is_authenticated`` ``True`` for the URI, so that subsequent requests to
363 the URI or any of its super-URIs will automatically include the
364 authentication credentials.
365
Berker Peksag6d7dced2015-04-17 04:58:45 +0300366 .. versionadded:: 3.5
367 Added ``is_authenticated`` support.
Georg Brandl116aa622007-08-15 14:28:22 +0000368
369
Georg Brandl7f01a132009-09-16 15:58:14 +0000370.. class:: HTTPBasicAuthHandler(password_mgr=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000371
Senthil Kumaran4de00a22011-05-11 21:17:57 +0800372 Handle authentication with the remote host. *password_mgr*, if given, should
373 be something that is compatible with :class:`HTTPPasswordMgr`; refer to
374 section :ref:`http-password-mgr` for information on the interface that must
375 be supported. HTTPBasicAuthHandler will raise a :exc:`ValueError` when
376 presented with a wrong Authentication scheme.
Georg Brandl116aa622007-08-15 14:28:22 +0000377
378
Georg Brandl7f01a132009-09-16 15:58:14 +0000379.. class:: ProxyBasicAuthHandler(password_mgr=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000380
381 Handle authentication with the proxy. *password_mgr*, if given, should be
382 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
383 :ref:`http-password-mgr` for information on the interface that must be
384 supported.
385
386
Georg Brandl7f01a132009-09-16 15:58:14 +0000387.. class:: AbstractDigestAuthHandler(password_mgr=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000388
389 This is a mixin class that helps with HTTP authentication, both to the remote
390 host and to a proxy. *password_mgr*, if given, should be something that is
391 compatible with :class:`HTTPPasswordMgr`; refer to section
392 :ref:`http-password-mgr` for information on the interface that must be
393 supported.
394
395
Georg Brandl7f01a132009-09-16 15:58:14 +0000396.. class:: HTTPDigestAuthHandler(password_mgr=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000397
Senthil Kumaran4de00a22011-05-11 21:17:57 +0800398 Handle authentication with the remote host. *password_mgr*, if given, should
399 be something that is compatible with :class:`HTTPPasswordMgr`; refer to
400 section :ref:`http-password-mgr` for information on the interface that must
401 be supported. When both Digest Authentication Handler and Basic
402 Authentication Handler are both added, Digest Authentication is always tried
403 first. If the Digest Authentication returns a 40x response again, it is sent
404 to Basic Authentication handler to Handle. This Handler method will raise a
405 :exc:`ValueError` when presented with an authentication scheme other than
406 Digest or Basic.
407
Ezio Melottie9c7d6c2011-05-12 01:10:57 +0300408 .. versionchanged:: 3.3
409 Raise :exc:`ValueError` on unsupported Authentication Scheme.
Senthil Kumaran4de00a22011-05-11 21:17:57 +0800410
Georg Brandl116aa622007-08-15 14:28:22 +0000411
412
Georg Brandl7f01a132009-09-16 15:58:14 +0000413.. class:: ProxyDigestAuthHandler(password_mgr=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000414
415 Handle authentication with the proxy. *password_mgr*, if given, should be
416 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
417 :ref:`http-password-mgr` for information on the interface that must be
418 supported.
419
420
421.. class:: HTTPHandler()
422
423 A class to handle opening of HTTP URLs.
424
425
Antoine Pitrou803e6d62010-10-13 10:36:15 +0000426.. class:: HTTPSHandler(debuglevel=0, context=None, check_hostname=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000427
Antoine Pitrou803e6d62010-10-13 10:36:15 +0000428 A class to handle opening of HTTPS URLs. *context* and *check_hostname*
429 have the same meaning as in :class:`http.client.HTTPSConnection`.
430
431 .. versionchanged:: 3.2
432 *context* and *check_hostname* were added.
Georg Brandl116aa622007-08-15 14:28:22 +0000433
434
435.. class:: FileHandler()
436
437 Open local files.
438
Antoine Pitroudf204be2012-11-24 17:59:08 +0100439.. class:: DataHandler()
440
441 Open data URLs.
442
443 .. versionadded:: 3.4
Georg Brandl116aa622007-08-15 14:28:22 +0000444
445.. class:: FTPHandler()
446
447 Open FTP URLs.
448
449
450.. class:: CacheFTPHandler()
451
452 Open FTP URLs, keeping a cache of open FTP connections to minimize delays.
453
454
455.. class:: UnknownHandler()
456
457 A catch-all class to handle unknown URLs.
458
459
Senthil Kumaraned270fa2011-07-18 06:42:46 +0800460.. class:: HTTPErrorProcessor()
461
462 Process HTTP error responses.
463
464
Georg Brandl116aa622007-08-15 14:28:22 +0000465.. _request-objects:
466
467Request Objects
468---------------
469
Jeremy Hyltone2573162009-03-31 14:38:13 +0000470The following methods describe :class:`Request`'s public interface,
471and so all may be overridden in subclasses. It also defines several
472public attributes that can be used by clients to inspect the parsed
473request.
Georg Brandl116aa622007-08-15 14:28:22 +0000474
Jeremy Hyltone2573162009-03-31 14:38:13 +0000475.. attribute:: Request.full_url
476
477 The original URL passed to the constructor.
478
Senthil Kumaran83070752013-05-24 09:14:12 -0700479 .. versionchanged:: 3.4
480
481 Request.full_url is a property with setter, getter and a deleter. Getting
482 :attr:`~Request.full_url` returns the original request URL with the
483 fragment, if it was present.
484
Jeremy Hyltone2573162009-03-31 14:38:13 +0000485.. attribute:: Request.type
486
487 The URI scheme.
488
489.. attribute:: Request.host
490
491 The URI authority, typically a host, but may also contain a port
492 separated by a colon.
493
494.. attribute:: Request.origin_req_host
495
496 The original host for the request, without port.
497
498.. attribute:: Request.selector
499
500 The URI path. If the :class:`Request` uses a proxy, then selector
Martin Panterfe289c02016-05-28 02:20:39 +0000501 will be the full URL that is passed to the proxy.
Jeremy Hyltone2573162009-03-31 14:38:13 +0000502
503.. attribute:: Request.data
504
Serhiy Storchakaecf41da2016-10-19 16:29:26 +0300505 The entity body for the request, or ``None`` if not specified.
Jeremy Hyltone2573162009-03-31 14:38:13 +0000506
Andrew Svetlovbff98fe2012-11-27 23:06:19 +0200507 .. versionchanged:: 3.4
508 Changing value of :attr:`Request.data` now deletes "Content-Length"
509 header if it was previously set or calculated.
510
Jeremy Hyltone2573162009-03-31 14:38:13 +0000511.. attribute:: Request.unverifiable
512
513 boolean, indicates whether the request is unverifiable as defined
Serhiy Storchaka0a36ac12018-05-31 07:39:00 +0300514 by :rfc:`2965`.
Georg Brandl116aa622007-08-15 14:28:22 +0000515
Senthil Kumarande49d642011-10-16 23:54:44 +0800516.. attribute:: Request.method
517
Larry Hastings3732ed22014-03-15 21:13:56 -0700518 The HTTP request method to use. By default its value is :const:`None`,
519 which means that :meth:`~Request.get_method` will do its normal computation
520 of the method to be used. Its value can be set (thus overriding the default
521 computation in :meth:`~Request.get_method`) either by providing a default
522 value by setting it at the class level in a :class:`Request` subclass, or by
523 passing a value in to the :class:`Request` constructor via the *method*
524 argument.
Senthil Kumarande49d642011-10-16 23:54:44 +0800525
Senthil Kumarana41c9422011-10-20 02:37:08 +0800526 .. versionadded:: 3.3
Senthil Kumarande49d642011-10-16 23:54:44 +0800527
Larry Hastings3732ed22014-03-15 21:13:56 -0700528 .. versionchanged:: 3.4
529 A default value can now be set in subclasses; previously it could only
530 be set via the constructor argument.
531
Georg Brandl116aa622007-08-15 14:28:22 +0000532
533.. method:: Request.get_method()
534
Senthil Kumarande49d642011-10-16 23:54:44 +0800535 Return a string indicating the HTTP request method. If
536 :attr:`Request.method` is not ``None``, return its value, otherwise return
537 ``'GET'`` if :attr:`Request.data` is ``None``, or ``'POST'`` if it's not.
538 This is only meaningful for HTTP requests.
539
Florent Xicluna95483b62011-10-19 11:44:51 +0200540 .. versionchanged:: 3.3
Senthil Kumarana41c9422011-10-20 02:37:08 +0800541 get_method now looks at the value of :attr:`Request.method`.
Georg Brandl116aa622007-08-15 14:28:22 +0000542
543
Georg Brandl116aa622007-08-15 14:28:22 +0000544.. method:: Request.add_header(key, val)
545
546 Add another header to the request. Headers are currently ignored by all
547 handlers except HTTP handlers, where they are added to the list of headers sent
548 to the server. Note that there cannot be more than one header with the same
549 name, and later calls will overwrite previous calls in case the *key* collides.
550 Currently, this is no loss of HTTP functionality, since all headers which have
551 meaning when used more than once have a (header-specific) way of gaining the
552 same functionality using only one header.
553
554
555.. method:: Request.add_unredirected_header(key, header)
556
557 Add a header that will not be added to a redirected request.
558
Georg Brandl116aa622007-08-15 14:28:22 +0000559
560.. method:: Request.has_header(header)
561
562 Return whether the instance has the named header (checks both regular and
563 unredirected).
564
Georg Brandl116aa622007-08-15 14:28:22 +0000565
Andrew Svetlovbff98fe2012-11-27 23:06:19 +0200566.. method:: Request.remove_header(header)
567
568 Remove named header from the request instance (both from regular and
569 unredirected headers).
570
Georg Brandlc0fc9582012-12-22 10:36:45 +0100571 .. versionadded:: 3.4
572
Andrew Svetlovbff98fe2012-11-27 23:06:19 +0200573
Georg Brandl116aa622007-08-15 14:28:22 +0000574.. method:: Request.get_full_url()
575
576 Return the URL given in the constructor.
577
Senthil Kumaran83070752013-05-24 09:14:12 -0700578 .. versionchanged:: 3.4
579
580 Returns :attr:`Request.full_url`
581
Georg Brandl116aa622007-08-15 14:28:22 +0000582
Georg Brandl116aa622007-08-15 14:28:22 +0000583.. method:: Request.set_proxy(host, type)
584
585 Prepare the request by connecting to a proxy server. The *host* and *type* will
586 replace those of the instance, and the instance's selector will be the original
587 URL given in the constructor.
588
589
Senthil Kumaran8dc50042012-04-29 11:50:39 +0800590.. method:: Request.get_header(header_name, default=None)
591
592 Return the value of the given header. If the header is not present, return
593 the default value.
594
595
596.. method:: Request.header_items()
597
598 Return a list of tuples (header_name, header_value) of the Request headers.
599
Senthil Kumaran6ddec172013-03-19 18:03:39 -0700600.. versionchanged:: 3.4
Georg Brandldf48b972014-03-24 09:06:18 +0100601 The request methods add_data, has_data, get_data, get_type, get_host,
602 get_selector, get_origin_req_host and is_unverifiable that were deprecated
603 since 3.3 have been removed.
604
Georg Brandl116aa622007-08-15 14:28:22 +0000605
606.. _opener-director-objects:
607
608OpenerDirector Objects
609----------------------
610
611:class:`OpenerDirector` instances have the following methods:
612
613
614.. method:: OpenerDirector.add_handler(handler)
615
616 *handler* should be an instance of :class:`BaseHandler`. The following methods
617 are searched, and added to the possible chains (note that HTTP errors are a
Denton Liudd7c4ce2019-03-22 14:49:55 -0700618 special case). Note that, in the following, *protocol* should be replaced
619 with the actual protocol to handle, for example :meth:`http_response` would
620 be the HTTP protocol response handler. Also *type* should be replaced with
621 the actual HTTP code, for example :meth:`http_error_404` would handle HTTP
622 404 errors.
Georg Brandl116aa622007-08-15 14:28:22 +0000623
Denton Liudd7c4ce2019-03-22 14:49:55 -0700624 * :meth:`<protocol>_open` --- signal that the handler knows how to open *protocol*
Georg Brandl116aa622007-08-15 14:28:22 +0000625 URLs.
626
Denton Liudd7c4ce2019-03-22 14:49:55 -0700627 See |protocol_open|_ for more information.
628
629 * :meth:`http_error_\<type\>` --- signal that the handler knows how to handle HTTP
Georg Brandl116aa622007-08-15 14:28:22 +0000630 errors with HTTP error code *type*.
631
Denton Liudd7c4ce2019-03-22 14:49:55 -0700632 See |http_error_nnn|_ for more information.
633
634 * :meth:`<protocol>_error` --- signal that the handler knows how to handle errors
Georg Brandl116aa622007-08-15 14:28:22 +0000635 from (non-\ ``http``) *protocol*.
636
Denton Liudd7c4ce2019-03-22 14:49:55 -0700637 * :meth:`<protocol>_request` --- signal that the handler knows how to pre-process
Georg Brandl116aa622007-08-15 14:28:22 +0000638 *protocol* requests.
639
Denton Liudd7c4ce2019-03-22 14:49:55 -0700640 See |protocol_request|_ for more information.
641
642 * :meth:`<protocol>_response` --- signal that the handler knows how to
Georg Brandl116aa622007-08-15 14:28:22 +0000643 post-process *protocol* responses.
644
Denton Liudd7c4ce2019-03-22 14:49:55 -0700645 See |protocol_response|_ for more information.
646
647.. |protocol_open| replace:: :meth:`BaseHandler.<protocol>_open`
648.. |http_error_nnn| replace:: :meth:`BaseHandler.http_error_\<nnn\>`
649.. |protocol_request| replace:: :meth:`BaseHandler.<protocol>_request`
650.. |protocol_response| replace:: :meth:`BaseHandler.<protocol>_response`
Georg Brandl116aa622007-08-15 14:28:22 +0000651
Georg Brandl7f01a132009-09-16 15:58:14 +0000652.. method:: OpenerDirector.open(url, data=None[, timeout])
Georg Brandl116aa622007-08-15 14:28:22 +0000653
654 Open the given *url* (which can be a request object or a string), optionally
Alexandre Vassalotti5f8ced22008-05-16 00:03:33 +0000655 passing the given *data*. Arguments, return values and exceptions raised are
656 the same as those of :func:`urlopen` (which simply calls the :meth:`open`
657 method on the currently installed global :class:`OpenerDirector`). The
658 optional *timeout* parameter specifies a timeout in seconds for blocking
Georg Brandlf78e02b2008-06-10 17:40:04 +0000659 operations like the connection attempt (if not specified, the global default
Georg Brandl325524e2010-05-21 20:57:33 +0000660 timeout setting will be used). The timeout feature actually works only for
Senthil Kumaranc08d9072010-10-05 18:46:56 +0000661 HTTP, HTTPS and FTP connections).
Georg Brandl116aa622007-08-15 14:28:22 +0000662
Georg Brandl116aa622007-08-15 14:28:22 +0000663
Georg Brandl7f01a132009-09-16 15:58:14 +0000664.. method:: OpenerDirector.error(proto, *args)
Georg Brandl116aa622007-08-15 14:28:22 +0000665
666 Handle an error of the given protocol. This will call the registered error
667 handlers for the given protocol with the given arguments (which are protocol
668 specific). The HTTP protocol is a special case which uses the HTTP response
Denton Liudd7c4ce2019-03-22 14:49:55 -0700669 code to determine the specific error handler; refer to the :meth:`http_error_\<type\>`
Georg Brandl116aa622007-08-15 14:28:22 +0000670 methods of the handler classes.
671
672 Return values and exceptions raised are the same as those of :func:`urlopen`.
673
674OpenerDirector objects open URLs in three stages:
675
676The order in which these methods are called within each stage is determined by
677sorting the handler instances.
678
Denton Liudd7c4ce2019-03-22 14:49:55 -0700679#. Every handler with a method named like :meth:`<protocol>_request` has that
Georg Brandl116aa622007-08-15 14:28:22 +0000680 method called to pre-process the request.
681
Denton Liudd7c4ce2019-03-22 14:49:55 -0700682#. Handlers with a method named like :meth:`<protocol>_open` are called to handle
Georg Brandl116aa622007-08-15 14:28:22 +0000683 the request. This stage ends when a handler either returns a non-\ :const:`None`
Serhiy Storchaka5e1c0532013-10-13 20:06:50 +0300684 value (ie. a response), or raises an exception (usually
685 :exc:`~urllib.error.URLError`). Exceptions are allowed to propagate.
Georg Brandl116aa622007-08-15 14:28:22 +0000686
687 In fact, the above algorithm is first tried for methods named
688 :meth:`default_open`. If all such methods return :const:`None`, the algorithm
Denton Liudd7c4ce2019-03-22 14:49:55 -0700689 is repeated for methods named like :meth:`<protocol>_open`. If all such methods
Georg Brandl116aa622007-08-15 14:28:22 +0000690 return :const:`None`, the algorithm is repeated for methods named
691 :meth:`unknown_open`.
692
693 Note that the implementation of these methods may involve calls of the parent
Georg Brandla5eacee2010-07-23 16:55:26 +0000694 :class:`OpenerDirector` instance's :meth:`~OpenerDirector.open` and
695 :meth:`~OpenerDirector.error` methods.
Georg Brandl116aa622007-08-15 14:28:22 +0000696
Denton Liudd7c4ce2019-03-22 14:49:55 -0700697#. Every handler with a method named like :meth:`<protocol>_response` has that
Georg Brandl116aa622007-08-15 14:28:22 +0000698 method called to post-process the response.
699
700
701.. _base-handler-objects:
702
703BaseHandler Objects
704-------------------
705
706:class:`BaseHandler` objects provide a couple of methods that are directly
707useful, and others that are meant to be used by derived classes. These are
708intended for direct use:
709
710
711.. method:: BaseHandler.add_parent(director)
712
713 Add a director as parent.
714
715
716.. method:: BaseHandler.close()
717
718 Remove any parents.
719
Senthil Kumarana6bac952011-07-04 11:28:30 -0700720The following attribute and methods should only be used by classes derived from
Georg Brandl116aa622007-08-15 14:28:22 +0000721:class:`BaseHandler`.
722
723.. note::
724
725 The convention has been adopted that subclasses defining
Denton Liudd7c4ce2019-03-22 14:49:55 -0700726 :meth:`<protocol>_request` or :meth:`<protocol>_response` methods are named
Georg Brandl116aa622007-08-15 14:28:22 +0000727 :class:`\*Processor`; all others are named :class:`\*Handler`.
728
729
730.. attribute:: BaseHandler.parent
731
732 A valid :class:`OpenerDirector`, which can be used to open using a different
733 protocol, or handle errors.
734
735
736.. method:: BaseHandler.default_open(req)
737
738 This method is *not* defined in :class:`BaseHandler`, but subclasses should
739 define it if they want to catch all URLs.
740
741 This method, if implemented, will be called by the parent
742 :class:`OpenerDirector`. It should return a file-like object as described in
743 the return value of the :meth:`open` of :class:`OpenerDirector`, or ``None``.
Serhiy Storchaka5e1c0532013-10-13 20:06:50 +0300744 It should raise :exc:`~urllib.error.URLError`, unless a truly exceptional
745 thing happens (for example, :exc:`MemoryError` should not be mapped to
746 :exc:`URLError`).
Georg Brandl116aa622007-08-15 14:28:22 +0000747
748 This method will be called before any protocol-specific open method.
749
750
Denton Liudd7c4ce2019-03-22 14:49:55 -0700751.. _protocol_open:
752.. method:: BaseHandler.<protocol>_open(req)
Georg Brandl116aa622007-08-15 14:28:22 +0000753 :noindex:
754
755 This method is *not* defined in :class:`BaseHandler`, but subclasses should
756 define it if they want to handle URLs with the given protocol.
757
758 This method, if defined, will be called by the parent :class:`OpenerDirector`.
759 Return values should be the same as for :meth:`default_open`.
760
761
762.. method:: BaseHandler.unknown_open(req)
763
764 This method is *not* defined in :class:`BaseHandler`, but subclasses should
765 define it if they want to catch all URLs with no specific registered handler to
766 open it.
767
768 This method, if implemented, will be called by the :attr:`parent`
769 :class:`OpenerDirector`. Return values should be the same as for
770 :meth:`default_open`.
771
772
773.. method:: BaseHandler.http_error_default(req, fp, code, msg, hdrs)
774
775 This method is *not* defined in :class:`BaseHandler`, but subclasses should
776 override it if they intend to provide a catch-all for otherwise unhandled HTTP
777 errors. It will be called automatically by the :class:`OpenerDirector` getting
778 the error, and should not normally be called in other circumstances.
779
780 *req* will be a :class:`Request` object, *fp* will be a file-like object with
781 the HTTP error body, *code* will be the three-digit code of the error, *msg*
782 will be the user-visible explanation of the code and *hdrs* will be a mapping
783 object with the headers of the error.
784
785 Return values and exceptions raised should be the same as those of
786 :func:`urlopen`.
787
788
Denton Liudd7c4ce2019-03-22 14:49:55 -0700789.. _http_error_nnn:
790.. method:: BaseHandler.http_error_<nnn>(req, fp, code, msg, hdrs)
Georg Brandl116aa622007-08-15 14:28:22 +0000791
792 *nnn* should be a three-digit HTTP error code. This method is also not defined
793 in :class:`BaseHandler`, but will be called, if it exists, on an instance of a
794 subclass, when an HTTP error with code *nnn* occurs.
795
796 Subclasses should override this method to handle specific HTTP errors.
797
798 Arguments, return values and exceptions raised should be the same as for
799 :meth:`http_error_default`.
800
801
Denton Liudd7c4ce2019-03-22 14:49:55 -0700802.. _protocol_request:
803.. method:: BaseHandler.<protocol>_request(req)
Georg Brandl116aa622007-08-15 14:28:22 +0000804 :noindex:
805
806 This method is *not* defined in :class:`BaseHandler`, but subclasses should
807 define it if they want to pre-process requests of the given protocol.
808
809 This method, if defined, will be called by the parent :class:`OpenerDirector`.
810 *req* will be a :class:`Request` object. The return value should be a
811 :class:`Request` object.
812
813
Denton Liudd7c4ce2019-03-22 14:49:55 -0700814.. _protocol_response:
815.. method:: BaseHandler.<protocol>_response(req, response)
Georg Brandl116aa622007-08-15 14:28:22 +0000816 :noindex:
817
818 This method is *not* defined in :class:`BaseHandler`, but subclasses should
819 define it if they want to post-process responses of the given protocol.
820
821 This method, if defined, will be called by the parent :class:`OpenerDirector`.
822 *req* will be a :class:`Request` object. *response* will be an object
823 implementing the same interface as the return value of :func:`urlopen`. The
824 return value should implement the same interface as the return value of
825 :func:`urlopen`.
826
827
828.. _http-redirect-handler:
829
830HTTPRedirectHandler Objects
831---------------------------
832
833.. note::
834
835 Some HTTP redirections require action from this module's client code. If this
Serhiy Storchaka5e1c0532013-10-13 20:06:50 +0300836 is the case, :exc:`~urllib.error.HTTPError` is raised. See :rfc:`2616` for
837 details of the precise meanings of the various redirection codes.
Georg Brandl116aa622007-08-15 14:28:22 +0000838
guido@google.coma119df92011-03-29 11:41:02 -0700839 An :class:`HTTPError` exception raised as a security consideration if the
Martin Panterfe289c02016-05-28 02:20:39 +0000840 HTTPRedirectHandler is presented with a redirected URL which is not an HTTP,
841 HTTPS or FTP URL.
guido@google.coma119df92011-03-29 11:41:02 -0700842
Georg Brandl116aa622007-08-15 14:28:22 +0000843
Georg Brandl9617a592009-02-13 10:40:43 +0000844.. method:: HTTPRedirectHandler.redirect_request(req, fp, code, msg, hdrs, newurl)
Georg Brandl116aa622007-08-15 14:28:22 +0000845
846 Return a :class:`Request` or ``None`` in response to a redirect. This is called
847 by the default implementations of the :meth:`http_error_30\*` methods when a
848 redirection is received from the server. If a redirection should take place,
849 return a new :class:`Request` to allow :meth:`http_error_30\*` to perform the
Serhiy Storchaka5e1c0532013-10-13 20:06:50 +0300850 redirect to *newurl*. Otherwise, raise :exc:`~urllib.error.HTTPError` if
851 no other handler should try to handle this URL, or return ``None`` if you
852 can't but another handler might.
Georg Brandl116aa622007-08-15 14:28:22 +0000853
854 .. note::
855
856 The default implementation of this method does not strictly follow :rfc:`2616`,
857 which says that 301 and 302 responses to ``POST`` requests must not be
858 automatically redirected without confirmation by the user. In reality, browsers
859 do allow automatic redirection of these responses, changing the POST to a
860 ``GET``, and the default implementation reproduces this behavior.
861
862
863.. method:: HTTPRedirectHandler.http_error_301(req, fp, code, msg, hdrs)
864
Georg Brandl9617a592009-02-13 10:40:43 +0000865 Redirect to the ``Location:`` or ``URI:`` URL. This method is called by the
866 parent :class:`OpenerDirector` when getting an HTTP 'moved permanently' response.
Georg Brandl116aa622007-08-15 14:28:22 +0000867
868
869.. method:: HTTPRedirectHandler.http_error_302(req, fp, code, msg, hdrs)
870
871 The same as :meth:`http_error_301`, but called for the 'found' response.
872
873
874.. method:: HTTPRedirectHandler.http_error_303(req, fp, code, msg, hdrs)
875
876 The same as :meth:`http_error_301`, but called for the 'see other' response.
877
878
879.. method:: HTTPRedirectHandler.http_error_307(req, fp, code, msg, hdrs)
880
881 The same as :meth:`http_error_301`, but called for the 'temporary redirect'
882 response.
883
884
885.. _http-cookie-processor:
886
887HTTPCookieProcessor Objects
888---------------------------
889
Georg Brandl116aa622007-08-15 14:28:22 +0000890:class:`HTTPCookieProcessor` instances have one attribute:
891
Georg Brandl116aa622007-08-15 14:28:22 +0000892.. attribute:: HTTPCookieProcessor.cookiejar
893
Georg Brandl24420152008-05-26 16:32:26 +0000894 The :class:`http.cookiejar.CookieJar` in which cookies are stored.
Georg Brandl116aa622007-08-15 14:28:22 +0000895
896
897.. _proxy-handler:
898
899ProxyHandler Objects
900--------------------
901
902
Denton Liudd7c4ce2019-03-22 14:49:55 -0700903.. method:: ProxyHandler.<protocol>_open(request)
Georg Brandl116aa622007-08-15 14:28:22 +0000904 :noindex:
905
Denton Liudd7c4ce2019-03-22 14:49:55 -0700906 The :class:`ProxyHandler` will have a method :meth:`<protocol>_open` for every
Georg Brandl116aa622007-08-15 14:28:22 +0000907 *protocol* which has a proxy in the *proxies* dictionary given in the
908 constructor. The method will modify requests to go through the proxy, by
909 calling ``request.set_proxy()``, and call the next handler in the chain to
910 actually execute the protocol.
911
912
913.. _http-password-mgr:
914
915HTTPPasswordMgr Objects
916-----------------------
917
918These methods are available on :class:`HTTPPasswordMgr` and
919:class:`HTTPPasswordMgrWithDefaultRealm` objects.
920
921
922.. method:: HTTPPasswordMgr.add_password(realm, uri, user, passwd)
923
924 *uri* can be either a single URI, or a sequence of URIs. *realm*, *user* and
925 *passwd* must be strings. This causes ``(user, passwd)`` to be used as
926 authentication tokens when authentication for *realm* and a super-URI of any of
927 the given URIs is given.
928
929
930.. method:: HTTPPasswordMgr.find_user_password(realm, authuri)
931
932 Get user/password for given realm and URI, if any. This method will return
933 ``(None, None)`` if there is no matching user/password.
934
935 For :class:`HTTPPasswordMgrWithDefaultRealm` objects, the realm ``None`` will be
936 searched if the given *realm* has no matching user/password.
937
938
R David Murray4c7f9952015-04-16 16:36:18 -0400939.. _http-password-mgr-with-prior-auth:
940
941HTTPPasswordMgrWithPriorAuth Objects
942------------------------------------
943
944This password manager extends :class:`HTTPPasswordMgrWithDefaultRealm` to support
945tracking URIs for which authentication credentials should always be sent.
946
947
948.. method:: HTTPPasswordMgrWithPriorAuth.add_password(realm, uri, user, \
949 passwd, is_authenticated=False)
950
951 *realm*, *uri*, *user*, *passwd* are as for
952 :meth:`HTTPPasswordMgr.add_password`. *is_authenticated* sets the initial
953 value of the ``is_authenticated`` flag for the given URI or list of URIs.
954 If *is_authenticated* is specified as ``True``, *realm* is ignored.
955
956
957.. method:: HTTPPasswordMgr.find_user_password(realm, authuri)
958
959 Same as for :class:`HTTPPasswordMgrWithDefaultRealm` objects
960
961
962.. method:: HTTPPasswordMgrWithPriorAuth.update_authenticated(self, uri, \
963 is_authenticated=False)
964
965 Update the ``is_authenticated`` flag for the given *uri* or list
966 of URIs.
967
968
969.. method:: HTTPPasswordMgrWithPriorAuth.is_authenticated(self, authuri)
970
971 Returns the current state of the ``is_authenticated`` flag for
972 the given URI.
973
974
Georg Brandl116aa622007-08-15 14:28:22 +0000975.. _abstract-basic-auth-handler:
976
977AbstractBasicAuthHandler Objects
978--------------------------------
979
980
981.. method:: AbstractBasicAuthHandler.http_error_auth_reqed(authreq, host, req, headers)
982
983 Handle an authentication request by getting a user/password pair, and re-trying
984 the request. *authreq* should be the name of the header where the information
985 about the realm is included in the request, *host* specifies the URL and path to
986 authenticate for, *req* should be the (failed) :class:`Request` object, and
987 *headers* should be the error headers.
988
989 *host* is either an authority (e.g. ``"python.org"``) or a URL containing an
990 authority component (e.g. ``"http://python.org/"``). In either case, the
991 authority must not contain a userinfo component (so, ``"python.org"`` and
992 ``"python.org:80"`` are fine, ``"joe:password@python.org"`` is not).
993
994
995.. _http-basic-auth-handler:
996
997HTTPBasicAuthHandler Objects
998----------------------------
999
1000
1001.. method:: HTTPBasicAuthHandler.http_error_401(req, fp, code, msg, hdrs)
1002
1003 Retry the request with authentication information, if available.
1004
1005
1006.. _proxy-basic-auth-handler:
1007
1008ProxyBasicAuthHandler Objects
1009-----------------------------
1010
1011
1012.. method:: ProxyBasicAuthHandler.http_error_407(req, fp, code, msg, hdrs)
1013
1014 Retry the request with authentication information, if available.
1015
1016
1017.. _abstract-digest-auth-handler:
1018
1019AbstractDigestAuthHandler Objects
1020---------------------------------
1021
1022
1023.. method:: AbstractDigestAuthHandler.http_error_auth_reqed(authreq, host, req, headers)
1024
1025 *authreq* should be the name of the header where the information about the realm
1026 is included in the request, *host* should be the host to authenticate to, *req*
1027 should be the (failed) :class:`Request` object, and *headers* should be the
1028 error headers.
1029
1030
1031.. _http-digest-auth-handler:
1032
1033HTTPDigestAuthHandler Objects
1034-----------------------------
1035
1036
1037.. method:: HTTPDigestAuthHandler.http_error_401(req, fp, code, msg, hdrs)
1038
1039 Retry the request with authentication information, if available.
1040
1041
1042.. _proxy-digest-auth-handler:
1043
1044ProxyDigestAuthHandler Objects
1045------------------------------
1046
1047
1048.. method:: ProxyDigestAuthHandler.http_error_407(req, fp, code, msg, hdrs)
1049
1050 Retry the request with authentication information, if available.
1051
1052
1053.. _http-handler-objects:
1054
1055HTTPHandler Objects
1056-------------------
1057
1058
1059.. method:: HTTPHandler.http_open(req)
1060
1061 Send an HTTP request, which can be either GET or POST, depending on
1062 ``req.has_data()``.
1063
1064
1065.. _https-handler-objects:
1066
1067HTTPSHandler Objects
1068--------------------
1069
1070
1071.. method:: HTTPSHandler.https_open(req)
1072
1073 Send an HTTPS request, which can be either GET or POST, depending on
1074 ``req.has_data()``.
1075
1076
1077.. _file-handler-objects:
1078
1079FileHandler Objects
1080-------------------
1081
1082
1083.. method:: FileHandler.file_open(req)
1084
1085 Open the file locally, if there is no host name, or the host name is
Senthil Kumaran383c32d2010-10-14 11:57:35 +00001086 ``'localhost'``.
1087
Georg Brandl61063cc2012-06-24 22:48:30 +02001088 .. versionchanged:: 3.2
1089 This method is applicable only for local hostnames. When a remote
Serhiy Storchaka5e1c0532013-10-13 20:06:50 +03001090 hostname is given, an :exc:`~urllib.error.URLError` is raised.
Georg Brandl116aa622007-08-15 14:28:22 +00001091
1092
Antoine Pitroudf204be2012-11-24 17:59:08 +01001093.. _data-handler-objects:
1094
1095DataHandler Objects
1096-------------------
1097
1098.. method:: DataHandler.data_open(req)
1099
1100 Read a data URL. This kind of URL contains the content encoded in the URL
1101 itself. The data URL syntax is specified in :rfc:`2397`. This implementation
1102 ignores white spaces in base64 encoded data URLs so the URL may be wrapped
1103 in whatever source file it comes from. But even though some browsers don't
1104 mind about a missing padding at the end of a base64 encoded data URL, this
1105 implementation will raise an :exc:`ValueError` in that case.
1106
1107
Georg Brandl116aa622007-08-15 14:28:22 +00001108.. _ftp-handler-objects:
1109
1110FTPHandler Objects
1111------------------
1112
1113
1114.. method:: FTPHandler.ftp_open(req)
1115
1116 Open the FTP file indicated by *req*. The login is always done with empty
1117 username and password.
1118
1119
1120.. _cacheftp-handler-objects:
1121
1122CacheFTPHandler Objects
1123-----------------------
1124
1125:class:`CacheFTPHandler` objects are :class:`FTPHandler` objects with the
1126following additional methods:
1127
1128
1129.. method:: CacheFTPHandler.setTimeout(t)
1130
1131 Set timeout of connections to *t* seconds.
1132
1133
1134.. method:: CacheFTPHandler.setMaxConns(m)
1135
1136 Set maximum number of cached connections to *m*.
1137
1138
1139.. _unknown-handler-objects:
1140
1141UnknownHandler Objects
1142----------------------
1143
1144
1145.. method:: UnknownHandler.unknown_open()
1146
Serhiy Storchaka5e1c0532013-10-13 20:06:50 +03001147 Raise a :exc:`~urllib.error.URLError` exception.
Georg Brandl116aa622007-08-15 14:28:22 +00001148
1149
1150.. _http-error-processor-objects:
1151
1152HTTPErrorProcessor Objects
1153--------------------------
1154
Sebastian Rittauc53aaec2018-08-17 11:47:32 +02001155.. method:: HTTPErrorProcessor.http_response(request, response)
Georg Brandl116aa622007-08-15 14:28:22 +00001156
1157 Process HTTP error responses.
1158
1159 For 200 error codes, the response object is returned immediately.
1160
1161 For non-200 error codes, this simply passes the job on to the
Denton Liudd7c4ce2019-03-22 14:49:55 -07001162 :meth:`http_error_\<type\>` handler methods, via :meth:`OpenerDirector.error`.
Georg Brandl0f7ede42008-06-23 11:23:31 +00001163 Eventually, :class:`HTTPDefaultErrorHandler` will raise an
Serhiy Storchaka5e1c0532013-10-13 20:06:50 +03001164 :exc:`~urllib.error.HTTPError` if no other handler handles the error.
Georg Brandl116aa622007-08-15 14:28:22 +00001165
Georg Brandl0f7ede42008-06-23 11:23:31 +00001166
Sebastian Rittauc53aaec2018-08-17 11:47:32 +02001167.. method:: HTTPErrorProcessor.https_response(request, response)
Senthil Kumaran0215d092011-07-18 07:12:40 +08001168
Senthil Kumaran3e7f33f2011-07-18 07:17:20 +08001169 Process HTTPS error responses.
1170
Senthil Kumaran0215d092011-07-18 07:12:40 +08001171 The behavior is same as :meth:`http_response`.
1172
1173
Georg Brandl0f7ede42008-06-23 11:23:31 +00001174.. _urllib-request-examples:
Georg Brandl116aa622007-08-15 14:28:22 +00001175
1176Examples
1177--------
1178
Martin Panter7aaaded82016-06-01 08:10:50 +00001179In addition to the examples below, more examples are given in
1180:ref:`urllib-howto`.
1181
Senthil Kumaran0c2d8b82010-04-22 10:53:30 +00001182This example gets the python.org main page and displays the first 300 bytes of
Georg Brandlbdc55ab2010-04-20 18:15:54 +00001183it. ::
Georg Brandl116aa622007-08-15 14:28:22 +00001184
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001185 >>> import urllib.request
Berker Peksag9575e182015-04-12 13:52:49 +03001186 >>> with urllib.request.urlopen('http://www.python.org/') as f:
1187 ... print(f.read(300))
1188 ...
Senthil Kumaran0c2d8b82010-04-22 10:53:30 +00001189 b'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
1190 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">\n\n\n<html
1191 xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">\n\n<head>\n
1192 <meta http-equiv="content-type" content="text/html; charset=utf-8" />\n
1193 <title>Python Programming '
Senthil Kumaranb213ee32010-04-15 17:18:22 +00001194
Senthil Kumaran0c2d8b82010-04-22 10:53:30 +00001195Note that urlopen returns a bytes object. This is because there is no way
1196for urlopen to automatically determine the encoding of the byte stream
Martin Panterfe289c02016-05-28 02:20:39 +00001197it receives from the HTTP server. In general, a program will decode
Senthil Kumaran0c2d8b82010-04-22 10:53:30 +00001198the returned bytes object to string once it determines or guesses
1199the appropriate encoding.
Senthil Kumaranb213ee32010-04-15 17:18:22 +00001200
Serhiy Storchaka6dff0202016-05-07 10:49:07 +03001201The following W3C document, https://www.w3.org/International/O-charset\ , lists
Martin Panter204bf0b2016-07-11 07:51:37 +00001202the various ways in which an (X)HTML or an XML document could have specified its
Senthil Kumaran0c2d8b82010-04-22 10:53:30 +00001203encoding information.
1204
Donald Stufft8b852f12014-05-20 12:58:38 -04001205As the python.org website uses *utf-8* encoding as specified in its meta tag, we
Senthil Kumaran21c71ba2012-03-13 19:47:51 -07001206will use the same for decoding the bytes object. ::
1207
1208 >>> with urllib.request.urlopen('http://www.python.org/') as f:
1209 ... print(f.read(100).decode('utf-8'))
1210 ...
1211 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
1212 "http://www.w3.org/TR/xhtml1/DTD/xhtm
1213
1214It is also possible to achieve the same result without using the
1215:term:`context manager` approach. ::
Senthil Kumaranb213ee32010-04-15 17:18:22 +00001216
1217 >>> import urllib.request
1218 >>> f = urllib.request.urlopen('http://www.python.org/')
Georg Brandlfe4fd832010-05-21 21:01:32 +00001219 >>> print(f.read(100).decode('utf-8'))
Senthil Kumaran0c2d8b82010-04-22 10:53:30 +00001220 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
1221 "http://www.w3.org/TR/xhtml1/DTD/xhtm
1222
Senthil Kumaranb213ee32010-04-15 17:18:22 +00001223In the following example, we are sending a data-stream to the stdin of a CGI
1224and reading the data it returns to us. Note that this example will only work
1225when the Python installation supports SSL. ::
Georg Brandl116aa622007-08-15 14:28:22 +00001226
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001227 >>> import urllib.request
1228 >>> req = urllib.request.Request(url='https://localhost/cgi-bin/test.cgi',
Senthil Kumaran29333122011-02-11 11:25:47 +00001229 ... data=b'This data is passed to stdin of the CGI')
Berker Peksag9575e182015-04-12 13:52:49 +03001230 >>> with urllib.request.urlopen(req) as f:
1231 ... print(f.read().decode('utf-8'))
1232 ...
Georg Brandl116aa622007-08-15 14:28:22 +00001233 Got Data: "This data is passed to stdin of the CGI"
1234
1235The code for the sample CGI used in the above example is::
1236
1237 #!/usr/bin/env python
1238 import sys
1239 data = sys.stdin.read()
Martin Panterac34e092015-11-14 00:58:32 +00001240 print('Content-type: text/plain\n\nGot Data: "%s"' % data)
Georg Brandl116aa622007-08-15 14:28:22 +00001241
Senthil Kumarane66cc812013-03-13 13:42:47 -07001242Here is an example of doing a ``PUT`` request using :class:`Request`::
1243
1244 import urllib.request
Serhiy Storchakadba90392016-05-10 12:01:23 +03001245 DATA = b'some data'
Senthil Kumarane66cc812013-03-13 13:42:47 -07001246 req = urllib.request.Request(url='http://localhost:8080', data=DATA,method='PUT')
Berker Peksag9575e182015-04-12 13:52:49 +03001247 with urllib.request.urlopen(req) as f:
1248 pass
Senthil Kumarane66cc812013-03-13 13:42:47 -07001249 print(f.status)
1250 print(f.reason)
1251
Georg Brandl116aa622007-08-15 14:28:22 +00001252Use of Basic HTTP Authentication::
1253
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001254 import urllib.request
Georg Brandl116aa622007-08-15 14:28:22 +00001255 # Create an OpenerDirector with support for Basic HTTP Authentication...
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001256 auth_handler = urllib.request.HTTPBasicAuthHandler()
Georg Brandl116aa622007-08-15 14:28:22 +00001257 auth_handler.add_password(realm='PDQ Application',
1258 uri='https://mahler:8092/site-updates.py',
1259 user='klem',
1260 passwd='kadidd!ehopper')
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001261 opener = urllib.request.build_opener(auth_handler)
Georg Brandl116aa622007-08-15 14:28:22 +00001262 # ...and install it globally so it can be used with urlopen.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001263 urllib.request.install_opener(opener)
1264 urllib.request.urlopen('http://www.example.com/login.html')
Georg Brandl116aa622007-08-15 14:28:22 +00001265
1266:func:`build_opener` provides many handlers by default, including a
1267:class:`ProxyHandler`. By default, :class:`ProxyHandler` uses the environment
1268variables named ``<scheme>_proxy``, where ``<scheme>`` is the URL scheme
1269involved. For example, the :envvar:`http_proxy` environment variable is read to
1270obtain the HTTP proxy's URL.
1271
1272This example replaces the default :class:`ProxyHandler` with one that uses
Georg Brandl2ee470f2008-07-16 12:55:28 +00001273programmatically-supplied proxy URLs, and adds proxy authorization support with
Georg Brandl116aa622007-08-15 14:28:22 +00001274:class:`ProxyBasicAuthHandler`. ::
1275
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001276 proxy_handler = urllib.request.ProxyHandler({'http': 'http://www.example.com:3128/'})
Senthil Kumaran037f8362009-12-24 02:24:37 +00001277 proxy_auth_handler = urllib.request.ProxyBasicAuthHandler()
Georg Brandl116aa622007-08-15 14:28:22 +00001278 proxy_auth_handler.add_password('realm', 'host', 'username', 'password')
1279
Senthil Kumaran037f8362009-12-24 02:24:37 +00001280 opener = urllib.request.build_opener(proxy_handler, proxy_auth_handler)
Georg Brandl116aa622007-08-15 14:28:22 +00001281 # This time, rather than install the OpenerDirector, we use it directly:
1282 opener.open('http://www.example.com/login.html')
1283
1284Adding HTTP headers:
1285
1286Use the *headers* argument to the :class:`Request` constructor, or::
1287
Georg Brandl029986a2008-06-23 11:44:14 +00001288 import urllib.request
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001289 req = urllib.request.Request('http://www.example.com/')
Georg Brandl116aa622007-08-15 14:28:22 +00001290 req.add_header('Referer', 'http://www.python.org/')
Martin Panter7aaaded82016-06-01 08:10:50 +00001291 # Customize the default User-Agent header value:
1292 req.add_header('User-Agent', 'urllib-example/0.1 (Contact: . . .)')
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001293 r = urllib.request.urlopen(req)
Georg Brandl116aa622007-08-15 14:28:22 +00001294
1295:class:`OpenerDirector` automatically adds a :mailheader:`User-Agent` header to
1296every :class:`Request`. To change this::
1297
Georg Brandl029986a2008-06-23 11:44:14 +00001298 import urllib.request
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001299 opener = urllib.request.build_opener()
Georg Brandl116aa622007-08-15 14:28:22 +00001300 opener.addheaders = [('User-agent', 'Mozilla/5.0')]
1301 opener.open('http://www.example.com/')
1302
1303Also, remember that a few standard headers (:mailheader:`Content-Length`,
Martin Pantercda85a02015-11-24 22:33:18 +00001304:mailheader:`Content-Type` and :mailheader:`Host`)
Senthil Kumaran6b3434a2012-03-15 18:11:16 -07001305are added when the :class:`Request` is passed to :func:`urlopen` (or
1306:meth:`OpenerDirector.open`).
Georg Brandl116aa622007-08-15 14:28:22 +00001307
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001308.. _urllib-examples:
1309
1310Here is an example session that uses the ``GET`` method to retrieve a URL
1311containing parameters::
1312
1313 >>> import urllib.request
1314 >>> import urllib.parse
1315 >>> params = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
Berker Peksag9575e182015-04-12 13:52:49 +03001316 >>> url = "http://www.musi-cal.com/cgi-bin/query?%s" % params
1317 >>> with urllib.request.urlopen(url) as f:
1318 ... print(f.read().decode('utf-8'))
1319 ...
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001320
Senthil Kumaran29333122011-02-11 11:25:47 +00001321The following example uses the ``POST`` method instead. Note that params output
1322from urlencode is encoded to bytes before it is sent to urlopen as data::
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001323
1324 >>> import urllib.request
1325 >>> import urllib.parse
Senthil Kumaran6b3434a2012-03-15 18:11:16 -07001326 >>> data = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
Martin Pantercda85a02015-11-24 22:33:18 +00001327 >>> data = data.encode('ascii')
1328 >>> with urllib.request.urlopen("http://requestb.in/xrbl82xr", data) as f:
Berker Peksag9575e182015-04-12 13:52:49 +03001329 ... print(f.read().decode('utf-8'))
1330 ...
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001331
1332The following example uses an explicitly specified HTTP proxy, overriding
1333environment settings::
1334
1335 >>> import urllib.request
1336 >>> proxies = {'http': 'http://proxy.example.com:8080/'}
1337 >>> opener = urllib.request.FancyURLopener(proxies)
Berker Peksag9575e182015-04-12 13:52:49 +03001338 >>> with opener.open("http://www.python.org") as f:
1339 ... f.read().decode('utf-8')
1340 ...
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001341
1342The following example uses no proxies at all, overriding environment settings::
1343
1344 >>> import urllib.request
1345 >>> opener = urllib.request.FancyURLopener({})
Berker Peksag9575e182015-04-12 13:52:49 +03001346 >>> with opener.open("http://www.python.org/") as f:
1347 ... f.read().decode('utf-8')
1348 ...
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001349
1350
Antoine Pitroub8eb9cb2010-12-15 19:07:26 +00001351Legacy interface
1352----------------
1353
1354The following functions and classes are ported from the Python 2 module
1355``urllib`` (as opposed to ``urllib2``). They might become deprecated at
1356some point in the future.
1357
Antoine Pitroub8eb9cb2010-12-15 19:07:26 +00001358.. function:: urlretrieve(url, filename=None, reporthook=None, data=None)
1359
Senthil Kumarane24f96a2012-03-13 19:29:33 -07001360 Copy a network object denoted by a URL to a local file. If the URL
1361 points to a local file, the object will not be copied unless filename is supplied.
1362 Return a tuple ``(filename, headers)`` where *filename* is the
Antoine Pitroub8eb9cb2010-12-15 19:07:26 +00001363 local file name under which the object can be found, and *headers* is whatever
1364 the :meth:`info` method of the object returned by :func:`urlopen` returned (for
Senthil Kumarane24f96a2012-03-13 19:29:33 -07001365 a remote object). Exceptions are the same as for :func:`urlopen`.
Antoine Pitroub8eb9cb2010-12-15 19:07:26 +00001366
1367 The second argument, if present, specifies the file location to copy to (if
1368 absent, the location will be a tempfile with a generated name). The third
Andrés Delfino0ba9a0b2018-06-23 23:23:50 -03001369 argument, if present, is a callable that will be called once on
Antoine Pitroub8eb9cb2010-12-15 19:07:26 +00001370 establishment of the network connection and once after each block read
Andrés Delfino0ba9a0b2018-06-23 23:23:50 -03001371 thereafter. The callable will be passed three arguments; a count of blocks
Antoine Pitroub8eb9cb2010-12-15 19:07:26 +00001372 transferred so far, a block size in bytes, and the total size of the file. The
1373 third argument may be ``-1`` on older FTP servers which do not return a file
1374 size in response to a retrieval request.
1375
Senthil Kumarane24f96a2012-03-13 19:29:33 -07001376 The following example illustrates the most common usage scenario::
1377
1378 >>> import urllib.request
1379 >>> local_filename, headers = urllib.request.urlretrieve('http://python.org/')
1380 >>> html = open(local_filename)
1381 >>> html.close()
1382
Antoine Pitroub8eb9cb2010-12-15 19:07:26 +00001383 If the *url* uses the :file:`http:` scheme identifier, the optional *data*
Senthil Kumarane24f96a2012-03-13 19:29:33 -07001384 argument may be given to specify a ``POST`` request (normally the request
Senthil Kumaran87684e62012-03-14 18:08:13 -07001385 type is ``GET``). The *data* argument must be a bytes object in standard
Senthil Kumarane24f96a2012-03-13 19:29:33 -07001386 :mimetype:`application/x-www-form-urlencoded` format; see the
Serhiy Storchaka5e1c0532013-10-13 20:06:50 +03001387 :func:`urllib.parse.urlencode` function.
Antoine Pitroub8eb9cb2010-12-15 19:07:26 +00001388
1389 :func:`urlretrieve` will raise :exc:`ContentTooShortError` when it detects that
1390 the amount of data available was less than the expected amount (which is the
1391 size reported by a *Content-Length* header). This can occur, for example, when
1392 the download is interrupted.
1393
1394 The *Content-Length* is treated as a lower bound: if there's more data to read,
Senthil Kumarane24f96a2012-03-13 19:29:33 -07001395 urlretrieve reads more data, but if less data is available, it raises the
1396 exception.
Antoine Pitroub8eb9cb2010-12-15 19:07:26 +00001397
1398 You can still retrieve the downloaded data in this case, it is stored in the
1399 :attr:`content` attribute of the exception instance.
1400
Senthil Kumarane24f96a2012-03-13 19:29:33 -07001401 If no *Content-Length* header was supplied, urlretrieve can not check the size
1402 of the data it has downloaded, and just returns it. In this case you just have
1403 to assume that the download was successful.
Antoine Pitroub8eb9cb2010-12-15 19:07:26 +00001404
1405.. function:: urlcleanup()
1406
Senthil Kumarane24f96a2012-03-13 19:29:33 -07001407 Cleans up temporary files that may have been left behind by previous
1408 calls to :func:`urlretrieve`.
Antoine Pitroub8eb9cb2010-12-15 19:07:26 +00001409
1410.. class:: URLopener(proxies=None, **x509)
1411
Senthil Kumaran6227c692013-03-18 17:09:50 -07001412 .. deprecated:: 3.3
1413
Antoine Pitroub8eb9cb2010-12-15 19:07:26 +00001414 Base class for opening and reading URLs. Unless you need to support opening
1415 objects using schemes other than :file:`http:`, :file:`ftp:`, or :file:`file:`,
1416 you probably want to use :class:`FancyURLopener`.
1417
1418 By default, the :class:`URLopener` class sends a :mailheader:`User-Agent` header
1419 of ``urllib/VVV``, where *VVV* is the :mod:`urllib` version number.
1420 Applications can define their own :mailheader:`User-Agent` header by subclassing
1421 :class:`URLopener` or :class:`FancyURLopener` and setting the class attribute
1422 :attr:`version` to an appropriate string value in the subclass definition.
1423
1424 The optional *proxies* parameter should be a dictionary mapping scheme names to
1425 proxy URLs, where an empty dictionary turns proxies off completely. Its default
1426 value is ``None``, in which case environmental proxy settings will be used if
1427 present, as discussed in the definition of :func:`urlopen`, above.
1428
1429 Additional keyword parameters, collected in *x509*, may be used for
1430 authentication of the client when using the :file:`https:` scheme. The keywords
1431 *key_file* and *cert_file* are supported to provide an SSL key and certificate;
1432 both are needed to support client authentication.
1433
Antoine Pitrou4272d6a2011-10-12 19:10:10 +02001434 :class:`URLopener` objects will raise an :exc:`OSError` exception if the server
Antoine Pitroub8eb9cb2010-12-15 19:07:26 +00001435 returns an error code.
1436
Martin Panteref107ee2017-01-24 00:26:56 +00001437 .. method:: open(fullurl, data=None)
Antoine Pitroub8eb9cb2010-12-15 19:07:26 +00001438
Martin Panteref107ee2017-01-24 00:26:56 +00001439 Open *fullurl* using the appropriate protocol. This method sets up cache and
1440 proxy information, then calls the appropriate open method with its input
1441 arguments. If the scheme is not recognized, :meth:`open_unknown` is called.
1442 The *data* argument has the same meaning as the *data* argument of
1443 :func:`urlopen`.
Antoine Pitroub8eb9cb2010-12-15 19:07:26 +00001444
Gregory P. Smith2fb2bc82019-04-10 02:17:48 -07001445 This method always quotes *fullurl* using :func:`~urllib.parse.quote`.
Antoine Pitroub8eb9cb2010-12-15 19:07:26 +00001446
Martin Panteref107ee2017-01-24 00:26:56 +00001447 .. method:: open_unknown(fullurl, data=None)
Antoine Pitroub8eb9cb2010-12-15 19:07:26 +00001448
Martin Panteref107ee2017-01-24 00:26:56 +00001449 Overridable interface to open unknown URL types.
Antoine Pitroub8eb9cb2010-12-15 19:07:26 +00001450
1451
Martin Panteref107ee2017-01-24 00:26:56 +00001452 .. method:: retrieve(url, filename=None, reporthook=None, data=None)
Antoine Pitroub8eb9cb2010-12-15 19:07:26 +00001453
Martin Panteref107ee2017-01-24 00:26:56 +00001454 Retrieves the contents of *url* and places it in *filename*. The return value
1455 is a tuple consisting of a local filename and either an
1456 :class:`email.message.Message` object containing the response headers (for remote
1457 URLs) or ``None`` (for local URLs). The caller must then open and read the
1458 contents of *filename*. If *filename* is not given and the URL refers to a
1459 local file, the input filename is returned. If the URL is non-local and
1460 *filename* is not given, the filename is the output of :func:`tempfile.mktemp`
1461 with a suffix that matches the suffix of the last path component of the input
1462 URL. If *reporthook* is given, it must be a function accepting three numeric
1463 parameters: A chunk number, the maximum size chunks are read in and the total size of the download
1464 (-1 if unknown). It will be called once at the start and after each chunk of data is read from the
1465 network. *reporthook* is ignored for local URLs.
Antoine Pitroub8eb9cb2010-12-15 19:07:26 +00001466
Martin Panteref107ee2017-01-24 00:26:56 +00001467 If the *url* uses the :file:`http:` scheme identifier, the optional *data*
1468 argument may be given to specify a ``POST`` request (normally the request type
1469 is ``GET``). The *data* argument must in standard
1470 :mimetype:`application/x-www-form-urlencoded` format; see the
1471 :func:`urllib.parse.urlencode` function.
Antoine Pitroub8eb9cb2010-12-15 19:07:26 +00001472
1473
Martin Panteref107ee2017-01-24 00:26:56 +00001474 .. attribute:: version
Antoine Pitroub8eb9cb2010-12-15 19:07:26 +00001475
Martin Panteref107ee2017-01-24 00:26:56 +00001476 Variable that specifies the user agent of the opener object. To get
1477 :mod:`urllib` to tell servers that it is a particular user agent, set this in a
1478 subclass as a class variable or in the constructor before calling the base
1479 constructor.
Antoine Pitroub8eb9cb2010-12-15 19:07:26 +00001480
1481
1482.. class:: FancyURLopener(...)
1483
Senthil Kumaran6227c692013-03-18 17:09:50 -07001484 .. deprecated:: 3.3
1485
Antoine Pitroub8eb9cb2010-12-15 19:07:26 +00001486 :class:`FancyURLopener` subclasses :class:`URLopener` providing default handling
1487 for the following HTTP response codes: 301, 302, 303, 307 and 401. For the 30x
1488 response codes listed above, the :mailheader:`Location` header is used to fetch
1489 the actual URL. For 401 response codes (authentication required), basic HTTP
1490 authentication is performed. For the 30x response codes, recursion is bounded
1491 by the value of the *maxtries* attribute, which defaults to 10.
1492
1493 For all other response codes, the method :meth:`http_error_default` is called
1494 which you can override in subclasses to handle the error appropriately.
1495
1496 .. note::
1497
1498 According to the letter of :rfc:`2616`, 301 and 302 responses to POST requests
1499 must not be automatically redirected without confirmation by the user. In
1500 reality, browsers do allow automatic redirection of these responses, changing
1501 the POST to a GET, and :mod:`urllib` reproduces this behaviour.
1502
1503 The parameters to the constructor are the same as those for :class:`URLopener`.
1504
1505 .. note::
1506
1507 When performing basic authentication, a :class:`FancyURLopener` instance calls
1508 its :meth:`prompt_user_passwd` method. The default implementation asks the
1509 users for the required information on the controlling terminal. A subclass may
1510 override this method to support more appropriate behavior if needed.
1511
1512 The :class:`FancyURLopener` class offers one additional method that should be
1513 overloaded to provide the appropriate behavior:
1514
1515 .. method:: prompt_user_passwd(host, realm)
1516
1517 Return information needed to authenticate the user at the given host in the
1518 specified security realm. The return value should be a tuple, ``(user,
1519 password)``, which can be used for basic authentication.
1520
1521 The implementation prompts for this information on the terminal; an application
1522 should override this method to use an appropriate interaction model in the local
1523 environment.
1524
1525
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001526:mod:`urllib.request` Restrictions
1527----------------------------------
1528
1529 .. index::
1530 pair: HTTP; protocol
1531 pair: FTP; protocol
1532
Florent Xicluna83386da2011-10-28 22:03:55 +02001533* Currently, only the following protocols are supported: HTTP (versions 0.9 and
Antoine Pitroudf204be2012-11-24 17:59:08 +01001534 1.0), FTP, local files, and data URLs.
1535
1536 .. versionchanged:: 3.4 Added support for data URLs.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001537
Florent Xicluna305bb662011-10-28 22:14:41 +02001538* The caching feature of :func:`urlretrieve` has been disabled until someone
1539 finds the time to hack proper processing of Expiration time headers.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001540
1541* There should be a function to query whether a particular URL is in the cache.
1542
1543* For backward compatibility, if a URL appears to point to a local file but the
1544 file can't be opened, the URL is re-interpreted using the FTP protocol. This
1545 can sometimes cause confusing error messages.
1546
1547* The :func:`urlopen` and :func:`urlretrieve` functions can cause arbitrarily
1548 long delays while waiting for a network connection to be set up. This means
1549 that it is difficult to build an interactive Web client using these functions
1550 without using threads.
1551
1552 .. index::
1553 single: HTML
1554 pair: HTTP; protocol
1555
1556* The data returned by :func:`urlopen` or :func:`urlretrieve` is the raw data
1557 returned by the server. This may be binary data (such as an image), plain text
1558 or (for example) HTML. The HTTP protocol provides type information in the reply
1559 header, which can be inspected by looking at the :mailheader:`Content-Type`
1560 header. If the returned data is HTML, you can use the module
1561 :mod:`html.parser` to parse it.
1562
1563 .. index:: single: FTP
1564
1565* The code handling the FTP protocol cannot differentiate between a file and a
1566 directory. This can lead to unexpected behavior when attempting to read a URL
1567 that points to a file that is not accessible. If the URL ends in a ``/``, it is
1568 assumed to refer to a directory and will be handled accordingly. But if an
1569 attempt to read a file leads to a 550 error (meaning the URL cannot be found or
1570 is not accessible, often for permission reasons), then the path is treated as a
1571 directory in order to handle the case when a directory is specified by a URL but
1572 the trailing ``/`` has been left off. This can cause misleading results when
1573 you try to fetch a file whose read permissions make it inaccessible; the FTP
1574 code will try to read it, fail with a 550 error, and then perform a directory
1575 listing for the unreadable file. If fine-grained control is needed, consider
Éric Araujo09eb9802011-03-20 18:30:37 +01001576 using the :mod:`ftplib` module, subclassing :class:`FancyURLopener`, or changing
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001577 *_urlopener* to meet your needs.
1578
Georg Brandl0f7ede42008-06-23 11:23:31 +00001579
1580
Georg Brandl8175dae2010-11-29 14:53:15 +00001581:mod:`urllib.response` --- Response classes used by urllib
1582==========================================================
Georg Brandl0f7ede42008-06-23 11:23:31 +00001583
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001584.. module:: urllib.response
1585 :synopsis: Response classes used by urllib.
1586
1587The :mod:`urllib.response` module defines functions and classes which define a
Georg Brandl0f7ede42008-06-23 11:23:31 +00001588minimal file like interface, including ``read()`` and ``readline()``. The
Ezio Melottib9701422010-11-18 19:48:27 +00001589typical response object is an addinfourl instance, which defines an ``info()``
Georg Brandl0f7ede42008-06-23 11:23:31 +00001590method and that returns headers and a ``geturl()`` method that returns the url.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001591Functions defined by this module are used internally by the
1592:mod:`urllib.request` module.
1593