blob: 3fe4f25015a65a6b504d4c1168321bccd3cb9ad9 [file] [log] [blame]
Georg Brandl8ec7f652007-08-15 14:28:01 +00001:mod:`urllib2` --- extensible library for opening URLs
2======================================================
3
4.. module:: urllib2
5 :synopsis: Next generation URL opening library.
6.. moduleauthor:: Jeremy Hylton <jhylton@users.sourceforge.net>
7.. sectionauthor:: Moshe Zadka <moshez@users.sourceforge.net>
8
9
Brett Cannon97aa1ae2008-07-11 00:12:52 +000010.. note::
11 The :mod:`urllib2` module has been split across several modules in
Ezio Melotti510ff542012-05-03 19:21:40 +030012 Python 3 named :mod:`urllib.request` and :mod:`urllib.error`.
Brett Cannon97aa1ae2008-07-11 00:12:52 +000013 The :term:`2to3` tool will automatically adapt imports when converting
Ezio Melotti510ff542012-05-03 19:21:40 +030014 your sources to Python 3.
Brett Cannon97aa1ae2008-07-11 00:12:52 +000015
16
Georg Brandl8ec7f652007-08-15 14:28:01 +000017The :mod:`urllib2` module defines functions and classes which help in opening
18URLs (mostly HTTP) in a complex world --- basic and digest authentication,
19redirections, cookies and more.
20
Benjamin Peterson2c6ca8a2015-04-20 18:20:27 -040021.. seealso::
22
23 The `Requests package <http://requests.readthedocs.org/>`_
24 is recommended for a higher-level http client interface.
25
Antoine Pitrou66bfda82010-09-29 11:30:52 +000026
Georg Brandl8ec7f652007-08-15 14:28:01 +000027The :mod:`urllib2` module defines the following functions:
28
29
Benjamin Petersonfcfb18e2014-11-23 11:42:45 -060030.. function:: urlopen(url[, data[, timeout[, cafile[, capath[, cadefault[, context]]]]])
Georg Brandl8ec7f652007-08-15 14:28:01 +000031
32 Open the URL *url*, which can be either a string or a :class:`Request` object.
33
34 *data* may be a string specifying additional data to send to the server, or
35 ``None`` if no such data is needed. Currently HTTP requests are the only ones
36 that use *data*; the HTTP request will be a POST instead of a GET when the
37 *data* parameter is provided. *data* should be a buffer in the standard
38 :mimetype:`application/x-www-form-urlencoded` format. The
39 :func:`urllib.urlencode` function takes a mapping or sequence of 2-tuples and
Senthil Kumaranb7575ee2010-08-21 16:14:54 +000040 returns a string in this format. urllib2 module sends HTTP/1.1 requests with
Éric Araujoa7cbe282011-09-01 19:49:31 +020041 ``Connection:close`` header included.
Georg Brandl8ec7f652007-08-15 14:28:01 +000042
Georg Brandlab756f62008-05-11 11:09:35 +000043 The optional *timeout* parameter specifies a timeout in seconds for blocking
Facundo Batista4f1b1ed2008-05-29 16:39:26 +000044 operations like the connection attempt (if not specified, the global default
Senthil Kumaran30630b92010-10-05 18:45:00 +000045 timeout setting will be used). This actually only works for HTTP, HTTPS and
Ned Deily40ce0142014-11-23 20:55:55 -080046 FTP connections.
Benjamin Petersonfcfb18e2014-11-23 11:42:45 -060047
48 If *context* is specified, it must be a :class:`ssl.SSLContext` instance
49 describing the various SSL options. See :class:`~httplib.HTTPSConnection` for
50 more details.
51
52 The optional *cafile* and *capath* parameters specify a set of trusted CA
53 certificates for HTTPS requests. *cafile* should point to a single file
54 containing a bundle of CA certificates, whereas *capath* should point to a
55 directory of hashed certificate files. More information can be found in
56 :meth:`ssl.SSLContext.load_verify_locations`.
57
58 The *cadefault* parameter is ignored.
Georg Brandl8ec7f652007-08-15 14:28:01 +000059
Berker Peksag86af3102014-06-28 03:12:37 +030060 This function returns a file-like object with three additional methods:
Georg Brandl8ec7f652007-08-15 14:28:01 +000061
Georg Brandl586a57a2008-02-02 09:56:20 +000062 * :meth:`geturl` --- return the URL of the resource retrieved, commonly used to
63 determine if a redirect was followed
Georg Brandl8ec7f652007-08-15 14:28:01 +000064
Senthil Kumaran8c996ef2010-06-28 17:07:40 +000065 * :meth:`info` --- return the meta-information of the page, such as headers,
66 in the form of an :class:`mimetools.Message` instance
Georg Brandl586a57a2008-02-02 09:56:20 +000067 (see `Quick Reference to HTTP Headers <http://www.cs.tut.fi/~jkorpela/http.html>`_)
Georg Brandl8ec7f652007-08-15 14:28:01 +000068
Senthil Kumaran785d1b12013-02-07 00:51:34 -080069 * :meth:`getcode` --- return the HTTP status code of the response.
70
Georg Brandl8ec7f652007-08-15 14:28:01 +000071 Raises :exc:`URLError` on errors.
72
73 Note that ``None`` may be returned if no handler handles the request (though the
74 default installed global :class:`OpenerDirector` uses :class:`UnknownHandler` to
75 ensure this never happens).
76
R David Murray806c1c92013-04-28 11:16:21 -040077 In addition, if proxy settings are detected (for example, when a ``*_proxy``
78 environment variable like :envvar:`http_proxy` is set),
79 :class:`ProxyHandler` is default installed and makes sure the requests are
80 handled through the proxy.
Senthil Kumaran45a505f2009-10-18 01:24:41 +000081
Georg Brandl8ec7f652007-08-15 14:28:01 +000082 .. versionchanged:: 2.6
Benjamin Petersonfcfb18e2014-11-23 11:42:45 -060083 *timeout* was added.
84
85 .. versionchanged:: 2.7.9
86 *cafile*, *capath*, *cadefault*, and *context* were added.
Georg Brandl8ec7f652007-08-15 14:28:01 +000087
88
89.. function:: install_opener(opener)
90
91 Install an :class:`OpenerDirector` instance as the default global opener.
92 Installing an opener is only necessary if you want urlopen to use that opener;
93 otherwise, simply call :meth:`OpenerDirector.open` instead of :func:`urlopen`.
94 The code does not check for a real :class:`OpenerDirector`, and any class with
95 the appropriate interface will work.
96
97
98.. function:: build_opener([handler, ...])
99
100 Return an :class:`OpenerDirector` instance, which chains the handlers in the
101 order given. *handler*\s can be either instances of :class:`BaseHandler`, or
102 subclasses of :class:`BaseHandler` (in which case it must be possible to call
103 the constructor without any parameters). Instances of the following classes
104 will be in front of the *handler*\s, unless the *handler*\s contain them,
R David Murray806c1c92013-04-28 11:16:21 -0400105 instances of them or subclasses of them: :class:`ProxyHandler` (if proxy
106 settings are detected),
Georg Brandl8ec7f652007-08-15 14:28:01 +0000107 :class:`UnknownHandler`, :class:`HTTPHandler`, :class:`HTTPDefaultErrorHandler`,
108 :class:`HTTPRedirectHandler`, :class:`FTPHandler`, :class:`FileHandler`,
109 :class:`HTTPErrorProcessor`.
110
Guido van Rossum8ee23bb2007-08-27 19:11:11 +0000111 If the Python installation has SSL support (i.e., if the :mod:`ssl` module can be imported),
Georg Brandl8ec7f652007-08-15 14:28:01 +0000112 :class:`HTTPSHandler` will also be added.
113
114 Beginning in Python 2.3, a :class:`BaseHandler` subclass may also change its
Senthil Kumaran6f18b982011-07-04 12:50:02 -0700115 :attr:`handler_order` attribute to modify its position in the handlers
Georg Brandl8ec7f652007-08-15 14:28:01 +0000116 list.
117
118The following exceptions are raised as appropriate:
119
120
121.. exception:: URLError
122
123 The handlers raise this exception (or derived exceptions) when they run into a
124 problem. It is a subclass of :exc:`IOError`.
125
Georg Brandl586a57a2008-02-02 09:56:20 +0000126 .. attribute:: reason
127
128 The reason for this error. It can be a message string or another exception
129 instance (:exc:`socket.error` for remote URLs, :exc:`OSError` for local
130 URLs).
131
Georg Brandl8ec7f652007-08-15 14:28:01 +0000132
133.. exception:: HTTPError
134
Georg Brandl586a57a2008-02-02 09:56:20 +0000135 Though being an exception (a subclass of :exc:`URLError`), an :exc:`HTTPError`
136 can also function as a non-exceptional file-like return value (the same thing
137 that :func:`urlopen` returns). This is useful when handling exotic HTTP
138 errors, such as requests for authentication.
139
140 .. attribute:: code
141
Georg Brandlc62ef8b2009-01-03 20:55:06 +0000142 An HTTP status code as defined in `RFC 2616 <http://www.faqs.org/rfcs/rfc2616.html>`_.
Georg Brandl586a57a2008-02-02 09:56:20 +0000143 This numeric value corresponds to a value found in the dictionary of
144 codes as found in :attr:`BaseHTTPServer.BaseHTTPRequestHandler.responses`.
145
Senthil Kumaranbfb09892012-12-09 13:36:40 -0800146 .. attribute:: reason
Georg Brandl586a57a2008-02-02 09:56:20 +0000147
Senthil Kumaranbfb09892012-12-09 13:36:40 -0800148 The reason for this error. It can be a message string or another exception
149 instance.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000150
151The following classes are provided:
152
153
Georg Brandl586a57a2008-02-02 09:56:20 +0000154.. class:: Request(url[, data][, headers][, origin_req_host][, unverifiable])
Georg Brandl8ec7f652007-08-15 14:28:01 +0000155
156 This class is an abstraction of a URL request.
157
158 *url* should be a string containing a valid URL.
159
160 *data* may be a string specifying additional data to send to the server, or
161 ``None`` if no such data is needed. Currently HTTP requests are the only ones
162 that use *data*; the HTTP request will be a POST instead of a GET when the
163 *data* parameter is provided. *data* should be a buffer in the standard
164 :mimetype:`application/x-www-form-urlencoded` format. The
165 :func:`urllib.urlencode` function takes a mapping or sequence of 2-tuples and
166 returns a string in this format.
167
168 *headers* should be a dictionary, and will be treated as if :meth:`add_header`
Georg Brandl586a57a2008-02-02 09:56:20 +0000169 was called with each key and value as arguments. This is often used to "spoof"
170 the ``User-Agent`` header, which is used by a browser to identify itself --
171 some HTTP servers only allow requests coming from common browsers as opposed
172 to scripts. For example, Mozilla Firefox may identify itself as ``"Mozilla/5.0
173 (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11"``, while :mod:`urllib2`'s
174 default user agent string is ``"Python-urllib/2.6"`` (on Python 2.6).
Georg Brandl8ec7f652007-08-15 14:28:01 +0000175
176 The final two arguments are only of interest for correct handling of third-party
177 HTTP cookies:
178
179 *origin_req_host* should be the request-host of the origin transaction, as
180 defined by :rfc:`2965`. It defaults to ``cookielib.request_host(self)``. This
181 is the host name or IP address of the original request that was initiated by the
182 user. For example, if the request is for an image in an HTML document, this
183 should be the request-host of the request for the page containing the image.
184
185 *unverifiable* should indicate whether the request is unverifiable, as defined
Serhiy Storchaka26d936a2013-11-29 12:16:53 +0200186 by RFC 2965. It defaults to ``False``. An unverifiable request is one whose URL
Georg Brandl8ec7f652007-08-15 14:28:01 +0000187 the user did not have the option to approve. For example, if the request is for
188 an image in an HTML document, and the user had no option to approve the
189 automatic fetching of the image, this should be true.
190
191
192.. class:: OpenerDirector()
193
194 The :class:`OpenerDirector` class opens URLs via :class:`BaseHandler`\ s chained
195 together. It manages the chaining of handlers, and recovery from errors.
196
197
198.. class:: BaseHandler()
199
200 This is the base class for all registered handlers --- and handles only the
201 simple mechanics of registration.
202
203
204.. class:: HTTPDefaultErrorHandler()
205
206 A class which defines a default handler for HTTP error responses; all responses
207 are turned into :exc:`HTTPError` exceptions.
208
209
210.. class:: HTTPRedirectHandler()
211
212 A class to handle redirections.
213
214
215.. class:: HTTPCookieProcessor([cookiejar])
216
217 A class to handle HTTP Cookies.
218
219
220.. class:: ProxyHandler([proxies])
221
222 Cause requests to go through a proxy. If *proxies* is given, it must be a
Senthil Kumaran45a505f2009-10-18 01:24:41 +0000223 dictionary mapping protocol names to URLs of proxies. The default is to read
224 the list of proxies from the environment variables
R David Murray806c1c92013-04-28 11:16:21 -0400225 :envvar:`<protocol>_proxy`. If no proxy environment variables are set, then
226 in a Windows environment proxy settings are obtained from the registry's
227 Internet Settings section, and in a Mac OS X environment proxy information
Senthil Kumaran83f1ef62009-10-18 01:58:45 +0000228 is retrieved from the OS X System Configuration Framework.
Senthil Kumaran45a505f2009-10-18 01:24:41 +0000229
Sean Reifscheider45ea86c2008-03-20 03:20:48 +0000230 To disable autodetected proxy pass an empty dictionary.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000231
232
233.. class:: HTTPPasswordMgr()
234
235 Keep a database of ``(realm, uri) -> (user, password)`` mappings.
236
237
238.. class:: HTTPPasswordMgrWithDefaultRealm()
239
240 Keep a database of ``(realm, uri) -> (user, password)`` mappings. A realm of
241 ``None`` is considered a catch-all realm, which is searched if no other realm
242 fits.
243
244
245.. class:: AbstractBasicAuthHandler([password_mgr])
246
247 This is a mixin class that helps with HTTP authentication, both to the remote
248 host and to a proxy. *password_mgr*, if given, should be something that is
249 compatible with :class:`HTTPPasswordMgr`; refer to section
250 :ref:`http-password-mgr` for information on the interface that must be
251 supported.
252
253
254.. class:: HTTPBasicAuthHandler([password_mgr])
255
256 Handle authentication with the remote host. *password_mgr*, if given, should be
257 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
258 :ref:`http-password-mgr` for information on the interface that must be
259 supported.
260
261
262.. class:: ProxyBasicAuthHandler([password_mgr])
263
264 Handle authentication with the proxy. *password_mgr*, if given, should be
265 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
266 :ref:`http-password-mgr` for information on the interface that must be
267 supported.
268
269
270.. class:: AbstractDigestAuthHandler([password_mgr])
271
272 This is a mixin class that helps with HTTP authentication, both to the remote
273 host and to a proxy. *password_mgr*, if given, should be something that is
274 compatible with :class:`HTTPPasswordMgr`; refer to section
275 :ref:`http-password-mgr` for information on the interface that must be
276 supported.
277
278
279.. class:: HTTPDigestAuthHandler([password_mgr])
280
281 Handle authentication with the remote host. *password_mgr*, if given, should be
282 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
283 :ref:`http-password-mgr` for information on the interface that must be
284 supported.
285
286
287.. class:: ProxyDigestAuthHandler([password_mgr])
288
289 Handle authentication with the proxy. *password_mgr*, if given, should be
290 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
291 :ref:`http-password-mgr` for information on the interface that must be
292 supported.
293
294
295.. class:: HTTPHandler()
296
297 A class to handle opening of HTTP URLs.
298
299
Benjamin Peterson73d50312014-12-07 14:25:38 -0500300.. class:: HTTPSHandler([debuglevel[, context]])
Georg Brandl8ec7f652007-08-15 14:28:01 +0000301
Benjamin Peterson73d50312014-12-07 14:25:38 -0500302 A class to handle opening of HTTPS URLs. *context* has the same meaning as
303 for :class:`httplib.HTTPSConnection`.
Benjamin Petersonfcfb18e2014-11-23 11:42:45 -0600304
305 .. versionchanged:: 2.7.9
Benjamin Peterson73d50312014-12-07 14:25:38 -0500306 *context* added.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000307
308
309.. class:: FileHandler()
310
311 Open local files.
312
313
314.. class:: FTPHandler()
315
316 Open FTP URLs.
317
318
319.. class:: CacheFTPHandler()
320
321 Open FTP URLs, keeping a cache of open FTP connections to minimize delays.
322
323
324.. class:: UnknownHandler()
325
326 A catch-all class to handle unknown URLs.
327
328
Senthil Kumaran612b2b32011-07-18 06:44:11 +0800329.. class:: HTTPErrorProcessor()
330
331 Process HTTP error responses.
332
333
Georg Brandl8ec7f652007-08-15 14:28:01 +0000334.. _request-objects:
335
336Request Objects
337---------------
338
339The following methods describe all of :class:`Request`'s public interface, and
340so all must be overridden in subclasses.
341
342
343.. method:: Request.add_data(data)
344
345 Set the :class:`Request` data to *data*. This is ignored by all handlers except
346 HTTP handlers --- and there it should be a byte string, and will change the
347 request to be ``POST`` rather than ``GET``.
348
349
350.. method:: Request.get_method()
351
352 Return a string indicating the HTTP request method. This is only meaningful for
353 HTTP requests, and currently always returns ``'GET'`` or ``'POST'``.
354
355
356.. method:: Request.has_data()
357
358 Return whether the instance has a non-\ ``None`` data.
359
360
361.. method:: Request.get_data()
362
363 Return the instance's data.
364
365
366.. method:: Request.add_header(key, val)
367
368 Add another header to the request. Headers are currently ignored by all
369 handlers except HTTP handlers, where they are added to the list of headers sent
370 to the server. Note that there cannot be more than one header with the same
371 name, and later calls will overwrite previous calls in case the *key* collides.
372 Currently, this is no loss of HTTP functionality, since all headers which have
373 meaning when used more than once have a (header-specific) way of gaining the
374 same functionality using only one header.
375
376
377.. method:: Request.add_unredirected_header(key, header)
378
379 Add a header that will not be added to a redirected request.
380
381 .. versionadded:: 2.4
382
383
384.. method:: Request.has_header(header)
385
386 Return whether the instance has the named header (checks both regular and
387 unredirected).
388
389 .. versionadded:: 2.4
390
391
392.. method:: Request.get_full_url()
393
394 Return the URL given in the constructor.
395
396
397.. method:: Request.get_type()
398
399 Return the type of the URL --- also known as the scheme.
400
401
402.. method:: Request.get_host()
403
404 Return the host to which a connection will be made.
405
406
407.. method:: Request.get_selector()
408
409 Return the selector --- the part of the URL that is sent to the server.
410
411
Senthil Kumaran429d3112012-04-29 11:52:59 +0800412.. method:: Request.get_header(header_name, default=None)
413
414 Return the value of the given header. If the header is not present, return
415 the default value.
416
417
418.. method:: Request.header_items()
419
420 Return a list of tuples (header_name, header_value) of the Request headers.
421
422
Georg Brandl8ec7f652007-08-15 14:28:01 +0000423.. method:: Request.set_proxy(host, type)
424
425 Prepare the request by connecting to a proxy server. The *host* and *type* will
426 replace those of the instance, and the instance's selector will be the original
427 URL given in the constructor.
428
429
430.. method:: Request.get_origin_req_host()
431
432 Return the request-host of the origin transaction, as defined by :rfc:`2965`.
433 See the documentation for the :class:`Request` constructor.
434
435
436.. method:: Request.is_unverifiable()
437
438 Return whether the request is unverifiable, as defined by RFC 2965. See the
439 documentation for the :class:`Request` constructor.
440
441
442.. _opener-director-objects:
443
444OpenerDirector Objects
445----------------------
446
447:class:`OpenerDirector` instances have the following methods:
448
449
450.. method:: OpenerDirector.add_handler(handler)
451
Georg Brandld0eb8f92009-01-01 11:53:55 +0000452 *handler* should be an instance of :class:`BaseHandler`. The following
453 methods are searched, and added to the possible chains (note that HTTP errors
454 are a special case).
Georg Brandl8ec7f652007-08-15 14:28:01 +0000455
Georg Brandld0eb8f92009-01-01 11:53:55 +0000456 * :samp:`{protocol}_open` --- signal that the handler knows how to open
457 *protocol* URLs.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000458
Georg Brandld0eb8f92009-01-01 11:53:55 +0000459 * :samp:`http_error_{type}` --- signal that the handler knows how to handle
460 HTTP errors with HTTP error code *type*.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000461
Georg Brandld0eb8f92009-01-01 11:53:55 +0000462 * :samp:`{protocol}_error` --- signal that the handler knows how to handle
463 errors from (non-\ ``http``) *protocol*.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000464
Georg Brandld0eb8f92009-01-01 11:53:55 +0000465 * :samp:`{protocol}_request` --- signal that the handler knows how to
466 pre-process *protocol* requests.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000467
Georg Brandld0eb8f92009-01-01 11:53:55 +0000468 * :samp:`{protocol}_response` --- signal that the handler knows how to
Georg Brandl8ec7f652007-08-15 14:28:01 +0000469 post-process *protocol* responses.
470
471
472.. method:: OpenerDirector.open(url[, data][, timeout])
473
474 Open the given *url* (which can be a request object or a string), optionally
Georg Brandlab756f62008-05-11 11:09:35 +0000475 passing the given *data*. Arguments, return values and exceptions raised are
476 the same as those of :func:`urlopen` (which simply calls the :meth:`open`
477 method on the currently installed global :class:`OpenerDirector`). The
478 optional *timeout* parameter specifies a timeout in seconds for blocking
Facundo Batista4f1b1ed2008-05-29 16:39:26 +0000479 operations like the connection attempt (if not specified, the global default
Georg Brandlda69add2010-05-21 20:52:46 +0000480 timeout setting will be used). The timeout feature actually works only for
Senthil Kumaran30630b92010-10-05 18:45:00 +0000481 HTTP, HTTPS and FTP connections).
Georg Brandl8ec7f652007-08-15 14:28:01 +0000482
483 .. versionchanged:: 2.6
484 *timeout* was added.
485
486
487.. method:: OpenerDirector.error(proto[, arg[, ...]])
488
489 Handle an error of the given protocol. This will call the registered error
490 handlers for the given protocol with the given arguments (which are protocol
491 specific). The HTTP protocol is a special case which uses the HTTP response
492 code to determine the specific error handler; refer to the :meth:`http_error_\*`
493 methods of the handler classes.
494
495 Return values and exceptions raised are the same as those of :func:`urlopen`.
496
497OpenerDirector objects open URLs in three stages:
498
499The order in which these methods are called within each stage is determined by
500sorting the handler instances.
501
Georg Brandld0eb8f92009-01-01 11:53:55 +0000502#. Every handler with a method named like :samp:`{protocol}_request` has that
Georg Brandl8ec7f652007-08-15 14:28:01 +0000503 method called to pre-process the request.
504
Georg Brandld0eb8f92009-01-01 11:53:55 +0000505#. Handlers with a method named like :samp:`{protocol}_open` are called to handle
Georg Brandl8ec7f652007-08-15 14:28:01 +0000506 the request. This stage ends when a handler either returns a non-\ :const:`None`
507 value (ie. a response), or raises an exception (usually :exc:`URLError`).
508 Exceptions are allowed to propagate.
509
510 In fact, the above algorithm is first tried for methods named
Georg Brandld0eb8f92009-01-01 11:53:55 +0000511 :meth:`default_open`. If all such methods return :const:`None`, the
512 algorithm is repeated for methods named like :samp:`{protocol}_open`. If all
513 such methods return :const:`None`, the algorithm is repeated for methods
514 named :meth:`unknown_open`.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000515
516 Note that the implementation of these methods may involve calls of the parent
Georg Brandl821fc082010-08-01 21:26:45 +0000517 :class:`OpenerDirector` instance's :meth:`~OpenerDirector.open` and
518 :meth:`~OpenerDirector.error` methods.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000519
Georg Brandld0eb8f92009-01-01 11:53:55 +0000520#. Every handler with a method named like :samp:`{protocol}_response` has that
Georg Brandl8ec7f652007-08-15 14:28:01 +0000521 method called to post-process the response.
522
523
524.. _base-handler-objects:
525
526BaseHandler Objects
527-------------------
528
529:class:`BaseHandler` objects provide a couple of methods that are directly
530useful, and others that are meant to be used by derived classes. These are
531intended for direct use:
532
533
534.. method:: BaseHandler.add_parent(director)
535
536 Add a director as parent.
537
538
539.. method:: BaseHandler.close()
540
541 Remove any parents.
542
Senthil Kumaran6f18b982011-07-04 12:50:02 -0700543The following attributes and methods should only be used by classes derived from
Georg Brandl8ec7f652007-08-15 14:28:01 +0000544:class:`BaseHandler`.
545
546.. note::
547
548 The convention has been adopted that subclasses defining
549 :meth:`protocol_request` or :meth:`protocol_response` methods are named
550 :class:`\*Processor`; all others are named :class:`\*Handler`.
551
552
553.. attribute:: BaseHandler.parent
554
555 A valid :class:`OpenerDirector`, which can be used to open using a different
556 protocol, or handle errors.
557
558
559.. method:: BaseHandler.default_open(req)
560
561 This method is *not* defined in :class:`BaseHandler`, but subclasses should
562 define it if they want to catch all URLs.
563
564 This method, if implemented, will be called by the parent
565 :class:`OpenerDirector`. It should return a file-like object as described in
566 the return value of the :meth:`open` of :class:`OpenerDirector`, or ``None``.
567 It should raise :exc:`URLError`, unless a truly exceptional thing happens (for
568 example, :exc:`MemoryError` should not be mapped to :exc:`URLError`).
569
570 This method will be called before any protocol-specific open method.
571
572
573.. method:: BaseHandler.protocol_open(req)
574 :noindex:
575
Georg Brandld0eb8f92009-01-01 11:53:55 +0000576 ("protocol" is to be replaced by the protocol name.)
577
Georg Brandl8ec7f652007-08-15 14:28:01 +0000578 This method is *not* defined in :class:`BaseHandler`, but subclasses should
Georg Brandld0eb8f92009-01-01 11:53:55 +0000579 define it if they want to handle URLs with the given *protocol*.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000580
581 This method, if defined, will be called by the parent :class:`OpenerDirector`.
582 Return values should be the same as for :meth:`default_open`.
583
584
585.. method:: BaseHandler.unknown_open(req)
586
587 This method is *not* defined in :class:`BaseHandler`, but subclasses should
588 define it if they want to catch all URLs with no specific registered handler to
589 open it.
590
591 This method, if implemented, will be called by the :attr:`parent`
592 :class:`OpenerDirector`. Return values should be the same as for
593 :meth:`default_open`.
594
595
596.. method:: BaseHandler.http_error_default(req, fp, code, msg, hdrs)
597
598 This method is *not* defined in :class:`BaseHandler`, but subclasses should
599 override it if they intend to provide a catch-all for otherwise unhandled HTTP
600 errors. It will be called automatically by the :class:`OpenerDirector` getting
601 the error, and should not normally be called in other circumstances.
602
603 *req* will be a :class:`Request` object, *fp* will be a file-like object with
604 the HTTP error body, *code* will be the three-digit code of the error, *msg*
605 will be the user-visible explanation of the code and *hdrs* will be a mapping
606 object with the headers of the error.
607
608 Return values and exceptions raised should be the same as those of
609 :func:`urlopen`.
610
611
612.. method:: BaseHandler.http_error_nnn(req, fp, code, msg, hdrs)
613
614 *nnn* should be a three-digit HTTP error code. This method is also not defined
615 in :class:`BaseHandler`, but will be called, if it exists, on an instance of a
616 subclass, when an HTTP error with code *nnn* occurs.
617
618 Subclasses should override this method to handle specific HTTP errors.
619
620 Arguments, return values and exceptions raised should be the same as for
621 :meth:`http_error_default`.
622
623
624.. method:: BaseHandler.protocol_request(req)
625 :noindex:
626
Georg Brandld0eb8f92009-01-01 11:53:55 +0000627 ("protocol" is to be replaced by the protocol name.)
628
Georg Brandl8ec7f652007-08-15 14:28:01 +0000629 This method is *not* defined in :class:`BaseHandler`, but subclasses should
Georg Brandld0eb8f92009-01-01 11:53:55 +0000630 define it if they want to pre-process requests of the given *protocol*.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000631
632 This method, if defined, will be called by the parent :class:`OpenerDirector`.
633 *req* will be a :class:`Request` object. The return value should be a
634 :class:`Request` object.
635
636
637.. method:: BaseHandler.protocol_response(req, response)
638 :noindex:
639
Georg Brandld0eb8f92009-01-01 11:53:55 +0000640 ("protocol" is to be replaced by the protocol name.)
641
Georg Brandl8ec7f652007-08-15 14:28:01 +0000642 This method is *not* defined in :class:`BaseHandler`, but subclasses should
Georg Brandld0eb8f92009-01-01 11:53:55 +0000643 define it if they want to post-process responses of the given *protocol*.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000644
645 This method, if defined, will be called by the parent :class:`OpenerDirector`.
646 *req* will be a :class:`Request` object. *response* will be an object
647 implementing the same interface as the return value of :func:`urlopen`. The
648 return value should implement the same interface as the return value of
649 :func:`urlopen`.
650
651
652.. _http-redirect-handler:
653
654HTTPRedirectHandler Objects
655---------------------------
656
657.. note::
658
659 Some HTTP redirections require action from this module's client code. If this
660 is the case, :exc:`HTTPError` is raised. See :rfc:`2616` for details of the
661 precise meanings of the various redirection codes.
662
663
Georg Brandl8fba5b32009-02-13 10:40:14 +0000664.. method:: HTTPRedirectHandler.redirect_request(req, fp, code, msg, hdrs, newurl)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000665
666 Return a :class:`Request` or ``None`` in response to a redirect. This is called
667 by the default implementations of the :meth:`http_error_30\*` methods when a
668 redirection is received from the server. If a redirection should take place,
669 return a new :class:`Request` to allow :meth:`http_error_30\*` to perform the
Georg Brandl8fba5b32009-02-13 10:40:14 +0000670 redirect to *newurl*. Otherwise, raise :exc:`HTTPError` if no other handler
671 should try to handle this URL, or return ``None`` if you can't but another
672 handler might.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000673
674 .. note::
675
676 The default implementation of this method does not strictly follow :rfc:`2616`,
677 which says that 301 and 302 responses to ``POST`` requests must not be
678 automatically redirected without confirmation by the user. In reality, browsers
679 do allow automatic redirection of these responses, changing the POST to a
680 ``GET``, and the default implementation reproduces this behavior.
681
682
683.. method:: HTTPRedirectHandler.http_error_301(req, fp, code, msg, hdrs)
684
Georg Brandl8fba5b32009-02-13 10:40:14 +0000685 Redirect to the ``Location:`` or ``URI:`` URL. This method is called by the
686 parent :class:`OpenerDirector` when getting an HTTP 'moved permanently' response.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000687
688
689.. method:: HTTPRedirectHandler.http_error_302(req, fp, code, msg, hdrs)
690
691 The same as :meth:`http_error_301`, but called for the 'found' response.
692
693
694.. method:: HTTPRedirectHandler.http_error_303(req, fp, code, msg, hdrs)
695
696 The same as :meth:`http_error_301`, but called for the 'see other' response.
697
698
699.. method:: HTTPRedirectHandler.http_error_307(req, fp, code, msg, hdrs)
700
701 The same as :meth:`http_error_301`, but called for the 'temporary redirect'
702 response.
703
704
705.. _http-cookie-processor:
706
707HTTPCookieProcessor Objects
708---------------------------
709
710.. versionadded:: 2.4
711
712:class:`HTTPCookieProcessor` instances have one attribute:
713
714
715.. attribute:: HTTPCookieProcessor.cookiejar
716
717 The :class:`cookielib.CookieJar` in which cookies are stored.
718
719
720.. _proxy-handler:
721
722ProxyHandler Objects
723--------------------
724
725
726.. method:: ProxyHandler.protocol_open(request)
727 :noindex:
728
Georg Brandld0eb8f92009-01-01 11:53:55 +0000729 ("protocol" is to be replaced by the protocol name.)
730
731 The :class:`ProxyHandler` will have a method :samp:`{protocol}_open` for every
Georg Brandl8ec7f652007-08-15 14:28:01 +0000732 *protocol* which has a proxy in the *proxies* dictionary given in the
733 constructor. The method will modify requests to go through the proxy, by
734 calling ``request.set_proxy()``, and call the next handler in the chain to
735 actually execute the protocol.
736
737
738.. _http-password-mgr:
739
740HTTPPasswordMgr Objects
741-----------------------
742
743These methods are available on :class:`HTTPPasswordMgr` and
744:class:`HTTPPasswordMgrWithDefaultRealm` objects.
745
746
747.. method:: HTTPPasswordMgr.add_password(realm, uri, user, passwd)
748
749 *uri* can be either a single URI, or a sequence of URIs. *realm*, *user* and
750 *passwd* must be strings. This causes ``(user, passwd)`` to be used as
751 authentication tokens when authentication for *realm* and a super-URI of any of
752 the given URIs is given.
753
754
755.. method:: HTTPPasswordMgr.find_user_password(realm, authuri)
756
757 Get user/password for given realm and URI, if any. This method will return
758 ``(None, None)`` if there is no matching user/password.
759
760 For :class:`HTTPPasswordMgrWithDefaultRealm` objects, the realm ``None`` will be
761 searched if the given *realm* has no matching user/password.
762
763
764.. _abstract-basic-auth-handler:
765
766AbstractBasicAuthHandler Objects
767--------------------------------
768
769
770.. method:: AbstractBasicAuthHandler.http_error_auth_reqed(authreq, host, req, headers)
771
772 Handle an authentication request by getting a user/password pair, and re-trying
773 the request. *authreq* should be the name of the header where the information
774 about the realm is included in the request, *host* specifies the URL and path to
775 authenticate for, *req* should be the (failed) :class:`Request` object, and
776 *headers* should be the error headers.
777
778 *host* is either an authority (e.g. ``"python.org"``) or a URL containing an
779 authority component (e.g. ``"http://python.org/"``). In either case, the
780 authority must not contain a userinfo component (so, ``"python.org"`` and
781 ``"python.org:80"`` are fine, ``"joe:password@python.org"`` is not).
782
783
784.. _http-basic-auth-handler:
785
786HTTPBasicAuthHandler Objects
787----------------------------
788
789
790.. method:: HTTPBasicAuthHandler.http_error_401(req, fp, code, msg, hdrs)
791
792 Retry the request with authentication information, if available.
793
794
795.. _proxy-basic-auth-handler:
796
797ProxyBasicAuthHandler Objects
798-----------------------------
799
800
801.. method:: ProxyBasicAuthHandler.http_error_407(req, fp, code, msg, hdrs)
802
803 Retry the request with authentication information, if available.
804
805
806.. _abstract-digest-auth-handler:
807
808AbstractDigestAuthHandler Objects
809---------------------------------
810
811
812.. method:: AbstractDigestAuthHandler.http_error_auth_reqed(authreq, host, req, headers)
813
814 *authreq* should be the name of the header where the information about the realm
815 is included in the request, *host* should be the host to authenticate to, *req*
816 should be the (failed) :class:`Request` object, and *headers* should be the
817 error headers.
818
819
820.. _http-digest-auth-handler:
821
822HTTPDigestAuthHandler Objects
823-----------------------------
824
825
826.. method:: HTTPDigestAuthHandler.http_error_401(req, fp, code, msg, hdrs)
827
828 Retry the request with authentication information, if available.
829
830
831.. _proxy-digest-auth-handler:
832
833ProxyDigestAuthHandler Objects
834------------------------------
835
836
837.. method:: ProxyDigestAuthHandler.http_error_407(req, fp, code, msg, hdrs)
838
839 Retry the request with authentication information, if available.
840
841
842.. _http-handler-objects:
843
844HTTPHandler Objects
845-------------------
846
847
848.. method:: HTTPHandler.http_open(req)
849
850 Send an HTTP request, which can be either GET or POST, depending on
851 ``req.has_data()``.
852
853
854.. _https-handler-objects:
855
856HTTPSHandler Objects
857--------------------
858
859
860.. method:: HTTPSHandler.https_open(req)
861
862 Send an HTTPS request, which can be either GET or POST, depending on
863 ``req.has_data()``.
864
865
866.. _file-handler-objects:
867
868FileHandler Objects
869-------------------
870
871
872.. method:: FileHandler.file_open(req)
873
874 Open the file locally, if there is no host name, or the host name is
875 ``'localhost'``. Change the protocol to ``ftp`` otherwise, and retry opening it
876 using :attr:`parent`.
877
878
879.. _ftp-handler-objects:
880
881FTPHandler Objects
882------------------
883
884
885.. method:: FTPHandler.ftp_open(req)
886
887 Open the FTP file indicated by *req*. The login is always done with empty
888 username and password.
889
890
891.. _cacheftp-handler-objects:
892
893CacheFTPHandler Objects
894-----------------------
895
896:class:`CacheFTPHandler` objects are :class:`FTPHandler` objects with the
897following additional methods:
898
899
900.. method:: CacheFTPHandler.setTimeout(t)
901
902 Set timeout of connections to *t* seconds.
903
904
905.. method:: CacheFTPHandler.setMaxConns(m)
906
907 Set maximum number of cached connections to *m*.
908
909
910.. _unknown-handler-objects:
911
912UnknownHandler Objects
913----------------------
914
915
916.. method:: UnknownHandler.unknown_open()
917
918 Raise a :exc:`URLError` exception.
919
920
921.. _http-error-processor-objects:
922
923HTTPErrorProcessor Objects
924--------------------------
925
926.. versionadded:: 2.4
927
928
Senthil Kumarana2dd57a2011-07-18 07:16:02 +0800929.. method:: HTTPErrorProcessor.http_response()
Georg Brandl8ec7f652007-08-15 14:28:01 +0000930
931 Process HTTP error responses.
932
933 For 200 error codes, the response object is returned immediately.
934
935 For non-200 error codes, this simply passes the job on to the
Georg Brandld0eb8f92009-01-01 11:53:55 +0000936 :samp:`{protocol}_error_code` handler methods, via
937 :meth:`OpenerDirector.error`. Eventually,
938 :class:`urllib2.HTTPDefaultErrorHandler` will raise an :exc:`HTTPError` if no
939 other handler handles the error.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000940
Senthil Kumarana2dd57a2011-07-18 07:16:02 +0800941.. method:: HTTPErrorProcessor.https_response()
942
Senthil Kumaran1c0ebc02011-07-18 07:18:40 +0800943 Process HTTPS error responses.
944
Senthil Kumarana2dd57a2011-07-18 07:16:02 +0800945 The behavior is same as :meth:`http_response`.
946
Georg Brandl8ec7f652007-08-15 14:28:01 +0000947
948.. _urllib2-examples:
949
950Examples
951--------
952
953This example gets the python.org main page and displays the first 100 bytes of
954it::
955
956 >>> import urllib2
957 >>> f = urllib2.urlopen('http://www.python.org/')
958 >>> print f.read(100)
959 <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
960 <?xml-stylesheet href="./css/ht2html
961
962Here we are sending a data-stream to the stdin of a CGI and reading the data it
963returns to us. Note that this example will only work when the Python
964installation supports SSL. ::
965
966 >>> import urllib2
967 >>> req = urllib2.Request(url='https://localhost/cgi-bin/test.cgi',
968 ... data='This data is passed to stdin of the CGI')
969 >>> f = urllib2.urlopen(req)
970 >>> print f.read()
971 Got Data: "This data is passed to stdin of the CGI"
972
973The code for the sample CGI used in the above example is::
974
975 #!/usr/bin/env python
976 import sys
977 data = sys.stdin.read()
978 print 'Content-type: text-plain\n\nGot Data: "%s"' % data
979
980Use of Basic HTTP Authentication::
981
982 import urllib2
983 # Create an OpenerDirector with support for Basic HTTP Authentication...
984 auth_handler = urllib2.HTTPBasicAuthHandler()
985 auth_handler.add_password(realm='PDQ Application',
986 uri='https://mahler:8092/site-updates.py',
987 user='klem',
988 passwd='kadidd!ehopper')
989 opener = urllib2.build_opener(auth_handler)
990 # ...and install it globally so it can be used with urlopen.
991 urllib2.install_opener(opener)
992 urllib2.urlopen('http://www.example.com/login.html')
993
994:func:`build_opener` provides many handlers by default, including a
995:class:`ProxyHandler`. By default, :class:`ProxyHandler` uses the environment
996variables named ``<scheme>_proxy``, where ``<scheme>`` is the URL scheme
997involved. For example, the :envvar:`http_proxy` environment variable is read to
998obtain the HTTP proxy's URL.
999
1000This example replaces the default :class:`ProxyHandler` with one that uses
Benjamin Peterson90f36732008-07-12 20:16:19 +00001001programmatically-supplied proxy URLs, and adds proxy authorization support with
Georg Brandl8ec7f652007-08-15 14:28:01 +00001002:class:`ProxyBasicAuthHandler`. ::
1003
1004 proxy_handler = urllib2.ProxyHandler({'http': 'http://www.example.com:3128/'})
Senthil Kumaranf9a21f42009-12-24 02:18:14 +00001005 proxy_auth_handler = urllib2.ProxyBasicAuthHandler()
Georg Brandl8ec7f652007-08-15 14:28:01 +00001006 proxy_auth_handler.add_password('realm', 'host', 'username', 'password')
1007
Senthil Kumaranf9a21f42009-12-24 02:18:14 +00001008 opener = urllib2.build_opener(proxy_handler, proxy_auth_handler)
Georg Brandl8ec7f652007-08-15 14:28:01 +00001009 # This time, rather than install the OpenerDirector, we use it directly:
1010 opener.open('http://www.example.com/login.html')
1011
1012Adding HTTP headers:
1013
1014Use the *headers* argument to the :class:`Request` constructor, or::
1015
1016 import urllib2
1017 req = urllib2.Request('http://www.example.com/')
1018 req.add_header('Referer', 'http://www.python.org/')
1019 r = urllib2.urlopen(req)
1020
1021:class:`OpenerDirector` automatically adds a :mailheader:`User-Agent` header to
1022every :class:`Request`. To change this::
1023
1024 import urllib2
1025 opener = urllib2.build_opener()
1026 opener.addheaders = [('User-agent', 'Mozilla/5.0')]
1027 opener.open('http://www.example.com/')
1028
1029Also, remember that a few standard headers (:mailheader:`Content-Length`,
1030:mailheader:`Content-Type` and :mailheader:`Host`) are added when the
1031:class:`Request` is passed to :func:`urlopen` (or :meth:`OpenerDirector.open`).
1032