blob: 0411e18365e3677b84611c1eb0d10e63b903efa5 [file] [log] [blame]
Georg Brandl8ec7f652007-08-15 14:28:01 +00001:mod:`urllib2` --- extensible library for opening URLs
2======================================================
3
4.. module:: urllib2
5 :synopsis: Next generation URL opening library.
6.. moduleauthor:: Jeremy Hylton <jhylton@users.sourceforge.net>
7.. sectionauthor:: Moshe Zadka <moshez@users.sourceforge.net>
8
9
Brett Cannon97aa1ae2008-07-11 00:12:52 +000010.. note::
11 The :mod:`urllib2` module has been split across several modules in
Ezio Melotti510ff542012-05-03 19:21:40 +030012 Python 3 named :mod:`urllib.request` and :mod:`urllib.error`.
Brett Cannon97aa1ae2008-07-11 00:12:52 +000013 The :term:`2to3` tool will automatically adapt imports when converting
Ezio Melotti510ff542012-05-03 19:21:40 +030014 your sources to Python 3.
Brett Cannon97aa1ae2008-07-11 00:12:52 +000015
16
Georg Brandl8ec7f652007-08-15 14:28:01 +000017The :mod:`urllib2` module defines functions and classes which help in opening
18URLs (mostly HTTP) in a complex world --- basic and digest authentication,
19redirections, cookies and more.
20
Antoine Pitrou66bfda82010-09-29 11:30:52 +000021
Georg Brandl8ec7f652007-08-15 14:28:01 +000022The :mod:`urllib2` module defines the following functions:
23
24
25.. function:: urlopen(url[, data][, timeout])
26
27 Open the URL *url*, which can be either a string or a :class:`Request` object.
28
Senthil Kumaran30630b92010-10-05 18:45:00 +000029 .. warning::
30 HTTPS requests do not do any verification of the server's certificate.
31
Georg Brandl8ec7f652007-08-15 14:28:01 +000032 *data* may be a string specifying additional data to send to the server, or
33 ``None`` if no such data is needed. Currently HTTP requests are the only ones
34 that use *data*; the HTTP request will be a POST instead of a GET when the
35 *data* parameter is provided. *data* should be a buffer in the standard
36 :mimetype:`application/x-www-form-urlencoded` format. The
37 :func:`urllib.urlencode` function takes a mapping or sequence of 2-tuples and
Senthil Kumaranb7575ee2010-08-21 16:14:54 +000038 returns a string in this format. urllib2 module sends HTTP/1.1 requests with
Éric Araujoa7cbe282011-09-01 19:49:31 +020039 ``Connection:close`` header included.
Georg Brandl8ec7f652007-08-15 14:28:01 +000040
Georg Brandlab756f62008-05-11 11:09:35 +000041 The optional *timeout* parameter specifies a timeout in seconds for blocking
Facundo Batista4f1b1ed2008-05-29 16:39:26 +000042 operations like the connection attempt (if not specified, the global default
Senthil Kumaran30630b92010-10-05 18:45:00 +000043 timeout setting will be used). This actually only works for HTTP, HTTPS and
44 FTP connections.
Georg Brandl8ec7f652007-08-15 14:28:01 +000045
Berker Peksag86af3102014-06-28 03:12:37 +030046 This function returns a file-like object with three additional methods:
Georg Brandl8ec7f652007-08-15 14:28:01 +000047
Georg Brandl586a57a2008-02-02 09:56:20 +000048 * :meth:`geturl` --- return the URL of the resource retrieved, commonly used to
49 determine if a redirect was followed
Georg Brandl8ec7f652007-08-15 14:28:01 +000050
Senthil Kumaran8c996ef2010-06-28 17:07:40 +000051 * :meth:`info` --- return the meta-information of the page, such as headers,
52 in the form of an :class:`mimetools.Message` instance
Georg Brandl586a57a2008-02-02 09:56:20 +000053 (see `Quick Reference to HTTP Headers <http://www.cs.tut.fi/~jkorpela/http.html>`_)
Georg Brandl8ec7f652007-08-15 14:28:01 +000054
Senthil Kumaran785d1b12013-02-07 00:51:34 -080055 * :meth:`getcode` --- return the HTTP status code of the response.
56
Georg Brandl8ec7f652007-08-15 14:28:01 +000057 Raises :exc:`URLError` on errors.
58
59 Note that ``None`` may be returned if no handler handles the request (though the
60 default installed global :class:`OpenerDirector` uses :class:`UnknownHandler` to
61 ensure this never happens).
62
R David Murray806c1c92013-04-28 11:16:21 -040063 In addition, if proxy settings are detected (for example, when a ``*_proxy``
64 environment variable like :envvar:`http_proxy` is set),
65 :class:`ProxyHandler` is default installed and makes sure the requests are
66 handled through the proxy.
Senthil Kumaran45a505f2009-10-18 01:24:41 +000067
Georg Brandl8ec7f652007-08-15 14:28:01 +000068 .. versionchanged:: 2.6
69 *timeout* was added.
70
71
72.. function:: install_opener(opener)
73
74 Install an :class:`OpenerDirector` instance as the default global opener.
75 Installing an opener is only necessary if you want urlopen to use that opener;
76 otherwise, simply call :meth:`OpenerDirector.open` instead of :func:`urlopen`.
77 The code does not check for a real :class:`OpenerDirector`, and any class with
78 the appropriate interface will work.
79
80
81.. function:: build_opener([handler, ...])
82
83 Return an :class:`OpenerDirector` instance, which chains the handlers in the
84 order given. *handler*\s can be either instances of :class:`BaseHandler`, or
85 subclasses of :class:`BaseHandler` (in which case it must be possible to call
86 the constructor without any parameters). Instances of the following classes
87 will be in front of the *handler*\s, unless the *handler*\s contain them,
R David Murray806c1c92013-04-28 11:16:21 -040088 instances of them or subclasses of them: :class:`ProxyHandler` (if proxy
89 settings are detected),
Georg Brandl8ec7f652007-08-15 14:28:01 +000090 :class:`UnknownHandler`, :class:`HTTPHandler`, :class:`HTTPDefaultErrorHandler`,
91 :class:`HTTPRedirectHandler`, :class:`FTPHandler`, :class:`FileHandler`,
92 :class:`HTTPErrorProcessor`.
93
Guido van Rossum8ee23bb2007-08-27 19:11:11 +000094 If the Python installation has SSL support (i.e., if the :mod:`ssl` module can be imported),
Georg Brandl8ec7f652007-08-15 14:28:01 +000095 :class:`HTTPSHandler` will also be added.
96
97 Beginning in Python 2.3, a :class:`BaseHandler` subclass may also change its
Senthil Kumaran6f18b982011-07-04 12:50:02 -070098 :attr:`handler_order` attribute to modify its position in the handlers
Georg Brandl8ec7f652007-08-15 14:28:01 +000099 list.
100
101The following exceptions are raised as appropriate:
102
103
104.. exception:: URLError
105
106 The handlers raise this exception (or derived exceptions) when they run into a
107 problem. It is a subclass of :exc:`IOError`.
108
Georg Brandl586a57a2008-02-02 09:56:20 +0000109 .. attribute:: reason
110
111 The reason for this error. It can be a message string or another exception
112 instance (:exc:`socket.error` for remote URLs, :exc:`OSError` for local
113 URLs).
114
Georg Brandl8ec7f652007-08-15 14:28:01 +0000115
116.. exception:: HTTPError
117
Georg Brandl586a57a2008-02-02 09:56:20 +0000118 Though being an exception (a subclass of :exc:`URLError`), an :exc:`HTTPError`
119 can also function as a non-exceptional file-like return value (the same thing
120 that :func:`urlopen` returns). This is useful when handling exotic HTTP
121 errors, such as requests for authentication.
122
123 .. attribute:: code
124
Georg Brandlc62ef8b2009-01-03 20:55:06 +0000125 An HTTP status code as defined in `RFC 2616 <http://www.faqs.org/rfcs/rfc2616.html>`_.
Georg Brandl586a57a2008-02-02 09:56:20 +0000126 This numeric value corresponds to a value found in the dictionary of
127 codes as found in :attr:`BaseHTTPServer.BaseHTTPRequestHandler.responses`.
128
Senthil Kumaranbfb09892012-12-09 13:36:40 -0800129 .. attribute:: reason
Georg Brandl586a57a2008-02-02 09:56:20 +0000130
Senthil Kumaranbfb09892012-12-09 13:36:40 -0800131 The reason for this error. It can be a message string or another exception
132 instance.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000133
134The following classes are provided:
135
136
Georg Brandl586a57a2008-02-02 09:56:20 +0000137.. class:: Request(url[, data][, headers][, origin_req_host][, unverifiable])
Georg Brandl8ec7f652007-08-15 14:28:01 +0000138
139 This class is an abstraction of a URL request.
140
141 *url* should be a string containing a valid URL.
142
143 *data* may be a string specifying additional data to send to the server, or
144 ``None`` if no such data is needed. Currently HTTP requests are the only ones
145 that use *data*; the HTTP request will be a POST instead of a GET when the
146 *data* parameter is provided. *data* should be a buffer in the standard
147 :mimetype:`application/x-www-form-urlencoded` format. The
148 :func:`urllib.urlencode` function takes a mapping or sequence of 2-tuples and
149 returns a string in this format.
150
151 *headers* should be a dictionary, and will be treated as if :meth:`add_header`
Georg Brandl586a57a2008-02-02 09:56:20 +0000152 was called with each key and value as arguments. This is often used to "spoof"
153 the ``User-Agent`` header, which is used by a browser to identify itself --
154 some HTTP servers only allow requests coming from common browsers as opposed
155 to scripts. For example, Mozilla Firefox may identify itself as ``"Mozilla/5.0
156 (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11"``, while :mod:`urllib2`'s
157 default user agent string is ``"Python-urllib/2.6"`` (on Python 2.6).
Georg Brandl8ec7f652007-08-15 14:28:01 +0000158
159 The final two arguments are only of interest for correct handling of third-party
160 HTTP cookies:
161
162 *origin_req_host* should be the request-host of the origin transaction, as
163 defined by :rfc:`2965`. It defaults to ``cookielib.request_host(self)``. This
164 is the host name or IP address of the original request that was initiated by the
165 user. For example, if the request is for an image in an HTML document, this
166 should be the request-host of the request for the page containing the image.
167
168 *unverifiable* should indicate whether the request is unverifiable, as defined
Serhiy Storchaka26d936a2013-11-29 12:16:53 +0200169 by RFC 2965. It defaults to ``False``. An unverifiable request is one whose URL
Georg Brandl8ec7f652007-08-15 14:28:01 +0000170 the user did not have the option to approve. For example, if the request is for
171 an image in an HTML document, and the user had no option to approve the
172 automatic fetching of the image, this should be true.
173
174
175.. class:: OpenerDirector()
176
177 The :class:`OpenerDirector` class opens URLs via :class:`BaseHandler`\ s chained
178 together. It manages the chaining of handlers, and recovery from errors.
179
180
181.. class:: BaseHandler()
182
183 This is the base class for all registered handlers --- and handles only the
184 simple mechanics of registration.
185
186
187.. class:: HTTPDefaultErrorHandler()
188
189 A class which defines a default handler for HTTP error responses; all responses
190 are turned into :exc:`HTTPError` exceptions.
191
192
193.. class:: HTTPRedirectHandler()
194
195 A class to handle redirections.
196
197
198.. class:: HTTPCookieProcessor([cookiejar])
199
200 A class to handle HTTP Cookies.
201
202
203.. class:: ProxyHandler([proxies])
204
205 Cause requests to go through a proxy. If *proxies* is given, it must be a
Senthil Kumaran45a505f2009-10-18 01:24:41 +0000206 dictionary mapping protocol names to URLs of proxies. The default is to read
207 the list of proxies from the environment variables
R David Murray806c1c92013-04-28 11:16:21 -0400208 :envvar:`<protocol>_proxy`. If no proxy environment variables are set, then
209 in a Windows environment proxy settings are obtained from the registry's
210 Internet Settings section, and in a Mac OS X environment proxy information
Senthil Kumaran83f1ef62009-10-18 01:58:45 +0000211 is retrieved from the OS X System Configuration Framework.
Senthil Kumaran45a505f2009-10-18 01:24:41 +0000212
Sean Reifscheider45ea86c2008-03-20 03:20:48 +0000213 To disable autodetected proxy pass an empty dictionary.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000214
215
216.. class:: HTTPPasswordMgr()
217
218 Keep a database of ``(realm, uri) -> (user, password)`` mappings.
219
220
221.. class:: HTTPPasswordMgrWithDefaultRealm()
222
223 Keep a database of ``(realm, uri) -> (user, password)`` mappings. A realm of
224 ``None`` is considered a catch-all realm, which is searched if no other realm
225 fits.
226
227
228.. class:: AbstractBasicAuthHandler([password_mgr])
229
230 This is a mixin class that helps with HTTP authentication, both to the remote
231 host and to a proxy. *password_mgr*, if given, should be something that is
232 compatible with :class:`HTTPPasswordMgr`; refer to section
233 :ref:`http-password-mgr` for information on the interface that must be
234 supported.
235
236
237.. class:: HTTPBasicAuthHandler([password_mgr])
238
239 Handle authentication with the remote host. *password_mgr*, if given, should be
240 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
241 :ref:`http-password-mgr` for information on the interface that must be
242 supported.
243
244
245.. class:: ProxyBasicAuthHandler([password_mgr])
246
247 Handle authentication with the proxy. *password_mgr*, if given, should be
248 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
249 :ref:`http-password-mgr` for information on the interface that must be
250 supported.
251
252
253.. class:: AbstractDigestAuthHandler([password_mgr])
254
255 This is a mixin class that helps with HTTP authentication, both to the remote
256 host and to a proxy. *password_mgr*, if given, should be something that is
257 compatible with :class:`HTTPPasswordMgr`; refer to section
258 :ref:`http-password-mgr` for information on the interface that must be
259 supported.
260
261
262.. class:: HTTPDigestAuthHandler([password_mgr])
263
264 Handle authentication with the remote host. *password_mgr*, if given, should be
265 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
266 :ref:`http-password-mgr` for information on the interface that must be
267 supported.
268
269
270.. class:: ProxyDigestAuthHandler([password_mgr])
271
272 Handle authentication with the proxy. *password_mgr*, if given, should be
273 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
274 :ref:`http-password-mgr` for information on the interface that must be
275 supported.
276
277
278.. class:: HTTPHandler()
279
280 A class to handle opening of HTTP URLs.
281
282
283.. class:: HTTPSHandler()
284
285 A class to handle opening of HTTPS URLs.
286
287
288.. class:: FileHandler()
289
290 Open local files.
291
292
293.. class:: FTPHandler()
294
295 Open FTP URLs.
296
297
298.. class:: CacheFTPHandler()
299
300 Open FTP URLs, keeping a cache of open FTP connections to minimize delays.
301
302
303.. class:: UnknownHandler()
304
305 A catch-all class to handle unknown URLs.
306
307
Senthil Kumaran612b2b32011-07-18 06:44:11 +0800308.. class:: HTTPErrorProcessor()
309
310 Process HTTP error responses.
311
312
Georg Brandl8ec7f652007-08-15 14:28:01 +0000313.. _request-objects:
314
315Request Objects
316---------------
317
318The following methods describe all of :class:`Request`'s public interface, and
319so all must be overridden in subclasses.
320
321
322.. method:: Request.add_data(data)
323
324 Set the :class:`Request` data to *data*. This is ignored by all handlers except
325 HTTP handlers --- and there it should be a byte string, and will change the
326 request to be ``POST`` rather than ``GET``.
327
328
329.. method:: Request.get_method()
330
331 Return a string indicating the HTTP request method. This is only meaningful for
332 HTTP requests, and currently always returns ``'GET'`` or ``'POST'``.
333
334
335.. method:: Request.has_data()
336
337 Return whether the instance has a non-\ ``None`` data.
338
339
340.. method:: Request.get_data()
341
342 Return the instance's data.
343
344
345.. method:: Request.add_header(key, val)
346
347 Add another header to the request. Headers are currently ignored by all
348 handlers except HTTP handlers, where they are added to the list of headers sent
349 to the server. Note that there cannot be more than one header with the same
350 name, and later calls will overwrite previous calls in case the *key* collides.
351 Currently, this is no loss of HTTP functionality, since all headers which have
352 meaning when used more than once have a (header-specific) way of gaining the
353 same functionality using only one header.
354
355
356.. method:: Request.add_unredirected_header(key, header)
357
358 Add a header that will not be added to a redirected request.
359
360 .. versionadded:: 2.4
361
362
363.. method:: Request.has_header(header)
364
365 Return whether the instance has the named header (checks both regular and
366 unredirected).
367
368 .. versionadded:: 2.4
369
370
371.. method:: Request.get_full_url()
372
373 Return the URL given in the constructor.
374
375
376.. method:: Request.get_type()
377
378 Return the type of the URL --- also known as the scheme.
379
380
381.. method:: Request.get_host()
382
383 Return the host to which a connection will be made.
384
385
386.. method:: Request.get_selector()
387
388 Return the selector --- the part of the URL that is sent to the server.
389
390
Senthil Kumaran429d3112012-04-29 11:52:59 +0800391.. method:: Request.get_header(header_name, default=None)
392
393 Return the value of the given header. If the header is not present, return
394 the default value.
395
396
397.. method:: Request.header_items()
398
399 Return a list of tuples (header_name, header_value) of the Request headers.
400
401
Georg Brandl8ec7f652007-08-15 14:28:01 +0000402.. method:: Request.set_proxy(host, type)
403
404 Prepare the request by connecting to a proxy server. The *host* and *type* will
405 replace those of the instance, and the instance's selector will be the original
406 URL given in the constructor.
407
408
409.. method:: Request.get_origin_req_host()
410
411 Return the request-host of the origin transaction, as defined by :rfc:`2965`.
412 See the documentation for the :class:`Request` constructor.
413
414
415.. method:: Request.is_unverifiable()
416
417 Return whether the request is unverifiable, as defined by RFC 2965. See the
418 documentation for the :class:`Request` constructor.
419
420
421.. _opener-director-objects:
422
423OpenerDirector Objects
424----------------------
425
426:class:`OpenerDirector` instances have the following methods:
427
428
429.. method:: OpenerDirector.add_handler(handler)
430
Georg Brandld0eb8f92009-01-01 11:53:55 +0000431 *handler* should be an instance of :class:`BaseHandler`. The following
432 methods are searched, and added to the possible chains (note that HTTP errors
433 are a special case).
Georg Brandl8ec7f652007-08-15 14:28:01 +0000434
Georg Brandld0eb8f92009-01-01 11:53:55 +0000435 * :samp:`{protocol}_open` --- signal that the handler knows how to open
436 *protocol* URLs.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000437
Georg Brandld0eb8f92009-01-01 11:53:55 +0000438 * :samp:`http_error_{type}` --- signal that the handler knows how to handle
439 HTTP errors with HTTP error code *type*.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000440
Georg Brandld0eb8f92009-01-01 11:53:55 +0000441 * :samp:`{protocol}_error` --- signal that the handler knows how to handle
442 errors from (non-\ ``http``) *protocol*.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000443
Georg Brandld0eb8f92009-01-01 11:53:55 +0000444 * :samp:`{protocol}_request` --- signal that the handler knows how to
445 pre-process *protocol* requests.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000446
Georg Brandld0eb8f92009-01-01 11:53:55 +0000447 * :samp:`{protocol}_response` --- signal that the handler knows how to
Georg Brandl8ec7f652007-08-15 14:28:01 +0000448 post-process *protocol* responses.
449
450
451.. method:: OpenerDirector.open(url[, data][, timeout])
452
453 Open the given *url* (which can be a request object or a string), optionally
Georg Brandlab756f62008-05-11 11:09:35 +0000454 passing the given *data*. Arguments, return values and exceptions raised are
455 the same as those of :func:`urlopen` (which simply calls the :meth:`open`
456 method on the currently installed global :class:`OpenerDirector`). The
457 optional *timeout* parameter specifies a timeout in seconds for blocking
Facundo Batista4f1b1ed2008-05-29 16:39:26 +0000458 operations like the connection attempt (if not specified, the global default
Georg Brandlda69add2010-05-21 20:52:46 +0000459 timeout setting will be used). The timeout feature actually works only for
Senthil Kumaran30630b92010-10-05 18:45:00 +0000460 HTTP, HTTPS and FTP connections).
Georg Brandl8ec7f652007-08-15 14:28:01 +0000461
462 .. versionchanged:: 2.6
463 *timeout* was added.
464
465
466.. method:: OpenerDirector.error(proto[, arg[, ...]])
467
468 Handle an error of the given protocol. This will call the registered error
469 handlers for the given protocol with the given arguments (which are protocol
470 specific). The HTTP protocol is a special case which uses the HTTP response
471 code to determine the specific error handler; refer to the :meth:`http_error_\*`
472 methods of the handler classes.
473
474 Return values and exceptions raised are the same as those of :func:`urlopen`.
475
476OpenerDirector objects open URLs in three stages:
477
478The order in which these methods are called within each stage is determined by
479sorting the handler instances.
480
Georg Brandld0eb8f92009-01-01 11:53:55 +0000481#. Every handler with a method named like :samp:`{protocol}_request` has that
Georg Brandl8ec7f652007-08-15 14:28:01 +0000482 method called to pre-process the request.
483
Georg Brandld0eb8f92009-01-01 11:53:55 +0000484#. Handlers with a method named like :samp:`{protocol}_open` are called to handle
Georg Brandl8ec7f652007-08-15 14:28:01 +0000485 the request. This stage ends when a handler either returns a non-\ :const:`None`
486 value (ie. a response), or raises an exception (usually :exc:`URLError`).
487 Exceptions are allowed to propagate.
488
489 In fact, the above algorithm is first tried for methods named
Georg Brandld0eb8f92009-01-01 11:53:55 +0000490 :meth:`default_open`. If all such methods return :const:`None`, the
491 algorithm is repeated for methods named like :samp:`{protocol}_open`. If all
492 such methods return :const:`None`, the algorithm is repeated for methods
493 named :meth:`unknown_open`.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000494
495 Note that the implementation of these methods may involve calls of the parent
Georg Brandl821fc082010-08-01 21:26:45 +0000496 :class:`OpenerDirector` instance's :meth:`~OpenerDirector.open` and
497 :meth:`~OpenerDirector.error` methods.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000498
Georg Brandld0eb8f92009-01-01 11:53:55 +0000499#. Every handler with a method named like :samp:`{protocol}_response` has that
Georg Brandl8ec7f652007-08-15 14:28:01 +0000500 method called to post-process the response.
501
502
503.. _base-handler-objects:
504
505BaseHandler Objects
506-------------------
507
508:class:`BaseHandler` objects provide a couple of methods that are directly
509useful, and others that are meant to be used by derived classes. These are
510intended for direct use:
511
512
513.. method:: BaseHandler.add_parent(director)
514
515 Add a director as parent.
516
517
518.. method:: BaseHandler.close()
519
520 Remove any parents.
521
Senthil Kumaran6f18b982011-07-04 12:50:02 -0700522The following attributes and methods should only be used by classes derived from
Georg Brandl8ec7f652007-08-15 14:28:01 +0000523:class:`BaseHandler`.
524
525.. note::
526
527 The convention has been adopted that subclasses defining
528 :meth:`protocol_request` or :meth:`protocol_response` methods are named
529 :class:`\*Processor`; all others are named :class:`\*Handler`.
530
531
532.. attribute:: BaseHandler.parent
533
534 A valid :class:`OpenerDirector`, which can be used to open using a different
535 protocol, or handle errors.
536
537
538.. method:: BaseHandler.default_open(req)
539
540 This method is *not* defined in :class:`BaseHandler`, but subclasses should
541 define it if they want to catch all URLs.
542
543 This method, if implemented, will be called by the parent
544 :class:`OpenerDirector`. It should return a file-like object as described in
545 the return value of the :meth:`open` of :class:`OpenerDirector`, or ``None``.
546 It should raise :exc:`URLError`, unless a truly exceptional thing happens (for
547 example, :exc:`MemoryError` should not be mapped to :exc:`URLError`).
548
549 This method will be called before any protocol-specific open method.
550
551
552.. method:: BaseHandler.protocol_open(req)
553 :noindex:
554
Georg Brandld0eb8f92009-01-01 11:53:55 +0000555 ("protocol" is to be replaced by the protocol name.)
556
Georg Brandl8ec7f652007-08-15 14:28:01 +0000557 This method is *not* defined in :class:`BaseHandler`, but subclasses should
Georg Brandld0eb8f92009-01-01 11:53:55 +0000558 define it if they want to handle URLs with the given *protocol*.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000559
560 This method, if defined, will be called by the parent :class:`OpenerDirector`.
561 Return values should be the same as for :meth:`default_open`.
562
563
564.. method:: BaseHandler.unknown_open(req)
565
566 This method is *not* defined in :class:`BaseHandler`, but subclasses should
567 define it if they want to catch all URLs with no specific registered handler to
568 open it.
569
570 This method, if implemented, will be called by the :attr:`parent`
571 :class:`OpenerDirector`. Return values should be the same as for
572 :meth:`default_open`.
573
574
575.. method:: BaseHandler.http_error_default(req, fp, code, msg, hdrs)
576
577 This method is *not* defined in :class:`BaseHandler`, but subclasses should
578 override it if they intend to provide a catch-all for otherwise unhandled HTTP
579 errors. It will be called automatically by the :class:`OpenerDirector` getting
580 the error, and should not normally be called in other circumstances.
581
582 *req* will be a :class:`Request` object, *fp* will be a file-like object with
583 the HTTP error body, *code* will be the three-digit code of the error, *msg*
584 will be the user-visible explanation of the code and *hdrs* will be a mapping
585 object with the headers of the error.
586
587 Return values and exceptions raised should be the same as those of
588 :func:`urlopen`.
589
590
591.. method:: BaseHandler.http_error_nnn(req, fp, code, msg, hdrs)
592
593 *nnn* should be a three-digit HTTP error code. This method is also not defined
594 in :class:`BaseHandler`, but will be called, if it exists, on an instance of a
595 subclass, when an HTTP error with code *nnn* occurs.
596
597 Subclasses should override this method to handle specific HTTP errors.
598
599 Arguments, return values and exceptions raised should be the same as for
600 :meth:`http_error_default`.
601
602
603.. method:: BaseHandler.protocol_request(req)
604 :noindex:
605
Georg Brandld0eb8f92009-01-01 11:53:55 +0000606 ("protocol" is to be replaced by the protocol name.)
607
Georg Brandl8ec7f652007-08-15 14:28:01 +0000608 This method is *not* defined in :class:`BaseHandler`, but subclasses should
Georg Brandld0eb8f92009-01-01 11:53:55 +0000609 define it if they want to pre-process requests of the given *protocol*.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000610
611 This method, if defined, will be called by the parent :class:`OpenerDirector`.
612 *req* will be a :class:`Request` object. The return value should be a
613 :class:`Request` object.
614
615
616.. method:: BaseHandler.protocol_response(req, response)
617 :noindex:
618
Georg Brandld0eb8f92009-01-01 11:53:55 +0000619 ("protocol" is to be replaced by the protocol name.)
620
Georg Brandl8ec7f652007-08-15 14:28:01 +0000621 This method is *not* defined in :class:`BaseHandler`, but subclasses should
Georg Brandld0eb8f92009-01-01 11:53:55 +0000622 define it if they want to post-process responses of the given *protocol*.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000623
624 This method, if defined, will be called by the parent :class:`OpenerDirector`.
625 *req* will be a :class:`Request` object. *response* will be an object
626 implementing the same interface as the return value of :func:`urlopen`. The
627 return value should implement the same interface as the return value of
628 :func:`urlopen`.
629
630
631.. _http-redirect-handler:
632
633HTTPRedirectHandler Objects
634---------------------------
635
636.. note::
637
638 Some HTTP redirections require action from this module's client code. If this
639 is the case, :exc:`HTTPError` is raised. See :rfc:`2616` for details of the
640 precise meanings of the various redirection codes.
641
642
Georg Brandl8fba5b32009-02-13 10:40:14 +0000643.. method:: HTTPRedirectHandler.redirect_request(req, fp, code, msg, hdrs, newurl)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000644
645 Return a :class:`Request` or ``None`` in response to a redirect. This is called
646 by the default implementations of the :meth:`http_error_30\*` methods when a
647 redirection is received from the server. If a redirection should take place,
648 return a new :class:`Request` to allow :meth:`http_error_30\*` to perform the
Georg Brandl8fba5b32009-02-13 10:40:14 +0000649 redirect to *newurl*. Otherwise, raise :exc:`HTTPError` if no other handler
650 should try to handle this URL, or return ``None`` if you can't but another
651 handler might.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000652
653 .. note::
654
655 The default implementation of this method does not strictly follow :rfc:`2616`,
656 which says that 301 and 302 responses to ``POST`` requests must not be
657 automatically redirected without confirmation by the user. In reality, browsers
658 do allow automatic redirection of these responses, changing the POST to a
659 ``GET``, and the default implementation reproduces this behavior.
660
661
662.. method:: HTTPRedirectHandler.http_error_301(req, fp, code, msg, hdrs)
663
Georg Brandl8fba5b32009-02-13 10:40:14 +0000664 Redirect to the ``Location:`` or ``URI:`` URL. This method is called by the
665 parent :class:`OpenerDirector` when getting an HTTP 'moved permanently' response.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000666
667
668.. method:: HTTPRedirectHandler.http_error_302(req, fp, code, msg, hdrs)
669
670 The same as :meth:`http_error_301`, but called for the 'found' response.
671
672
673.. method:: HTTPRedirectHandler.http_error_303(req, fp, code, msg, hdrs)
674
675 The same as :meth:`http_error_301`, but called for the 'see other' response.
676
677
678.. method:: HTTPRedirectHandler.http_error_307(req, fp, code, msg, hdrs)
679
680 The same as :meth:`http_error_301`, but called for the 'temporary redirect'
681 response.
682
683
684.. _http-cookie-processor:
685
686HTTPCookieProcessor Objects
687---------------------------
688
689.. versionadded:: 2.4
690
691:class:`HTTPCookieProcessor` instances have one attribute:
692
693
694.. attribute:: HTTPCookieProcessor.cookiejar
695
696 The :class:`cookielib.CookieJar` in which cookies are stored.
697
698
699.. _proxy-handler:
700
701ProxyHandler Objects
702--------------------
703
704
705.. method:: ProxyHandler.protocol_open(request)
706 :noindex:
707
Georg Brandld0eb8f92009-01-01 11:53:55 +0000708 ("protocol" is to be replaced by the protocol name.)
709
710 The :class:`ProxyHandler` will have a method :samp:`{protocol}_open` for every
Georg Brandl8ec7f652007-08-15 14:28:01 +0000711 *protocol* which has a proxy in the *proxies* dictionary given in the
712 constructor. The method will modify requests to go through the proxy, by
713 calling ``request.set_proxy()``, and call the next handler in the chain to
714 actually execute the protocol.
715
716
717.. _http-password-mgr:
718
719HTTPPasswordMgr Objects
720-----------------------
721
722These methods are available on :class:`HTTPPasswordMgr` and
723:class:`HTTPPasswordMgrWithDefaultRealm` objects.
724
725
726.. method:: HTTPPasswordMgr.add_password(realm, uri, user, passwd)
727
728 *uri* can be either a single URI, or a sequence of URIs. *realm*, *user* and
729 *passwd* must be strings. This causes ``(user, passwd)`` to be used as
730 authentication tokens when authentication for *realm* and a super-URI of any of
731 the given URIs is given.
732
733
734.. method:: HTTPPasswordMgr.find_user_password(realm, authuri)
735
736 Get user/password for given realm and URI, if any. This method will return
737 ``(None, None)`` if there is no matching user/password.
738
739 For :class:`HTTPPasswordMgrWithDefaultRealm` objects, the realm ``None`` will be
740 searched if the given *realm* has no matching user/password.
741
742
743.. _abstract-basic-auth-handler:
744
745AbstractBasicAuthHandler Objects
746--------------------------------
747
748
749.. method:: AbstractBasicAuthHandler.http_error_auth_reqed(authreq, host, req, headers)
750
751 Handle an authentication request by getting a user/password pair, and re-trying
752 the request. *authreq* should be the name of the header where the information
753 about the realm is included in the request, *host* specifies the URL and path to
754 authenticate for, *req* should be the (failed) :class:`Request` object, and
755 *headers* should be the error headers.
756
757 *host* is either an authority (e.g. ``"python.org"``) or a URL containing an
758 authority component (e.g. ``"http://python.org/"``). In either case, the
759 authority must not contain a userinfo component (so, ``"python.org"`` and
760 ``"python.org:80"`` are fine, ``"joe:password@python.org"`` is not).
761
762
763.. _http-basic-auth-handler:
764
765HTTPBasicAuthHandler Objects
766----------------------------
767
768
769.. method:: HTTPBasicAuthHandler.http_error_401(req, fp, code, msg, hdrs)
770
771 Retry the request with authentication information, if available.
772
773
774.. _proxy-basic-auth-handler:
775
776ProxyBasicAuthHandler Objects
777-----------------------------
778
779
780.. method:: ProxyBasicAuthHandler.http_error_407(req, fp, code, msg, hdrs)
781
782 Retry the request with authentication information, if available.
783
784
785.. _abstract-digest-auth-handler:
786
787AbstractDigestAuthHandler Objects
788---------------------------------
789
790
791.. method:: AbstractDigestAuthHandler.http_error_auth_reqed(authreq, host, req, headers)
792
793 *authreq* should be the name of the header where the information about the realm
794 is included in the request, *host* should be the host to authenticate to, *req*
795 should be the (failed) :class:`Request` object, and *headers* should be the
796 error headers.
797
798
799.. _http-digest-auth-handler:
800
801HTTPDigestAuthHandler Objects
802-----------------------------
803
804
805.. method:: HTTPDigestAuthHandler.http_error_401(req, fp, code, msg, hdrs)
806
807 Retry the request with authentication information, if available.
808
809
810.. _proxy-digest-auth-handler:
811
812ProxyDigestAuthHandler Objects
813------------------------------
814
815
816.. method:: ProxyDigestAuthHandler.http_error_407(req, fp, code, msg, hdrs)
817
818 Retry the request with authentication information, if available.
819
820
821.. _http-handler-objects:
822
823HTTPHandler Objects
824-------------------
825
826
827.. method:: HTTPHandler.http_open(req)
828
829 Send an HTTP request, which can be either GET or POST, depending on
830 ``req.has_data()``.
831
832
833.. _https-handler-objects:
834
835HTTPSHandler Objects
836--------------------
837
838
839.. method:: HTTPSHandler.https_open(req)
840
841 Send an HTTPS request, which can be either GET or POST, depending on
842 ``req.has_data()``.
843
844
845.. _file-handler-objects:
846
847FileHandler Objects
848-------------------
849
850
851.. method:: FileHandler.file_open(req)
852
853 Open the file locally, if there is no host name, or the host name is
854 ``'localhost'``. Change the protocol to ``ftp`` otherwise, and retry opening it
855 using :attr:`parent`.
856
857
858.. _ftp-handler-objects:
859
860FTPHandler Objects
861------------------
862
863
864.. method:: FTPHandler.ftp_open(req)
865
866 Open the FTP file indicated by *req*. The login is always done with empty
867 username and password.
868
869
870.. _cacheftp-handler-objects:
871
872CacheFTPHandler Objects
873-----------------------
874
875:class:`CacheFTPHandler` objects are :class:`FTPHandler` objects with the
876following additional methods:
877
878
879.. method:: CacheFTPHandler.setTimeout(t)
880
881 Set timeout of connections to *t* seconds.
882
883
884.. method:: CacheFTPHandler.setMaxConns(m)
885
886 Set maximum number of cached connections to *m*.
887
888
889.. _unknown-handler-objects:
890
891UnknownHandler Objects
892----------------------
893
894
895.. method:: UnknownHandler.unknown_open()
896
897 Raise a :exc:`URLError` exception.
898
899
900.. _http-error-processor-objects:
901
902HTTPErrorProcessor Objects
903--------------------------
904
905.. versionadded:: 2.4
906
907
Senthil Kumarana2dd57a2011-07-18 07:16:02 +0800908.. method:: HTTPErrorProcessor.http_response()
Georg Brandl8ec7f652007-08-15 14:28:01 +0000909
910 Process HTTP error responses.
911
912 For 200 error codes, the response object is returned immediately.
913
914 For non-200 error codes, this simply passes the job on to the
Georg Brandld0eb8f92009-01-01 11:53:55 +0000915 :samp:`{protocol}_error_code` handler methods, via
916 :meth:`OpenerDirector.error`. Eventually,
917 :class:`urllib2.HTTPDefaultErrorHandler` will raise an :exc:`HTTPError` if no
918 other handler handles the error.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000919
Senthil Kumarana2dd57a2011-07-18 07:16:02 +0800920.. method:: HTTPErrorProcessor.https_response()
921
Senthil Kumaran1c0ebc02011-07-18 07:18:40 +0800922 Process HTTPS error responses.
923
Senthil Kumarana2dd57a2011-07-18 07:16:02 +0800924 The behavior is same as :meth:`http_response`.
925
Georg Brandl8ec7f652007-08-15 14:28:01 +0000926
927.. _urllib2-examples:
928
929Examples
930--------
931
932This example gets the python.org main page and displays the first 100 bytes of
933it::
934
935 >>> import urllib2
936 >>> f = urllib2.urlopen('http://www.python.org/')
937 >>> print f.read(100)
938 <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
939 <?xml-stylesheet href="./css/ht2html
940
941Here we are sending a data-stream to the stdin of a CGI and reading the data it
942returns to us. Note that this example will only work when the Python
943installation supports SSL. ::
944
945 >>> import urllib2
946 >>> req = urllib2.Request(url='https://localhost/cgi-bin/test.cgi',
947 ... data='This data is passed to stdin of the CGI')
948 >>> f = urllib2.urlopen(req)
949 >>> print f.read()
950 Got Data: "This data is passed to stdin of the CGI"
951
952The code for the sample CGI used in the above example is::
953
954 #!/usr/bin/env python
955 import sys
956 data = sys.stdin.read()
957 print 'Content-type: text-plain\n\nGot Data: "%s"' % data
958
959Use of Basic HTTP Authentication::
960
961 import urllib2
962 # Create an OpenerDirector with support for Basic HTTP Authentication...
963 auth_handler = urllib2.HTTPBasicAuthHandler()
964 auth_handler.add_password(realm='PDQ Application',
965 uri='https://mahler:8092/site-updates.py',
966 user='klem',
967 passwd='kadidd!ehopper')
968 opener = urllib2.build_opener(auth_handler)
969 # ...and install it globally so it can be used with urlopen.
970 urllib2.install_opener(opener)
971 urllib2.urlopen('http://www.example.com/login.html')
972
973:func:`build_opener` provides many handlers by default, including a
974:class:`ProxyHandler`. By default, :class:`ProxyHandler` uses the environment
975variables named ``<scheme>_proxy``, where ``<scheme>`` is the URL scheme
976involved. For example, the :envvar:`http_proxy` environment variable is read to
977obtain the HTTP proxy's URL.
978
979This example replaces the default :class:`ProxyHandler` with one that uses
Benjamin Peterson90f36732008-07-12 20:16:19 +0000980programmatically-supplied proxy URLs, and adds proxy authorization support with
Georg Brandl8ec7f652007-08-15 14:28:01 +0000981:class:`ProxyBasicAuthHandler`. ::
982
983 proxy_handler = urllib2.ProxyHandler({'http': 'http://www.example.com:3128/'})
Senthil Kumaranf9a21f42009-12-24 02:18:14 +0000984 proxy_auth_handler = urllib2.ProxyBasicAuthHandler()
Georg Brandl8ec7f652007-08-15 14:28:01 +0000985 proxy_auth_handler.add_password('realm', 'host', 'username', 'password')
986
Senthil Kumaranf9a21f42009-12-24 02:18:14 +0000987 opener = urllib2.build_opener(proxy_handler, proxy_auth_handler)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000988 # This time, rather than install the OpenerDirector, we use it directly:
989 opener.open('http://www.example.com/login.html')
990
991Adding HTTP headers:
992
993Use the *headers* argument to the :class:`Request` constructor, or::
994
995 import urllib2
996 req = urllib2.Request('http://www.example.com/')
997 req.add_header('Referer', 'http://www.python.org/')
998 r = urllib2.urlopen(req)
999
1000:class:`OpenerDirector` automatically adds a :mailheader:`User-Agent` header to
1001every :class:`Request`. To change this::
1002
1003 import urllib2
1004 opener = urllib2.build_opener()
1005 opener.addheaders = [('User-agent', 'Mozilla/5.0')]
1006 opener.open('http://www.example.com/')
1007
1008Also, remember that a few standard headers (:mailheader:`Content-Length`,
1009:mailheader:`Content-Type` and :mailheader:`Host`) are added when the
1010:class:`Request` is passed to :func:`urlopen` (or :meth:`OpenerDirector.open`).
1011