blob: 973c098eb2c77e605f3bece54aefcbb1d1c094a8 [file] [log] [blame]
Georg Brandl8ec7f652007-08-15 14:28:01 +00001:mod:`urllib2` --- extensible library for opening URLs
2======================================================
3
4.. module:: urllib2
5 :synopsis: Next generation URL opening library.
6.. moduleauthor:: Jeremy Hylton <jhylton@users.sourceforge.net>
7.. sectionauthor:: Moshe Zadka <moshez@users.sourceforge.net>
8
9
Brett Cannon97aa1ae2008-07-11 00:12:52 +000010.. note::
11 The :mod:`urllib2` module has been split across several modules in
12 Python 3.0 named :mod:`urllib.request` and :mod:`urllib.error`.
13 The :term:`2to3` tool will automatically adapt imports when converting
14 your sources to 3.0.
15
16
Georg Brandl8ec7f652007-08-15 14:28:01 +000017The :mod:`urllib2` module defines functions and classes which help in opening
18URLs (mostly HTTP) in a complex world --- basic and digest authentication,
19redirections, cookies and more.
20
21The :mod:`urllib2` module defines the following functions:
22
23
24.. function:: urlopen(url[, data][, timeout])
25
26 Open the URL *url*, which can be either a string or a :class:`Request` object.
27
28 *data* may be a string specifying additional data to send to the server, or
29 ``None`` if no such data is needed. Currently HTTP requests are the only ones
30 that use *data*; the HTTP request will be a POST instead of a GET when the
31 *data* parameter is provided. *data* should be a buffer in the standard
32 :mimetype:`application/x-www-form-urlencoded` format. The
33 :func:`urllib.urlencode` function takes a mapping or sequence of 2-tuples and
Senthil Kumaranb7575ee2010-08-21 16:14:54 +000034 returns a string in this format. urllib2 module sends HTTP/1.1 requests with
35 `Connection:close` header included.
Georg Brandl8ec7f652007-08-15 14:28:01 +000036
Georg Brandlab756f62008-05-11 11:09:35 +000037 The optional *timeout* parameter specifies a timeout in seconds for blocking
Facundo Batista4f1b1ed2008-05-29 16:39:26 +000038 operations like the connection attempt (if not specified, the global default
39 timeout setting will be used). This actually only works for HTTP, HTTPS,
40 FTP and FTPS connections.
Georg Brandl8ec7f652007-08-15 14:28:01 +000041
42 This function returns a file-like object with two additional methods:
43
Georg Brandl586a57a2008-02-02 09:56:20 +000044 * :meth:`geturl` --- return the URL of the resource retrieved, commonly used to
45 determine if a redirect was followed
Georg Brandl8ec7f652007-08-15 14:28:01 +000046
Senthil Kumaran8c996ef2010-06-28 17:07:40 +000047 * :meth:`info` --- return the meta-information of the page, such as headers,
48 in the form of an :class:`mimetools.Message` instance
Georg Brandl586a57a2008-02-02 09:56:20 +000049 (see `Quick Reference to HTTP Headers <http://www.cs.tut.fi/~jkorpela/http.html>`_)
Georg Brandl8ec7f652007-08-15 14:28:01 +000050
51 Raises :exc:`URLError` on errors.
52
53 Note that ``None`` may be returned if no handler handles the request (though the
54 default installed global :class:`OpenerDirector` uses :class:`UnknownHandler` to
55 ensure this never happens).
56
Senthil Kumaran45a505f2009-10-18 01:24:41 +000057 In addition, default installed :class:`ProxyHandler` makes sure the requests
58 are handled through the proxy when they are set.
59
Georg Brandl8ec7f652007-08-15 14:28:01 +000060 .. versionchanged:: 2.6
61 *timeout* was added.
62
63
64.. function:: install_opener(opener)
65
66 Install an :class:`OpenerDirector` instance as the default global opener.
67 Installing an opener is only necessary if you want urlopen to use that opener;
68 otherwise, simply call :meth:`OpenerDirector.open` instead of :func:`urlopen`.
69 The code does not check for a real :class:`OpenerDirector`, and any class with
70 the appropriate interface will work.
71
72
73.. function:: build_opener([handler, ...])
74
75 Return an :class:`OpenerDirector` instance, which chains the handlers in the
76 order given. *handler*\s can be either instances of :class:`BaseHandler`, or
77 subclasses of :class:`BaseHandler` (in which case it must be possible to call
78 the constructor without any parameters). Instances of the following classes
79 will be in front of the *handler*\s, unless the *handler*\s contain them,
80 instances of them or subclasses of them: :class:`ProxyHandler`,
81 :class:`UnknownHandler`, :class:`HTTPHandler`, :class:`HTTPDefaultErrorHandler`,
82 :class:`HTTPRedirectHandler`, :class:`FTPHandler`, :class:`FileHandler`,
83 :class:`HTTPErrorProcessor`.
84
Guido van Rossum8ee23bb2007-08-27 19:11:11 +000085 If the Python installation has SSL support (i.e., if the :mod:`ssl` module can be imported),
Georg Brandl8ec7f652007-08-15 14:28:01 +000086 :class:`HTTPSHandler` will also be added.
87
88 Beginning in Python 2.3, a :class:`BaseHandler` subclass may also change its
89 :attr:`handler_order` member variable to modify its position in the handlers
90 list.
91
92The following exceptions are raised as appropriate:
93
94
95.. exception:: URLError
96
97 The handlers raise this exception (or derived exceptions) when they run into a
98 problem. It is a subclass of :exc:`IOError`.
99
Georg Brandl586a57a2008-02-02 09:56:20 +0000100 .. attribute:: reason
101
102 The reason for this error. It can be a message string or another exception
103 instance (:exc:`socket.error` for remote URLs, :exc:`OSError` for local
104 URLs).
105
Georg Brandl8ec7f652007-08-15 14:28:01 +0000106
107.. exception:: HTTPError
108
Georg Brandl586a57a2008-02-02 09:56:20 +0000109 Though being an exception (a subclass of :exc:`URLError`), an :exc:`HTTPError`
110 can also function as a non-exceptional file-like return value (the same thing
111 that :func:`urlopen` returns). This is useful when handling exotic HTTP
112 errors, such as requests for authentication.
113
114 .. attribute:: code
115
Georg Brandlc62ef8b2009-01-03 20:55:06 +0000116 An HTTP status code as defined in `RFC 2616 <http://www.faqs.org/rfcs/rfc2616.html>`_.
Georg Brandl586a57a2008-02-02 09:56:20 +0000117 This numeric value corresponds to a value found in the dictionary of
118 codes as found in :attr:`BaseHTTPServer.BaseHTTPRequestHandler.responses`.
119
120
Georg Brandl8ec7f652007-08-15 14:28:01 +0000121
122The following classes are provided:
123
124
Georg Brandl586a57a2008-02-02 09:56:20 +0000125.. class:: Request(url[, data][, headers][, origin_req_host][, unverifiable])
Georg Brandl8ec7f652007-08-15 14:28:01 +0000126
127 This class is an abstraction of a URL request.
128
129 *url* should be a string containing a valid URL.
130
131 *data* may be a string specifying additional data to send to the server, or
132 ``None`` if no such data is needed. Currently HTTP requests are the only ones
133 that use *data*; the HTTP request will be a POST instead of a GET when the
134 *data* parameter is provided. *data* should be a buffer in the standard
135 :mimetype:`application/x-www-form-urlencoded` format. The
136 :func:`urllib.urlencode` function takes a mapping or sequence of 2-tuples and
137 returns a string in this format.
138
139 *headers* should be a dictionary, and will be treated as if :meth:`add_header`
Georg Brandl586a57a2008-02-02 09:56:20 +0000140 was called with each key and value as arguments. This is often used to "spoof"
141 the ``User-Agent`` header, which is used by a browser to identify itself --
142 some HTTP servers only allow requests coming from common browsers as opposed
143 to scripts. For example, Mozilla Firefox may identify itself as ``"Mozilla/5.0
144 (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11"``, while :mod:`urllib2`'s
145 default user agent string is ``"Python-urllib/2.6"`` (on Python 2.6).
Georg Brandl8ec7f652007-08-15 14:28:01 +0000146
147 The final two arguments are only of interest for correct handling of third-party
148 HTTP cookies:
149
150 *origin_req_host* should be the request-host of the origin transaction, as
151 defined by :rfc:`2965`. It defaults to ``cookielib.request_host(self)``. This
152 is the host name or IP address of the original request that was initiated by the
153 user. For example, if the request is for an image in an HTML document, this
154 should be the request-host of the request for the page containing the image.
155
156 *unverifiable* should indicate whether the request is unverifiable, as defined
157 by RFC 2965. It defaults to False. An unverifiable request is one whose URL
158 the user did not have the option to approve. For example, if the request is for
159 an image in an HTML document, and the user had no option to approve the
160 automatic fetching of the image, this should be true.
161
162
163.. class:: OpenerDirector()
164
165 The :class:`OpenerDirector` class opens URLs via :class:`BaseHandler`\ s chained
166 together. It manages the chaining of handlers, and recovery from errors.
167
168
169.. class:: BaseHandler()
170
171 This is the base class for all registered handlers --- and handles only the
172 simple mechanics of registration.
173
174
175.. class:: HTTPDefaultErrorHandler()
176
177 A class which defines a default handler for HTTP error responses; all responses
178 are turned into :exc:`HTTPError` exceptions.
179
180
181.. class:: HTTPRedirectHandler()
182
183 A class to handle redirections.
184
185
186.. class:: HTTPCookieProcessor([cookiejar])
187
188 A class to handle HTTP Cookies.
189
190
191.. class:: ProxyHandler([proxies])
192
193 Cause requests to go through a proxy. If *proxies* is given, it must be a
Senthil Kumaran45a505f2009-10-18 01:24:41 +0000194 dictionary mapping protocol names to URLs of proxies. The default is to read
195 the list of proxies from the environment variables
196 :envvar:`<protocol>_proxy`. If no proxy environment variables are set, in a
197 Windows environment, proxy settings are obtained from the registry's
198 Internet Settings section and in a Mac OS X environment, proxy information
Senthil Kumaran83f1ef62009-10-18 01:58:45 +0000199 is retrieved from the OS X System Configuration Framework.
Senthil Kumaran45a505f2009-10-18 01:24:41 +0000200
Sean Reifscheider45ea86c2008-03-20 03:20:48 +0000201 To disable autodetected proxy pass an empty dictionary.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000202
203
204.. class:: HTTPPasswordMgr()
205
206 Keep a database of ``(realm, uri) -> (user, password)`` mappings.
207
208
209.. class:: HTTPPasswordMgrWithDefaultRealm()
210
211 Keep a database of ``(realm, uri) -> (user, password)`` mappings. A realm of
212 ``None`` is considered a catch-all realm, which is searched if no other realm
213 fits.
214
215
216.. class:: AbstractBasicAuthHandler([password_mgr])
217
218 This is a mixin class that helps with HTTP authentication, both to the remote
219 host and to a proxy. *password_mgr*, if given, should be something that is
220 compatible with :class:`HTTPPasswordMgr`; refer to section
221 :ref:`http-password-mgr` for information on the interface that must be
222 supported.
223
224
225.. class:: HTTPBasicAuthHandler([password_mgr])
226
227 Handle authentication with the remote host. *password_mgr*, if given, should be
228 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
229 :ref:`http-password-mgr` for information on the interface that must be
230 supported.
231
232
233.. class:: ProxyBasicAuthHandler([password_mgr])
234
235 Handle authentication with the proxy. *password_mgr*, if given, should be
236 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
237 :ref:`http-password-mgr` for information on the interface that must be
238 supported.
239
240
241.. class:: AbstractDigestAuthHandler([password_mgr])
242
243 This is a mixin class that helps with HTTP authentication, both to the remote
244 host and to a proxy. *password_mgr*, if given, should be something that is
245 compatible with :class:`HTTPPasswordMgr`; refer to section
246 :ref:`http-password-mgr` for information on the interface that must be
247 supported.
248
249
250.. class:: HTTPDigestAuthHandler([password_mgr])
251
252 Handle authentication with the remote host. *password_mgr*, if given, should be
253 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
254 :ref:`http-password-mgr` for information on the interface that must be
255 supported.
256
257
258.. class:: ProxyDigestAuthHandler([password_mgr])
259
260 Handle authentication with the proxy. *password_mgr*, if given, should be
261 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
262 :ref:`http-password-mgr` for information on the interface that must be
263 supported.
264
265
266.. class:: HTTPHandler()
267
268 A class to handle opening of HTTP URLs.
269
270
271.. class:: HTTPSHandler()
272
273 A class to handle opening of HTTPS URLs.
274
275
276.. class:: FileHandler()
277
278 Open local files.
279
280
281.. class:: FTPHandler()
282
283 Open FTP URLs.
284
285
286.. class:: CacheFTPHandler()
287
288 Open FTP URLs, keeping a cache of open FTP connections to minimize delays.
289
290
291.. class:: UnknownHandler()
292
293 A catch-all class to handle unknown URLs.
294
295
296.. _request-objects:
297
298Request Objects
299---------------
300
301The following methods describe all of :class:`Request`'s public interface, and
302so all must be overridden in subclasses.
303
304
305.. method:: Request.add_data(data)
306
307 Set the :class:`Request` data to *data*. This is ignored by all handlers except
308 HTTP handlers --- and there it should be a byte string, and will change the
309 request to be ``POST`` rather than ``GET``.
310
311
312.. method:: Request.get_method()
313
314 Return a string indicating the HTTP request method. This is only meaningful for
315 HTTP requests, and currently always returns ``'GET'`` or ``'POST'``.
316
317
318.. method:: Request.has_data()
319
320 Return whether the instance has a non-\ ``None`` data.
321
322
323.. method:: Request.get_data()
324
325 Return the instance's data.
326
327
328.. method:: Request.add_header(key, val)
329
330 Add another header to the request. Headers are currently ignored by all
331 handlers except HTTP handlers, where they are added to the list of headers sent
332 to the server. Note that there cannot be more than one header with the same
333 name, and later calls will overwrite previous calls in case the *key* collides.
334 Currently, this is no loss of HTTP functionality, since all headers which have
335 meaning when used more than once have a (header-specific) way of gaining the
336 same functionality using only one header.
337
338
339.. method:: Request.add_unredirected_header(key, header)
340
341 Add a header that will not be added to a redirected request.
342
343 .. versionadded:: 2.4
344
345
346.. method:: Request.has_header(header)
347
348 Return whether the instance has the named header (checks both regular and
349 unredirected).
350
351 .. versionadded:: 2.4
352
353
354.. method:: Request.get_full_url()
355
356 Return the URL given in the constructor.
357
358
359.. method:: Request.get_type()
360
361 Return the type of the URL --- also known as the scheme.
362
363
364.. method:: Request.get_host()
365
366 Return the host to which a connection will be made.
367
368
369.. method:: Request.get_selector()
370
371 Return the selector --- the part of the URL that is sent to the server.
372
373
374.. method:: Request.set_proxy(host, type)
375
376 Prepare the request by connecting to a proxy server. The *host* and *type* will
377 replace those of the instance, and the instance's selector will be the original
378 URL given in the constructor.
379
380
381.. method:: Request.get_origin_req_host()
382
383 Return the request-host of the origin transaction, as defined by :rfc:`2965`.
384 See the documentation for the :class:`Request` constructor.
385
386
387.. method:: Request.is_unverifiable()
388
389 Return whether the request is unverifiable, as defined by RFC 2965. See the
390 documentation for the :class:`Request` constructor.
391
392
393.. _opener-director-objects:
394
395OpenerDirector Objects
396----------------------
397
398:class:`OpenerDirector` instances have the following methods:
399
400
401.. method:: OpenerDirector.add_handler(handler)
402
Georg Brandld0eb8f92009-01-01 11:53:55 +0000403 *handler* should be an instance of :class:`BaseHandler`. The following
404 methods are searched, and added to the possible chains (note that HTTP errors
405 are a special case).
Georg Brandl8ec7f652007-08-15 14:28:01 +0000406
Georg Brandld0eb8f92009-01-01 11:53:55 +0000407 * :samp:`{protocol}_open` --- signal that the handler knows how to open
408 *protocol* URLs.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000409
Georg Brandld0eb8f92009-01-01 11:53:55 +0000410 * :samp:`http_error_{type}` --- signal that the handler knows how to handle
411 HTTP errors with HTTP error code *type*.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000412
Georg Brandld0eb8f92009-01-01 11:53:55 +0000413 * :samp:`{protocol}_error` --- signal that the handler knows how to handle
414 errors from (non-\ ``http``) *protocol*.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000415
Georg Brandld0eb8f92009-01-01 11:53:55 +0000416 * :samp:`{protocol}_request` --- signal that the handler knows how to
417 pre-process *protocol* requests.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000418
Georg Brandld0eb8f92009-01-01 11:53:55 +0000419 * :samp:`{protocol}_response` --- signal that the handler knows how to
Georg Brandl8ec7f652007-08-15 14:28:01 +0000420 post-process *protocol* responses.
421
422
423.. method:: OpenerDirector.open(url[, data][, timeout])
424
425 Open the given *url* (which can be a request object or a string), optionally
Georg Brandlab756f62008-05-11 11:09:35 +0000426 passing the given *data*. Arguments, return values and exceptions raised are
427 the same as those of :func:`urlopen` (which simply calls the :meth:`open`
428 method on the currently installed global :class:`OpenerDirector`). The
429 optional *timeout* parameter specifies a timeout in seconds for blocking
Facundo Batista4f1b1ed2008-05-29 16:39:26 +0000430 operations like the connection attempt (if not specified, the global default
Georg Brandlda69add2010-05-21 20:52:46 +0000431 timeout setting will be used). The timeout feature actually works only for
Facundo Batista4f1b1ed2008-05-29 16:39:26 +0000432 HTTP, HTTPS, FTP and FTPS connections).
Georg Brandl8ec7f652007-08-15 14:28:01 +0000433
434 .. versionchanged:: 2.6
435 *timeout* was added.
436
437
438.. method:: OpenerDirector.error(proto[, arg[, ...]])
439
440 Handle an error of the given protocol. This will call the registered error
441 handlers for the given protocol with the given arguments (which are protocol
442 specific). The HTTP protocol is a special case which uses the HTTP response
443 code to determine the specific error handler; refer to the :meth:`http_error_\*`
444 methods of the handler classes.
445
446 Return values and exceptions raised are the same as those of :func:`urlopen`.
447
448OpenerDirector objects open URLs in three stages:
449
450The order in which these methods are called within each stage is determined by
451sorting the handler instances.
452
Georg Brandld0eb8f92009-01-01 11:53:55 +0000453#. Every handler with a method named like :samp:`{protocol}_request` has that
Georg Brandl8ec7f652007-08-15 14:28:01 +0000454 method called to pre-process the request.
455
Georg Brandld0eb8f92009-01-01 11:53:55 +0000456#. Handlers with a method named like :samp:`{protocol}_open` are called to handle
Georg Brandl8ec7f652007-08-15 14:28:01 +0000457 the request. This stage ends when a handler either returns a non-\ :const:`None`
458 value (ie. a response), or raises an exception (usually :exc:`URLError`).
459 Exceptions are allowed to propagate.
460
461 In fact, the above algorithm is first tried for methods named
Georg Brandld0eb8f92009-01-01 11:53:55 +0000462 :meth:`default_open`. If all such methods return :const:`None`, the
463 algorithm is repeated for methods named like :samp:`{protocol}_open`. If all
464 such methods return :const:`None`, the algorithm is repeated for methods
465 named :meth:`unknown_open`.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000466
467 Note that the implementation of these methods may involve calls of the parent
Georg Brandl821fc082010-08-01 21:26:45 +0000468 :class:`OpenerDirector` instance's :meth:`~OpenerDirector.open` and
469 :meth:`~OpenerDirector.error` methods.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000470
Georg Brandld0eb8f92009-01-01 11:53:55 +0000471#. Every handler with a method named like :samp:`{protocol}_response` has that
Georg Brandl8ec7f652007-08-15 14:28:01 +0000472 method called to post-process the response.
473
474
475.. _base-handler-objects:
476
477BaseHandler Objects
478-------------------
479
480:class:`BaseHandler` objects provide a couple of methods that are directly
481useful, and others that are meant to be used by derived classes. These are
482intended for direct use:
483
484
485.. method:: BaseHandler.add_parent(director)
486
487 Add a director as parent.
488
489
490.. method:: BaseHandler.close()
491
492 Remove any parents.
493
494The following members and methods should only be used by classes derived from
495:class:`BaseHandler`.
496
497.. note::
498
499 The convention has been adopted that subclasses defining
500 :meth:`protocol_request` or :meth:`protocol_response` methods are named
501 :class:`\*Processor`; all others are named :class:`\*Handler`.
502
503
504.. attribute:: BaseHandler.parent
505
506 A valid :class:`OpenerDirector`, which can be used to open using a different
507 protocol, or handle errors.
508
509
510.. method:: BaseHandler.default_open(req)
511
512 This method is *not* defined in :class:`BaseHandler`, but subclasses should
513 define it if they want to catch all URLs.
514
515 This method, if implemented, will be called by the parent
516 :class:`OpenerDirector`. It should return a file-like object as described in
517 the return value of the :meth:`open` of :class:`OpenerDirector`, or ``None``.
518 It should raise :exc:`URLError`, unless a truly exceptional thing happens (for
519 example, :exc:`MemoryError` should not be mapped to :exc:`URLError`).
520
521 This method will be called before any protocol-specific open method.
522
523
524.. method:: BaseHandler.protocol_open(req)
525 :noindex:
526
Georg Brandld0eb8f92009-01-01 11:53:55 +0000527 ("protocol" is to be replaced by the protocol name.)
528
Georg Brandl8ec7f652007-08-15 14:28:01 +0000529 This method is *not* defined in :class:`BaseHandler`, but subclasses should
Georg Brandld0eb8f92009-01-01 11:53:55 +0000530 define it if they want to handle URLs with the given *protocol*.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000531
532 This method, if defined, will be called by the parent :class:`OpenerDirector`.
533 Return values should be the same as for :meth:`default_open`.
534
535
536.. method:: BaseHandler.unknown_open(req)
537
538 This method is *not* defined in :class:`BaseHandler`, but subclasses should
539 define it if they want to catch all URLs with no specific registered handler to
540 open it.
541
542 This method, if implemented, will be called by the :attr:`parent`
543 :class:`OpenerDirector`. Return values should be the same as for
544 :meth:`default_open`.
545
546
547.. method:: BaseHandler.http_error_default(req, fp, code, msg, hdrs)
548
549 This method is *not* defined in :class:`BaseHandler`, but subclasses should
550 override it if they intend to provide a catch-all for otherwise unhandled HTTP
551 errors. It will be called automatically by the :class:`OpenerDirector` getting
552 the error, and should not normally be called in other circumstances.
553
554 *req* will be a :class:`Request` object, *fp* will be a file-like object with
555 the HTTP error body, *code* will be the three-digit code of the error, *msg*
556 will be the user-visible explanation of the code and *hdrs* will be a mapping
557 object with the headers of the error.
558
559 Return values and exceptions raised should be the same as those of
560 :func:`urlopen`.
561
562
563.. method:: BaseHandler.http_error_nnn(req, fp, code, msg, hdrs)
564
565 *nnn* should be a three-digit HTTP error code. This method is also not defined
566 in :class:`BaseHandler`, but will be called, if it exists, on an instance of a
567 subclass, when an HTTP error with code *nnn* occurs.
568
569 Subclasses should override this method to handle specific HTTP errors.
570
571 Arguments, return values and exceptions raised should be the same as for
572 :meth:`http_error_default`.
573
574
575.. method:: BaseHandler.protocol_request(req)
576 :noindex:
577
Georg Brandld0eb8f92009-01-01 11:53:55 +0000578 ("protocol" is to be replaced by the protocol name.)
579
Georg Brandl8ec7f652007-08-15 14:28:01 +0000580 This method is *not* defined in :class:`BaseHandler`, but subclasses should
Georg Brandld0eb8f92009-01-01 11:53:55 +0000581 define it if they want to pre-process requests of the given *protocol*.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000582
583 This method, if defined, will be called by the parent :class:`OpenerDirector`.
584 *req* will be a :class:`Request` object. The return value should be a
585 :class:`Request` object.
586
587
588.. method:: BaseHandler.protocol_response(req, response)
589 :noindex:
590
Georg Brandld0eb8f92009-01-01 11:53:55 +0000591 ("protocol" is to be replaced by the protocol name.)
592
Georg Brandl8ec7f652007-08-15 14:28:01 +0000593 This method is *not* defined in :class:`BaseHandler`, but subclasses should
Georg Brandld0eb8f92009-01-01 11:53:55 +0000594 define it if they want to post-process responses of the given *protocol*.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000595
596 This method, if defined, will be called by the parent :class:`OpenerDirector`.
597 *req* will be a :class:`Request` object. *response* will be an object
598 implementing the same interface as the return value of :func:`urlopen`. The
599 return value should implement the same interface as the return value of
600 :func:`urlopen`.
601
602
603.. _http-redirect-handler:
604
605HTTPRedirectHandler Objects
606---------------------------
607
608.. note::
609
610 Some HTTP redirections require action from this module's client code. If this
611 is the case, :exc:`HTTPError` is raised. See :rfc:`2616` for details of the
612 precise meanings of the various redirection codes.
613
614
Georg Brandl8fba5b32009-02-13 10:40:14 +0000615.. method:: HTTPRedirectHandler.redirect_request(req, fp, code, msg, hdrs, newurl)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000616
617 Return a :class:`Request` or ``None`` in response to a redirect. This is called
618 by the default implementations of the :meth:`http_error_30\*` methods when a
619 redirection is received from the server. If a redirection should take place,
620 return a new :class:`Request` to allow :meth:`http_error_30\*` to perform the
Georg Brandl8fba5b32009-02-13 10:40:14 +0000621 redirect to *newurl*. Otherwise, raise :exc:`HTTPError` if no other handler
622 should try to handle this URL, or return ``None`` if you can't but another
623 handler might.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000624
625 .. note::
626
627 The default implementation of this method does not strictly follow :rfc:`2616`,
628 which says that 301 and 302 responses to ``POST`` requests must not be
629 automatically redirected without confirmation by the user. In reality, browsers
630 do allow automatic redirection of these responses, changing the POST to a
631 ``GET``, and the default implementation reproduces this behavior.
632
633
634.. method:: HTTPRedirectHandler.http_error_301(req, fp, code, msg, hdrs)
635
Georg Brandl8fba5b32009-02-13 10:40:14 +0000636 Redirect to the ``Location:`` or ``URI:`` URL. This method is called by the
637 parent :class:`OpenerDirector` when getting an HTTP 'moved permanently' response.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000638
639
640.. method:: HTTPRedirectHandler.http_error_302(req, fp, code, msg, hdrs)
641
642 The same as :meth:`http_error_301`, but called for the 'found' response.
643
644
645.. method:: HTTPRedirectHandler.http_error_303(req, fp, code, msg, hdrs)
646
647 The same as :meth:`http_error_301`, but called for the 'see other' response.
648
649
650.. method:: HTTPRedirectHandler.http_error_307(req, fp, code, msg, hdrs)
651
652 The same as :meth:`http_error_301`, but called for the 'temporary redirect'
653 response.
654
655
656.. _http-cookie-processor:
657
658HTTPCookieProcessor Objects
659---------------------------
660
661.. versionadded:: 2.4
662
663:class:`HTTPCookieProcessor` instances have one attribute:
664
665
666.. attribute:: HTTPCookieProcessor.cookiejar
667
668 The :class:`cookielib.CookieJar` in which cookies are stored.
669
670
671.. _proxy-handler:
672
673ProxyHandler Objects
674--------------------
675
676
677.. method:: ProxyHandler.protocol_open(request)
678 :noindex:
679
Georg Brandld0eb8f92009-01-01 11:53:55 +0000680 ("protocol" is to be replaced by the protocol name.)
681
682 The :class:`ProxyHandler` will have a method :samp:`{protocol}_open` for every
Georg Brandl8ec7f652007-08-15 14:28:01 +0000683 *protocol* which has a proxy in the *proxies* dictionary given in the
684 constructor. The method will modify requests to go through the proxy, by
685 calling ``request.set_proxy()``, and call the next handler in the chain to
686 actually execute the protocol.
687
688
689.. _http-password-mgr:
690
691HTTPPasswordMgr Objects
692-----------------------
693
694These methods are available on :class:`HTTPPasswordMgr` and
695:class:`HTTPPasswordMgrWithDefaultRealm` objects.
696
697
698.. method:: HTTPPasswordMgr.add_password(realm, uri, user, passwd)
699
700 *uri* can be either a single URI, or a sequence of URIs. *realm*, *user* and
701 *passwd* must be strings. This causes ``(user, passwd)`` to be used as
702 authentication tokens when authentication for *realm* and a super-URI of any of
703 the given URIs is given.
704
705
706.. method:: HTTPPasswordMgr.find_user_password(realm, authuri)
707
708 Get user/password for given realm and URI, if any. This method will return
709 ``(None, None)`` if there is no matching user/password.
710
711 For :class:`HTTPPasswordMgrWithDefaultRealm` objects, the realm ``None`` will be
712 searched if the given *realm* has no matching user/password.
713
714
715.. _abstract-basic-auth-handler:
716
717AbstractBasicAuthHandler Objects
718--------------------------------
719
720
721.. method:: AbstractBasicAuthHandler.http_error_auth_reqed(authreq, host, req, headers)
722
723 Handle an authentication request by getting a user/password pair, and re-trying
724 the request. *authreq* should be the name of the header where the information
725 about the realm is included in the request, *host* specifies the URL and path to
726 authenticate for, *req* should be the (failed) :class:`Request` object, and
727 *headers* should be the error headers.
728
729 *host* is either an authority (e.g. ``"python.org"``) or a URL containing an
730 authority component (e.g. ``"http://python.org/"``). In either case, the
731 authority must not contain a userinfo component (so, ``"python.org"`` and
732 ``"python.org:80"`` are fine, ``"joe:password@python.org"`` is not).
733
734
735.. _http-basic-auth-handler:
736
737HTTPBasicAuthHandler Objects
738----------------------------
739
740
741.. method:: HTTPBasicAuthHandler.http_error_401(req, fp, code, msg, hdrs)
742
743 Retry the request with authentication information, if available.
744
745
746.. _proxy-basic-auth-handler:
747
748ProxyBasicAuthHandler Objects
749-----------------------------
750
751
752.. method:: ProxyBasicAuthHandler.http_error_407(req, fp, code, msg, hdrs)
753
754 Retry the request with authentication information, if available.
755
756
757.. _abstract-digest-auth-handler:
758
759AbstractDigestAuthHandler Objects
760---------------------------------
761
762
763.. method:: AbstractDigestAuthHandler.http_error_auth_reqed(authreq, host, req, headers)
764
765 *authreq* should be the name of the header where the information about the realm
766 is included in the request, *host* should be the host to authenticate to, *req*
767 should be the (failed) :class:`Request` object, and *headers* should be the
768 error headers.
769
770
771.. _http-digest-auth-handler:
772
773HTTPDigestAuthHandler Objects
774-----------------------------
775
776
777.. method:: HTTPDigestAuthHandler.http_error_401(req, fp, code, msg, hdrs)
778
779 Retry the request with authentication information, if available.
780
781
782.. _proxy-digest-auth-handler:
783
784ProxyDigestAuthHandler Objects
785------------------------------
786
787
788.. method:: ProxyDigestAuthHandler.http_error_407(req, fp, code, msg, hdrs)
789
790 Retry the request with authentication information, if available.
791
792
793.. _http-handler-objects:
794
795HTTPHandler Objects
796-------------------
797
798
799.. method:: HTTPHandler.http_open(req)
800
801 Send an HTTP request, which can be either GET or POST, depending on
802 ``req.has_data()``.
803
804
805.. _https-handler-objects:
806
807HTTPSHandler Objects
808--------------------
809
810
811.. method:: HTTPSHandler.https_open(req)
812
813 Send an HTTPS request, which can be either GET or POST, depending on
814 ``req.has_data()``.
815
816
817.. _file-handler-objects:
818
819FileHandler Objects
820-------------------
821
822
823.. method:: FileHandler.file_open(req)
824
825 Open the file locally, if there is no host name, or the host name is
826 ``'localhost'``. Change the protocol to ``ftp`` otherwise, and retry opening it
827 using :attr:`parent`.
828
829
830.. _ftp-handler-objects:
831
832FTPHandler Objects
833------------------
834
835
836.. method:: FTPHandler.ftp_open(req)
837
838 Open the FTP file indicated by *req*. The login is always done with empty
839 username and password.
840
841
842.. _cacheftp-handler-objects:
843
844CacheFTPHandler Objects
845-----------------------
846
847:class:`CacheFTPHandler` objects are :class:`FTPHandler` objects with the
848following additional methods:
849
850
851.. method:: CacheFTPHandler.setTimeout(t)
852
853 Set timeout of connections to *t* seconds.
854
855
856.. method:: CacheFTPHandler.setMaxConns(m)
857
858 Set maximum number of cached connections to *m*.
859
860
861.. _unknown-handler-objects:
862
863UnknownHandler Objects
864----------------------
865
866
867.. method:: UnknownHandler.unknown_open()
868
869 Raise a :exc:`URLError` exception.
870
871
872.. _http-error-processor-objects:
873
874HTTPErrorProcessor Objects
875--------------------------
876
877.. versionadded:: 2.4
878
879
880.. method:: HTTPErrorProcessor.unknown_open()
881
882 Process HTTP error responses.
883
884 For 200 error codes, the response object is returned immediately.
885
886 For non-200 error codes, this simply passes the job on to the
Georg Brandld0eb8f92009-01-01 11:53:55 +0000887 :samp:`{protocol}_error_code` handler methods, via
888 :meth:`OpenerDirector.error`. Eventually,
889 :class:`urllib2.HTTPDefaultErrorHandler` will raise an :exc:`HTTPError` if no
890 other handler handles the error.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000891
892
893.. _urllib2-examples:
894
895Examples
896--------
897
898This example gets the python.org main page and displays the first 100 bytes of
899it::
900
901 >>> import urllib2
902 >>> f = urllib2.urlopen('http://www.python.org/')
903 >>> print f.read(100)
904 <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
905 <?xml-stylesheet href="./css/ht2html
906
907Here we are sending a data-stream to the stdin of a CGI and reading the data it
908returns to us. Note that this example will only work when the Python
909installation supports SSL. ::
910
911 >>> import urllib2
912 >>> req = urllib2.Request(url='https://localhost/cgi-bin/test.cgi',
913 ... data='This data is passed to stdin of the CGI')
914 >>> f = urllib2.urlopen(req)
915 >>> print f.read()
916 Got Data: "This data is passed to stdin of the CGI"
917
918The code for the sample CGI used in the above example is::
919
920 #!/usr/bin/env python
921 import sys
922 data = sys.stdin.read()
923 print 'Content-type: text-plain\n\nGot Data: "%s"' % data
924
925Use of Basic HTTP Authentication::
926
927 import urllib2
928 # Create an OpenerDirector with support for Basic HTTP Authentication...
929 auth_handler = urllib2.HTTPBasicAuthHandler()
930 auth_handler.add_password(realm='PDQ Application',
931 uri='https://mahler:8092/site-updates.py',
932 user='klem',
933 passwd='kadidd!ehopper')
934 opener = urllib2.build_opener(auth_handler)
935 # ...and install it globally so it can be used with urlopen.
936 urllib2.install_opener(opener)
937 urllib2.urlopen('http://www.example.com/login.html')
938
939:func:`build_opener` provides many handlers by default, including a
940:class:`ProxyHandler`. By default, :class:`ProxyHandler` uses the environment
941variables named ``<scheme>_proxy``, where ``<scheme>`` is the URL scheme
942involved. For example, the :envvar:`http_proxy` environment variable is read to
943obtain the HTTP proxy's URL.
944
945This example replaces the default :class:`ProxyHandler` with one that uses
Benjamin Peterson90f36732008-07-12 20:16:19 +0000946programmatically-supplied proxy URLs, and adds proxy authorization support with
Georg Brandl8ec7f652007-08-15 14:28:01 +0000947:class:`ProxyBasicAuthHandler`. ::
948
949 proxy_handler = urllib2.ProxyHandler({'http': 'http://www.example.com:3128/'})
Senthil Kumaranf9a21f42009-12-24 02:18:14 +0000950 proxy_auth_handler = urllib2.ProxyBasicAuthHandler()
Georg Brandl8ec7f652007-08-15 14:28:01 +0000951 proxy_auth_handler.add_password('realm', 'host', 'username', 'password')
952
Senthil Kumaranf9a21f42009-12-24 02:18:14 +0000953 opener = urllib2.build_opener(proxy_handler, proxy_auth_handler)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000954 # This time, rather than install the OpenerDirector, we use it directly:
955 opener.open('http://www.example.com/login.html')
956
957Adding HTTP headers:
958
959Use the *headers* argument to the :class:`Request` constructor, or::
960
961 import urllib2
962 req = urllib2.Request('http://www.example.com/')
963 req.add_header('Referer', 'http://www.python.org/')
964 r = urllib2.urlopen(req)
965
966:class:`OpenerDirector` automatically adds a :mailheader:`User-Agent` header to
967every :class:`Request`. To change this::
968
969 import urllib2
970 opener = urllib2.build_opener()
971 opener.addheaders = [('User-agent', 'Mozilla/5.0')]
972 opener.open('http://www.example.com/')
973
974Also, remember that a few standard headers (:mailheader:`Content-Length`,
975:mailheader:`Content-Type` and :mailheader:`Host`) are added when the
976:class:`Request` is passed to :func:`urlopen` (or :meth:`OpenerDirector.open`).
977