blob: d5c57b2c8076100d33937b03238ef37398a62ede [file] [log] [blame]
Georg Brandl8ec7f652007-08-15 14:28:01 +00001:mod:`urllib2` --- extensible library for opening URLs
2======================================================
3
4.. module:: urllib2
5 :synopsis: Next generation URL opening library.
6.. moduleauthor:: Jeremy Hylton <jhylton@users.sourceforge.net>
7.. sectionauthor:: Moshe Zadka <moshez@users.sourceforge.net>
8
9
10The :mod:`urllib2` module defines functions and classes which help in opening
11URLs (mostly HTTP) in a complex world --- basic and digest authentication,
12redirections, cookies and more.
13
14The :mod:`urllib2` module defines the following functions:
15
16
17.. function:: urlopen(url[, data][, timeout])
18
19 Open the URL *url*, which can be either a string or a :class:`Request` object.
20
21 *data* may be a string specifying additional data to send to the server, or
22 ``None`` if no such data is needed. Currently HTTP requests are the only ones
23 that use *data*; the HTTP request will be a POST instead of a GET when the
24 *data* parameter is provided. *data* should be a buffer in the standard
25 :mimetype:`application/x-www-form-urlencoded` format. The
26 :func:`urllib.urlencode` function takes a mapping or sequence of 2-tuples and
27 returns a string in this format.
28
29 The optional *timeout* parameter specifies a timeout in seconds for the
30 connection attempt (if not specified, or passed as None, the global default
31 timeout setting will be used). This actually only work for HTTP, HTTPS, FTP and
32 FTPS connections.
33
34 This function returns a file-like object with two additional methods:
35
Georg Brandl586a57a2008-02-02 09:56:20 +000036 * :meth:`geturl` --- return the URL of the resource retrieved, commonly used to
37 determine if a redirect was followed
Georg Brandl8ec7f652007-08-15 14:28:01 +000038
Georg Brandl586a57a2008-02-02 09:56:20 +000039 * :meth:`info` --- return the meta-information of the page, such as headers, in
40 the form of an ``httplib.HTTPMessage`` instance
41 (see `Quick Reference to HTTP Headers <http://www.cs.tut.fi/~jkorpela/http.html>`_)
Georg Brandl8ec7f652007-08-15 14:28:01 +000042
43 Raises :exc:`URLError` on errors.
44
45 Note that ``None`` may be returned if no handler handles the request (though the
46 default installed global :class:`OpenerDirector` uses :class:`UnknownHandler` to
47 ensure this never happens).
48
49 .. versionchanged:: 2.6
50 *timeout* was added.
51
52
53.. function:: install_opener(opener)
54
55 Install an :class:`OpenerDirector` instance as the default global opener.
56 Installing an opener is only necessary if you want urlopen to use that opener;
57 otherwise, simply call :meth:`OpenerDirector.open` instead of :func:`urlopen`.
58 The code does not check for a real :class:`OpenerDirector`, and any class with
59 the appropriate interface will work.
60
61
62.. function:: build_opener([handler, ...])
63
64 Return an :class:`OpenerDirector` instance, which chains the handlers in the
65 order given. *handler*\s can be either instances of :class:`BaseHandler`, or
66 subclasses of :class:`BaseHandler` (in which case it must be possible to call
67 the constructor without any parameters). Instances of the following classes
68 will be in front of the *handler*\s, unless the *handler*\s contain them,
69 instances of them or subclasses of them: :class:`ProxyHandler`,
70 :class:`UnknownHandler`, :class:`HTTPHandler`, :class:`HTTPDefaultErrorHandler`,
71 :class:`HTTPRedirectHandler`, :class:`FTPHandler`, :class:`FileHandler`,
72 :class:`HTTPErrorProcessor`.
73
Guido van Rossum8ee23bb2007-08-27 19:11:11 +000074 If the Python installation has SSL support (i.e., if the :mod:`ssl` module can be imported),
Georg Brandl8ec7f652007-08-15 14:28:01 +000075 :class:`HTTPSHandler` will also be added.
76
77 Beginning in Python 2.3, a :class:`BaseHandler` subclass may also change its
78 :attr:`handler_order` member variable to modify its position in the handlers
79 list.
80
81The following exceptions are raised as appropriate:
82
83
84.. exception:: URLError
85
86 The handlers raise this exception (or derived exceptions) when they run into a
87 problem. It is a subclass of :exc:`IOError`.
88
Georg Brandl586a57a2008-02-02 09:56:20 +000089 .. attribute:: reason
90
91 The reason for this error. It can be a message string or another exception
92 instance (:exc:`socket.error` for remote URLs, :exc:`OSError` for local
93 URLs).
94
Georg Brandl8ec7f652007-08-15 14:28:01 +000095
96.. exception:: HTTPError
97
Georg Brandl586a57a2008-02-02 09:56:20 +000098 Though being an exception (a subclass of :exc:`URLError`), an :exc:`HTTPError`
99 can also function as a non-exceptional file-like return value (the same thing
100 that :func:`urlopen` returns). This is useful when handling exotic HTTP
101 errors, such as requests for authentication.
102
103 .. attribute:: code
104
105 An HTTP status code as defined in `RFC 2616 <http://www.faqs.org/rfcs/rfc2616.html>`_.
106 This numeric value corresponds to a value found in the dictionary of
107 codes as found in :attr:`BaseHTTPServer.BaseHTTPRequestHandler.responses`.
108
109
Georg Brandl8ec7f652007-08-15 14:28:01 +0000110
111The following classes are provided:
112
113
Georg Brandl586a57a2008-02-02 09:56:20 +0000114.. class:: Request(url[, data][, headers][, origin_req_host][, unverifiable])
Georg Brandl8ec7f652007-08-15 14:28:01 +0000115
116 This class is an abstraction of a URL request.
117
118 *url* should be a string containing a valid URL.
119
120 *data* may be a string specifying additional data to send to the server, or
121 ``None`` if no such data is needed. Currently HTTP requests are the only ones
122 that use *data*; the HTTP request will be a POST instead of a GET when the
123 *data* parameter is provided. *data* should be a buffer in the standard
124 :mimetype:`application/x-www-form-urlencoded` format. The
125 :func:`urllib.urlencode` function takes a mapping or sequence of 2-tuples and
126 returns a string in this format.
127
128 *headers* should be a dictionary, and will be treated as if :meth:`add_header`
Georg Brandl586a57a2008-02-02 09:56:20 +0000129 was called with each key and value as arguments. This is often used to "spoof"
130 the ``User-Agent`` header, which is used by a browser to identify itself --
131 some HTTP servers only allow requests coming from common browsers as opposed
132 to scripts. For example, Mozilla Firefox may identify itself as ``"Mozilla/5.0
133 (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11"``, while :mod:`urllib2`'s
134 default user agent string is ``"Python-urllib/2.6"`` (on Python 2.6).
Georg Brandl8ec7f652007-08-15 14:28:01 +0000135
136 The final two arguments are only of interest for correct handling of third-party
137 HTTP cookies:
138
139 *origin_req_host* should be the request-host of the origin transaction, as
140 defined by :rfc:`2965`. It defaults to ``cookielib.request_host(self)``. This
141 is the host name or IP address of the original request that was initiated by the
142 user. For example, if the request is for an image in an HTML document, this
143 should be the request-host of the request for the page containing the image.
144
145 *unverifiable* should indicate whether the request is unverifiable, as defined
146 by RFC 2965. It defaults to False. An unverifiable request is one whose URL
147 the user did not have the option to approve. For example, if the request is for
148 an image in an HTML document, and the user had no option to approve the
149 automatic fetching of the image, this should be true.
150
151
152.. class:: OpenerDirector()
153
154 The :class:`OpenerDirector` class opens URLs via :class:`BaseHandler`\ s chained
155 together. It manages the chaining of handlers, and recovery from errors.
156
157
158.. class:: BaseHandler()
159
160 This is the base class for all registered handlers --- and handles only the
161 simple mechanics of registration.
162
163
164.. class:: HTTPDefaultErrorHandler()
165
166 A class which defines a default handler for HTTP error responses; all responses
167 are turned into :exc:`HTTPError` exceptions.
168
169
170.. class:: HTTPRedirectHandler()
171
172 A class to handle redirections.
173
174
175.. class:: HTTPCookieProcessor([cookiejar])
176
177 A class to handle HTTP Cookies.
178
179
180.. class:: ProxyHandler([proxies])
181
182 Cause requests to go through a proxy. If *proxies* is given, it must be a
183 dictionary mapping protocol names to URLs of proxies. The default is to read the
184 list of proxies from the environment variables :envvar:`<protocol>_proxy`.
185
186
187.. class:: HTTPPasswordMgr()
188
189 Keep a database of ``(realm, uri) -> (user, password)`` mappings.
190
191
192.. class:: HTTPPasswordMgrWithDefaultRealm()
193
194 Keep a database of ``(realm, uri) -> (user, password)`` mappings. A realm of
195 ``None`` is considered a catch-all realm, which is searched if no other realm
196 fits.
197
198
199.. class:: AbstractBasicAuthHandler([password_mgr])
200
201 This is a mixin class that helps with HTTP authentication, both to the remote
202 host and to a proxy. *password_mgr*, if given, should be something that is
203 compatible with :class:`HTTPPasswordMgr`; refer to section
204 :ref:`http-password-mgr` for information on the interface that must be
205 supported.
206
207
208.. class:: HTTPBasicAuthHandler([password_mgr])
209
210 Handle authentication with the remote host. *password_mgr*, if given, should be
211 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
212 :ref:`http-password-mgr` for information on the interface that must be
213 supported.
214
215
216.. class:: ProxyBasicAuthHandler([password_mgr])
217
218 Handle authentication with the proxy. *password_mgr*, if given, should be
219 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
220 :ref:`http-password-mgr` for information on the interface that must be
221 supported.
222
223
224.. class:: AbstractDigestAuthHandler([password_mgr])
225
226 This is a mixin class that helps with HTTP authentication, both to the remote
227 host and to a proxy. *password_mgr*, if given, should be something that is
228 compatible with :class:`HTTPPasswordMgr`; refer to section
229 :ref:`http-password-mgr` for information on the interface that must be
230 supported.
231
232
233.. class:: HTTPDigestAuthHandler([password_mgr])
234
235 Handle authentication with the remote host. *password_mgr*, if given, should be
236 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
237 :ref:`http-password-mgr` for information on the interface that must be
238 supported.
239
240
241.. class:: ProxyDigestAuthHandler([password_mgr])
242
243 Handle authentication with the proxy. *password_mgr*, if given, should be
244 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
245 :ref:`http-password-mgr` for information on the interface that must be
246 supported.
247
248
249.. class:: HTTPHandler()
250
251 A class to handle opening of HTTP URLs.
252
253
254.. class:: HTTPSHandler()
255
256 A class to handle opening of HTTPS URLs.
257
258
259.. class:: FileHandler()
260
261 Open local files.
262
263
264.. class:: FTPHandler()
265
266 Open FTP URLs.
267
268
269.. class:: CacheFTPHandler()
270
271 Open FTP URLs, keeping a cache of open FTP connections to minimize delays.
272
273
274.. class:: UnknownHandler()
275
276 A catch-all class to handle unknown URLs.
277
278
279.. _request-objects:
280
281Request Objects
282---------------
283
284The following methods describe all of :class:`Request`'s public interface, and
285so all must be overridden in subclasses.
286
287
288.. method:: Request.add_data(data)
289
290 Set the :class:`Request` data to *data*. This is ignored by all handlers except
291 HTTP handlers --- and there it should be a byte string, and will change the
292 request to be ``POST`` rather than ``GET``.
293
294
295.. method:: Request.get_method()
296
297 Return a string indicating the HTTP request method. This is only meaningful for
298 HTTP requests, and currently always returns ``'GET'`` or ``'POST'``.
299
300
301.. method:: Request.has_data()
302
303 Return whether the instance has a non-\ ``None`` data.
304
305
306.. method:: Request.get_data()
307
308 Return the instance's data.
309
310
311.. method:: Request.add_header(key, val)
312
313 Add another header to the request. Headers are currently ignored by all
314 handlers except HTTP handlers, where they are added to the list of headers sent
315 to the server. Note that there cannot be more than one header with the same
316 name, and later calls will overwrite previous calls in case the *key* collides.
317 Currently, this is no loss of HTTP functionality, since all headers which have
318 meaning when used more than once have a (header-specific) way of gaining the
319 same functionality using only one header.
320
321
322.. method:: Request.add_unredirected_header(key, header)
323
324 Add a header that will not be added to a redirected request.
325
326 .. versionadded:: 2.4
327
328
329.. method:: Request.has_header(header)
330
331 Return whether the instance has the named header (checks both regular and
332 unredirected).
333
334 .. versionadded:: 2.4
335
336
337.. method:: Request.get_full_url()
338
339 Return the URL given in the constructor.
340
341
342.. method:: Request.get_type()
343
344 Return the type of the URL --- also known as the scheme.
345
346
347.. method:: Request.get_host()
348
349 Return the host to which a connection will be made.
350
351
352.. method:: Request.get_selector()
353
354 Return the selector --- the part of the URL that is sent to the server.
355
356
357.. method:: Request.set_proxy(host, type)
358
359 Prepare the request by connecting to a proxy server. The *host* and *type* will
360 replace those of the instance, and the instance's selector will be the original
361 URL given in the constructor.
362
363
364.. method:: Request.get_origin_req_host()
365
366 Return the request-host of the origin transaction, as defined by :rfc:`2965`.
367 See the documentation for the :class:`Request` constructor.
368
369
370.. method:: Request.is_unverifiable()
371
372 Return whether the request is unverifiable, as defined by RFC 2965. See the
373 documentation for the :class:`Request` constructor.
374
375
376.. _opener-director-objects:
377
378OpenerDirector Objects
379----------------------
380
381:class:`OpenerDirector` instances have the following methods:
382
383
384.. method:: OpenerDirector.add_handler(handler)
385
386 *handler* should be an instance of :class:`BaseHandler`. The following methods
387 are searched, and added to the possible chains (note that HTTP errors are a
388 special case).
389
390 * :meth:`protocol_open` --- signal that the handler knows how to open *protocol*
391 URLs.
392
393 * :meth:`http_error_type` --- signal that the handler knows how to handle HTTP
394 errors with HTTP error code *type*.
395
396 * :meth:`protocol_error` --- signal that the handler knows how to handle errors
397 from (non-\ ``http``) *protocol*.
398
399 * :meth:`protocol_request` --- signal that the handler knows how to pre-process
400 *protocol* requests.
401
402 * :meth:`protocol_response` --- signal that the handler knows how to
403 post-process *protocol* responses.
404
405
406.. method:: OpenerDirector.open(url[, data][, timeout])
407
408 Open the given *url* (which can be a request object or a string), optionally
409 passing the given *data*. Arguments, return values and exceptions raised are the
410 same as those of :func:`urlopen` (which simply calls the :meth:`open` method on
411 the currently installed global :class:`OpenerDirector`). The optional *timeout*
412 parameter specifies a timeout in seconds for the connection attempt (if not
413 specified, or passed as None, the global default timeout setting will be used;
414 this actually only work for HTTP, HTTPS, FTP and FTPS connections).
415
416 .. versionchanged:: 2.6
417 *timeout* was added.
418
419
420.. method:: OpenerDirector.error(proto[, arg[, ...]])
421
422 Handle an error of the given protocol. This will call the registered error
423 handlers for the given protocol with the given arguments (which are protocol
424 specific). The HTTP protocol is a special case which uses the HTTP response
425 code to determine the specific error handler; refer to the :meth:`http_error_\*`
426 methods of the handler classes.
427
428 Return values and exceptions raised are the same as those of :func:`urlopen`.
429
430OpenerDirector objects open URLs in three stages:
431
432The order in which these methods are called within each stage is determined by
433sorting the handler instances.
434
435#. Every handler with a method named like :meth:`protocol_request` has that
436 method called to pre-process the request.
437
438#. Handlers with a method named like :meth:`protocol_open` are called to handle
439 the request. This stage ends when a handler either returns a non-\ :const:`None`
440 value (ie. a response), or raises an exception (usually :exc:`URLError`).
441 Exceptions are allowed to propagate.
442
443 In fact, the above algorithm is first tried for methods named
444 :meth:`default_open`. If all such methods return :const:`None`, the algorithm
445 is repeated for methods named like :meth:`protocol_open`. If all such methods
446 return :const:`None`, the algorithm is repeated for methods named
447 :meth:`unknown_open`.
448
449 Note that the implementation of these methods may involve calls of the parent
450 :class:`OpenerDirector` instance's :meth:`.open` and :meth:`.error` methods.
451
452#. Every handler with a method named like :meth:`protocol_response` has that
453 method called to post-process the response.
454
455
456.. _base-handler-objects:
457
458BaseHandler Objects
459-------------------
460
461:class:`BaseHandler` objects provide a couple of methods that are directly
462useful, and others that are meant to be used by derived classes. These are
463intended for direct use:
464
465
466.. method:: BaseHandler.add_parent(director)
467
468 Add a director as parent.
469
470
471.. method:: BaseHandler.close()
472
473 Remove any parents.
474
475The following members and methods should only be used by classes derived from
476:class:`BaseHandler`.
477
478.. note::
479
480 The convention has been adopted that subclasses defining
481 :meth:`protocol_request` or :meth:`protocol_response` methods are named
482 :class:`\*Processor`; all others are named :class:`\*Handler`.
483
484
485.. attribute:: BaseHandler.parent
486
487 A valid :class:`OpenerDirector`, which can be used to open using a different
488 protocol, or handle errors.
489
490
491.. method:: BaseHandler.default_open(req)
492
493 This method is *not* defined in :class:`BaseHandler`, but subclasses should
494 define it if they want to catch all URLs.
495
496 This method, if implemented, will be called by the parent
497 :class:`OpenerDirector`. It should return a file-like object as described in
498 the return value of the :meth:`open` of :class:`OpenerDirector`, or ``None``.
499 It should raise :exc:`URLError`, unless a truly exceptional thing happens (for
500 example, :exc:`MemoryError` should not be mapped to :exc:`URLError`).
501
502 This method will be called before any protocol-specific open method.
503
504
505.. method:: BaseHandler.protocol_open(req)
506 :noindex:
507
508 This method is *not* defined in :class:`BaseHandler`, but subclasses should
509 define it if they want to handle URLs with the given protocol.
510
511 This method, if defined, will be called by the parent :class:`OpenerDirector`.
512 Return values should be the same as for :meth:`default_open`.
513
514
515.. method:: BaseHandler.unknown_open(req)
516
517 This method is *not* defined in :class:`BaseHandler`, but subclasses should
518 define it if they want to catch all URLs with no specific registered handler to
519 open it.
520
521 This method, if implemented, will be called by the :attr:`parent`
522 :class:`OpenerDirector`. Return values should be the same as for
523 :meth:`default_open`.
524
525
526.. method:: BaseHandler.http_error_default(req, fp, code, msg, hdrs)
527
528 This method is *not* defined in :class:`BaseHandler`, but subclasses should
529 override it if they intend to provide a catch-all for otherwise unhandled HTTP
530 errors. It will be called automatically by the :class:`OpenerDirector` getting
531 the error, and should not normally be called in other circumstances.
532
533 *req* will be a :class:`Request` object, *fp* will be a file-like object with
534 the HTTP error body, *code* will be the three-digit code of the error, *msg*
535 will be the user-visible explanation of the code and *hdrs* will be a mapping
536 object with the headers of the error.
537
538 Return values and exceptions raised should be the same as those of
539 :func:`urlopen`.
540
541
542.. method:: BaseHandler.http_error_nnn(req, fp, code, msg, hdrs)
543
544 *nnn* should be a three-digit HTTP error code. This method is also not defined
545 in :class:`BaseHandler`, but will be called, if it exists, on an instance of a
546 subclass, when an HTTP error with code *nnn* occurs.
547
548 Subclasses should override this method to handle specific HTTP errors.
549
550 Arguments, return values and exceptions raised should be the same as for
551 :meth:`http_error_default`.
552
553
554.. method:: BaseHandler.protocol_request(req)
555 :noindex:
556
557 This method is *not* defined in :class:`BaseHandler`, but subclasses should
558 define it if they want to pre-process requests of the given protocol.
559
560 This method, if defined, will be called by the parent :class:`OpenerDirector`.
561 *req* will be a :class:`Request` object. The return value should be a
562 :class:`Request` object.
563
564
565.. method:: BaseHandler.protocol_response(req, response)
566 :noindex:
567
568 This method is *not* defined in :class:`BaseHandler`, but subclasses should
569 define it if they want to post-process responses of the given protocol.
570
571 This method, if defined, will be called by the parent :class:`OpenerDirector`.
572 *req* will be a :class:`Request` object. *response* will be an object
573 implementing the same interface as the return value of :func:`urlopen`. The
574 return value should implement the same interface as the return value of
575 :func:`urlopen`.
576
577
578.. _http-redirect-handler:
579
580HTTPRedirectHandler Objects
581---------------------------
582
583.. note::
584
585 Some HTTP redirections require action from this module's client code. If this
586 is the case, :exc:`HTTPError` is raised. See :rfc:`2616` for details of the
587 precise meanings of the various redirection codes.
588
589
590.. method:: HTTPRedirectHandler.redirect_request(req, fp, code, msg, hdrs)
591
592 Return a :class:`Request` or ``None`` in response to a redirect. This is called
593 by the default implementations of the :meth:`http_error_30\*` methods when a
594 redirection is received from the server. If a redirection should take place,
595 return a new :class:`Request` to allow :meth:`http_error_30\*` to perform the
596 redirect. Otherwise, raise :exc:`HTTPError` if no other handler should try to
597 handle this URL, or return ``None`` if you can't but another handler might.
598
599 .. note::
600
601 The default implementation of this method does not strictly follow :rfc:`2616`,
602 which says that 301 and 302 responses to ``POST`` requests must not be
603 automatically redirected without confirmation by the user. In reality, browsers
604 do allow automatic redirection of these responses, changing the POST to a
605 ``GET``, and the default implementation reproduces this behavior.
606
607
608.. method:: HTTPRedirectHandler.http_error_301(req, fp, code, msg, hdrs)
609
610 Redirect to the ``Location:`` URL. This method is called by the parent
611 :class:`OpenerDirector` when getting an HTTP 'moved permanently' response.
612
613
614.. method:: HTTPRedirectHandler.http_error_302(req, fp, code, msg, hdrs)
615
616 The same as :meth:`http_error_301`, but called for the 'found' response.
617
618
619.. method:: HTTPRedirectHandler.http_error_303(req, fp, code, msg, hdrs)
620
621 The same as :meth:`http_error_301`, but called for the 'see other' response.
622
623
624.. method:: HTTPRedirectHandler.http_error_307(req, fp, code, msg, hdrs)
625
626 The same as :meth:`http_error_301`, but called for the 'temporary redirect'
627 response.
628
629
630.. _http-cookie-processor:
631
632HTTPCookieProcessor Objects
633---------------------------
634
635.. versionadded:: 2.4
636
637:class:`HTTPCookieProcessor` instances have one attribute:
638
639
640.. attribute:: HTTPCookieProcessor.cookiejar
641
642 The :class:`cookielib.CookieJar` in which cookies are stored.
643
644
645.. _proxy-handler:
646
647ProxyHandler Objects
648--------------------
649
650
651.. method:: ProxyHandler.protocol_open(request)
652 :noindex:
653
654 The :class:`ProxyHandler` will have a method :meth:`protocol_open` for every
655 *protocol* which has a proxy in the *proxies* dictionary given in the
656 constructor. The method will modify requests to go through the proxy, by
657 calling ``request.set_proxy()``, and call the next handler in the chain to
658 actually execute the protocol.
659
660
661.. _http-password-mgr:
662
663HTTPPasswordMgr Objects
664-----------------------
665
666These methods are available on :class:`HTTPPasswordMgr` and
667:class:`HTTPPasswordMgrWithDefaultRealm` objects.
668
669
670.. method:: HTTPPasswordMgr.add_password(realm, uri, user, passwd)
671
672 *uri* can be either a single URI, or a sequence of URIs. *realm*, *user* and
673 *passwd* must be strings. This causes ``(user, passwd)`` to be used as
674 authentication tokens when authentication for *realm* and a super-URI of any of
675 the given URIs is given.
676
677
678.. method:: HTTPPasswordMgr.find_user_password(realm, authuri)
679
680 Get user/password for given realm and URI, if any. This method will return
681 ``(None, None)`` if there is no matching user/password.
682
683 For :class:`HTTPPasswordMgrWithDefaultRealm` objects, the realm ``None`` will be
684 searched if the given *realm* has no matching user/password.
685
686
687.. _abstract-basic-auth-handler:
688
689AbstractBasicAuthHandler Objects
690--------------------------------
691
692
693.. method:: AbstractBasicAuthHandler.http_error_auth_reqed(authreq, host, req, headers)
694
695 Handle an authentication request by getting a user/password pair, and re-trying
696 the request. *authreq* should be the name of the header where the information
697 about the realm is included in the request, *host* specifies the URL and path to
698 authenticate for, *req* should be the (failed) :class:`Request` object, and
699 *headers* should be the error headers.
700
701 *host* is either an authority (e.g. ``"python.org"``) or a URL containing an
702 authority component (e.g. ``"http://python.org/"``). In either case, the
703 authority must not contain a userinfo component (so, ``"python.org"`` and
704 ``"python.org:80"`` are fine, ``"joe:password@python.org"`` is not).
705
706
707.. _http-basic-auth-handler:
708
709HTTPBasicAuthHandler Objects
710----------------------------
711
712
713.. method:: HTTPBasicAuthHandler.http_error_401(req, fp, code, msg, hdrs)
714
715 Retry the request with authentication information, if available.
716
717
718.. _proxy-basic-auth-handler:
719
720ProxyBasicAuthHandler Objects
721-----------------------------
722
723
724.. method:: ProxyBasicAuthHandler.http_error_407(req, fp, code, msg, hdrs)
725
726 Retry the request with authentication information, if available.
727
728
729.. _abstract-digest-auth-handler:
730
731AbstractDigestAuthHandler Objects
732---------------------------------
733
734
735.. method:: AbstractDigestAuthHandler.http_error_auth_reqed(authreq, host, req, headers)
736
737 *authreq* should be the name of the header where the information about the realm
738 is included in the request, *host* should be the host to authenticate to, *req*
739 should be the (failed) :class:`Request` object, and *headers* should be the
740 error headers.
741
742
743.. _http-digest-auth-handler:
744
745HTTPDigestAuthHandler Objects
746-----------------------------
747
748
749.. method:: HTTPDigestAuthHandler.http_error_401(req, fp, code, msg, hdrs)
750
751 Retry the request with authentication information, if available.
752
753
754.. _proxy-digest-auth-handler:
755
756ProxyDigestAuthHandler Objects
757------------------------------
758
759
760.. method:: ProxyDigestAuthHandler.http_error_407(req, fp, code, msg, hdrs)
761
762 Retry the request with authentication information, if available.
763
764
765.. _http-handler-objects:
766
767HTTPHandler Objects
768-------------------
769
770
771.. method:: HTTPHandler.http_open(req)
772
773 Send an HTTP request, which can be either GET or POST, depending on
774 ``req.has_data()``.
775
776
777.. _https-handler-objects:
778
779HTTPSHandler Objects
780--------------------
781
782
783.. method:: HTTPSHandler.https_open(req)
784
785 Send an HTTPS request, which can be either GET or POST, depending on
786 ``req.has_data()``.
787
788
789.. _file-handler-objects:
790
791FileHandler Objects
792-------------------
793
794
795.. method:: FileHandler.file_open(req)
796
797 Open the file locally, if there is no host name, or the host name is
798 ``'localhost'``. Change the protocol to ``ftp`` otherwise, and retry opening it
799 using :attr:`parent`.
800
801
802.. _ftp-handler-objects:
803
804FTPHandler Objects
805------------------
806
807
808.. method:: FTPHandler.ftp_open(req)
809
810 Open the FTP file indicated by *req*. The login is always done with empty
811 username and password.
812
813
814.. _cacheftp-handler-objects:
815
816CacheFTPHandler Objects
817-----------------------
818
819:class:`CacheFTPHandler` objects are :class:`FTPHandler` objects with the
820following additional methods:
821
822
823.. method:: CacheFTPHandler.setTimeout(t)
824
825 Set timeout of connections to *t* seconds.
826
827
828.. method:: CacheFTPHandler.setMaxConns(m)
829
830 Set maximum number of cached connections to *m*.
831
832
833.. _unknown-handler-objects:
834
835UnknownHandler Objects
836----------------------
837
838
839.. method:: UnknownHandler.unknown_open()
840
841 Raise a :exc:`URLError` exception.
842
843
844.. _http-error-processor-objects:
845
846HTTPErrorProcessor Objects
847--------------------------
848
849.. versionadded:: 2.4
850
851
852.. method:: HTTPErrorProcessor.unknown_open()
853
854 Process HTTP error responses.
855
856 For 200 error codes, the response object is returned immediately.
857
858 For non-200 error codes, this simply passes the job on to the
859 :meth:`protocol_error_code` handler methods, via :meth:`OpenerDirector.error`.
860 Eventually, :class:`urllib2.HTTPDefaultErrorHandler` will raise an
861 :exc:`HTTPError` if no other handler handles the error.
862
863
864.. _urllib2-examples:
865
866Examples
867--------
868
869This example gets the python.org main page and displays the first 100 bytes of
870it::
871
872 >>> import urllib2
873 >>> f = urllib2.urlopen('http://www.python.org/')
874 >>> print f.read(100)
875 <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
876 <?xml-stylesheet href="./css/ht2html
877
878Here we are sending a data-stream to the stdin of a CGI and reading the data it
879returns to us. Note that this example will only work when the Python
880installation supports SSL. ::
881
882 >>> import urllib2
883 >>> req = urllib2.Request(url='https://localhost/cgi-bin/test.cgi',
884 ... data='This data is passed to stdin of the CGI')
885 >>> f = urllib2.urlopen(req)
886 >>> print f.read()
887 Got Data: "This data is passed to stdin of the CGI"
888
889The code for the sample CGI used in the above example is::
890
891 #!/usr/bin/env python
892 import sys
893 data = sys.stdin.read()
894 print 'Content-type: text-plain\n\nGot Data: "%s"' % data
895
896Use of Basic HTTP Authentication::
897
898 import urllib2
899 # Create an OpenerDirector with support for Basic HTTP Authentication...
900 auth_handler = urllib2.HTTPBasicAuthHandler()
901 auth_handler.add_password(realm='PDQ Application',
902 uri='https://mahler:8092/site-updates.py',
903 user='klem',
904 passwd='kadidd!ehopper')
905 opener = urllib2.build_opener(auth_handler)
906 # ...and install it globally so it can be used with urlopen.
907 urllib2.install_opener(opener)
908 urllib2.urlopen('http://www.example.com/login.html')
909
910:func:`build_opener` provides many handlers by default, including a
911:class:`ProxyHandler`. By default, :class:`ProxyHandler` uses the environment
912variables named ``<scheme>_proxy``, where ``<scheme>`` is the URL scheme
913involved. For example, the :envvar:`http_proxy` environment variable is read to
914obtain the HTTP proxy's URL.
915
916This example replaces the default :class:`ProxyHandler` with one that uses
917programatically-supplied proxy URLs, and adds proxy authorization support with
918:class:`ProxyBasicAuthHandler`. ::
919
920 proxy_handler = urllib2.ProxyHandler({'http': 'http://www.example.com:3128/'})
921 proxy_auth_handler = urllib2.HTTPBasicAuthHandler()
922 proxy_auth_handler.add_password('realm', 'host', 'username', 'password')
923
924 opener = build_opener(proxy_handler, proxy_auth_handler)
925 # This time, rather than install the OpenerDirector, we use it directly:
926 opener.open('http://www.example.com/login.html')
927
928Adding HTTP headers:
929
930Use the *headers* argument to the :class:`Request` constructor, or::
931
932 import urllib2
933 req = urllib2.Request('http://www.example.com/')
934 req.add_header('Referer', 'http://www.python.org/')
935 r = urllib2.urlopen(req)
936
937:class:`OpenerDirector` automatically adds a :mailheader:`User-Agent` header to
938every :class:`Request`. To change this::
939
940 import urllib2
941 opener = urllib2.build_opener()
942 opener.addheaders = [('User-agent', 'Mozilla/5.0')]
943 opener.open('http://www.example.com/')
944
945Also, remember that a few standard headers (:mailheader:`Content-Length`,
946:mailheader:`Content-Type` and :mailheader:`Host`) are added when the
947:class:`Request` is passed to :func:`urlopen` (or :meth:`OpenerDirector.open`).
948