blob: ea43ebf315751695ee2e7f848c8e7c3951c693ca [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`urllib2` --- extensible library for opening URLs
2======================================================
3
4.. module:: urllib2
5 :synopsis: Next generation URL opening library.
6.. moduleauthor:: Jeremy Hylton <jhylton@users.sourceforge.net>
7.. sectionauthor:: Moshe Zadka <moshez@users.sourceforge.net>
8
9
10The :mod:`urllib2` module defines functions and classes which help in opening
11URLs (mostly HTTP) in a complex world --- basic and digest authentication,
12redirections, cookies and more.
13
14The :mod:`urllib2` module defines the following functions:
15
16
17.. function:: urlopen(url[, data][, timeout])
18
19 Open the URL *url*, which can be either a string or a :class:`Request` object.
20
21 *data* may be a string specifying additional data to send to the server, or
22 ``None`` if no such data is needed. Currently HTTP requests are the only ones
23 that use *data*; the HTTP request will be a POST instead of a GET when the
24 *data* parameter is provided. *data* should be a buffer in the standard
25 :mimetype:`application/x-www-form-urlencoded` format. The
26 :func:`urllib.urlencode` function takes a mapping or sequence of 2-tuples and
27 returns a string in this format.
28
29 The optional *timeout* parameter specifies a timeout in seconds for the
30 connection attempt (if not specified, or passed as None, the global default
31 timeout setting will be used). This actually only work for HTTP, HTTPS, FTP and
32 FTPS connections.
33
34 This function returns a file-like object with two additional methods:
35
36 * :meth:`geturl` --- return the URL of the resource retrieved
37
38 * :meth:`info` --- return the meta-information of the page, as a dictionary-like
39 object
40
41 Raises :exc:`URLError` on errors.
42
43 Note that ``None`` may be returned if no handler handles the request (though the
44 default installed global :class:`OpenerDirector` uses :class:`UnknownHandler` to
45 ensure this never happens).
46
47 .. versionchanged:: 2.6
48 *timeout* was added.
49
50
51.. function:: install_opener(opener)
52
53 Install an :class:`OpenerDirector` instance as the default global opener.
54 Installing an opener is only necessary if you want urlopen to use that opener;
55 otherwise, simply call :meth:`OpenerDirector.open` instead of :func:`urlopen`.
56 The code does not check for a real :class:`OpenerDirector`, and any class with
57 the appropriate interface will work.
58
59
60.. function:: build_opener([handler, ...])
61
62 Return an :class:`OpenerDirector` instance, which chains the handlers in the
63 order given. *handler*\s can be either instances of :class:`BaseHandler`, or
64 subclasses of :class:`BaseHandler` (in which case it must be possible to call
65 the constructor without any parameters). Instances of the following classes
66 will be in front of the *handler*\s, unless the *handler*\s contain them,
67 instances of them or subclasses of them: :class:`ProxyHandler`,
68 :class:`UnknownHandler`, :class:`HTTPHandler`, :class:`HTTPDefaultErrorHandler`,
69 :class:`HTTPRedirectHandler`, :class:`FTPHandler`, :class:`FileHandler`,
70 :class:`HTTPErrorProcessor`.
71
Thomas Woutersed03b412007-08-28 21:37:11 +000072 If the Python installation has SSL support (i.e., if the :mod:`ssl` module can be imported),
Georg Brandl116aa622007-08-15 14:28:22 +000073 :class:`HTTPSHandler` will also be added.
74
75 Beginning in Python 2.3, a :class:`BaseHandler` subclass may also change its
76 :attr:`handler_order` member variable to modify its position in the handlers
77 list.
78
79The following exceptions are raised as appropriate:
80
81
82.. exception:: URLError
83
84 The handlers raise this exception (or derived exceptions) when they run into a
85 problem. It is a subclass of :exc:`IOError`.
86
87
88.. exception:: HTTPError
89
90 A subclass of :exc:`URLError`, it can also function as a non-exceptional
91 file-like return value (the same thing that :func:`urlopen` returns). This
92 is useful when handling exotic HTTP errors, such as requests for
93 authentication.
94
95The following classes are provided:
96
97
98.. class:: Request(url[, data][, headers] [, origin_req_host][, unverifiable])
99
100 This class is an abstraction of a URL request.
101
102 *url* should be a string containing a valid URL.
103
104 *data* may be a string specifying additional data to send to the server, or
105 ``None`` if no such data is needed. Currently HTTP requests are the only ones
106 that use *data*; the HTTP request will be a POST instead of a GET when the
107 *data* parameter is provided. *data* should be a buffer in the standard
108 :mimetype:`application/x-www-form-urlencoded` format. The
109 :func:`urllib.urlencode` function takes a mapping or sequence of 2-tuples and
110 returns a string in this format.
111
112 *headers* should be a dictionary, and will be treated as if :meth:`add_header`
113 was called with each key and value as arguments.
114
115 The final two arguments are only of interest for correct handling of third-party
116 HTTP cookies:
117
118 *origin_req_host* should be the request-host of the origin transaction, as
119 defined by :rfc:`2965`. It defaults to ``cookielib.request_host(self)``. This
120 is the host name or IP address of the original request that was initiated by the
121 user. For example, if the request is for an image in an HTML document, this
122 should be the request-host of the request for the page containing the image.
123
124 *unverifiable* should indicate whether the request is unverifiable, as defined
125 by RFC 2965. It defaults to False. An unverifiable request is one whose URL
126 the user did not have the option to approve. For example, if the request is for
127 an image in an HTML document, and the user had no option to approve the
128 automatic fetching of the image, this should be true.
129
130
131.. class:: OpenerDirector()
132
133 The :class:`OpenerDirector` class opens URLs via :class:`BaseHandler`\ s chained
134 together. It manages the chaining of handlers, and recovery from errors.
135
136
137.. class:: BaseHandler()
138
139 This is the base class for all registered handlers --- and handles only the
140 simple mechanics of registration.
141
142
143.. class:: HTTPDefaultErrorHandler()
144
145 A class which defines a default handler for HTTP error responses; all responses
146 are turned into :exc:`HTTPError` exceptions.
147
148
149.. class:: HTTPRedirectHandler()
150
151 A class to handle redirections.
152
153
154.. class:: HTTPCookieProcessor([cookiejar])
155
156 A class to handle HTTP Cookies.
157
158
159.. class:: ProxyHandler([proxies])
160
161 Cause requests to go through a proxy. If *proxies* is given, it must be a
162 dictionary mapping protocol names to URLs of proxies. The default is to read the
163 list of proxies from the environment variables :envvar:`<protocol>_proxy`.
164
165
166.. class:: HTTPPasswordMgr()
167
168 Keep a database of ``(realm, uri) -> (user, password)`` mappings.
169
170
171.. class:: HTTPPasswordMgrWithDefaultRealm()
172
173 Keep a database of ``(realm, uri) -> (user, password)`` mappings. A realm of
174 ``None`` is considered a catch-all realm, which is searched if no other realm
175 fits.
176
177
178.. class:: AbstractBasicAuthHandler([password_mgr])
179
180 This is a mixin class that helps with HTTP authentication, both to the remote
181 host and to a proxy. *password_mgr*, if given, should be something that is
182 compatible with :class:`HTTPPasswordMgr`; refer to section
183 :ref:`http-password-mgr` for information on the interface that must be
184 supported.
185
186
187.. class:: HTTPBasicAuthHandler([password_mgr])
188
189 Handle authentication with the remote host. *password_mgr*, if given, should be
190 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
191 :ref:`http-password-mgr` for information on the interface that must be
192 supported.
193
194
195.. class:: ProxyBasicAuthHandler([password_mgr])
196
197 Handle authentication with the proxy. *password_mgr*, if given, should be
198 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
199 :ref:`http-password-mgr` for information on the interface that must be
200 supported.
201
202
203.. class:: AbstractDigestAuthHandler([password_mgr])
204
205 This is a mixin class that helps with HTTP authentication, both to the remote
206 host and to a proxy. *password_mgr*, if given, should be something that is
207 compatible with :class:`HTTPPasswordMgr`; refer to section
208 :ref:`http-password-mgr` for information on the interface that must be
209 supported.
210
211
212.. class:: HTTPDigestAuthHandler([password_mgr])
213
214 Handle authentication with the remote host. *password_mgr*, if given, should be
215 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
216 :ref:`http-password-mgr` for information on the interface that must be
217 supported.
218
219
220.. class:: ProxyDigestAuthHandler([password_mgr])
221
222 Handle authentication with the proxy. *password_mgr*, if given, should be
223 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
224 :ref:`http-password-mgr` for information on the interface that must be
225 supported.
226
227
228.. class:: HTTPHandler()
229
230 A class to handle opening of HTTP URLs.
231
232
233.. class:: HTTPSHandler()
234
235 A class to handle opening of HTTPS URLs.
236
237
238.. class:: FileHandler()
239
240 Open local files.
241
242
243.. class:: FTPHandler()
244
245 Open FTP URLs.
246
247
248.. class:: CacheFTPHandler()
249
250 Open FTP URLs, keeping a cache of open FTP connections to minimize delays.
251
252
253.. class:: UnknownHandler()
254
255 A catch-all class to handle unknown URLs.
256
257
258.. _request-objects:
259
260Request Objects
261---------------
262
263The following methods describe all of :class:`Request`'s public interface, and
264so all must be overridden in subclasses.
265
266
267.. method:: Request.add_data(data)
268
269 Set the :class:`Request` data to *data*. This is ignored by all handlers except
270 HTTP handlers --- and there it should be a byte string, and will change the
271 request to be ``POST`` rather than ``GET``.
272
273
274.. method:: Request.get_method()
275
276 Return a string indicating the HTTP request method. This is only meaningful for
277 HTTP requests, and currently always returns ``'GET'`` or ``'POST'``.
278
279
280.. method:: Request.has_data()
281
282 Return whether the instance has a non-\ ``None`` data.
283
284
285.. method:: Request.get_data()
286
287 Return the instance's data.
288
289
290.. method:: Request.add_header(key, val)
291
292 Add another header to the request. Headers are currently ignored by all
293 handlers except HTTP handlers, where they are added to the list of headers sent
294 to the server. Note that there cannot be more than one header with the same
295 name, and later calls will overwrite previous calls in case the *key* collides.
296 Currently, this is no loss of HTTP functionality, since all headers which have
297 meaning when used more than once have a (header-specific) way of gaining the
298 same functionality using only one header.
299
300
301.. method:: Request.add_unredirected_header(key, header)
302
303 Add a header that will not be added to a redirected request.
304
305 .. versionadded:: 2.4
306
307
308.. method:: Request.has_header(header)
309
310 Return whether the instance has the named header (checks both regular and
311 unredirected).
312
313 .. versionadded:: 2.4
314
315
316.. method:: Request.get_full_url()
317
318 Return the URL given in the constructor.
319
320
321.. method:: Request.get_type()
322
323 Return the type of the URL --- also known as the scheme.
324
325
326.. method:: Request.get_host()
327
328 Return the host to which a connection will be made.
329
330
331.. method:: Request.get_selector()
332
333 Return the selector --- the part of the URL that is sent to the server.
334
335
336.. method:: Request.set_proxy(host, type)
337
338 Prepare the request by connecting to a proxy server. The *host* and *type* will
339 replace those of the instance, and the instance's selector will be the original
340 URL given in the constructor.
341
342
343.. method:: Request.get_origin_req_host()
344
345 Return the request-host of the origin transaction, as defined by :rfc:`2965`.
346 See the documentation for the :class:`Request` constructor.
347
348
349.. method:: Request.is_unverifiable()
350
351 Return whether the request is unverifiable, as defined by RFC 2965. See the
352 documentation for the :class:`Request` constructor.
353
354
355.. _opener-director-objects:
356
357OpenerDirector Objects
358----------------------
359
360:class:`OpenerDirector` instances have the following methods:
361
362
363.. method:: OpenerDirector.add_handler(handler)
364
365 *handler* should be an instance of :class:`BaseHandler`. The following methods
366 are searched, and added to the possible chains (note that HTTP errors are a
367 special case).
368
369 * :meth:`protocol_open` --- signal that the handler knows how to open *protocol*
370 URLs.
371
372 * :meth:`http_error_type` --- signal that the handler knows how to handle HTTP
373 errors with HTTP error code *type*.
374
375 * :meth:`protocol_error` --- signal that the handler knows how to handle errors
376 from (non-\ ``http``) *protocol*.
377
378 * :meth:`protocol_request` --- signal that the handler knows how to pre-process
379 *protocol* requests.
380
381 * :meth:`protocol_response` --- signal that the handler knows how to
382 post-process *protocol* responses.
383
384
385.. method:: OpenerDirector.open(url[, data][, timeout])
386
387 Open the given *url* (which can be a request object or a string), optionally
388 passing the given *data*. Arguments, return values and exceptions raised are the
389 same as those of :func:`urlopen` (which simply calls the :meth:`open` method on
390 the currently installed global :class:`OpenerDirector`). The optional *timeout*
391 parameter specifies a timeout in seconds for the connection attempt (if not
392 specified, or passed as None, the global default timeout setting will be used;
393 this actually only work for HTTP, HTTPS, FTP and FTPS connections).
394
395 .. versionchanged:: 2.6
396 *timeout* was added.
397
398
399.. method:: OpenerDirector.error(proto[, arg[, ...]])
400
401 Handle an error of the given protocol. This will call the registered error
402 handlers for the given protocol with the given arguments (which are protocol
403 specific). The HTTP protocol is a special case which uses the HTTP response
404 code to determine the specific error handler; refer to the :meth:`http_error_\*`
405 methods of the handler classes.
406
407 Return values and exceptions raised are the same as those of :func:`urlopen`.
408
409OpenerDirector objects open URLs in three stages:
410
411The order in which these methods are called within each stage is determined by
412sorting the handler instances.
413
414#. Every handler with a method named like :meth:`protocol_request` has that
415 method called to pre-process the request.
416
417#. Handlers with a method named like :meth:`protocol_open` are called to handle
418 the request. This stage ends when a handler either returns a non-\ :const:`None`
419 value (ie. a response), or raises an exception (usually :exc:`URLError`).
420 Exceptions are allowed to propagate.
421
422 In fact, the above algorithm is first tried for methods named
423 :meth:`default_open`. If all such methods return :const:`None`, the algorithm
424 is repeated for methods named like :meth:`protocol_open`. If all such methods
425 return :const:`None`, the algorithm is repeated for methods named
426 :meth:`unknown_open`.
427
428 Note that the implementation of these methods may involve calls of the parent
429 :class:`OpenerDirector` instance's :meth:`.open` and :meth:`.error` methods.
430
431#. Every handler with a method named like :meth:`protocol_response` has that
432 method called to post-process the response.
433
434
435.. _base-handler-objects:
436
437BaseHandler Objects
438-------------------
439
440:class:`BaseHandler` objects provide a couple of methods that are directly
441useful, and others that are meant to be used by derived classes. These are
442intended for direct use:
443
444
445.. method:: BaseHandler.add_parent(director)
446
447 Add a director as parent.
448
449
450.. method:: BaseHandler.close()
451
452 Remove any parents.
453
454The following members and methods should only be used by classes derived from
455:class:`BaseHandler`.
456
457.. note::
458
459 The convention has been adopted that subclasses defining
460 :meth:`protocol_request` or :meth:`protocol_response` methods are named
461 :class:`\*Processor`; all others are named :class:`\*Handler`.
462
463
464.. attribute:: BaseHandler.parent
465
466 A valid :class:`OpenerDirector`, which can be used to open using a different
467 protocol, or handle errors.
468
469
470.. method:: BaseHandler.default_open(req)
471
472 This method is *not* defined in :class:`BaseHandler`, but subclasses should
473 define it if they want to catch all URLs.
474
475 This method, if implemented, will be called by the parent
476 :class:`OpenerDirector`. It should return a file-like object as described in
477 the return value of the :meth:`open` of :class:`OpenerDirector`, or ``None``.
478 It should raise :exc:`URLError`, unless a truly exceptional thing happens (for
479 example, :exc:`MemoryError` should not be mapped to :exc:`URLError`).
480
481 This method will be called before any protocol-specific open method.
482
483
484.. method:: BaseHandler.protocol_open(req)
485 :noindex:
486
487 This method is *not* defined in :class:`BaseHandler`, but subclasses should
488 define it if they want to handle URLs with the given protocol.
489
490 This method, if defined, will be called by the parent :class:`OpenerDirector`.
491 Return values should be the same as for :meth:`default_open`.
492
493
494.. method:: BaseHandler.unknown_open(req)
495
496 This method is *not* defined in :class:`BaseHandler`, but subclasses should
497 define it if they want to catch all URLs with no specific registered handler to
498 open it.
499
500 This method, if implemented, will be called by the :attr:`parent`
501 :class:`OpenerDirector`. Return values should be the same as for
502 :meth:`default_open`.
503
504
505.. method:: BaseHandler.http_error_default(req, fp, code, msg, hdrs)
506
507 This method is *not* defined in :class:`BaseHandler`, but subclasses should
508 override it if they intend to provide a catch-all for otherwise unhandled HTTP
509 errors. It will be called automatically by the :class:`OpenerDirector` getting
510 the error, and should not normally be called in other circumstances.
511
512 *req* will be a :class:`Request` object, *fp* will be a file-like object with
513 the HTTP error body, *code* will be the three-digit code of the error, *msg*
514 will be the user-visible explanation of the code and *hdrs* will be a mapping
515 object with the headers of the error.
516
517 Return values and exceptions raised should be the same as those of
518 :func:`urlopen`.
519
520
521.. method:: BaseHandler.http_error_nnn(req, fp, code, msg, hdrs)
522
523 *nnn* should be a three-digit HTTP error code. This method is also not defined
524 in :class:`BaseHandler`, but will be called, if it exists, on an instance of a
525 subclass, when an HTTP error with code *nnn* occurs.
526
527 Subclasses should override this method to handle specific HTTP errors.
528
529 Arguments, return values and exceptions raised should be the same as for
530 :meth:`http_error_default`.
531
532
533.. method:: BaseHandler.protocol_request(req)
534 :noindex:
535
536 This method is *not* defined in :class:`BaseHandler`, but subclasses should
537 define it if they want to pre-process requests of the given protocol.
538
539 This method, if defined, will be called by the parent :class:`OpenerDirector`.
540 *req* will be a :class:`Request` object. The return value should be a
541 :class:`Request` object.
542
543
544.. method:: BaseHandler.protocol_response(req, response)
545 :noindex:
546
547 This method is *not* defined in :class:`BaseHandler`, but subclasses should
548 define it if they want to post-process responses of the given protocol.
549
550 This method, if defined, will be called by the parent :class:`OpenerDirector`.
551 *req* will be a :class:`Request` object. *response* will be an object
552 implementing the same interface as the return value of :func:`urlopen`. The
553 return value should implement the same interface as the return value of
554 :func:`urlopen`.
555
556
557.. _http-redirect-handler:
558
559HTTPRedirectHandler Objects
560---------------------------
561
562.. note::
563
564 Some HTTP redirections require action from this module's client code. If this
565 is the case, :exc:`HTTPError` is raised. See :rfc:`2616` for details of the
566 precise meanings of the various redirection codes.
567
568
569.. method:: HTTPRedirectHandler.redirect_request(req, fp, code, msg, hdrs)
570
571 Return a :class:`Request` or ``None`` in response to a redirect. This is called
572 by the default implementations of the :meth:`http_error_30\*` methods when a
573 redirection is received from the server. If a redirection should take place,
574 return a new :class:`Request` to allow :meth:`http_error_30\*` to perform the
575 redirect. Otherwise, raise :exc:`HTTPError` if no other handler should try to
576 handle this URL, or return ``None`` if you can't but another handler might.
577
578 .. note::
579
580 The default implementation of this method does not strictly follow :rfc:`2616`,
581 which says that 301 and 302 responses to ``POST`` requests must not be
582 automatically redirected without confirmation by the user. In reality, browsers
583 do allow automatic redirection of these responses, changing the POST to a
584 ``GET``, and the default implementation reproduces this behavior.
585
586
587.. method:: HTTPRedirectHandler.http_error_301(req, fp, code, msg, hdrs)
588
589 Redirect to the ``Location:`` URL. This method is called by the parent
590 :class:`OpenerDirector` when getting an HTTP 'moved permanently' response.
591
592
593.. method:: HTTPRedirectHandler.http_error_302(req, fp, code, msg, hdrs)
594
595 The same as :meth:`http_error_301`, but called for the 'found' response.
596
597
598.. method:: HTTPRedirectHandler.http_error_303(req, fp, code, msg, hdrs)
599
600 The same as :meth:`http_error_301`, but called for the 'see other' response.
601
602
603.. method:: HTTPRedirectHandler.http_error_307(req, fp, code, msg, hdrs)
604
605 The same as :meth:`http_error_301`, but called for the 'temporary redirect'
606 response.
607
608
609.. _http-cookie-processor:
610
611HTTPCookieProcessor Objects
612---------------------------
613
614.. versionadded:: 2.4
615
616:class:`HTTPCookieProcessor` instances have one attribute:
617
618
619.. attribute:: HTTPCookieProcessor.cookiejar
620
621 The :class:`cookielib.CookieJar` in which cookies are stored.
622
623
624.. _proxy-handler:
625
626ProxyHandler Objects
627--------------------
628
629
630.. method:: ProxyHandler.protocol_open(request)
631 :noindex:
632
633 The :class:`ProxyHandler` will have a method :meth:`protocol_open` for every
634 *protocol* which has a proxy in the *proxies* dictionary given in the
635 constructor. The method will modify requests to go through the proxy, by
636 calling ``request.set_proxy()``, and call the next handler in the chain to
637 actually execute the protocol.
638
639
640.. _http-password-mgr:
641
642HTTPPasswordMgr Objects
643-----------------------
644
645These methods are available on :class:`HTTPPasswordMgr` and
646:class:`HTTPPasswordMgrWithDefaultRealm` objects.
647
648
649.. method:: HTTPPasswordMgr.add_password(realm, uri, user, passwd)
650
651 *uri* can be either a single URI, or a sequence of URIs. *realm*, *user* and
652 *passwd* must be strings. This causes ``(user, passwd)`` to be used as
653 authentication tokens when authentication for *realm* and a super-URI of any of
654 the given URIs is given.
655
656
657.. method:: HTTPPasswordMgr.find_user_password(realm, authuri)
658
659 Get user/password for given realm and URI, if any. This method will return
660 ``(None, None)`` if there is no matching user/password.
661
662 For :class:`HTTPPasswordMgrWithDefaultRealm` objects, the realm ``None`` will be
663 searched if the given *realm* has no matching user/password.
664
665
666.. _abstract-basic-auth-handler:
667
668AbstractBasicAuthHandler Objects
669--------------------------------
670
671
672.. method:: AbstractBasicAuthHandler.http_error_auth_reqed(authreq, host, req, headers)
673
674 Handle an authentication request by getting a user/password pair, and re-trying
675 the request. *authreq* should be the name of the header where the information
676 about the realm is included in the request, *host* specifies the URL and path to
677 authenticate for, *req* should be the (failed) :class:`Request` object, and
678 *headers* should be the error headers.
679
680 *host* is either an authority (e.g. ``"python.org"``) or a URL containing an
681 authority component (e.g. ``"http://python.org/"``). In either case, the
682 authority must not contain a userinfo component (so, ``"python.org"`` and
683 ``"python.org:80"`` are fine, ``"joe:password@python.org"`` is not).
684
685
686.. _http-basic-auth-handler:
687
688HTTPBasicAuthHandler Objects
689----------------------------
690
691
692.. method:: HTTPBasicAuthHandler.http_error_401(req, fp, code, msg, hdrs)
693
694 Retry the request with authentication information, if available.
695
696
697.. _proxy-basic-auth-handler:
698
699ProxyBasicAuthHandler Objects
700-----------------------------
701
702
703.. method:: ProxyBasicAuthHandler.http_error_407(req, fp, code, msg, hdrs)
704
705 Retry the request with authentication information, if available.
706
707
708.. _abstract-digest-auth-handler:
709
710AbstractDigestAuthHandler Objects
711---------------------------------
712
713
714.. method:: AbstractDigestAuthHandler.http_error_auth_reqed(authreq, host, req, headers)
715
716 *authreq* should be the name of the header where the information about the realm
717 is included in the request, *host* should be the host to authenticate to, *req*
718 should be the (failed) :class:`Request` object, and *headers* should be the
719 error headers.
720
721
722.. _http-digest-auth-handler:
723
724HTTPDigestAuthHandler Objects
725-----------------------------
726
727
728.. method:: HTTPDigestAuthHandler.http_error_401(req, fp, code, msg, hdrs)
729
730 Retry the request with authentication information, if available.
731
732
733.. _proxy-digest-auth-handler:
734
735ProxyDigestAuthHandler Objects
736------------------------------
737
738
739.. method:: ProxyDigestAuthHandler.http_error_407(req, fp, code, msg, hdrs)
740
741 Retry the request with authentication information, if available.
742
743
744.. _http-handler-objects:
745
746HTTPHandler Objects
747-------------------
748
749
750.. method:: HTTPHandler.http_open(req)
751
752 Send an HTTP request, which can be either GET or POST, depending on
753 ``req.has_data()``.
754
755
756.. _https-handler-objects:
757
758HTTPSHandler Objects
759--------------------
760
761
762.. method:: HTTPSHandler.https_open(req)
763
764 Send an HTTPS request, which can be either GET or POST, depending on
765 ``req.has_data()``.
766
767
768.. _file-handler-objects:
769
770FileHandler Objects
771-------------------
772
773
774.. method:: FileHandler.file_open(req)
775
776 Open the file locally, if there is no host name, or the host name is
777 ``'localhost'``. Change the protocol to ``ftp`` otherwise, and retry opening it
778 using :attr:`parent`.
779
780
781.. _ftp-handler-objects:
782
783FTPHandler Objects
784------------------
785
786
787.. method:: FTPHandler.ftp_open(req)
788
789 Open the FTP file indicated by *req*. The login is always done with empty
790 username and password.
791
792
793.. _cacheftp-handler-objects:
794
795CacheFTPHandler Objects
796-----------------------
797
798:class:`CacheFTPHandler` objects are :class:`FTPHandler` objects with the
799following additional methods:
800
801
802.. method:: CacheFTPHandler.setTimeout(t)
803
804 Set timeout of connections to *t* seconds.
805
806
807.. method:: CacheFTPHandler.setMaxConns(m)
808
809 Set maximum number of cached connections to *m*.
810
811
812.. _unknown-handler-objects:
813
814UnknownHandler Objects
815----------------------
816
817
818.. method:: UnknownHandler.unknown_open()
819
820 Raise a :exc:`URLError` exception.
821
822
823.. _http-error-processor-objects:
824
825HTTPErrorProcessor Objects
826--------------------------
827
828.. versionadded:: 2.4
829
830
831.. method:: HTTPErrorProcessor.unknown_open()
832
833 Process HTTP error responses.
834
835 For 200 error codes, the response object is returned immediately.
836
837 For non-200 error codes, this simply passes the job on to the
838 :meth:`protocol_error_code` handler methods, via :meth:`OpenerDirector.error`.
839 Eventually, :class:`urllib2.HTTPDefaultErrorHandler` will raise an
840 :exc:`HTTPError` if no other handler handles the error.
841
842
843.. _urllib2-examples:
844
845Examples
846--------
847
848This example gets the python.org main page and displays the first 100 bytes of
849it::
850
851 >>> import urllib2
852 >>> f = urllib2.urlopen('http://www.python.org/')
853 >>> print f.read(100)
854 <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
855 <?xml-stylesheet href="./css/ht2html
856
857Here we are sending a data-stream to the stdin of a CGI and reading the data it
858returns to us. Note that this example will only work when the Python
859installation supports SSL. ::
860
861 >>> import urllib2
862 >>> req = urllib2.Request(url='https://localhost/cgi-bin/test.cgi',
863 ... data='This data is passed to stdin of the CGI')
864 >>> f = urllib2.urlopen(req)
865 >>> print f.read()
866 Got Data: "This data is passed to stdin of the CGI"
867
868The code for the sample CGI used in the above example is::
869
870 #!/usr/bin/env python
871 import sys
872 data = sys.stdin.read()
873 print 'Content-type: text-plain\n\nGot Data: "%s"' % data
874
875Use of Basic HTTP Authentication::
876
877 import urllib2
878 # Create an OpenerDirector with support for Basic HTTP Authentication...
879 auth_handler = urllib2.HTTPBasicAuthHandler()
880 auth_handler.add_password(realm='PDQ Application',
881 uri='https://mahler:8092/site-updates.py',
882 user='klem',
883 passwd='kadidd!ehopper')
884 opener = urllib2.build_opener(auth_handler)
885 # ...and install it globally so it can be used with urlopen.
886 urllib2.install_opener(opener)
887 urllib2.urlopen('http://www.example.com/login.html')
888
889:func:`build_opener` provides many handlers by default, including a
890:class:`ProxyHandler`. By default, :class:`ProxyHandler` uses the environment
891variables named ``<scheme>_proxy``, where ``<scheme>`` is the URL scheme
892involved. For example, the :envvar:`http_proxy` environment variable is read to
893obtain the HTTP proxy's URL.
894
895This example replaces the default :class:`ProxyHandler` with one that uses
896programatically-supplied proxy URLs, and adds proxy authorization support with
897:class:`ProxyBasicAuthHandler`. ::
898
899 proxy_handler = urllib2.ProxyHandler({'http': 'http://www.example.com:3128/'})
900 proxy_auth_handler = urllib2.HTTPBasicAuthHandler()
901 proxy_auth_handler.add_password('realm', 'host', 'username', 'password')
902
903 opener = build_opener(proxy_handler, proxy_auth_handler)
904 # This time, rather than install the OpenerDirector, we use it directly:
905 opener.open('http://www.example.com/login.html')
906
907Adding HTTP headers:
908
909Use the *headers* argument to the :class:`Request` constructor, or::
910
911 import urllib2
912 req = urllib2.Request('http://www.example.com/')
913 req.add_header('Referer', 'http://www.python.org/')
914 r = urllib2.urlopen(req)
915
916:class:`OpenerDirector` automatically adds a :mailheader:`User-Agent` header to
917every :class:`Request`. To change this::
918
919 import urllib2
920 opener = urllib2.build_opener()
921 opener.addheaders = [('User-agent', 'Mozilla/5.0')]
922 opener.open('http://www.example.com/')
923
924Also, remember that a few standard headers (:mailheader:`Content-Length`,
925:mailheader:`Content-Type` and :mailheader:`Host`) are added when the
926:class:`Request` is passed to :func:`urlopen` (or :meth:`OpenerDirector.open`).
927