blob: dad03401bfe9f24943d374b6a23d1b0239c7d30a [file] [log] [blame]
Georg Brandl8ec7f652007-08-15 14:28:01 +00001:mod:`urllib2` --- extensible library for opening URLs
2======================================================
3
4.. module:: urllib2
5 :synopsis: Next generation URL opening library.
6.. moduleauthor:: Jeremy Hylton <jhylton@users.sourceforge.net>
7.. sectionauthor:: Moshe Zadka <moshez@users.sourceforge.net>
8
9
10The :mod:`urllib2` module defines functions and classes which help in opening
11URLs (mostly HTTP) in a complex world --- basic and digest authentication,
12redirections, cookies and more.
13
14The :mod:`urllib2` module defines the following functions:
15
16
17.. function:: urlopen(url[, data][, timeout])
18
19 Open the URL *url*, which can be either a string or a :class:`Request` object.
20
21 *data* may be a string specifying additional data to send to the server, or
22 ``None`` if no such data is needed. Currently HTTP requests are the only ones
23 that use *data*; the HTTP request will be a POST instead of a GET when the
24 *data* parameter is provided. *data* should be a buffer in the standard
25 :mimetype:`application/x-www-form-urlencoded` format. The
26 :func:`urllib.urlencode` function takes a mapping or sequence of 2-tuples and
27 returns a string in this format.
28
29 The optional *timeout* parameter specifies a timeout in seconds for the
30 connection attempt (if not specified, or passed as None, the global default
31 timeout setting will be used). This actually only work for HTTP, HTTPS, FTP and
32 FTPS connections.
33
34 This function returns a file-like object with two additional methods:
35
Georg Brandl586a57a2008-02-02 09:56:20 +000036 * :meth:`geturl` --- return the URL of the resource retrieved, commonly used to
37 determine if a redirect was followed
Georg Brandl8ec7f652007-08-15 14:28:01 +000038
Georg Brandl586a57a2008-02-02 09:56:20 +000039 * :meth:`info` --- return the meta-information of the page, such as headers, in
40 the form of an ``httplib.HTTPMessage`` instance
41 (see `Quick Reference to HTTP Headers <http://www.cs.tut.fi/~jkorpela/http.html>`_)
Georg Brandl8ec7f652007-08-15 14:28:01 +000042
43 Raises :exc:`URLError` on errors.
44
45 Note that ``None`` may be returned if no handler handles the request (though the
46 default installed global :class:`OpenerDirector` uses :class:`UnknownHandler` to
47 ensure this never happens).
48
49 .. versionchanged:: 2.6
50 *timeout* was added.
51
52
53.. function:: install_opener(opener)
54
55 Install an :class:`OpenerDirector` instance as the default global opener.
56 Installing an opener is only necessary if you want urlopen to use that opener;
57 otherwise, simply call :meth:`OpenerDirector.open` instead of :func:`urlopen`.
58 The code does not check for a real :class:`OpenerDirector`, and any class with
59 the appropriate interface will work.
60
61
62.. function:: build_opener([handler, ...])
63
64 Return an :class:`OpenerDirector` instance, which chains the handlers in the
65 order given. *handler*\s can be either instances of :class:`BaseHandler`, or
66 subclasses of :class:`BaseHandler` (in which case it must be possible to call
67 the constructor without any parameters). Instances of the following classes
68 will be in front of the *handler*\s, unless the *handler*\s contain them,
69 instances of them or subclasses of them: :class:`ProxyHandler`,
70 :class:`UnknownHandler`, :class:`HTTPHandler`, :class:`HTTPDefaultErrorHandler`,
71 :class:`HTTPRedirectHandler`, :class:`FTPHandler`, :class:`FileHandler`,
72 :class:`HTTPErrorProcessor`.
73
Guido van Rossum8ee23bb2007-08-27 19:11:11 +000074 If the Python installation has SSL support (i.e., if the :mod:`ssl` module can be imported),
Georg Brandl8ec7f652007-08-15 14:28:01 +000075 :class:`HTTPSHandler` will also be added.
76
77 Beginning in Python 2.3, a :class:`BaseHandler` subclass may also change its
78 :attr:`handler_order` member variable to modify its position in the handlers
79 list.
80
81The following exceptions are raised as appropriate:
82
83
84.. exception:: URLError
85
86 The handlers raise this exception (or derived exceptions) when they run into a
87 problem. It is a subclass of :exc:`IOError`.
88
Georg Brandl586a57a2008-02-02 09:56:20 +000089 .. attribute:: reason
90
91 The reason for this error. It can be a message string or another exception
92 instance (:exc:`socket.error` for remote URLs, :exc:`OSError` for local
93 URLs).
94
Georg Brandl8ec7f652007-08-15 14:28:01 +000095
96.. exception:: HTTPError
97
Georg Brandl586a57a2008-02-02 09:56:20 +000098 Though being an exception (a subclass of :exc:`URLError`), an :exc:`HTTPError`
99 can also function as a non-exceptional file-like return value (the same thing
100 that :func:`urlopen` returns). This is useful when handling exotic HTTP
101 errors, such as requests for authentication.
102
103 .. attribute:: code
104
105 An HTTP status code as defined in `RFC 2616 <http://www.faqs.org/rfcs/rfc2616.html>`_.
106 This numeric value corresponds to a value found in the dictionary of
107 codes as found in :attr:`BaseHTTPServer.BaseHTTPRequestHandler.responses`.
108
109
Georg Brandl8ec7f652007-08-15 14:28:01 +0000110
111The following classes are provided:
112
113
Georg Brandl586a57a2008-02-02 09:56:20 +0000114.. class:: Request(url[, data][, headers][, origin_req_host][, unverifiable])
Georg Brandl8ec7f652007-08-15 14:28:01 +0000115
116 This class is an abstraction of a URL request.
117
118 *url* should be a string containing a valid URL.
119
120 *data* may be a string specifying additional data to send to the server, or
121 ``None`` if no such data is needed. Currently HTTP requests are the only ones
122 that use *data*; the HTTP request will be a POST instead of a GET when the
123 *data* parameter is provided. *data* should be a buffer in the standard
124 :mimetype:`application/x-www-form-urlencoded` format. The
125 :func:`urllib.urlencode` function takes a mapping or sequence of 2-tuples and
126 returns a string in this format.
127
128 *headers* should be a dictionary, and will be treated as if :meth:`add_header`
Georg Brandl586a57a2008-02-02 09:56:20 +0000129 was called with each key and value as arguments. This is often used to "spoof"
130 the ``User-Agent`` header, which is used by a browser to identify itself --
131 some HTTP servers only allow requests coming from common browsers as opposed
132 to scripts. For example, Mozilla Firefox may identify itself as ``"Mozilla/5.0
133 (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11"``, while :mod:`urllib2`'s
134 default user agent string is ``"Python-urllib/2.6"`` (on Python 2.6).
Georg Brandl8ec7f652007-08-15 14:28:01 +0000135
136 The final two arguments are only of interest for correct handling of third-party
137 HTTP cookies:
138
139 *origin_req_host* should be the request-host of the origin transaction, as
140 defined by :rfc:`2965`. It defaults to ``cookielib.request_host(self)``. This
141 is the host name or IP address of the original request that was initiated by the
142 user. For example, if the request is for an image in an HTML document, this
143 should be the request-host of the request for the page containing the image.
144
145 *unverifiable* should indicate whether the request is unverifiable, as defined
146 by RFC 2965. It defaults to False. An unverifiable request is one whose URL
147 the user did not have the option to approve. For example, if the request is for
148 an image in an HTML document, and the user had no option to approve the
149 automatic fetching of the image, this should be true.
150
151
152.. class:: OpenerDirector()
153
154 The :class:`OpenerDirector` class opens URLs via :class:`BaseHandler`\ s chained
155 together. It manages the chaining of handlers, and recovery from errors.
156
157
158.. class:: BaseHandler()
159
160 This is the base class for all registered handlers --- and handles only the
161 simple mechanics of registration.
162
163
164.. class:: HTTPDefaultErrorHandler()
165
166 A class which defines a default handler for HTTP error responses; all responses
167 are turned into :exc:`HTTPError` exceptions.
168
169
170.. class:: HTTPRedirectHandler()
171
172 A class to handle redirections.
173
174
175.. class:: HTTPCookieProcessor([cookiejar])
176
177 A class to handle HTTP Cookies.
178
179
180.. class:: ProxyHandler([proxies])
181
182 Cause requests to go through a proxy. If *proxies* is given, it must be a
183 dictionary mapping protocol names to URLs of proxies. The default is to read the
184 list of proxies from the environment variables :envvar:`<protocol>_proxy`.
Sean Reifscheider45ea86c2008-03-20 03:20:48 +0000185 To disable autodetected proxy pass an empty dictionary.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000186
187
188.. class:: HTTPPasswordMgr()
189
190 Keep a database of ``(realm, uri) -> (user, password)`` mappings.
191
192
193.. class:: HTTPPasswordMgrWithDefaultRealm()
194
195 Keep a database of ``(realm, uri) -> (user, password)`` mappings. A realm of
196 ``None`` is considered a catch-all realm, which is searched if no other realm
197 fits.
198
199
200.. class:: AbstractBasicAuthHandler([password_mgr])
201
202 This is a mixin class that helps with HTTP authentication, both to the remote
203 host and to a proxy. *password_mgr*, if given, should be something that is
204 compatible with :class:`HTTPPasswordMgr`; refer to section
205 :ref:`http-password-mgr` for information on the interface that must be
206 supported.
207
208
209.. class:: HTTPBasicAuthHandler([password_mgr])
210
211 Handle authentication with the remote host. *password_mgr*, if given, should be
212 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
213 :ref:`http-password-mgr` for information on the interface that must be
214 supported.
215
216
217.. class:: ProxyBasicAuthHandler([password_mgr])
218
219 Handle authentication with the proxy. *password_mgr*, if given, should be
220 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
221 :ref:`http-password-mgr` for information on the interface that must be
222 supported.
223
224
225.. class:: AbstractDigestAuthHandler([password_mgr])
226
227 This is a mixin class that helps with HTTP authentication, both to the remote
228 host and to a proxy. *password_mgr*, if given, should be something that is
229 compatible with :class:`HTTPPasswordMgr`; refer to section
230 :ref:`http-password-mgr` for information on the interface that must be
231 supported.
232
233
234.. class:: HTTPDigestAuthHandler([password_mgr])
235
236 Handle authentication with the remote host. *password_mgr*, if given, should be
237 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
238 :ref:`http-password-mgr` for information on the interface that must be
239 supported.
240
241
242.. class:: ProxyDigestAuthHandler([password_mgr])
243
244 Handle authentication with the proxy. *password_mgr*, if given, should be
245 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
246 :ref:`http-password-mgr` for information on the interface that must be
247 supported.
248
249
250.. class:: HTTPHandler()
251
252 A class to handle opening of HTTP URLs.
253
254
255.. class:: HTTPSHandler()
256
257 A class to handle opening of HTTPS URLs.
258
259
260.. class:: FileHandler()
261
262 Open local files.
263
264
265.. class:: FTPHandler()
266
267 Open FTP URLs.
268
269
270.. class:: CacheFTPHandler()
271
272 Open FTP URLs, keeping a cache of open FTP connections to minimize delays.
273
274
275.. class:: UnknownHandler()
276
277 A catch-all class to handle unknown URLs.
278
279
280.. _request-objects:
281
282Request Objects
283---------------
284
285The following methods describe all of :class:`Request`'s public interface, and
286so all must be overridden in subclasses.
287
288
289.. method:: Request.add_data(data)
290
291 Set the :class:`Request` data to *data*. This is ignored by all handlers except
292 HTTP handlers --- and there it should be a byte string, and will change the
293 request to be ``POST`` rather than ``GET``.
294
295
296.. method:: Request.get_method()
297
298 Return a string indicating the HTTP request method. This is only meaningful for
299 HTTP requests, and currently always returns ``'GET'`` or ``'POST'``.
300
301
302.. method:: Request.has_data()
303
304 Return whether the instance has a non-\ ``None`` data.
305
306
307.. method:: Request.get_data()
308
309 Return the instance's data.
310
311
312.. method:: Request.add_header(key, val)
313
314 Add another header to the request. Headers are currently ignored by all
315 handlers except HTTP handlers, where they are added to the list of headers sent
316 to the server. Note that there cannot be more than one header with the same
317 name, and later calls will overwrite previous calls in case the *key* collides.
318 Currently, this is no loss of HTTP functionality, since all headers which have
319 meaning when used more than once have a (header-specific) way of gaining the
320 same functionality using only one header.
321
322
323.. method:: Request.add_unredirected_header(key, header)
324
325 Add a header that will not be added to a redirected request.
326
327 .. versionadded:: 2.4
328
329
330.. method:: Request.has_header(header)
331
332 Return whether the instance has the named header (checks both regular and
333 unredirected).
334
335 .. versionadded:: 2.4
336
337
338.. method:: Request.get_full_url()
339
340 Return the URL given in the constructor.
341
342
343.. method:: Request.get_type()
344
345 Return the type of the URL --- also known as the scheme.
346
347
348.. method:: Request.get_host()
349
350 Return the host to which a connection will be made.
351
352
353.. method:: Request.get_selector()
354
355 Return the selector --- the part of the URL that is sent to the server.
356
357
358.. method:: Request.set_proxy(host, type)
359
360 Prepare the request by connecting to a proxy server. The *host* and *type* will
361 replace those of the instance, and the instance's selector will be the original
362 URL given in the constructor.
363
364
365.. method:: Request.get_origin_req_host()
366
367 Return the request-host of the origin transaction, as defined by :rfc:`2965`.
368 See the documentation for the :class:`Request` constructor.
369
370
371.. method:: Request.is_unverifiable()
372
373 Return whether the request is unverifiable, as defined by RFC 2965. See the
374 documentation for the :class:`Request` constructor.
375
376
377.. _opener-director-objects:
378
379OpenerDirector Objects
380----------------------
381
382:class:`OpenerDirector` instances have the following methods:
383
384
385.. method:: OpenerDirector.add_handler(handler)
386
387 *handler* should be an instance of :class:`BaseHandler`. The following methods
388 are searched, and added to the possible chains (note that HTTP errors are a
389 special case).
390
391 * :meth:`protocol_open` --- signal that the handler knows how to open *protocol*
392 URLs.
393
394 * :meth:`http_error_type` --- signal that the handler knows how to handle HTTP
395 errors with HTTP error code *type*.
396
397 * :meth:`protocol_error` --- signal that the handler knows how to handle errors
398 from (non-\ ``http``) *protocol*.
399
400 * :meth:`protocol_request` --- signal that the handler knows how to pre-process
401 *protocol* requests.
402
403 * :meth:`protocol_response` --- signal that the handler knows how to
404 post-process *protocol* responses.
405
406
407.. method:: OpenerDirector.open(url[, data][, timeout])
408
409 Open the given *url* (which can be a request object or a string), optionally
410 passing the given *data*. Arguments, return values and exceptions raised are the
411 same as those of :func:`urlopen` (which simply calls the :meth:`open` method on
412 the currently installed global :class:`OpenerDirector`). The optional *timeout*
413 parameter specifies a timeout in seconds for the connection attempt (if not
414 specified, or passed as None, the global default timeout setting will be used;
415 this actually only work for HTTP, HTTPS, FTP and FTPS connections).
416
417 .. versionchanged:: 2.6
418 *timeout* was added.
419
420
421.. method:: OpenerDirector.error(proto[, arg[, ...]])
422
423 Handle an error of the given protocol. This will call the registered error
424 handlers for the given protocol with the given arguments (which are protocol
425 specific). The HTTP protocol is a special case which uses the HTTP response
426 code to determine the specific error handler; refer to the :meth:`http_error_\*`
427 methods of the handler classes.
428
429 Return values and exceptions raised are the same as those of :func:`urlopen`.
430
431OpenerDirector objects open URLs in three stages:
432
433The order in which these methods are called within each stage is determined by
434sorting the handler instances.
435
436#. Every handler with a method named like :meth:`protocol_request` has that
437 method called to pre-process the request.
438
439#. Handlers with a method named like :meth:`protocol_open` are called to handle
440 the request. This stage ends when a handler either returns a non-\ :const:`None`
441 value (ie. a response), or raises an exception (usually :exc:`URLError`).
442 Exceptions are allowed to propagate.
443
444 In fact, the above algorithm is first tried for methods named
445 :meth:`default_open`. If all such methods return :const:`None`, the algorithm
446 is repeated for methods named like :meth:`protocol_open`. If all such methods
447 return :const:`None`, the algorithm is repeated for methods named
448 :meth:`unknown_open`.
449
450 Note that the implementation of these methods may involve calls of the parent
451 :class:`OpenerDirector` instance's :meth:`.open` and :meth:`.error` methods.
452
453#. Every handler with a method named like :meth:`protocol_response` has that
454 method called to post-process the response.
455
456
457.. _base-handler-objects:
458
459BaseHandler Objects
460-------------------
461
462:class:`BaseHandler` objects provide a couple of methods that are directly
463useful, and others that are meant to be used by derived classes. These are
464intended for direct use:
465
466
467.. method:: BaseHandler.add_parent(director)
468
469 Add a director as parent.
470
471
472.. method:: BaseHandler.close()
473
474 Remove any parents.
475
476The following members and methods should only be used by classes derived from
477:class:`BaseHandler`.
478
479.. note::
480
481 The convention has been adopted that subclasses defining
482 :meth:`protocol_request` or :meth:`protocol_response` methods are named
483 :class:`\*Processor`; all others are named :class:`\*Handler`.
484
485
486.. attribute:: BaseHandler.parent
487
488 A valid :class:`OpenerDirector`, which can be used to open using a different
489 protocol, or handle errors.
490
491
492.. method:: BaseHandler.default_open(req)
493
494 This method is *not* defined in :class:`BaseHandler`, but subclasses should
495 define it if they want to catch all URLs.
496
497 This method, if implemented, will be called by the parent
498 :class:`OpenerDirector`. It should return a file-like object as described in
499 the return value of the :meth:`open` of :class:`OpenerDirector`, or ``None``.
500 It should raise :exc:`URLError`, unless a truly exceptional thing happens (for
501 example, :exc:`MemoryError` should not be mapped to :exc:`URLError`).
502
503 This method will be called before any protocol-specific open method.
504
505
506.. method:: BaseHandler.protocol_open(req)
507 :noindex:
508
509 This method is *not* defined in :class:`BaseHandler`, but subclasses should
510 define it if they want to handle URLs with the given protocol.
511
512 This method, if defined, will be called by the parent :class:`OpenerDirector`.
513 Return values should be the same as for :meth:`default_open`.
514
515
516.. method:: BaseHandler.unknown_open(req)
517
518 This method is *not* defined in :class:`BaseHandler`, but subclasses should
519 define it if they want to catch all URLs with no specific registered handler to
520 open it.
521
522 This method, if implemented, will be called by the :attr:`parent`
523 :class:`OpenerDirector`. Return values should be the same as for
524 :meth:`default_open`.
525
526
527.. method:: BaseHandler.http_error_default(req, fp, code, msg, hdrs)
528
529 This method is *not* defined in :class:`BaseHandler`, but subclasses should
530 override it if they intend to provide a catch-all for otherwise unhandled HTTP
531 errors. It will be called automatically by the :class:`OpenerDirector` getting
532 the error, and should not normally be called in other circumstances.
533
534 *req* will be a :class:`Request` object, *fp* will be a file-like object with
535 the HTTP error body, *code* will be the three-digit code of the error, *msg*
536 will be the user-visible explanation of the code and *hdrs* will be a mapping
537 object with the headers of the error.
538
539 Return values and exceptions raised should be the same as those of
540 :func:`urlopen`.
541
542
543.. method:: BaseHandler.http_error_nnn(req, fp, code, msg, hdrs)
544
545 *nnn* should be a three-digit HTTP error code. This method is also not defined
546 in :class:`BaseHandler`, but will be called, if it exists, on an instance of a
547 subclass, when an HTTP error with code *nnn* occurs.
548
549 Subclasses should override this method to handle specific HTTP errors.
550
551 Arguments, return values and exceptions raised should be the same as for
552 :meth:`http_error_default`.
553
554
555.. method:: BaseHandler.protocol_request(req)
556 :noindex:
557
558 This method is *not* defined in :class:`BaseHandler`, but subclasses should
559 define it if they want to pre-process requests of the given protocol.
560
561 This method, if defined, will be called by the parent :class:`OpenerDirector`.
562 *req* will be a :class:`Request` object. The return value should be a
563 :class:`Request` object.
564
565
566.. method:: BaseHandler.protocol_response(req, response)
567 :noindex:
568
569 This method is *not* defined in :class:`BaseHandler`, but subclasses should
570 define it if they want to post-process responses of the given protocol.
571
572 This method, if defined, will be called by the parent :class:`OpenerDirector`.
573 *req* will be a :class:`Request` object. *response* will be an object
574 implementing the same interface as the return value of :func:`urlopen`. The
575 return value should implement the same interface as the return value of
576 :func:`urlopen`.
577
578
579.. _http-redirect-handler:
580
581HTTPRedirectHandler Objects
582---------------------------
583
584.. note::
585
586 Some HTTP redirections require action from this module's client code. If this
587 is the case, :exc:`HTTPError` is raised. See :rfc:`2616` for details of the
588 precise meanings of the various redirection codes.
589
590
591.. method:: HTTPRedirectHandler.redirect_request(req, fp, code, msg, hdrs)
592
593 Return a :class:`Request` or ``None`` in response to a redirect. This is called
594 by the default implementations of the :meth:`http_error_30\*` methods when a
595 redirection is received from the server. If a redirection should take place,
596 return a new :class:`Request` to allow :meth:`http_error_30\*` to perform the
597 redirect. Otherwise, raise :exc:`HTTPError` if no other handler should try to
598 handle this URL, or return ``None`` if you can't but another handler might.
599
600 .. note::
601
602 The default implementation of this method does not strictly follow :rfc:`2616`,
603 which says that 301 and 302 responses to ``POST`` requests must not be
604 automatically redirected without confirmation by the user. In reality, browsers
605 do allow automatic redirection of these responses, changing the POST to a
606 ``GET``, and the default implementation reproduces this behavior.
607
608
609.. method:: HTTPRedirectHandler.http_error_301(req, fp, code, msg, hdrs)
610
611 Redirect to the ``Location:`` URL. This method is called by the parent
612 :class:`OpenerDirector` when getting an HTTP 'moved permanently' response.
613
614
615.. method:: HTTPRedirectHandler.http_error_302(req, fp, code, msg, hdrs)
616
617 The same as :meth:`http_error_301`, but called for the 'found' response.
618
619
620.. method:: HTTPRedirectHandler.http_error_303(req, fp, code, msg, hdrs)
621
622 The same as :meth:`http_error_301`, but called for the 'see other' response.
623
624
625.. method:: HTTPRedirectHandler.http_error_307(req, fp, code, msg, hdrs)
626
627 The same as :meth:`http_error_301`, but called for the 'temporary redirect'
628 response.
629
630
631.. _http-cookie-processor:
632
633HTTPCookieProcessor Objects
634---------------------------
635
636.. versionadded:: 2.4
637
638:class:`HTTPCookieProcessor` instances have one attribute:
639
640
641.. attribute:: HTTPCookieProcessor.cookiejar
642
643 The :class:`cookielib.CookieJar` in which cookies are stored.
644
645
646.. _proxy-handler:
647
648ProxyHandler Objects
649--------------------
650
651
652.. method:: ProxyHandler.protocol_open(request)
653 :noindex:
654
655 The :class:`ProxyHandler` will have a method :meth:`protocol_open` for every
656 *protocol* which has a proxy in the *proxies* dictionary given in the
657 constructor. The method will modify requests to go through the proxy, by
658 calling ``request.set_proxy()``, and call the next handler in the chain to
659 actually execute the protocol.
660
661
662.. _http-password-mgr:
663
664HTTPPasswordMgr Objects
665-----------------------
666
667These methods are available on :class:`HTTPPasswordMgr` and
668:class:`HTTPPasswordMgrWithDefaultRealm` objects.
669
670
671.. method:: HTTPPasswordMgr.add_password(realm, uri, user, passwd)
672
673 *uri* can be either a single URI, or a sequence of URIs. *realm*, *user* and
674 *passwd* must be strings. This causes ``(user, passwd)`` to be used as
675 authentication tokens when authentication for *realm* and a super-URI of any of
676 the given URIs is given.
677
678
679.. method:: HTTPPasswordMgr.find_user_password(realm, authuri)
680
681 Get user/password for given realm and URI, if any. This method will return
682 ``(None, None)`` if there is no matching user/password.
683
684 For :class:`HTTPPasswordMgrWithDefaultRealm` objects, the realm ``None`` will be
685 searched if the given *realm* has no matching user/password.
686
687
688.. _abstract-basic-auth-handler:
689
690AbstractBasicAuthHandler Objects
691--------------------------------
692
693
694.. method:: AbstractBasicAuthHandler.http_error_auth_reqed(authreq, host, req, headers)
695
696 Handle an authentication request by getting a user/password pair, and re-trying
697 the request. *authreq* should be the name of the header where the information
698 about the realm is included in the request, *host* specifies the URL and path to
699 authenticate for, *req* should be the (failed) :class:`Request` object, and
700 *headers* should be the error headers.
701
702 *host* is either an authority (e.g. ``"python.org"``) or a URL containing an
703 authority component (e.g. ``"http://python.org/"``). In either case, the
704 authority must not contain a userinfo component (so, ``"python.org"`` and
705 ``"python.org:80"`` are fine, ``"joe:password@python.org"`` is not).
706
707
708.. _http-basic-auth-handler:
709
710HTTPBasicAuthHandler Objects
711----------------------------
712
713
714.. method:: HTTPBasicAuthHandler.http_error_401(req, fp, code, msg, hdrs)
715
716 Retry the request with authentication information, if available.
717
718
719.. _proxy-basic-auth-handler:
720
721ProxyBasicAuthHandler Objects
722-----------------------------
723
724
725.. method:: ProxyBasicAuthHandler.http_error_407(req, fp, code, msg, hdrs)
726
727 Retry the request with authentication information, if available.
728
729
730.. _abstract-digest-auth-handler:
731
732AbstractDigestAuthHandler Objects
733---------------------------------
734
735
736.. method:: AbstractDigestAuthHandler.http_error_auth_reqed(authreq, host, req, headers)
737
738 *authreq* should be the name of the header where the information about the realm
739 is included in the request, *host* should be the host to authenticate to, *req*
740 should be the (failed) :class:`Request` object, and *headers* should be the
741 error headers.
742
743
744.. _http-digest-auth-handler:
745
746HTTPDigestAuthHandler Objects
747-----------------------------
748
749
750.. method:: HTTPDigestAuthHandler.http_error_401(req, fp, code, msg, hdrs)
751
752 Retry the request with authentication information, if available.
753
754
755.. _proxy-digest-auth-handler:
756
757ProxyDigestAuthHandler Objects
758------------------------------
759
760
761.. method:: ProxyDigestAuthHandler.http_error_407(req, fp, code, msg, hdrs)
762
763 Retry the request with authentication information, if available.
764
765
766.. _http-handler-objects:
767
768HTTPHandler Objects
769-------------------
770
771
772.. method:: HTTPHandler.http_open(req)
773
774 Send an HTTP request, which can be either GET or POST, depending on
775 ``req.has_data()``.
776
777
778.. _https-handler-objects:
779
780HTTPSHandler Objects
781--------------------
782
783
784.. method:: HTTPSHandler.https_open(req)
785
786 Send an HTTPS request, which can be either GET or POST, depending on
787 ``req.has_data()``.
788
789
790.. _file-handler-objects:
791
792FileHandler Objects
793-------------------
794
795
796.. method:: FileHandler.file_open(req)
797
798 Open the file locally, if there is no host name, or the host name is
799 ``'localhost'``. Change the protocol to ``ftp`` otherwise, and retry opening it
800 using :attr:`parent`.
801
802
803.. _ftp-handler-objects:
804
805FTPHandler Objects
806------------------
807
808
809.. method:: FTPHandler.ftp_open(req)
810
811 Open the FTP file indicated by *req*. The login is always done with empty
812 username and password.
813
814
815.. _cacheftp-handler-objects:
816
817CacheFTPHandler Objects
818-----------------------
819
820:class:`CacheFTPHandler` objects are :class:`FTPHandler` objects with the
821following additional methods:
822
823
824.. method:: CacheFTPHandler.setTimeout(t)
825
826 Set timeout of connections to *t* seconds.
827
828
829.. method:: CacheFTPHandler.setMaxConns(m)
830
831 Set maximum number of cached connections to *m*.
832
833
834.. _unknown-handler-objects:
835
836UnknownHandler Objects
837----------------------
838
839
840.. method:: UnknownHandler.unknown_open()
841
842 Raise a :exc:`URLError` exception.
843
844
845.. _http-error-processor-objects:
846
847HTTPErrorProcessor Objects
848--------------------------
849
850.. versionadded:: 2.4
851
852
853.. method:: HTTPErrorProcessor.unknown_open()
854
855 Process HTTP error responses.
856
857 For 200 error codes, the response object is returned immediately.
858
859 For non-200 error codes, this simply passes the job on to the
860 :meth:`protocol_error_code` handler methods, via :meth:`OpenerDirector.error`.
861 Eventually, :class:`urllib2.HTTPDefaultErrorHandler` will raise an
862 :exc:`HTTPError` if no other handler handles the error.
863
864
865.. _urllib2-examples:
866
867Examples
868--------
869
870This example gets the python.org main page and displays the first 100 bytes of
871it::
872
873 >>> import urllib2
874 >>> f = urllib2.urlopen('http://www.python.org/')
875 >>> print f.read(100)
876 <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
877 <?xml-stylesheet href="./css/ht2html
878
879Here we are sending a data-stream to the stdin of a CGI and reading the data it
880returns to us. Note that this example will only work when the Python
881installation supports SSL. ::
882
883 >>> import urllib2
884 >>> req = urllib2.Request(url='https://localhost/cgi-bin/test.cgi',
885 ... data='This data is passed to stdin of the CGI')
886 >>> f = urllib2.urlopen(req)
887 >>> print f.read()
888 Got Data: "This data is passed to stdin of the CGI"
889
890The code for the sample CGI used in the above example is::
891
892 #!/usr/bin/env python
893 import sys
894 data = sys.stdin.read()
895 print 'Content-type: text-plain\n\nGot Data: "%s"' % data
896
897Use of Basic HTTP Authentication::
898
899 import urllib2
900 # Create an OpenerDirector with support for Basic HTTP Authentication...
901 auth_handler = urllib2.HTTPBasicAuthHandler()
902 auth_handler.add_password(realm='PDQ Application',
903 uri='https://mahler:8092/site-updates.py',
904 user='klem',
905 passwd='kadidd!ehopper')
906 opener = urllib2.build_opener(auth_handler)
907 # ...and install it globally so it can be used with urlopen.
908 urllib2.install_opener(opener)
909 urllib2.urlopen('http://www.example.com/login.html')
910
911:func:`build_opener` provides many handlers by default, including a
912:class:`ProxyHandler`. By default, :class:`ProxyHandler` uses the environment
913variables named ``<scheme>_proxy``, where ``<scheme>`` is the URL scheme
914involved. For example, the :envvar:`http_proxy` environment variable is read to
915obtain the HTTP proxy's URL.
916
917This example replaces the default :class:`ProxyHandler` with one that uses
918programatically-supplied proxy URLs, and adds proxy authorization support with
919:class:`ProxyBasicAuthHandler`. ::
920
921 proxy_handler = urllib2.ProxyHandler({'http': 'http://www.example.com:3128/'})
922 proxy_auth_handler = urllib2.HTTPBasicAuthHandler()
923 proxy_auth_handler.add_password('realm', 'host', 'username', 'password')
924
925 opener = build_opener(proxy_handler, proxy_auth_handler)
926 # This time, rather than install the OpenerDirector, we use it directly:
927 opener.open('http://www.example.com/login.html')
928
929Adding HTTP headers:
930
931Use the *headers* argument to the :class:`Request` constructor, or::
932
933 import urllib2
934 req = urllib2.Request('http://www.example.com/')
935 req.add_header('Referer', 'http://www.python.org/')
936 r = urllib2.urlopen(req)
937
938:class:`OpenerDirector` automatically adds a :mailheader:`User-Agent` header to
939every :class:`Request`. To change this::
940
941 import urllib2
942 opener = urllib2.build_opener()
943 opener.addheaders = [('User-agent', 'Mozilla/5.0')]
944 opener.open('http://www.example.com/')
945
946Also, remember that a few standard headers (:mailheader:`Content-Length`,
947:mailheader:`Content-Type` and :mailheader:`Host`) are added when the
948:class:`Request` is passed to :func:`urlopen` (or :meth:`OpenerDirector.open`).
949