blob: d6a5fbd243243dcc8ed2d791635f05f4d4b6c508 [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`urllib2` --- extensible library for opening URLs
2======================================================
3
4.. module:: urllib2
5 :synopsis: Next generation URL opening library.
6.. moduleauthor:: Jeremy Hylton <jhylton@users.sourceforge.net>
7.. sectionauthor:: Moshe Zadka <moshez@users.sourceforge.net>
8
9
10The :mod:`urllib2` module defines functions and classes which help in opening
11URLs (mostly HTTP) in a complex world --- basic and digest authentication,
12redirections, cookies and more.
13
14The :mod:`urllib2` module defines the following functions:
15
16
17.. function:: urlopen(url[, data][, timeout])
18
19 Open the URL *url*, which can be either a string or a :class:`Request` object.
20
21 *data* may be a string specifying additional data to send to the server, or
22 ``None`` if no such data is needed. Currently HTTP requests are the only ones
23 that use *data*; the HTTP request will be a POST instead of a GET when the
24 *data* parameter is provided. *data* should be a buffer in the standard
25 :mimetype:`application/x-www-form-urlencoded` format. The
26 :func:`urllib.urlencode` function takes a mapping or sequence of 2-tuples and
27 returns a string in this format.
28
29 The optional *timeout* parameter specifies a timeout in seconds for the
30 connection attempt (if not specified, or passed as None, the global default
31 timeout setting will be used). This actually only work for HTTP, HTTPS, FTP and
32 FTPS connections.
33
34 This function returns a file-like object with two additional methods:
35
Christian Heimes292d3512008-02-03 16:51:08 +000036 * :meth:`geturl` --- return the URL of the resource retrieved, commonly used to
37 determine if a redirect was followed
Georg Brandl116aa622007-08-15 14:28:22 +000038
Christian Heimes292d3512008-02-03 16:51:08 +000039 * :meth:`info` --- return the meta-information of the page, such as headers, in
40 the form of an ``httplib.HTTPMessage`` instance
41 (see `Quick Reference to HTTP Headers <http://www.cs.tut.fi/~jkorpela/http.html>`_)
Georg Brandl116aa622007-08-15 14:28:22 +000042
43 Raises :exc:`URLError` on errors.
44
45 Note that ``None`` may be returned if no handler handles the request (though the
46 default installed global :class:`OpenerDirector` uses :class:`UnknownHandler` to
47 ensure this never happens).
48
Georg Brandl116aa622007-08-15 14:28:22 +000049
50.. function:: install_opener(opener)
51
52 Install an :class:`OpenerDirector` instance as the default global opener.
53 Installing an opener is only necessary if you want urlopen to use that opener;
54 otherwise, simply call :meth:`OpenerDirector.open` instead of :func:`urlopen`.
55 The code does not check for a real :class:`OpenerDirector`, and any class with
56 the appropriate interface will work.
57
58
59.. function:: build_opener([handler, ...])
60
61 Return an :class:`OpenerDirector` instance, which chains the handlers in the
62 order given. *handler*\s can be either instances of :class:`BaseHandler`, or
63 subclasses of :class:`BaseHandler` (in which case it must be possible to call
64 the constructor without any parameters). Instances of the following classes
65 will be in front of the *handler*\s, unless the *handler*\s contain them,
66 instances of them or subclasses of them: :class:`ProxyHandler`,
67 :class:`UnknownHandler`, :class:`HTTPHandler`, :class:`HTTPDefaultErrorHandler`,
68 :class:`HTTPRedirectHandler`, :class:`FTPHandler`, :class:`FileHandler`,
69 :class:`HTTPErrorProcessor`.
70
Thomas Woutersed03b412007-08-28 21:37:11 +000071 If the Python installation has SSL support (i.e., if the :mod:`ssl` module can be imported),
Georg Brandl116aa622007-08-15 14:28:22 +000072 :class:`HTTPSHandler` will also be added.
73
74 Beginning in Python 2.3, a :class:`BaseHandler` subclass may also change its
75 :attr:`handler_order` member variable to modify its position in the handlers
76 list.
77
78The following exceptions are raised as appropriate:
79
80
81.. exception:: URLError
82
83 The handlers raise this exception (or derived exceptions) when they run into a
84 problem. It is a subclass of :exc:`IOError`.
85
Christian Heimes292d3512008-02-03 16:51:08 +000086 .. attribute:: reason
87
88 The reason for this error. It can be a message string or another exception
89 instance (:exc:`socket.error` for remote URLs, :exc:`OSError` for local
90 URLs).
91
Georg Brandl116aa622007-08-15 14:28:22 +000092
93.. exception:: HTTPError
94
Christian Heimes292d3512008-02-03 16:51:08 +000095 Though being an exception (a subclass of :exc:`URLError`), an :exc:`HTTPError`
96 can also function as a non-exceptional file-like return value (the same thing
97 that :func:`urlopen` returns). This is useful when handling exotic HTTP
98 errors, such as requests for authentication.
99
100 .. attribute:: code
101
102 An HTTP status code as defined in `RFC 2616 <http://www.faqs.org/rfcs/rfc2616.html>`_.
103 This numeric value corresponds to a value found in the dictionary of
104 codes as found in :attr:`BaseHTTPServer.BaseHTTPRequestHandler.responses`.
105
106
Georg Brandl116aa622007-08-15 14:28:22 +0000107
108The following classes are provided:
109
110
Christian Heimes292d3512008-02-03 16:51:08 +0000111.. class:: Request(url[, data][, headers][, origin_req_host][, unverifiable])
Georg Brandl116aa622007-08-15 14:28:22 +0000112
113 This class is an abstraction of a URL request.
114
115 *url* should be a string containing a valid URL.
116
117 *data* may be a string specifying additional data to send to the server, or
118 ``None`` if no such data is needed. Currently HTTP requests are the only ones
119 that use *data*; the HTTP request will be a POST instead of a GET when the
120 *data* parameter is provided. *data* should be a buffer in the standard
121 :mimetype:`application/x-www-form-urlencoded` format. The
122 :func:`urllib.urlencode` function takes a mapping or sequence of 2-tuples and
123 returns a string in this format.
124
125 *headers* should be a dictionary, and will be treated as if :meth:`add_header`
Christian Heimes292d3512008-02-03 16:51:08 +0000126 was called with each key and value as arguments. This is often used to "spoof"
127 the ``User-Agent`` header, which is used by a browser to identify itself --
128 some HTTP servers only allow requests coming from common browsers as opposed
129 to scripts. For example, Mozilla Firefox may identify itself as ``"Mozilla/5.0
130 (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11"``, while :mod:`urllib2`'s
131 default user agent string is ``"Python-urllib/2.6"`` (on Python 2.6).
Georg Brandl116aa622007-08-15 14:28:22 +0000132
133 The final two arguments are only of interest for correct handling of third-party
134 HTTP cookies:
135
136 *origin_req_host* should be the request-host of the origin transaction, as
137 defined by :rfc:`2965`. It defaults to ``cookielib.request_host(self)``. This
138 is the host name or IP address of the original request that was initiated by the
139 user. For example, if the request is for an image in an HTML document, this
140 should be the request-host of the request for the page containing the image.
141
142 *unverifiable* should indicate whether the request is unverifiable, as defined
143 by RFC 2965. It defaults to False. An unverifiable request is one whose URL
144 the user did not have the option to approve. For example, if the request is for
145 an image in an HTML document, and the user had no option to approve the
146 automatic fetching of the image, this should be true.
147
148
149.. class:: OpenerDirector()
150
151 The :class:`OpenerDirector` class opens URLs via :class:`BaseHandler`\ s chained
152 together. It manages the chaining of handlers, and recovery from errors.
153
154
155.. class:: BaseHandler()
156
157 This is the base class for all registered handlers --- and handles only the
158 simple mechanics of registration.
159
160
161.. class:: HTTPDefaultErrorHandler()
162
163 A class which defines a default handler for HTTP error responses; all responses
164 are turned into :exc:`HTTPError` exceptions.
165
166
167.. class:: HTTPRedirectHandler()
168
169 A class to handle redirections.
170
171
172.. class:: HTTPCookieProcessor([cookiejar])
173
174 A class to handle HTTP Cookies.
175
176
177.. class:: ProxyHandler([proxies])
178
179 Cause requests to go through a proxy. If *proxies* is given, it must be a
180 dictionary mapping protocol names to URLs of proxies. The default is to read the
181 list of proxies from the environment variables :envvar:`<protocol>_proxy`.
Christian Heimese25f35e2008-03-20 10:49:03 +0000182 To disable autodetected proxy pass an empty dictionary.
Georg Brandl116aa622007-08-15 14:28:22 +0000183
184
185.. class:: HTTPPasswordMgr()
186
187 Keep a database of ``(realm, uri) -> (user, password)`` mappings.
188
189
190.. class:: HTTPPasswordMgrWithDefaultRealm()
191
192 Keep a database of ``(realm, uri) -> (user, password)`` mappings. A realm of
193 ``None`` is considered a catch-all realm, which is searched if no other realm
194 fits.
195
196
197.. class:: AbstractBasicAuthHandler([password_mgr])
198
199 This is a mixin class that helps with HTTP authentication, both to the remote
200 host and to a proxy. *password_mgr*, if given, should be something that is
201 compatible with :class:`HTTPPasswordMgr`; refer to section
202 :ref:`http-password-mgr` for information on the interface that must be
203 supported.
204
205
206.. class:: HTTPBasicAuthHandler([password_mgr])
207
208 Handle authentication with the remote host. *password_mgr*, if given, should be
209 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
210 :ref:`http-password-mgr` for information on the interface that must be
211 supported.
212
213
214.. class:: ProxyBasicAuthHandler([password_mgr])
215
216 Handle authentication with the proxy. *password_mgr*, if given, should be
217 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
218 :ref:`http-password-mgr` for information on the interface that must be
219 supported.
220
221
222.. class:: AbstractDigestAuthHandler([password_mgr])
223
224 This is a mixin class that helps with HTTP authentication, both to the remote
225 host and to a proxy. *password_mgr*, if given, should be something that is
226 compatible with :class:`HTTPPasswordMgr`; refer to section
227 :ref:`http-password-mgr` for information on the interface that must be
228 supported.
229
230
231.. class:: HTTPDigestAuthHandler([password_mgr])
232
233 Handle authentication with the remote host. *password_mgr*, if given, should be
234 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
235 :ref:`http-password-mgr` for information on the interface that must be
236 supported.
237
238
239.. class:: ProxyDigestAuthHandler([password_mgr])
240
241 Handle authentication with the proxy. *password_mgr*, if given, should be
242 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
243 :ref:`http-password-mgr` for information on the interface that must be
244 supported.
245
246
247.. class:: HTTPHandler()
248
249 A class to handle opening of HTTP URLs.
250
251
252.. class:: HTTPSHandler()
253
254 A class to handle opening of HTTPS URLs.
255
256
257.. class:: FileHandler()
258
259 Open local files.
260
261
262.. class:: FTPHandler()
263
264 Open FTP URLs.
265
266
267.. class:: CacheFTPHandler()
268
269 Open FTP URLs, keeping a cache of open FTP connections to minimize delays.
270
271
272.. class:: UnknownHandler()
273
274 A catch-all class to handle unknown URLs.
275
276
277.. _request-objects:
278
279Request Objects
280---------------
281
282The following methods describe all of :class:`Request`'s public interface, and
283so all must be overridden in subclasses.
284
285
286.. method:: Request.add_data(data)
287
288 Set the :class:`Request` data to *data*. This is ignored by all handlers except
289 HTTP handlers --- and there it should be a byte string, and will change the
290 request to be ``POST`` rather than ``GET``.
291
292
293.. method:: Request.get_method()
294
295 Return a string indicating the HTTP request method. This is only meaningful for
296 HTTP requests, and currently always returns ``'GET'`` or ``'POST'``.
297
298
299.. method:: Request.has_data()
300
301 Return whether the instance has a non-\ ``None`` data.
302
303
304.. method:: Request.get_data()
305
306 Return the instance's data.
307
308
309.. method:: Request.add_header(key, val)
310
311 Add another header to the request. Headers are currently ignored by all
312 handlers except HTTP handlers, where they are added to the list of headers sent
313 to the server. Note that there cannot be more than one header with the same
314 name, and later calls will overwrite previous calls in case the *key* collides.
315 Currently, this is no loss of HTTP functionality, since all headers which have
316 meaning when used more than once have a (header-specific) way of gaining the
317 same functionality using only one header.
318
319
320.. method:: Request.add_unredirected_header(key, header)
321
322 Add a header that will not be added to a redirected request.
323
Georg Brandl116aa622007-08-15 14:28:22 +0000324
325.. method:: Request.has_header(header)
326
327 Return whether the instance has the named header (checks both regular and
328 unredirected).
329
Georg Brandl116aa622007-08-15 14:28:22 +0000330
331.. method:: Request.get_full_url()
332
333 Return the URL given in the constructor.
334
335
336.. method:: Request.get_type()
337
338 Return the type of the URL --- also known as the scheme.
339
340
341.. method:: Request.get_host()
342
343 Return the host to which a connection will be made.
344
345
346.. method:: Request.get_selector()
347
348 Return the selector --- the part of the URL that is sent to the server.
349
350
351.. method:: Request.set_proxy(host, type)
352
353 Prepare the request by connecting to a proxy server. The *host* and *type* will
354 replace those of the instance, and the instance's selector will be the original
355 URL given in the constructor.
356
357
358.. method:: Request.get_origin_req_host()
359
360 Return the request-host of the origin transaction, as defined by :rfc:`2965`.
361 See the documentation for the :class:`Request` constructor.
362
363
364.. method:: Request.is_unverifiable()
365
366 Return whether the request is unverifiable, as defined by RFC 2965. See the
367 documentation for the :class:`Request` constructor.
368
369
370.. _opener-director-objects:
371
372OpenerDirector Objects
373----------------------
374
375:class:`OpenerDirector` instances have the following methods:
376
377
378.. method:: OpenerDirector.add_handler(handler)
379
380 *handler* should be an instance of :class:`BaseHandler`. The following methods
381 are searched, and added to the possible chains (note that HTTP errors are a
382 special case).
383
384 * :meth:`protocol_open` --- signal that the handler knows how to open *protocol*
385 URLs.
386
387 * :meth:`http_error_type` --- signal that the handler knows how to handle HTTP
388 errors with HTTP error code *type*.
389
390 * :meth:`protocol_error` --- signal that the handler knows how to handle errors
391 from (non-\ ``http``) *protocol*.
392
393 * :meth:`protocol_request` --- signal that the handler knows how to pre-process
394 *protocol* requests.
395
396 * :meth:`protocol_response` --- signal that the handler knows how to
397 post-process *protocol* responses.
398
399
400.. method:: OpenerDirector.open(url[, data][, timeout])
401
402 Open the given *url* (which can be a request object or a string), optionally
403 passing the given *data*. Arguments, return values and exceptions raised are the
404 same as those of :func:`urlopen` (which simply calls the :meth:`open` method on
405 the currently installed global :class:`OpenerDirector`). The optional *timeout*
406 parameter specifies a timeout in seconds for the connection attempt (if not
407 specified, or passed as None, the global default timeout setting will be used;
408 this actually only work for HTTP, HTTPS, FTP and FTPS connections).
409
Georg Brandl116aa622007-08-15 14:28:22 +0000410
411.. method:: OpenerDirector.error(proto[, arg[, ...]])
412
413 Handle an error of the given protocol. This will call the registered error
414 handlers for the given protocol with the given arguments (which are protocol
415 specific). The HTTP protocol is a special case which uses the HTTP response
416 code to determine the specific error handler; refer to the :meth:`http_error_\*`
417 methods of the handler classes.
418
419 Return values and exceptions raised are the same as those of :func:`urlopen`.
420
421OpenerDirector objects open URLs in three stages:
422
423The order in which these methods are called within each stage is determined by
424sorting the handler instances.
425
426#. Every handler with a method named like :meth:`protocol_request` has that
427 method called to pre-process the request.
428
429#. Handlers with a method named like :meth:`protocol_open` are called to handle
430 the request. This stage ends when a handler either returns a non-\ :const:`None`
431 value (ie. a response), or raises an exception (usually :exc:`URLError`).
432 Exceptions are allowed to propagate.
433
434 In fact, the above algorithm is first tried for methods named
435 :meth:`default_open`. If all such methods return :const:`None`, the algorithm
436 is repeated for methods named like :meth:`protocol_open`. If all such methods
437 return :const:`None`, the algorithm is repeated for methods named
438 :meth:`unknown_open`.
439
440 Note that the implementation of these methods may involve calls of the parent
441 :class:`OpenerDirector` instance's :meth:`.open` and :meth:`.error` methods.
442
443#. Every handler with a method named like :meth:`protocol_response` has that
444 method called to post-process the response.
445
446
447.. _base-handler-objects:
448
449BaseHandler Objects
450-------------------
451
452:class:`BaseHandler` objects provide a couple of methods that are directly
453useful, and others that are meant to be used by derived classes. These are
454intended for direct use:
455
456
457.. method:: BaseHandler.add_parent(director)
458
459 Add a director as parent.
460
461
462.. method:: BaseHandler.close()
463
464 Remove any parents.
465
466The following members and methods should only be used by classes derived from
467:class:`BaseHandler`.
468
469.. note::
470
471 The convention has been adopted that subclasses defining
472 :meth:`protocol_request` or :meth:`protocol_response` methods are named
473 :class:`\*Processor`; all others are named :class:`\*Handler`.
474
475
476.. attribute:: BaseHandler.parent
477
478 A valid :class:`OpenerDirector`, which can be used to open using a different
479 protocol, or handle errors.
480
481
482.. method:: BaseHandler.default_open(req)
483
484 This method is *not* defined in :class:`BaseHandler`, but subclasses should
485 define it if they want to catch all URLs.
486
487 This method, if implemented, will be called by the parent
488 :class:`OpenerDirector`. It should return a file-like object as described in
489 the return value of the :meth:`open` of :class:`OpenerDirector`, or ``None``.
490 It should raise :exc:`URLError`, unless a truly exceptional thing happens (for
491 example, :exc:`MemoryError` should not be mapped to :exc:`URLError`).
492
493 This method will be called before any protocol-specific open method.
494
495
496.. method:: BaseHandler.protocol_open(req)
497 :noindex:
498
499 This method is *not* defined in :class:`BaseHandler`, but subclasses should
500 define it if they want to handle URLs with the given protocol.
501
502 This method, if defined, will be called by the parent :class:`OpenerDirector`.
503 Return values should be the same as for :meth:`default_open`.
504
505
506.. method:: BaseHandler.unknown_open(req)
507
508 This method is *not* defined in :class:`BaseHandler`, but subclasses should
509 define it if they want to catch all URLs with no specific registered handler to
510 open it.
511
512 This method, if implemented, will be called by the :attr:`parent`
513 :class:`OpenerDirector`. Return values should be the same as for
514 :meth:`default_open`.
515
516
517.. method:: BaseHandler.http_error_default(req, fp, code, msg, hdrs)
518
519 This method is *not* defined in :class:`BaseHandler`, but subclasses should
520 override it if they intend to provide a catch-all for otherwise unhandled HTTP
521 errors. It will be called automatically by the :class:`OpenerDirector` getting
522 the error, and should not normally be called in other circumstances.
523
524 *req* will be a :class:`Request` object, *fp* will be a file-like object with
525 the HTTP error body, *code* will be the three-digit code of the error, *msg*
526 will be the user-visible explanation of the code and *hdrs* will be a mapping
527 object with the headers of the error.
528
529 Return values and exceptions raised should be the same as those of
530 :func:`urlopen`.
531
532
533.. method:: BaseHandler.http_error_nnn(req, fp, code, msg, hdrs)
534
535 *nnn* should be a three-digit HTTP error code. This method is also not defined
536 in :class:`BaseHandler`, but will be called, if it exists, on an instance of a
537 subclass, when an HTTP error with code *nnn* occurs.
538
539 Subclasses should override this method to handle specific HTTP errors.
540
541 Arguments, return values and exceptions raised should be the same as for
542 :meth:`http_error_default`.
543
544
545.. method:: BaseHandler.protocol_request(req)
546 :noindex:
547
548 This method is *not* defined in :class:`BaseHandler`, but subclasses should
549 define it if they want to pre-process requests of the given protocol.
550
551 This method, if defined, will be called by the parent :class:`OpenerDirector`.
552 *req* will be a :class:`Request` object. The return value should be a
553 :class:`Request` object.
554
555
556.. method:: BaseHandler.protocol_response(req, response)
557 :noindex:
558
559 This method is *not* defined in :class:`BaseHandler`, but subclasses should
560 define it if they want to post-process responses of the given protocol.
561
562 This method, if defined, will be called by the parent :class:`OpenerDirector`.
563 *req* will be a :class:`Request` object. *response* will be an object
564 implementing the same interface as the return value of :func:`urlopen`. The
565 return value should implement the same interface as the return value of
566 :func:`urlopen`.
567
568
569.. _http-redirect-handler:
570
571HTTPRedirectHandler Objects
572---------------------------
573
574.. note::
575
576 Some HTTP redirections require action from this module's client code. If this
577 is the case, :exc:`HTTPError` is raised. See :rfc:`2616` for details of the
578 precise meanings of the various redirection codes.
579
580
581.. method:: HTTPRedirectHandler.redirect_request(req, fp, code, msg, hdrs)
582
583 Return a :class:`Request` or ``None`` in response to a redirect. This is called
584 by the default implementations of the :meth:`http_error_30\*` methods when a
585 redirection is received from the server. If a redirection should take place,
586 return a new :class:`Request` to allow :meth:`http_error_30\*` to perform the
587 redirect. Otherwise, raise :exc:`HTTPError` if no other handler should try to
588 handle this URL, or return ``None`` if you can't but another handler might.
589
590 .. note::
591
592 The default implementation of this method does not strictly follow :rfc:`2616`,
593 which says that 301 and 302 responses to ``POST`` requests must not be
594 automatically redirected without confirmation by the user. In reality, browsers
595 do allow automatic redirection of these responses, changing the POST to a
596 ``GET``, and the default implementation reproduces this behavior.
597
598
599.. method:: HTTPRedirectHandler.http_error_301(req, fp, code, msg, hdrs)
600
601 Redirect to the ``Location:`` URL. This method is called by the parent
602 :class:`OpenerDirector` when getting an HTTP 'moved permanently' response.
603
604
605.. method:: HTTPRedirectHandler.http_error_302(req, fp, code, msg, hdrs)
606
607 The same as :meth:`http_error_301`, but called for the 'found' response.
608
609
610.. method:: HTTPRedirectHandler.http_error_303(req, fp, code, msg, hdrs)
611
612 The same as :meth:`http_error_301`, but called for the 'see other' response.
613
614
615.. method:: HTTPRedirectHandler.http_error_307(req, fp, code, msg, hdrs)
616
617 The same as :meth:`http_error_301`, but called for the 'temporary redirect'
618 response.
619
620
621.. _http-cookie-processor:
622
623HTTPCookieProcessor Objects
624---------------------------
625
Georg Brandl116aa622007-08-15 14:28:22 +0000626:class:`HTTPCookieProcessor` instances have one attribute:
627
Georg Brandl116aa622007-08-15 14:28:22 +0000628.. attribute:: HTTPCookieProcessor.cookiejar
629
630 The :class:`cookielib.CookieJar` in which cookies are stored.
631
632
633.. _proxy-handler:
634
635ProxyHandler Objects
636--------------------
637
638
639.. method:: ProxyHandler.protocol_open(request)
640 :noindex:
641
642 The :class:`ProxyHandler` will have a method :meth:`protocol_open` for every
643 *protocol* which has a proxy in the *proxies* dictionary given in the
644 constructor. The method will modify requests to go through the proxy, by
645 calling ``request.set_proxy()``, and call the next handler in the chain to
646 actually execute the protocol.
647
648
649.. _http-password-mgr:
650
651HTTPPasswordMgr Objects
652-----------------------
653
654These methods are available on :class:`HTTPPasswordMgr` and
655:class:`HTTPPasswordMgrWithDefaultRealm` objects.
656
657
658.. method:: HTTPPasswordMgr.add_password(realm, uri, user, passwd)
659
660 *uri* can be either a single URI, or a sequence of URIs. *realm*, *user* and
661 *passwd* must be strings. This causes ``(user, passwd)`` to be used as
662 authentication tokens when authentication for *realm* and a super-URI of any of
663 the given URIs is given.
664
665
666.. method:: HTTPPasswordMgr.find_user_password(realm, authuri)
667
668 Get user/password for given realm and URI, if any. This method will return
669 ``(None, None)`` if there is no matching user/password.
670
671 For :class:`HTTPPasswordMgrWithDefaultRealm` objects, the realm ``None`` will be
672 searched if the given *realm* has no matching user/password.
673
674
675.. _abstract-basic-auth-handler:
676
677AbstractBasicAuthHandler Objects
678--------------------------------
679
680
681.. method:: AbstractBasicAuthHandler.http_error_auth_reqed(authreq, host, req, headers)
682
683 Handle an authentication request by getting a user/password pair, and re-trying
684 the request. *authreq* should be the name of the header where the information
685 about the realm is included in the request, *host* specifies the URL and path to
686 authenticate for, *req* should be the (failed) :class:`Request` object, and
687 *headers* should be the error headers.
688
689 *host* is either an authority (e.g. ``"python.org"``) or a URL containing an
690 authority component (e.g. ``"http://python.org/"``). In either case, the
691 authority must not contain a userinfo component (so, ``"python.org"`` and
692 ``"python.org:80"`` are fine, ``"joe:password@python.org"`` is not).
693
694
695.. _http-basic-auth-handler:
696
697HTTPBasicAuthHandler Objects
698----------------------------
699
700
701.. method:: HTTPBasicAuthHandler.http_error_401(req, fp, code, msg, hdrs)
702
703 Retry the request with authentication information, if available.
704
705
706.. _proxy-basic-auth-handler:
707
708ProxyBasicAuthHandler Objects
709-----------------------------
710
711
712.. method:: ProxyBasicAuthHandler.http_error_407(req, fp, code, msg, hdrs)
713
714 Retry the request with authentication information, if available.
715
716
717.. _abstract-digest-auth-handler:
718
719AbstractDigestAuthHandler Objects
720---------------------------------
721
722
723.. method:: AbstractDigestAuthHandler.http_error_auth_reqed(authreq, host, req, headers)
724
725 *authreq* should be the name of the header where the information about the realm
726 is included in the request, *host* should be the host to authenticate to, *req*
727 should be the (failed) :class:`Request` object, and *headers* should be the
728 error headers.
729
730
731.. _http-digest-auth-handler:
732
733HTTPDigestAuthHandler Objects
734-----------------------------
735
736
737.. method:: HTTPDigestAuthHandler.http_error_401(req, fp, code, msg, hdrs)
738
739 Retry the request with authentication information, if available.
740
741
742.. _proxy-digest-auth-handler:
743
744ProxyDigestAuthHandler Objects
745------------------------------
746
747
748.. method:: ProxyDigestAuthHandler.http_error_407(req, fp, code, msg, hdrs)
749
750 Retry the request with authentication information, if available.
751
752
753.. _http-handler-objects:
754
755HTTPHandler Objects
756-------------------
757
758
759.. method:: HTTPHandler.http_open(req)
760
761 Send an HTTP request, which can be either GET or POST, depending on
762 ``req.has_data()``.
763
764
765.. _https-handler-objects:
766
767HTTPSHandler Objects
768--------------------
769
770
771.. method:: HTTPSHandler.https_open(req)
772
773 Send an HTTPS request, which can be either GET or POST, depending on
774 ``req.has_data()``.
775
776
777.. _file-handler-objects:
778
779FileHandler Objects
780-------------------
781
782
783.. method:: FileHandler.file_open(req)
784
785 Open the file locally, if there is no host name, or the host name is
786 ``'localhost'``. Change the protocol to ``ftp`` otherwise, and retry opening it
787 using :attr:`parent`.
788
789
790.. _ftp-handler-objects:
791
792FTPHandler Objects
793------------------
794
795
796.. method:: FTPHandler.ftp_open(req)
797
798 Open the FTP file indicated by *req*. The login is always done with empty
799 username and password.
800
801
802.. _cacheftp-handler-objects:
803
804CacheFTPHandler Objects
805-----------------------
806
807:class:`CacheFTPHandler` objects are :class:`FTPHandler` objects with the
808following additional methods:
809
810
811.. method:: CacheFTPHandler.setTimeout(t)
812
813 Set timeout of connections to *t* seconds.
814
815
816.. method:: CacheFTPHandler.setMaxConns(m)
817
818 Set maximum number of cached connections to *m*.
819
820
821.. _unknown-handler-objects:
822
823UnknownHandler Objects
824----------------------
825
826
827.. method:: UnknownHandler.unknown_open()
828
829 Raise a :exc:`URLError` exception.
830
831
832.. _http-error-processor-objects:
833
834HTTPErrorProcessor Objects
835--------------------------
836
Georg Brandl116aa622007-08-15 14:28:22 +0000837.. method:: HTTPErrorProcessor.unknown_open()
838
839 Process HTTP error responses.
840
841 For 200 error codes, the response object is returned immediately.
842
843 For non-200 error codes, this simply passes the job on to the
844 :meth:`protocol_error_code` handler methods, via :meth:`OpenerDirector.error`.
845 Eventually, :class:`urllib2.HTTPDefaultErrorHandler` will raise an
846 :exc:`HTTPError` if no other handler handles the error.
847
848
849.. _urllib2-examples:
850
851Examples
852--------
853
854This example gets the python.org main page and displays the first 100 bytes of
855it::
856
857 >>> import urllib2
858 >>> f = urllib2.urlopen('http://www.python.org/')
Collin Winterc79461b2007-09-01 23:34:30 +0000859 >>> print(f.read(100))
Georg Brandl116aa622007-08-15 14:28:22 +0000860 <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
861 <?xml-stylesheet href="./css/ht2html
862
863Here we are sending a data-stream to the stdin of a CGI and reading the data it
864returns to us. Note that this example will only work when the Python
865installation supports SSL. ::
866
867 >>> import urllib2
868 >>> req = urllib2.Request(url='https://localhost/cgi-bin/test.cgi',
869 ... data='This data is passed to stdin of the CGI')
870 >>> f = urllib2.urlopen(req)
Collin Winterc79461b2007-09-01 23:34:30 +0000871 >>> print(f.read())
Georg Brandl116aa622007-08-15 14:28:22 +0000872 Got Data: "This data is passed to stdin of the CGI"
873
874The code for the sample CGI used in the above example is::
875
876 #!/usr/bin/env python
877 import sys
878 data = sys.stdin.read()
Collin Winterc79461b2007-09-01 23:34:30 +0000879 print('Content-type: text-plain\n\nGot Data: "%s"' % data)
Georg Brandl116aa622007-08-15 14:28:22 +0000880
881Use of Basic HTTP Authentication::
882
883 import urllib2
884 # Create an OpenerDirector with support for Basic HTTP Authentication...
885 auth_handler = urllib2.HTTPBasicAuthHandler()
886 auth_handler.add_password(realm='PDQ Application',
887 uri='https://mahler:8092/site-updates.py',
888 user='klem',
889 passwd='kadidd!ehopper')
890 opener = urllib2.build_opener(auth_handler)
891 # ...and install it globally so it can be used with urlopen.
892 urllib2.install_opener(opener)
893 urllib2.urlopen('http://www.example.com/login.html')
894
895:func:`build_opener` provides many handlers by default, including a
896:class:`ProxyHandler`. By default, :class:`ProxyHandler` uses the environment
897variables named ``<scheme>_proxy``, where ``<scheme>`` is the URL scheme
898involved. For example, the :envvar:`http_proxy` environment variable is read to
899obtain the HTTP proxy's URL.
900
901This example replaces the default :class:`ProxyHandler` with one that uses
902programatically-supplied proxy URLs, and adds proxy authorization support with
903:class:`ProxyBasicAuthHandler`. ::
904
905 proxy_handler = urllib2.ProxyHandler({'http': 'http://www.example.com:3128/'})
906 proxy_auth_handler = urllib2.HTTPBasicAuthHandler()
907 proxy_auth_handler.add_password('realm', 'host', 'username', 'password')
908
909 opener = build_opener(proxy_handler, proxy_auth_handler)
910 # This time, rather than install the OpenerDirector, we use it directly:
911 opener.open('http://www.example.com/login.html')
912
913Adding HTTP headers:
914
915Use the *headers* argument to the :class:`Request` constructor, or::
916
917 import urllib2
918 req = urllib2.Request('http://www.example.com/')
919 req.add_header('Referer', 'http://www.python.org/')
920 r = urllib2.urlopen(req)
921
922:class:`OpenerDirector` automatically adds a :mailheader:`User-Agent` header to
923every :class:`Request`. To change this::
924
925 import urllib2
926 opener = urllib2.build_opener()
927 opener.addheaders = [('User-agent', 'Mozilla/5.0')]
928 opener.open('http://www.example.com/')
929
930Also, remember that a few standard headers (:mailheader:`Content-Length`,
931:mailheader:`Content-Type` and :mailheader:`Host`) are added when the
932:class:`Request` is passed to :func:`urlopen` (or :meth:`OpenerDirector.open`).
933