blob: ff664f5d4cec2c42c46fd379b151f0f719e442de [file] [log] [blame]
Georg Brandl8ec7f652007-08-15 14:28:01 +00001:mod:`urllib2` --- extensible library for opening URLs
2======================================================
3
4.. module:: urllib2
5 :synopsis: Next generation URL opening library.
6.. moduleauthor:: Jeremy Hylton <jhylton@users.sourceforge.net>
7.. sectionauthor:: Moshe Zadka <moshez@users.sourceforge.net>
8
9
10The :mod:`urllib2` module defines functions and classes which help in opening
11URLs (mostly HTTP) in a complex world --- basic and digest authentication,
12redirections, cookies and more.
13
14The :mod:`urllib2` module defines the following functions:
15
16
17.. function:: urlopen(url[, data][, timeout])
18
19 Open the URL *url*, which can be either a string or a :class:`Request` object.
20
21 *data* may be a string specifying additional data to send to the server, or
22 ``None`` if no such data is needed. Currently HTTP requests are the only ones
23 that use *data*; the HTTP request will be a POST instead of a GET when the
24 *data* parameter is provided. *data* should be a buffer in the standard
25 :mimetype:`application/x-www-form-urlencoded` format. The
26 :func:`urllib.urlencode` function takes a mapping or sequence of 2-tuples and
27 returns a string in this format.
28
Georg Brandlab756f62008-05-11 11:09:35 +000029 The optional *timeout* parameter specifies a timeout in seconds for blocking
Facundo Batista4f1b1ed2008-05-29 16:39:26 +000030 operations like the connection attempt (if not specified, the global default
31 timeout setting will be used). This actually only works for HTTP, HTTPS,
32 FTP and FTPS connections.
Georg Brandl8ec7f652007-08-15 14:28:01 +000033
34 This function returns a file-like object with two additional methods:
35
Georg Brandl586a57a2008-02-02 09:56:20 +000036 * :meth:`geturl` --- return the URL of the resource retrieved, commonly used to
37 determine if a redirect was followed
Georg Brandl8ec7f652007-08-15 14:28:01 +000038
Georg Brandl586a57a2008-02-02 09:56:20 +000039 * :meth:`info` --- return the meta-information of the page, such as headers, in
40 the form of an ``httplib.HTTPMessage`` instance
41 (see `Quick Reference to HTTP Headers <http://www.cs.tut.fi/~jkorpela/http.html>`_)
Georg Brandl8ec7f652007-08-15 14:28:01 +000042
43 Raises :exc:`URLError` on errors.
44
45 Note that ``None`` may be returned if no handler handles the request (though the
46 default installed global :class:`OpenerDirector` uses :class:`UnknownHandler` to
47 ensure this never happens).
48
49 .. versionchanged:: 2.6
50 *timeout* was added.
51
52
53.. function:: install_opener(opener)
54
55 Install an :class:`OpenerDirector` instance as the default global opener.
56 Installing an opener is only necessary if you want urlopen to use that opener;
57 otherwise, simply call :meth:`OpenerDirector.open` instead of :func:`urlopen`.
58 The code does not check for a real :class:`OpenerDirector`, and any class with
59 the appropriate interface will work.
60
61
62.. function:: build_opener([handler, ...])
63
64 Return an :class:`OpenerDirector` instance, which chains the handlers in the
65 order given. *handler*\s can be either instances of :class:`BaseHandler`, or
66 subclasses of :class:`BaseHandler` (in which case it must be possible to call
67 the constructor without any parameters). Instances of the following classes
68 will be in front of the *handler*\s, unless the *handler*\s contain them,
69 instances of them or subclasses of them: :class:`ProxyHandler`,
70 :class:`UnknownHandler`, :class:`HTTPHandler`, :class:`HTTPDefaultErrorHandler`,
71 :class:`HTTPRedirectHandler`, :class:`FTPHandler`, :class:`FileHandler`,
72 :class:`HTTPErrorProcessor`.
73
Guido van Rossum8ee23bb2007-08-27 19:11:11 +000074 If the Python installation has SSL support (i.e., if the :mod:`ssl` module can be imported),
Georg Brandl8ec7f652007-08-15 14:28:01 +000075 :class:`HTTPSHandler` will also be added.
76
77 Beginning in Python 2.3, a :class:`BaseHandler` subclass may also change its
78 :attr:`handler_order` member variable to modify its position in the handlers
79 list.
80
81The following exceptions are raised as appropriate:
82
83
84.. exception:: URLError
85
86 The handlers raise this exception (or derived exceptions) when they run into a
87 problem. It is a subclass of :exc:`IOError`.
88
Georg Brandl586a57a2008-02-02 09:56:20 +000089 .. attribute:: reason
90
91 The reason for this error. It can be a message string or another exception
92 instance (:exc:`socket.error` for remote URLs, :exc:`OSError` for local
93 URLs).
94
Georg Brandl8ec7f652007-08-15 14:28:01 +000095
96.. exception:: HTTPError
97
Georg Brandl586a57a2008-02-02 09:56:20 +000098 Though being an exception (a subclass of :exc:`URLError`), an :exc:`HTTPError`
99 can also function as a non-exceptional file-like return value (the same thing
100 that :func:`urlopen` returns). This is useful when handling exotic HTTP
101 errors, such as requests for authentication.
102
103 .. attribute:: code
104
105 An HTTP status code as defined in `RFC 2616 <http://www.faqs.org/rfcs/rfc2616.html>`_.
106 This numeric value corresponds to a value found in the dictionary of
107 codes as found in :attr:`BaseHTTPServer.BaseHTTPRequestHandler.responses`.
108
109
Georg Brandl8ec7f652007-08-15 14:28:01 +0000110
111The following classes are provided:
112
113
Georg Brandl586a57a2008-02-02 09:56:20 +0000114.. class:: Request(url[, data][, headers][, origin_req_host][, unverifiable])
Georg Brandl8ec7f652007-08-15 14:28:01 +0000115
116 This class is an abstraction of a URL request.
117
118 *url* should be a string containing a valid URL.
119
120 *data* may be a string specifying additional data to send to the server, or
121 ``None`` if no such data is needed. Currently HTTP requests are the only ones
122 that use *data*; the HTTP request will be a POST instead of a GET when the
123 *data* parameter is provided. *data* should be a buffer in the standard
124 :mimetype:`application/x-www-form-urlencoded` format. The
125 :func:`urllib.urlencode` function takes a mapping or sequence of 2-tuples and
126 returns a string in this format.
127
128 *headers* should be a dictionary, and will be treated as if :meth:`add_header`
Georg Brandl586a57a2008-02-02 09:56:20 +0000129 was called with each key and value as arguments. This is often used to "spoof"
130 the ``User-Agent`` header, which is used by a browser to identify itself --
131 some HTTP servers only allow requests coming from common browsers as opposed
132 to scripts. For example, Mozilla Firefox may identify itself as ``"Mozilla/5.0
133 (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11"``, while :mod:`urllib2`'s
134 default user agent string is ``"Python-urllib/2.6"`` (on Python 2.6).
Georg Brandl8ec7f652007-08-15 14:28:01 +0000135
136 The final two arguments are only of interest for correct handling of third-party
137 HTTP cookies:
138
139 *origin_req_host* should be the request-host of the origin transaction, as
140 defined by :rfc:`2965`. It defaults to ``cookielib.request_host(self)``. This
141 is the host name or IP address of the original request that was initiated by the
142 user. For example, if the request is for an image in an HTML document, this
143 should be the request-host of the request for the page containing the image.
144
145 *unverifiable* should indicate whether the request is unverifiable, as defined
146 by RFC 2965. It defaults to False. An unverifiable request is one whose URL
147 the user did not have the option to approve. For example, if the request is for
148 an image in an HTML document, and the user had no option to approve the
149 automatic fetching of the image, this should be true.
150
151
152.. class:: OpenerDirector()
153
154 The :class:`OpenerDirector` class opens URLs via :class:`BaseHandler`\ s chained
155 together. It manages the chaining of handlers, and recovery from errors.
156
157
158.. class:: BaseHandler()
159
160 This is the base class for all registered handlers --- and handles only the
161 simple mechanics of registration.
162
163
164.. class:: HTTPDefaultErrorHandler()
165
166 A class which defines a default handler for HTTP error responses; all responses
167 are turned into :exc:`HTTPError` exceptions.
168
169
170.. class:: HTTPRedirectHandler()
171
172 A class to handle redirections.
173
174
175.. class:: HTTPCookieProcessor([cookiejar])
176
177 A class to handle HTTP Cookies.
178
179
180.. class:: ProxyHandler([proxies])
181
182 Cause requests to go through a proxy. If *proxies* is given, it must be a
183 dictionary mapping protocol names to URLs of proxies. The default is to read the
184 list of proxies from the environment variables :envvar:`<protocol>_proxy`.
Sean Reifscheider45ea86c2008-03-20 03:20:48 +0000185 To disable autodetected proxy pass an empty dictionary.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000186
187
188.. class:: HTTPPasswordMgr()
189
190 Keep a database of ``(realm, uri) -> (user, password)`` mappings.
191
192
193.. class:: HTTPPasswordMgrWithDefaultRealm()
194
195 Keep a database of ``(realm, uri) -> (user, password)`` mappings. A realm of
196 ``None`` is considered a catch-all realm, which is searched if no other realm
197 fits.
198
199
200.. class:: AbstractBasicAuthHandler([password_mgr])
201
202 This is a mixin class that helps with HTTP authentication, both to the remote
203 host and to a proxy. *password_mgr*, if given, should be something that is
204 compatible with :class:`HTTPPasswordMgr`; refer to section
205 :ref:`http-password-mgr` for information on the interface that must be
206 supported.
207
208
209.. class:: HTTPBasicAuthHandler([password_mgr])
210
211 Handle authentication with the remote host. *password_mgr*, if given, should be
212 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
213 :ref:`http-password-mgr` for information on the interface that must be
214 supported.
215
216
217.. class:: ProxyBasicAuthHandler([password_mgr])
218
219 Handle authentication with the proxy. *password_mgr*, if given, should be
220 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
221 :ref:`http-password-mgr` for information on the interface that must be
222 supported.
223
224
225.. class:: AbstractDigestAuthHandler([password_mgr])
226
227 This is a mixin class that helps with HTTP authentication, both to the remote
228 host and to a proxy. *password_mgr*, if given, should be something that is
229 compatible with :class:`HTTPPasswordMgr`; refer to section
230 :ref:`http-password-mgr` for information on the interface that must be
231 supported.
232
233
234.. class:: HTTPDigestAuthHandler([password_mgr])
235
236 Handle authentication with the remote host. *password_mgr*, if given, should be
237 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
238 :ref:`http-password-mgr` for information on the interface that must be
239 supported.
240
241
242.. class:: ProxyDigestAuthHandler([password_mgr])
243
244 Handle authentication with the proxy. *password_mgr*, if given, should be
245 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
246 :ref:`http-password-mgr` for information on the interface that must be
247 supported.
248
249
250.. class:: HTTPHandler()
251
252 A class to handle opening of HTTP URLs.
253
254
255.. class:: HTTPSHandler()
256
257 A class to handle opening of HTTPS URLs.
258
259
260.. class:: FileHandler()
261
262 Open local files.
263
264
265.. class:: FTPHandler()
266
267 Open FTP URLs.
268
269
270.. class:: CacheFTPHandler()
271
272 Open FTP URLs, keeping a cache of open FTP connections to minimize delays.
273
274
275.. class:: UnknownHandler()
276
277 A catch-all class to handle unknown URLs.
278
279
280.. _request-objects:
281
282Request Objects
283---------------
284
285The following methods describe all of :class:`Request`'s public interface, and
286so all must be overridden in subclasses.
287
288
289.. method:: Request.add_data(data)
290
291 Set the :class:`Request` data to *data*. This is ignored by all handlers except
292 HTTP handlers --- and there it should be a byte string, and will change the
293 request to be ``POST`` rather than ``GET``.
294
295
296.. method:: Request.get_method()
297
298 Return a string indicating the HTTP request method. This is only meaningful for
299 HTTP requests, and currently always returns ``'GET'`` or ``'POST'``.
300
301
302.. method:: Request.has_data()
303
304 Return whether the instance has a non-\ ``None`` data.
305
306
307.. method:: Request.get_data()
308
309 Return the instance's data.
310
311
312.. method:: Request.add_header(key, val)
313
314 Add another header to the request. Headers are currently ignored by all
315 handlers except HTTP handlers, where they are added to the list of headers sent
316 to the server. Note that there cannot be more than one header with the same
317 name, and later calls will overwrite previous calls in case the *key* collides.
318 Currently, this is no loss of HTTP functionality, since all headers which have
319 meaning when used more than once have a (header-specific) way of gaining the
320 same functionality using only one header.
321
322
323.. method:: Request.add_unredirected_header(key, header)
324
325 Add a header that will not be added to a redirected request.
326
327 .. versionadded:: 2.4
328
329
330.. method:: Request.has_header(header)
331
332 Return whether the instance has the named header (checks both regular and
333 unredirected).
334
335 .. versionadded:: 2.4
336
337
338.. method:: Request.get_full_url()
339
340 Return the URL given in the constructor.
341
342
343.. method:: Request.get_type()
344
345 Return the type of the URL --- also known as the scheme.
346
347
348.. method:: Request.get_host()
349
350 Return the host to which a connection will be made.
351
352
353.. method:: Request.get_selector()
354
355 Return the selector --- the part of the URL that is sent to the server.
356
357
358.. method:: Request.set_proxy(host, type)
359
360 Prepare the request by connecting to a proxy server. The *host* and *type* will
361 replace those of the instance, and the instance's selector will be the original
362 URL given in the constructor.
363
364
365.. method:: Request.get_origin_req_host()
366
367 Return the request-host of the origin transaction, as defined by :rfc:`2965`.
368 See the documentation for the :class:`Request` constructor.
369
370
371.. method:: Request.is_unverifiable()
372
373 Return whether the request is unverifiable, as defined by RFC 2965. See the
374 documentation for the :class:`Request` constructor.
375
376
377.. _opener-director-objects:
378
379OpenerDirector Objects
380----------------------
381
382:class:`OpenerDirector` instances have the following methods:
383
384
385.. method:: OpenerDirector.add_handler(handler)
386
387 *handler* should be an instance of :class:`BaseHandler`. The following methods
388 are searched, and added to the possible chains (note that HTTP errors are a
389 special case).
390
391 * :meth:`protocol_open` --- signal that the handler knows how to open *protocol*
392 URLs.
393
394 * :meth:`http_error_type` --- signal that the handler knows how to handle HTTP
395 errors with HTTP error code *type*.
396
397 * :meth:`protocol_error` --- signal that the handler knows how to handle errors
398 from (non-\ ``http``) *protocol*.
399
400 * :meth:`protocol_request` --- signal that the handler knows how to pre-process
401 *protocol* requests.
402
403 * :meth:`protocol_response` --- signal that the handler knows how to
404 post-process *protocol* responses.
405
406
407.. method:: OpenerDirector.open(url[, data][, timeout])
408
409 Open the given *url* (which can be a request object or a string), optionally
Georg Brandlab756f62008-05-11 11:09:35 +0000410 passing the given *data*. Arguments, return values and exceptions raised are
411 the same as those of :func:`urlopen` (which simply calls the :meth:`open`
412 method on the currently installed global :class:`OpenerDirector`). The
413 optional *timeout* parameter specifies a timeout in seconds for blocking
Facundo Batista4f1b1ed2008-05-29 16:39:26 +0000414 operations like the connection attempt (if not specified, the global default
415 timeout setting will be usedi). The timeout feature actually works only for
416 HTTP, HTTPS, FTP and FTPS connections).
Georg Brandl8ec7f652007-08-15 14:28:01 +0000417
418 .. versionchanged:: 2.6
419 *timeout* was added.
420
421
422.. method:: OpenerDirector.error(proto[, arg[, ...]])
423
424 Handle an error of the given protocol. This will call the registered error
425 handlers for the given protocol with the given arguments (which are protocol
426 specific). The HTTP protocol is a special case which uses the HTTP response
427 code to determine the specific error handler; refer to the :meth:`http_error_\*`
428 methods of the handler classes.
429
430 Return values and exceptions raised are the same as those of :func:`urlopen`.
431
432OpenerDirector objects open URLs in three stages:
433
434The order in which these methods are called within each stage is determined by
435sorting the handler instances.
436
437#. Every handler with a method named like :meth:`protocol_request` has that
438 method called to pre-process the request.
439
440#. Handlers with a method named like :meth:`protocol_open` are called to handle
441 the request. This stage ends when a handler either returns a non-\ :const:`None`
442 value (ie. a response), or raises an exception (usually :exc:`URLError`).
443 Exceptions are allowed to propagate.
444
445 In fact, the above algorithm is first tried for methods named
446 :meth:`default_open`. If all such methods return :const:`None`, the algorithm
447 is repeated for methods named like :meth:`protocol_open`. If all such methods
448 return :const:`None`, the algorithm is repeated for methods named
449 :meth:`unknown_open`.
450
451 Note that the implementation of these methods may involve calls of the parent
452 :class:`OpenerDirector` instance's :meth:`.open` and :meth:`.error` methods.
453
454#. Every handler with a method named like :meth:`protocol_response` has that
455 method called to post-process the response.
456
457
458.. _base-handler-objects:
459
460BaseHandler Objects
461-------------------
462
463:class:`BaseHandler` objects provide a couple of methods that are directly
464useful, and others that are meant to be used by derived classes. These are
465intended for direct use:
466
467
468.. method:: BaseHandler.add_parent(director)
469
470 Add a director as parent.
471
472
473.. method:: BaseHandler.close()
474
475 Remove any parents.
476
477The following members and methods should only be used by classes derived from
478:class:`BaseHandler`.
479
480.. note::
481
482 The convention has been adopted that subclasses defining
483 :meth:`protocol_request` or :meth:`protocol_response` methods are named
484 :class:`\*Processor`; all others are named :class:`\*Handler`.
485
486
487.. attribute:: BaseHandler.parent
488
489 A valid :class:`OpenerDirector`, which can be used to open using a different
490 protocol, or handle errors.
491
492
493.. method:: BaseHandler.default_open(req)
494
495 This method is *not* defined in :class:`BaseHandler`, but subclasses should
496 define it if they want to catch all URLs.
497
498 This method, if implemented, will be called by the parent
499 :class:`OpenerDirector`. It should return a file-like object as described in
500 the return value of the :meth:`open` of :class:`OpenerDirector`, or ``None``.
501 It should raise :exc:`URLError`, unless a truly exceptional thing happens (for
502 example, :exc:`MemoryError` should not be mapped to :exc:`URLError`).
503
504 This method will be called before any protocol-specific open method.
505
506
507.. method:: BaseHandler.protocol_open(req)
508 :noindex:
509
510 This method is *not* defined in :class:`BaseHandler`, but subclasses should
511 define it if they want to handle URLs with the given protocol.
512
513 This method, if defined, will be called by the parent :class:`OpenerDirector`.
514 Return values should be the same as for :meth:`default_open`.
515
516
517.. method:: BaseHandler.unknown_open(req)
518
519 This method is *not* defined in :class:`BaseHandler`, but subclasses should
520 define it if they want to catch all URLs with no specific registered handler to
521 open it.
522
523 This method, if implemented, will be called by the :attr:`parent`
524 :class:`OpenerDirector`. Return values should be the same as for
525 :meth:`default_open`.
526
527
528.. method:: BaseHandler.http_error_default(req, fp, code, msg, hdrs)
529
530 This method is *not* defined in :class:`BaseHandler`, but subclasses should
531 override it if they intend to provide a catch-all for otherwise unhandled HTTP
532 errors. It will be called automatically by the :class:`OpenerDirector` getting
533 the error, and should not normally be called in other circumstances.
534
535 *req* will be a :class:`Request` object, *fp* will be a file-like object with
536 the HTTP error body, *code* will be the three-digit code of the error, *msg*
537 will be the user-visible explanation of the code and *hdrs* will be a mapping
538 object with the headers of the error.
539
540 Return values and exceptions raised should be the same as those of
541 :func:`urlopen`.
542
543
544.. method:: BaseHandler.http_error_nnn(req, fp, code, msg, hdrs)
545
546 *nnn* should be a three-digit HTTP error code. This method is also not defined
547 in :class:`BaseHandler`, but will be called, if it exists, on an instance of a
548 subclass, when an HTTP error with code *nnn* occurs.
549
550 Subclasses should override this method to handle specific HTTP errors.
551
552 Arguments, return values and exceptions raised should be the same as for
553 :meth:`http_error_default`.
554
555
556.. method:: BaseHandler.protocol_request(req)
557 :noindex:
558
559 This method is *not* defined in :class:`BaseHandler`, but subclasses should
560 define it if they want to pre-process requests of the given protocol.
561
562 This method, if defined, will be called by the parent :class:`OpenerDirector`.
563 *req* will be a :class:`Request` object. The return value should be a
564 :class:`Request` object.
565
566
567.. method:: BaseHandler.protocol_response(req, response)
568 :noindex:
569
570 This method is *not* defined in :class:`BaseHandler`, but subclasses should
571 define it if they want to post-process responses of the given protocol.
572
573 This method, if defined, will be called by the parent :class:`OpenerDirector`.
574 *req* will be a :class:`Request` object. *response* will be an object
575 implementing the same interface as the return value of :func:`urlopen`. The
576 return value should implement the same interface as the return value of
577 :func:`urlopen`.
578
579
580.. _http-redirect-handler:
581
582HTTPRedirectHandler Objects
583---------------------------
584
585.. note::
586
587 Some HTTP redirections require action from this module's client code. If this
588 is the case, :exc:`HTTPError` is raised. See :rfc:`2616` for details of the
589 precise meanings of the various redirection codes.
590
591
592.. method:: HTTPRedirectHandler.redirect_request(req, fp, code, msg, hdrs)
593
594 Return a :class:`Request` or ``None`` in response to a redirect. This is called
595 by the default implementations of the :meth:`http_error_30\*` methods when a
596 redirection is received from the server. If a redirection should take place,
597 return a new :class:`Request` to allow :meth:`http_error_30\*` to perform the
598 redirect. Otherwise, raise :exc:`HTTPError` if no other handler should try to
599 handle this URL, or return ``None`` if you can't but another handler might.
600
601 .. note::
602
603 The default implementation of this method does not strictly follow :rfc:`2616`,
604 which says that 301 and 302 responses to ``POST`` requests must not be
605 automatically redirected without confirmation by the user. In reality, browsers
606 do allow automatic redirection of these responses, changing the POST to a
607 ``GET``, and the default implementation reproduces this behavior.
608
609
610.. method:: HTTPRedirectHandler.http_error_301(req, fp, code, msg, hdrs)
611
612 Redirect to the ``Location:`` URL. This method is called by the parent
613 :class:`OpenerDirector` when getting an HTTP 'moved permanently' response.
614
615
616.. method:: HTTPRedirectHandler.http_error_302(req, fp, code, msg, hdrs)
617
618 The same as :meth:`http_error_301`, but called for the 'found' response.
619
620
621.. method:: HTTPRedirectHandler.http_error_303(req, fp, code, msg, hdrs)
622
623 The same as :meth:`http_error_301`, but called for the 'see other' response.
624
625
626.. method:: HTTPRedirectHandler.http_error_307(req, fp, code, msg, hdrs)
627
628 The same as :meth:`http_error_301`, but called for the 'temporary redirect'
629 response.
630
631
632.. _http-cookie-processor:
633
634HTTPCookieProcessor Objects
635---------------------------
636
637.. versionadded:: 2.4
638
639:class:`HTTPCookieProcessor` instances have one attribute:
640
641
642.. attribute:: HTTPCookieProcessor.cookiejar
643
644 The :class:`cookielib.CookieJar` in which cookies are stored.
645
646
647.. _proxy-handler:
648
649ProxyHandler Objects
650--------------------
651
652
653.. method:: ProxyHandler.protocol_open(request)
654 :noindex:
655
656 The :class:`ProxyHandler` will have a method :meth:`protocol_open` for every
657 *protocol* which has a proxy in the *proxies* dictionary given in the
658 constructor. The method will modify requests to go through the proxy, by
659 calling ``request.set_proxy()``, and call the next handler in the chain to
660 actually execute the protocol.
661
662
663.. _http-password-mgr:
664
665HTTPPasswordMgr Objects
666-----------------------
667
668These methods are available on :class:`HTTPPasswordMgr` and
669:class:`HTTPPasswordMgrWithDefaultRealm` objects.
670
671
672.. method:: HTTPPasswordMgr.add_password(realm, uri, user, passwd)
673
674 *uri* can be either a single URI, or a sequence of URIs. *realm*, *user* and
675 *passwd* must be strings. This causes ``(user, passwd)`` to be used as
676 authentication tokens when authentication for *realm* and a super-URI of any of
677 the given URIs is given.
678
679
680.. method:: HTTPPasswordMgr.find_user_password(realm, authuri)
681
682 Get user/password for given realm and URI, if any. This method will return
683 ``(None, None)`` if there is no matching user/password.
684
685 For :class:`HTTPPasswordMgrWithDefaultRealm` objects, the realm ``None`` will be
686 searched if the given *realm* has no matching user/password.
687
688
689.. _abstract-basic-auth-handler:
690
691AbstractBasicAuthHandler Objects
692--------------------------------
693
694
695.. method:: AbstractBasicAuthHandler.http_error_auth_reqed(authreq, host, req, headers)
696
697 Handle an authentication request by getting a user/password pair, and re-trying
698 the request. *authreq* should be the name of the header where the information
699 about the realm is included in the request, *host* specifies the URL and path to
700 authenticate for, *req* should be the (failed) :class:`Request` object, and
701 *headers* should be the error headers.
702
703 *host* is either an authority (e.g. ``"python.org"``) or a URL containing an
704 authority component (e.g. ``"http://python.org/"``). In either case, the
705 authority must not contain a userinfo component (so, ``"python.org"`` and
706 ``"python.org:80"`` are fine, ``"joe:password@python.org"`` is not).
707
708
709.. _http-basic-auth-handler:
710
711HTTPBasicAuthHandler Objects
712----------------------------
713
714
715.. method:: HTTPBasicAuthHandler.http_error_401(req, fp, code, msg, hdrs)
716
717 Retry the request with authentication information, if available.
718
719
720.. _proxy-basic-auth-handler:
721
722ProxyBasicAuthHandler Objects
723-----------------------------
724
725
726.. method:: ProxyBasicAuthHandler.http_error_407(req, fp, code, msg, hdrs)
727
728 Retry the request with authentication information, if available.
729
730
731.. _abstract-digest-auth-handler:
732
733AbstractDigestAuthHandler Objects
734---------------------------------
735
736
737.. method:: AbstractDigestAuthHandler.http_error_auth_reqed(authreq, host, req, headers)
738
739 *authreq* should be the name of the header where the information about the realm
740 is included in the request, *host* should be the host to authenticate to, *req*
741 should be the (failed) :class:`Request` object, and *headers* should be the
742 error headers.
743
744
745.. _http-digest-auth-handler:
746
747HTTPDigestAuthHandler Objects
748-----------------------------
749
750
751.. method:: HTTPDigestAuthHandler.http_error_401(req, fp, code, msg, hdrs)
752
753 Retry the request with authentication information, if available.
754
755
756.. _proxy-digest-auth-handler:
757
758ProxyDigestAuthHandler Objects
759------------------------------
760
761
762.. method:: ProxyDigestAuthHandler.http_error_407(req, fp, code, msg, hdrs)
763
764 Retry the request with authentication information, if available.
765
766
767.. _http-handler-objects:
768
769HTTPHandler Objects
770-------------------
771
772
773.. method:: HTTPHandler.http_open(req)
774
775 Send an HTTP request, which can be either GET or POST, depending on
776 ``req.has_data()``.
777
778
779.. _https-handler-objects:
780
781HTTPSHandler Objects
782--------------------
783
784
785.. method:: HTTPSHandler.https_open(req)
786
787 Send an HTTPS request, which can be either GET or POST, depending on
788 ``req.has_data()``.
789
790
791.. _file-handler-objects:
792
793FileHandler Objects
794-------------------
795
796
797.. method:: FileHandler.file_open(req)
798
799 Open the file locally, if there is no host name, or the host name is
800 ``'localhost'``. Change the protocol to ``ftp`` otherwise, and retry opening it
801 using :attr:`parent`.
802
803
804.. _ftp-handler-objects:
805
806FTPHandler Objects
807------------------
808
809
810.. method:: FTPHandler.ftp_open(req)
811
812 Open the FTP file indicated by *req*. The login is always done with empty
813 username and password.
814
815
816.. _cacheftp-handler-objects:
817
818CacheFTPHandler Objects
819-----------------------
820
821:class:`CacheFTPHandler` objects are :class:`FTPHandler` objects with the
822following additional methods:
823
824
825.. method:: CacheFTPHandler.setTimeout(t)
826
827 Set timeout of connections to *t* seconds.
828
829
830.. method:: CacheFTPHandler.setMaxConns(m)
831
832 Set maximum number of cached connections to *m*.
833
834
835.. _unknown-handler-objects:
836
837UnknownHandler Objects
838----------------------
839
840
841.. method:: UnknownHandler.unknown_open()
842
843 Raise a :exc:`URLError` exception.
844
845
846.. _http-error-processor-objects:
847
848HTTPErrorProcessor Objects
849--------------------------
850
851.. versionadded:: 2.4
852
853
854.. method:: HTTPErrorProcessor.unknown_open()
855
856 Process HTTP error responses.
857
858 For 200 error codes, the response object is returned immediately.
859
860 For non-200 error codes, this simply passes the job on to the
861 :meth:`protocol_error_code` handler methods, via :meth:`OpenerDirector.error`.
862 Eventually, :class:`urllib2.HTTPDefaultErrorHandler` will raise an
863 :exc:`HTTPError` if no other handler handles the error.
864
865
866.. _urllib2-examples:
867
868Examples
869--------
870
871This example gets the python.org main page and displays the first 100 bytes of
872it::
873
874 >>> import urllib2
875 >>> f = urllib2.urlopen('http://www.python.org/')
876 >>> print f.read(100)
877 <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
878 <?xml-stylesheet href="./css/ht2html
879
880Here we are sending a data-stream to the stdin of a CGI and reading the data it
881returns to us. Note that this example will only work when the Python
882installation supports SSL. ::
883
884 >>> import urllib2
885 >>> req = urllib2.Request(url='https://localhost/cgi-bin/test.cgi',
886 ... data='This data is passed to stdin of the CGI')
887 >>> f = urllib2.urlopen(req)
888 >>> print f.read()
889 Got Data: "This data is passed to stdin of the CGI"
890
891The code for the sample CGI used in the above example is::
892
893 #!/usr/bin/env python
894 import sys
895 data = sys.stdin.read()
896 print 'Content-type: text-plain\n\nGot Data: "%s"' % data
897
898Use of Basic HTTP Authentication::
899
900 import urllib2
901 # Create an OpenerDirector with support for Basic HTTP Authentication...
902 auth_handler = urllib2.HTTPBasicAuthHandler()
903 auth_handler.add_password(realm='PDQ Application',
904 uri='https://mahler:8092/site-updates.py',
905 user='klem',
906 passwd='kadidd!ehopper')
907 opener = urllib2.build_opener(auth_handler)
908 # ...and install it globally so it can be used with urlopen.
909 urllib2.install_opener(opener)
910 urllib2.urlopen('http://www.example.com/login.html')
911
912:func:`build_opener` provides many handlers by default, including a
913:class:`ProxyHandler`. By default, :class:`ProxyHandler` uses the environment
914variables named ``<scheme>_proxy``, where ``<scheme>`` is the URL scheme
915involved. For example, the :envvar:`http_proxy` environment variable is read to
916obtain the HTTP proxy's URL.
917
918This example replaces the default :class:`ProxyHandler` with one that uses
919programatically-supplied proxy URLs, and adds proxy authorization support with
920:class:`ProxyBasicAuthHandler`. ::
921
922 proxy_handler = urllib2.ProxyHandler({'http': 'http://www.example.com:3128/'})
923 proxy_auth_handler = urllib2.HTTPBasicAuthHandler()
924 proxy_auth_handler.add_password('realm', 'host', 'username', 'password')
925
926 opener = build_opener(proxy_handler, proxy_auth_handler)
927 # This time, rather than install the OpenerDirector, we use it directly:
928 opener.open('http://www.example.com/login.html')
929
930Adding HTTP headers:
931
932Use the *headers* argument to the :class:`Request` constructor, or::
933
934 import urllib2
935 req = urllib2.Request('http://www.example.com/')
936 req.add_header('Referer', 'http://www.python.org/')
937 r = urllib2.urlopen(req)
938
939:class:`OpenerDirector` automatically adds a :mailheader:`User-Agent` header to
940every :class:`Request`. To change this::
941
942 import urllib2
943 opener = urllib2.build_opener()
944 opener.addheaders = [('User-agent', 'Mozilla/5.0')]
945 opener.open('http://www.example.com/')
946
947Also, remember that a few standard headers (:mailheader:`Content-Length`,
948:mailheader:`Content-Type` and :mailheader:`Host`) are added when the
949:class:`Request` is passed to :func:`urlopen` (or :meth:`OpenerDirector.open`).
950