blob: 143fe50b73173243678aebb0942cd11814505706 [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`urllib2` --- extensible library for opening URLs
2======================================================
3
4.. module:: urllib2
5 :synopsis: Next generation URL opening library.
6.. moduleauthor:: Jeremy Hylton <jhylton@users.sourceforge.net>
7.. sectionauthor:: Moshe Zadka <moshez@users.sourceforge.net>
8
9
10The :mod:`urllib2` module defines functions and classes which help in opening
11URLs (mostly HTTP) in a complex world --- basic and digest authentication,
12redirections, cookies and more.
13
14The :mod:`urllib2` module defines the following functions:
15
16
17.. function:: urlopen(url[, data][, timeout])
18
19 Open the URL *url*, which can be either a string or a :class:`Request` object.
20
21 *data* may be a string specifying additional data to send to the server, or
22 ``None`` if no such data is needed. Currently HTTP requests are the only ones
23 that use *data*; the HTTP request will be a POST instead of a GET when the
24 *data* parameter is provided. *data* should be a buffer in the standard
25 :mimetype:`application/x-www-form-urlencoded` format. The
26 :func:`urllib.urlencode` function takes a mapping or sequence of 2-tuples and
27 returns a string in this format.
28
Alexandre Vassalotti5f8ced22008-05-16 00:03:33 +000029 The optional *timeout* parameter specifies a timeout in seconds for blocking
30 operations like the connection attempt (if not specified, or passed as
31 ``None``, the global default timeout setting will be used). This actually
32 only works for HTTP, HTTPS, FTP and FTPS connections.
Georg Brandl116aa622007-08-15 14:28:22 +000033
34 This function returns a file-like object with two additional methods:
35
Christian Heimes292d3512008-02-03 16:51:08 +000036 * :meth:`geturl` --- return the URL of the resource retrieved, commonly used to
37 determine if a redirect was followed
Georg Brandl116aa622007-08-15 14:28:22 +000038
Christian Heimes292d3512008-02-03 16:51:08 +000039 * :meth:`info` --- return the meta-information of the page, such as headers, in
40 the form of an ``httplib.HTTPMessage`` instance
41 (see `Quick Reference to HTTP Headers <http://www.cs.tut.fi/~jkorpela/http.html>`_)
Georg Brandl116aa622007-08-15 14:28:22 +000042
43 Raises :exc:`URLError` on errors.
44
45 Note that ``None`` may be returned if no handler handles the request (though the
46 default installed global :class:`OpenerDirector` uses :class:`UnknownHandler` to
47 ensure this never happens).
48
Georg Brandl116aa622007-08-15 14:28:22 +000049
50.. function:: install_opener(opener)
51
52 Install an :class:`OpenerDirector` instance as the default global opener.
53 Installing an opener is only necessary if you want urlopen to use that opener;
54 otherwise, simply call :meth:`OpenerDirector.open` instead of :func:`urlopen`.
55 The code does not check for a real :class:`OpenerDirector`, and any class with
56 the appropriate interface will work.
57
58
59.. function:: build_opener([handler, ...])
60
61 Return an :class:`OpenerDirector` instance, which chains the handlers in the
62 order given. *handler*\s can be either instances of :class:`BaseHandler`, or
63 subclasses of :class:`BaseHandler` (in which case it must be possible to call
64 the constructor without any parameters). Instances of the following classes
65 will be in front of the *handler*\s, unless the *handler*\s contain them,
66 instances of them or subclasses of them: :class:`ProxyHandler`,
67 :class:`UnknownHandler`, :class:`HTTPHandler`, :class:`HTTPDefaultErrorHandler`,
68 :class:`HTTPRedirectHandler`, :class:`FTPHandler`, :class:`FileHandler`,
69 :class:`HTTPErrorProcessor`.
70
Thomas Woutersed03b412007-08-28 21:37:11 +000071 If the Python installation has SSL support (i.e., if the :mod:`ssl` module can be imported),
Georg Brandl116aa622007-08-15 14:28:22 +000072 :class:`HTTPSHandler` will also be added.
73
Georg Brandle6bcc912008-05-12 18:05:20 +000074 A :class:`BaseHandler` subclass may also change its :attr:`handler_order`
75 member variable to modify its position in the handlers list.
Georg Brandl116aa622007-08-15 14:28:22 +000076
77The following exceptions are raised as appropriate:
78
79
80.. exception:: URLError
81
82 The handlers raise this exception (or derived exceptions) when they run into a
83 problem. It is a subclass of :exc:`IOError`.
84
Christian Heimes292d3512008-02-03 16:51:08 +000085 .. attribute:: reason
86
87 The reason for this error. It can be a message string or another exception
88 instance (:exc:`socket.error` for remote URLs, :exc:`OSError` for local
89 URLs).
90
Georg Brandl116aa622007-08-15 14:28:22 +000091
92.. exception:: HTTPError
93
Christian Heimes292d3512008-02-03 16:51:08 +000094 Though being an exception (a subclass of :exc:`URLError`), an :exc:`HTTPError`
95 can also function as a non-exceptional file-like return value (the same thing
96 that :func:`urlopen` returns). This is useful when handling exotic HTTP
97 errors, such as requests for authentication.
98
99 .. attribute:: code
100
101 An HTTP status code as defined in `RFC 2616 <http://www.faqs.org/rfcs/rfc2616.html>`_.
102 This numeric value corresponds to a value found in the dictionary of
103 codes as found in :attr:`BaseHTTPServer.BaseHTTPRequestHandler.responses`.
104
105
Georg Brandl116aa622007-08-15 14:28:22 +0000106
107The following classes are provided:
108
109
Christian Heimes292d3512008-02-03 16:51:08 +0000110.. class:: Request(url[, data][, headers][, origin_req_host][, unverifiable])
Georg Brandl116aa622007-08-15 14:28:22 +0000111
112 This class is an abstraction of a URL request.
113
114 *url* should be a string containing a valid URL.
115
116 *data* may be a string specifying additional data to send to the server, or
117 ``None`` if no such data is needed. Currently HTTP requests are the only ones
118 that use *data*; the HTTP request will be a POST instead of a GET when the
119 *data* parameter is provided. *data* should be a buffer in the standard
120 :mimetype:`application/x-www-form-urlencoded` format. The
121 :func:`urllib.urlencode` function takes a mapping or sequence of 2-tuples and
122 returns a string in this format.
123
124 *headers* should be a dictionary, and will be treated as if :meth:`add_header`
Christian Heimes292d3512008-02-03 16:51:08 +0000125 was called with each key and value as arguments. This is often used to "spoof"
126 the ``User-Agent`` header, which is used by a browser to identify itself --
127 some HTTP servers only allow requests coming from common browsers as opposed
128 to scripts. For example, Mozilla Firefox may identify itself as ``"Mozilla/5.0
129 (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11"``, while :mod:`urllib2`'s
130 default user agent string is ``"Python-urllib/2.6"`` (on Python 2.6).
Georg Brandl116aa622007-08-15 14:28:22 +0000131
132 The final two arguments are only of interest for correct handling of third-party
133 HTTP cookies:
134
135 *origin_req_host* should be the request-host of the origin transaction, as
136 defined by :rfc:`2965`. It defaults to ``cookielib.request_host(self)``. This
137 is the host name or IP address of the original request that was initiated by the
138 user. For example, if the request is for an image in an HTML document, this
139 should be the request-host of the request for the page containing the image.
140
141 *unverifiable* should indicate whether the request is unverifiable, as defined
142 by RFC 2965. It defaults to False. An unverifiable request is one whose URL
143 the user did not have the option to approve. For example, if the request is for
144 an image in an HTML document, and the user had no option to approve the
145 automatic fetching of the image, this should be true.
146
147
148.. class:: OpenerDirector()
149
150 The :class:`OpenerDirector` class opens URLs via :class:`BaseHandler`\ s chained
151 together. It manages the chaining of handlers, and recovery from errors.
152
153
154.. class:: BaseHandler()
155
156 This is the base class for all registered handlers --- and handles only the
157 simple mechanics of registration.
158
159
160.. class:: HTTPDefaultErrorHandler()
161
162 A class which defines a default handler for HTTP error responses; all responses
163 are turned into :exc:`HTTPError` exceptions.
164
165
166.. class:: HTTPRedirectHandler()
167
168 A class to handle redirections.
169
170
171.. class:: HTTPCookieProcessor([cookiejar])
172
173 A class to handle HTTP Cookies.
174
175
176.. class:: ProxyHandler([proxies])
177
178 Cause requests to go through a proxy. If *proxies* is given, it must be a
179 dictionary mapping protocol names to URLs of proxies. The default is to read the
180 list of proxies from the environment variables :envvar:`<protocol>_proxy`.
Christian Heimese25f35e2008-03-20 10:49:03 +0000181 To disable autodetected proxy pass an empty dictionary.
Georg Brandl116aa622007-08-15 14:28:22 +0000182
183
184.. class:: HTTPPasswordMgr()
185
186 Keep a database of ``(realm, uri) -> (user, password)`` mappings.
187
188
189.. class:: HTTPPasswordMgrWithDefaultRealm()
190
191 Keep a database of ``(realm, uri) -> (user, password)`` mappings. A realm of
192 ``None`` is considered a catch-all realm, which is searched if no other realm
193 fits.
194
195
196.. class:: AbstractBasicAuthHandler([password_mgr])
197
198 This is a mixin class that helps with HTTP authentication, both to the remote
199 host and to a proxy. *password_mgr*, if given, should be something that is
200 compatible with :class:`HTTPPasswordMgr`; refer to section
201 :ref:`http-password-mgr` for information on the interface that must be
202 supported.
203
204
205.. class:: HTTPBasicAuthHandler([password_mgr])
206
207 Handle authentication with the remote host. *password_mgr*, if given, should be
208 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
209 :ref:`http-password-mgr` for information on the interface that must be
210 supported.
211
212
213.. class:: ProxyBasicAuthHandler([password_mgr])
214
215 Handle authentication with the proxy. *password_mgr*, if given, should be
216 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
217 :ref:`http-password-mgr` for information on the interface that must be
218 supported.
219
220
221.. class:: AbstractDigestAuthHandler([password_mgr])
222
223 This is a mixin class that helps with HTTP authentication, both to the remote
224 host and to a proxy. *password_mgr*, if given, should be something that is
225 compatible with :class:`HTTPPasswordMgr`; refer to section
226 :ref:`http-password-mgr` for information on the interface that must be
227 supported.
228
229
230.. class:: HTTPDigestAuthHandler([password_mgr])
231
232 Handle authentication with the remote host. *password_mgr*, if given, should be
233 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
234 :ref:`http-password-mgr` for information on the interface that must be
235 supported.
236
237
238.. class:: ProxyDigestAuthHandler([password_mgr])
239
240 Handle authentication with the proxy. *password_mgr*, if given, should be
241 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
242 :ref:`http-password-mgr` for information on the interface that must be
243 supported.
244
245
246.. class:: HTTPHandler()
247
248 A class to handle opening of HTTP URLs.
249
250
251.. class:: HTTPSHandler()
252
253 A class to handle opening of HTTPS URLs.
254
255
256.. class:: FileHandler()
257
258 Open local files.
259
260
261.. class:: FTPHandler()
262
263 Open FTP URLs.
264
265
266.. class:: CacheFTPHandler()
267
268 Open FTP URLs, keeping a cache of open FTP connections to minimize delays.
269
270
271.. class:: UnknownHandler()
272
273 A catch-all class to handle unknown URLs.
274
275
276.. _request-objects:
277
278Request Objects
279---------------
280
281The following methods describe all of :class:`Request`'s public interface, and
282so all must be overridden in subclasses.
283
284
285.. method:: Request.add_data(data)
286
287 Set the :class:`Request` data to *data*. This is ignored by all handlers except
288 HTTP handlers --- and there it should be a byte string, and will change the
289 request to be ``POST`` rather than ``GET``.
290
291
292.. method:: Request.get_method()
293
294 Return a string indicating the HTTP request method. This is only meaningful for
295 HTTP requests, and currently always returns ``'GET'`` or ``'POST'``.
296
297
298.. method:: Request.has_data()
299
300 Return whether the instance has a non-\ ``None`` data.
301
302
303.. method:: Request.get_data()
304
305 Return the instance's data.
306
307
308.. method:: Request.add_header(key, val)
309
310 Add another header to the request. Headers are currently ignored by all
311 handlers except HTTP handlers, where they are added to the list of headers sent
312 to the server. Note that there cannot be more than one header with the same
313 name, and later calls will overwrite previous calls in case the *key* collides.
314 Currently, this is no loss of HTTP functionality, since all headers which have
315 meaning when used more than once have a (header-specific) way of gaining the
316 same functionality using only one header.
317
318
319.. method:: Request.add_unredirected_header(key, header)
320
321 Add a header that will not be added to a redirected request.
322
Georg Brandl116aa622007-08-15 14:28:22 +0000323
324.. method:: Request.has_header(header)
325
326 Return whether the instance has the named header (checks both regular and
327 unredirected).
328
Georg Brandl116aa622007-08-15 14:28:22 +0000329
330.. method:: Request.get_full_url()
331
332 Return the URL given in the constructor.
333
334
335.. method:: Request.get_type()
336
337 Return the type of the URL --- also known as the scheme.
338
339
340.. method:: Request.get_host()
341
342 Return the host to which a connection will be made.
343
344
345.. method:: Request.get_selector()
346
347 Return the selector --- the part of the URL that is sent to the server.
348
349
350.. method:: Request.set_proxy(host, type)
351
352 Prepare the request by connecting to a proxy server. The *host* and *type* will
353 replace those of the instance, and the instance's selector will be the original
354 URL given in the constructor.
355
356
357.. method:: Request.get_origin_req_host()
358
359 Return the request-host of the origin transaction, as defined by :rfc:`2965`.
360 See the documentation for the :class:`Request` constructor.
361
362
363.. method:: Request.is_unverifiable()
364
365 Return whether the request is unverifiable, as defined by RFC 2965. See the
366 documentation for the :class:`Request` constructor.
367
368
369.. _opener-director-objects:
370
371OpenerDirector Objects
372----------------------
373
374:class:`OpenerDirector` instances have the following methods:
375
376
377.. method:: OpenerDirector.add_handler(handler)
378
379 *handler* should be an instance of :class:`BaseHandler`. The following methods
380 are searched, and added to the possible chains (note that HTTP errors are a
381 special case).
382
383 * :meth:`protocol_open` --- signal that the handler knows how to open *protocol*
384 URLs.
385
386 * :meth:`http_error_type` --- signal that the handler knows how to handle HTTP
387 errors with HTTP error code *type*.
388
389 * :meth:`protocol_error` --- signal that the handler knows how to handle errors
390 from (non-\ ``http``) *protocol*.
391
392 * :meth:`protocol_request` --- signal that the handler knows how to pre-process
393 *protocol* requests.
394
395 * :meth:`protocol_response` --- signal that the handler knows how to
396 post-process *protocol* responses.
397
398
399.. method:: OpenerDirector.open(url[, data][, timeout])
400
401 Open the given *url* (which can be a request object or a string), optionally
Alexandre Vassalotti5f8ced22008-05-16 00:03:33 +0000402 passing the given *data*. Arguments, return values and exceptions raised are
403 the same as those of :func:`urlopen` (which simply calls the :meth:`open`
404 method on the currently installed global :class:`OpenerDirector`). The
405 optional *timeout* parameter specifies a timeout in seconds for blocking
406 operations like the connection attempt (if not specified, or passed as
407 ``None``, the global default timeout setting will be used; this actually only
408 works for HTTP, HTTPS, FTP and FTPS connections).
Georg Brandl116aa622007-08-15 14:28:22 +0000409
Georg Brandl116aa622007-08-15 14:28:22 +0000410
411.. method:: OpenerDirector.error(proto[, arg[, ...]])
412
413 Handle an error of the given protocol. This will call the registered error
414 handlers for the given protocol with the given arguments (which are protocol
415 specific). The HTTP protocol is a special case which uses the HTTP response
416 code to determine the specific error handler; refer to the :meth:`http_error_\*`
417 methods of the handler classes.
418
419 Return values and exceptions raised are the same as those of :func:`urlopen`.
420
421OpenerDirector objects open URLs in three stages:
422
423The order in which these methods are called within each stage is determined by
424sorting the handler instances.
425
426#. Every handler with a method named like :meth:`protocol_request` has that
427 method called to pre-process the request.
428
429#. Handlers with a method named like :meth:`protocol_open` are called to handle
430 the request. This stage ends when a handler either returns a non-\ :const:`None`
431 value (ie. a response), or raises an exception (usually :exc:`URLError`).
432 Exceptions are allowed to propagate.
433
434 In fact, the above algorithm is first tried for methods named
435 :meth:`default_open`. If all such methods return :const:`None`, the algorithm
436 is repeated for methods named like :meth:`protocol_open`. If all such methods
437 return :const:`None`, the algorithm is repeated for methods named
438 :meth:`unknown_open`.
439
440 Note that the implementation of these methods may involve calls of the parent
441 :class:`OpenerDirector` instance's :meth:`.open` and :meth:`.error` methods.
442
443#. Every handler with a method named like :meth:`protocol_response` has that
444 method called to post-process the response.
445
446
447.. _base-handler-objects:
448
449BaseHandler Objects
450-------------------
451
452:class:`BaseHandler` objects provide a couple of methods that are directly
453useful, and others that are meant to be used by derived classes. These are
454intended for direct use:
455
456
457.. method:: BaseHandler.add_parent(director)
458
459 Add a director as parent.
460
461
462.. method:: BaseHandler.close()
463
464 Remove any parents.
465
466The following members and methods should only be used by classes derived from
467:class:`BaseHandler`.
468
469.. note::
470
471 The convention has been adopted that subclasses defining
472 :meth:`protocol_request` or :meth:`protocol_response` methods are named
473 :class:`\*Processor`; all others are named :class:`\*Handler`.
474
475
476.. attribute:: BaseHandler.parent
477
478 A valid :class:`OpenerDirector`, which can be used to open using a different
479 protocol, or handle errors.
480
481
482.. method:: BaseHandler.default_open(req)
483
484 This method is *not* defined in :class:`BaseHandler`, but subclasses should
485 define it if they want to catch all URLs.
486
487 This method, if implemented, will be called by the parent
488 :class:`OpenerDirector`. It should return a file-like object as described in
489 the return value of the :meth:`open` of :class:`OpenerDirector`, or ``None``.
490 It should raise :exc:`URLError`, unless a truly exceptional thing happens (for
491 example, :exc:`MemoryError` should not be mapped to :exc:`URLError`).
492
493 This method will be called before any protocol-specific open method.
494
495
496.. method:: BaseHandler.protocol_open(req)
497 :noindex:
498
499 This method is *not* defined in :class:`BaseHandler`, but subclasses should
500 define it if they want to handle URLs with the given protocol.
501
502 This method, if defined, will be called by the parent :class:`OpenerDirector`.
503 Return values should be the same as for :meth:`default_open`.
504
505
506.. method:: BaseHandler.unknown_open(req)
507
508 This method is *not* defined in :class:`BaseHandler`, but subclasses should
509 define it if they want to catch all URLs with no specific registered handler to
510 open it.
511
512 This method, if implemented, will be called by the :attr:`parent`
513 :class:`OpenerDirector`. Return values should be the same as for
514 :meth:`default_open`.
515
516
517.. method:: BaseHandler.http_error_default(req, fp, code, msg, hdrs)
518
519 This method is *not* defined in :class:`BaseHandler`, but subclasses should
520 override it if they intend to provide a catch-all for otherwise unhandled HTTP
521 errors. It will be called automatically by the :class:`OpenerDirector` getting
522 the error, and should not normally be called in other circumstances.
523
524 *req* will be a :class:`Request` object, *fp* will be a file-like object with
525 the HTTP error body, *code* will be the three-digit code of the error, *msg*
526 will be the user-visible explanation of the code and *hdrs* will be a mapping
527 object with the headers of the error.
528
529 Return values and exceptions raised should be the same as those of
530 :func:`urlopen`.
531
532
533.. method:: BaseHandler.http_error_nnn(req, fp, code, msg, hdrs)
534
535 *nnn* should be a three-digit HTTP error code. This method is also not defined
536 in :class:`BaseHandler`, but will be called, if it exists, on an instance of a
537 subclass, when an HTTP error with code *nnn* occurs.
538
539 Subclasses should override this method to handle specific HTTP errors.
540
541 Arguments, return values and exceptions raised should be the same as for
542 :meth:`http_error_default`.
543
544
545.. method:: BaseHandler.protocol_request(req)
546 :noindex:
547
548 This method is *not* defined in :class:`BaseHandler`, but subclasses should
549 define it if they want to pre-process requests of the given protocol.
550
551 This method, if defined, will be called by the parent :class:`OpenerDirector`.
552 *req* will be a :class:`Request` object. The return value should be a
553 :class:`Request` object.
554
555
556.. method:: BaseHandler.protocol_response(req, response)
557 :noindex:
558
559 This method is *not* defined in :class:`BaseHandler`, but subclasses should
560 define it if they want to post-process responses of the given protocol.
561
562 This method, if defined, will be called by the parent :class:`OpenerDirector`.
563 *req* will be a :class:`Request` object. *response* will be an object
564 implementing the same interface as the return value of :func:`urlopen`. The
565 return value should implement the same interface as the return value of
566 :func:`urlopen`.
567
568
569.. _http-redirect-handler:
570
571HTTPRedirectHandler Objects
572---------------------------
573
574.. note::
575
576 Some HTTP redirections require action from this module's client code. If this
577 is the case, :exc:`HTTPError` is raised. See :rfc:`2616` for details of the
578 precise meanings of the various redirection codes.
579
580
581.. method:: HTTPRedirectHandler.redirect_request(req, fp, code, msg, hdrs)
582
583 Return a :class:`Request` or ``None`` in response to a redirect. This is called
584 by the default implementations of the :meth:`http_error_30\*` methods when a
585 redirection is received from the server. If a redirection should take place,
586 return a new :class:`Request` to allow :meth:`http_error_30\*` to perform the
587 redirect. Otherwise, raise :exc:`HTTPError` if no other handler should try to
588 handle this URL, or return ``None`` if you can't but another handler might.
589
590 .. note::
591
592 The default implementation of this method does not strictly follow :rfc:`2616`,
593 which says that 301 and 302 responses to ``POST`` requests must not be
594 automatically redirected without confirmation by the user. In reality, browsers
595 do allow automatic redirection of these responses, changing the POST to a
596 ``GET``, and the default implementation reproduces this behavior.
597
598
599.. method:: HTTPRedirectHandler.http_error_301(req, fp, code, msg, hdrs)
600
601 Redirect to the ``Location:`` URL. This method is called by the parent
602 :class:`OpenerDirector` when getting an HTTP 'moved permanently' response.
603
604
605.. method:: HTTPRedirectHandler.http_error_302(req, fp, code, msg, hdrs)
606
607 The same as :meth:`http_error_301`, but called for the 'found' response.
608
609
610.. method:: HTTPRedirectHandler.http_error_303(req, fp, code, msg, hdrs)
611
612 The same as :meth:`http_error_301`, but called for the 'see other' response.
613
614
615.. method:: HTTPRedirectHandler.http_error_307(req, fp, code, msg, hdrs)
616
617 The same as :meth:`http_error_301`, but called for the 'temporary redirect'
618 response.
619
620
621.. _http-cookie-processor:
622
623HTTPCookieProcessor Objects
624---------------------------
625
Georg Brandl116aa622007-08-15 14:28:22 +0000626:class:`HTTPCookieProcessor` instances have one attribute:
627
Georg Brandl116aa622007-08-15 14:28:22 +0000628.. attribute:: HTTPCookieProcessor.cookiejar
629
630 The :class:`cookielib.CookieJar` in which cookies are stored.
631
632
633.. _proxy-handler:
634
635ProxyHandler Objects
636--------------------
637
638
639.. method:: ProxyHandler.protocol_open(request)
640 :noindex:
641
642 The :class:`ProxyHandler` will have a method :meth:`protocol_open` for every
643 *protocol* which has a proxy in the *proxies* dictionary given in the
644 constructor. The method will modify requests to go through the proxy, by
645 calling ``request.set_proxy()``, and call the next handler in the chain to
646 actually execute the protocol.
647
648
649.. _http-password-mgr:
650
651HTTPPasswordMgr Objects
652-----------------------
653
654These methods are available on :class:`HTTPPasswordMgr` and
655:class:`HTTPPasswordMgrWithDefaultRealm` objects.
656
657
658.. method:: HTTPPasswordMgr.add_password(realm, uri, user, passwd)
659
660 *uri* can be either a single URI, or a sequence of URIs. *realm*, *user* and
661 *passwd* must be strings. This causes ``(user, passwd)`` to be used as
662 authentication tokens when authentication for *realm* and a super-URI of any of
663 the given URIs is given.
664
665
666.. method:: HTTPPasswordMgr.find_user_password(realm, authuri)
667
668 Get user/password for given realm and URI, if any. This method will return
669 ``(None, None)`` if there is no matching user/password.
670
671 For :class:`HTTPPasswordMgrWithDefaultRealm` objects, the realm ``None`` will be
672 searched if the given *realm* has no matching user/password.
673
674
675.. _abstract-basic-auth-handler:
676
677AbstractBasicAuthHandler Objects
678--------------------------------
679
680
681.. method:: AbstractBasicAuthHandler.http_error_auth_reqed(authreq, host, req, headers)
682
683 Handle an authentication request by getting a user/password pair, and re-trying
684 the request. *authreq* should be the name of the header where the information
685 about the realm is included in the request, *host* specifies the URL and path to
686 authenticate for, *req* should be the (failed) :class:`Request` object, and
687 *headers* should be the error headers.
688
689 *host* is either an authority (e.g. ``"python.org"``) or a URL containing an
690 authority component (e.g. ``"http://python.org/"``). In either case, the
691 authority must not contain a userinfo component (so, ``"python.org"`` and
692 ``"python.org:80"`` are fine, ``"joe:password@python.org"`` is not).
693
694
695.. _http-basic-auth-handler:
696
697HTTPBasicAuthHandler Objects
698----------------------------
699
700
701.. method:: HTTPBasicAuthHandler.http_error_401(req, fp, code, msg, hdrs)
702
703 Retry the request with authentication information, if available.
704
705
706.. _proxy-basic-auth-handler:
707
708ProxyBasicAuthHandler Objects
709-----------------------------
710
711
712.. method:: ProxyBasicAuthHandler.http_error_407(req, fp, code, msg, hdrs)
713
714 Retry the request with authentication information, if available.
715
716
717.. _abstract-digest-auth-handler:
718
719AbstractDigestAuthHandler Objects
720---------------------------------
721
722
723.. method:: AbstractDigestAuthHandler.http_error_auth_reqed(authreq, host, req, headers)
724
725 *authreq* should be the name of the header where the information about the realm
726 is included in the request, *host* should be the host to authenticate to, *req*
727 should be the (failed) :class:`Request` object, and *headers* should be the
728 error headers.
729
730
731.. _http-digest-auth-handler:
732
733HTTPDigestAuthHandler Objects
734-----------------------------
735
736
737.. method:: HTTPDigestAuthHandler.http_error_401(req, fp, code, msg, hdrs)
738
739 Retry the request with authentication information, if available.
740
741
742.. _proxy-digest-auth-handler:
743
744ProxyDigestAuthHandler Objects
745------------------------------
746
747
748.. method:: ProxyDigestAuthHandler.http_error_407(req, fp, code, msg, hdrs)
749
750 Retry the request with authentication information, if available.
751
752
753.. _http-handler-objects:
754
755HTTPHandler Objects
756-------------------
757
758
759.. method:: HTTPHandler.http_open(req)
760
761 Send an HTTP request, which can be either GET or POST, depending on
762 ``req.has_data()``.
763
764
765.. _https-handler-objects:
766
767HTTPSHandler Objects
768--------------------
769
770
771.. method:: HTTPSHandler.https_open(req)
772
773 Send an HTTPS request, which can be either GET or POST, depending on
774 ``req.has_data()``.
775
776
777.. _file-handler-objects:
778
779FileHandler Objects
780-------------------
781
782
783.. method:: FileHandler.file_open(req)
784
785 Open the file locally, if there is no host name, or the host name is
786 ``'localhost'``. Change the protocol to ``ftp`` otherwise, and retry opening it
787 using :attr:`parent`.
788
789
790.. _ftp-handler-objects:
791
792FTPHandler Objects
793------------------
794
795
796.. method:: FTPHandler.ftp_open(req)
797
798 Open the FTP file indicated by *req*. The login is always done with empty
799 username and password.
800
801
802.. _cacheftp-handler-objects:
803
804CacheFTPHandler Objects
805-----------------------
806
807:class:`CacheFTPHandler` objects are :class:`FTPHandler` objects with the
808following additional methods:
809
810
811.. method:: CacheFTPHandler.setTimeout(t)
812
813 Set timeout of connections to *t* seconds.
814
815
816.. method:: CacheFTPHandler.setMaxConns(m)
817
818 Set maximum number of cached connections to *m*.
819
820
821.. _unknown-handler-objects:
822
823UnknownHandler Objects
824----------------------
825
826
827.. method:: UnknownHandler.unknown_open()
828
829 Raise a :exc:`URLError` exception.
830
831
832.. _http-error-processor-objects:
833
834HTTPErrorProcessor Objects
835--------------------------
836
Georg Brandl116aa622007-08-15 14:28:22 +0000837.. method:: HTTPErrorProcessor.unknown_open()
838
839 Process HTTP error responses.
840
841 For 200 error codes, the response object is returned immediately.
842
843 For non-200 error codes, this simply passes the job on to the
844 :meth:`protocol_error_code` handler methods, via :meth:`OpenerDirector.error`.
845 Eventually, :class:`urllib2.HTTPDefaultErrorHandler` will raise an
846 :exc:`HTTPError` if no other handler handles the error.
847
848
849.. _urllib2-examples:
850
851Examples
852--------
853
854This example gets the python.org main page and displays the first 100 bytes of
855it::
856
857 >>> import urllib2
858 >>> f = urllib2.urlopen('http://www.python.org/')
Collin Winterc79461b2007-09-01 23:34:30 +0000859 >>> print(f.read(100))
Georg Brandl116aa622007-08-15 14:28:22 +0000860 <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
861 <?xml-stylesheet href="./css/ht2html
862
863Here we are sending a data-stream to the stdin of a CGI and reading the data it
864returns to us. Note that this example will only work when the Python
865installation supports SSL. ::
866
867 >>> import urllib2
868 >>> req = urllib2.Request(url='https://localhost/cgi-bin/test.cgi',
869 ... data='This data is passed to stdin of the CGI')
870 >>> f = urllib2.urlopen(req)
Collin Winterc79461b2007-09-01 23:34:30 +0000871 >>> print(f.read())
Georg Brandl116aa622007-08-15 14:28:22 +0000872 Got Data: "This data is passed to stdin of the CGI"
873
874The code for the sample CGI used in the above example is::
875
876 #!/usr/bin/env python
877 import sys
878 data = sys.stdin.read()
Collin Winterc79461b2007-09-01 23:34:30 +0000879 print('Content-type: text-plain\n\nGot Data: "%s"' % data)
Georg Brandl116aa622007-08-15 14:28:22 +0000880
881Use of Basic HTTP Authentication::
882
883 import urllib2
884 # Create an OpenerDirector with support for Basic HTTP Authentication...
885 auth_handler = urllib2.HTTPBasicAuthHandler()
886 auth_handler.add_password(realm='PDQ Application',
887 uri='https://mahler:8092/site-updates.py',
888 user='klem',
889 passwd='kadidd!ehopper')
890 opener = urllib2.build_opener(auth_handler)
891 # ...and install it globally so it can be used with urlopen.
892 urllib2.install_opener(opener)
893 urllib2.urlopen('http://www.example.com/login.html')
894
895:func:`build_opener` provides many handlers by default, including a
896:class:`ProxyHandler`. By default, :class:`ProxyHandler` uses the environment
897variables named ``<scheme>_proxy``, where ``<scheme>`` is the URL scheme
898involved. For example, the :envvar:`http_proxy` environment variable is read to
899obtain the HTTP proxy's URL.
900
901This example replaces the default :class:`ProxyHandler` with one that uses
902programatically-supplied proxy URLs, and adds proxy authorization support with
903:class:`ProxyBasicAuthHandler`. ::
904
905 proxy_handler = urllib2.ProxyHandler({'http': 'http://www.example.com:3128/'})
906 proxy_auth_handler = urllib2.HTTPBasicAuthHandler()
907 proxy_auth_handler.add_password('realm', 'host', 'username', 'password')
908
909 opener = build_opener(proxy_handler, proxy_auth_handler)
910 # This time, rather than install the OpenerDirector, we use it directly:
911 opener.open('http://www.example.com/login.html')
912
913Adding HTTP headers:
914
915Use the *headers* argument to the :class:`Request` constructor, or::
916
917 import urllib2
918 req = urllib2.Request('http://www.example.com/')
919 req.add_header('Referer', 'http://www.python.org/')
920 r = urllib2.urlopen(req)
921
922:class:`OpenerDirector` automatically adds a :mailheader:`User-Agent` header to
923every :class:`Request`. To change this::
924
925 import urllib2
926 opener = urllib2.build_opener()
927 opener.addheaders = [('User-agent', 'Mozilla/5.0')]
928 opener.open('http://www.example.com/')
929
930Also, remember that a few standard headers (:mailheader:`Content-Length`,
931:mailheader:`Content-Type` and :mailheader:`Host`) are added when the
932:class:`Request` is passed to :func:`urlopen` (or :meth:`OpenerDirector.open`).
933