blob: 9d2c38279e2b9e2f9adebff887782f3d8970c197 [file] [log] [blame]
Moshe Zadka8a18e992001-03-01 08:40:42 +00001\section{\module{urllib2} ---
2 extensible library for opening URLs}
3
4\declaremodule{standard}{urllib2}
Moshe Zadka8a18e992001-03-01 08:40:42 +00005\moduleauthor{Jeremy Hylton}{jhylton@users.sourceforge.net}
6\sectionauthor{Moshe Zadka}{moshez@users.sourceforge.net}
7
8\modulesynopsis{An extensible library for opening URLs using a variety of
9 protocols}
10
11The \module{urllib2} module defines functions and classes which help
Fred Drake93c86712001-03-02 20:39:34 +000012in opening URLs (mostly HTTP) in a complex world --- basic and digest
Martin v. Löwis2a6ba902004-05-31 18:22:40 +000013authentication, redirections, cookies and more.
Moshe Zadka8a18e992001-03-01 08:40:42 +000014
15The \module{urllib2} module defines the following functions:
16
Guido van Rossumcd16bf62007-06-13 18:07:49 +000017\begin{funcdesc}{urlopen}{url\optional{, data}\optional{, timeout}}
Fred Drake399bc8c2001-11-09 03:49:29 +000018Open the URL \var{url}, which can be either a string or a \class{Request}
Martin v. Löwis2a6ba902004-05-31 18:22:40 +000019object.
Moshe Zadka8a18e992001-03-01 08:40:42 +000020
Thomas Wouters73e5a5b2006-06-08 15:35:45 +000021\var{data} may be a string specifying additional data to send to the
Thomas Wouters0e3f5912006-08-11 14:57:12 +000022server, or \code{None} if no such data is needed.
23Currently HTTP requests are the only ones that use \var{data};
Thomas Wouters73e5a5b2006-06-08 15:35:45 +000024the HTTP request will be a POST instead of a GET when the \var{data}
25parameter is provided. \var{data} should be a buffer in the standard
26\mimetype{application/x-www-form-urlencoded} format. The
27\function{urllib.urlencode()} function takes a mapping or sequence of
282-tuples and returns a string in this format.
Moshe Zadka8a18e992001-03-01 08:40:42 +000029
Guido van Rossumcd16bf62007-06-13 18:07:49 +000030The optional \var{timeout} parameter specifies a timeout in seconds for the
31connection attempt (if not specified, or passed as None, the global default
32timeout setting will be used). This actually only work for HTTP, HTTPS, FTP
33and FTPS connections.
34
Moshe Zadka8a18e992001-03-01 08:40:42 +000035This function returns a file-like object with two additional methods:
36
37\begin{itemize}
Fred Drake93c86712001-03-02 20:39:34 +000038 \item \method{geturl()} --- return the URL of the resource retrieved
39 \item \method{info()} --- return the meta-information of the page, as
40 a dictionary-like object
Moshe Zadka8a18e992001-03-01 08:40:42 +000041\end{itemize}
42
43Raises \exception{URLError} on errors.
Kurt B. Kaiser8932b412004-07-11 02:13:17 +000044
45Note that \code{None} may be returned if no handler handles the
46request (though the default installed global \class{OpenerDirector}
47uses \class{UnknownHandler} to ensure this never happens).
Guido van Rossumcd16bf62007-06-13 18:07:49 +000048
49\versionchanged[\var{timeout} was added]{2.6}
Moshe Zadka8a18e992001-03-01 08:40:42 +000050\end{funcdesc}
51
52\begin{funcdesc}{install_opener}{opener}
Kurt B. Kaiser8932b412004-07-11 02:13:17 +000053Install an \class{OpenerDirector} instance as the default global
54opener. Installing an opener is only necessary if you want urlopen to
55use that opener; otherwise, simply call \method{OpenerDirector.open()}
56instead of \function{urlopen()}. The code does not check for a real
57\class{OpenerDirector}, and any class with the appropriate interface
58will work.
Moshe Zadka8a18e992001-03-01 08:40:42 +000059\end{funcdesc}
60
Fred Drake93c86712001-03-02 20:39:34 +000061\begin{funcdesc}{build_opener}{\optional{handler, \moreargs}}
Moshe Zadka8a18e992001-03-01 08:40:42 +000062Return an \class{OpenerDirector} instance, which chains the
63handlers in the order given. \var{handler}s can be either instances
64of \class{BaseHandler}, or subclasses of \class{BaseHandler} (in
65which case it must be possible to call the constructor without
Fred Drake399bc8c2001-11-09 03:49:29 +000066any parameters). Instances of the following classes will be in
67front of the \var{handler}s, unless the \var{handler}s contain
Moshe Zadka8a18e992001-03-01 08:40:42 +000068them, instances of them or subclasses of them:
Fred Draked9cf8e72003-07-14 21:07:05 +000069\class{ProxyHandler}, \class{UnknownHandler}, \class{HTTPHandler},
70\class{HTTPDefaultErrorHandler}, \class{HTTPRedirectHandler},
Jeremy Hyltonc1be59f2003-12-14 05:27:34 +000071\class{FTPHandler}, \class{FileHandler}, \class{HTTPErrorProcessor}.
Moshe Zadka8a18e992001-03-01 08:40:42 +000072
Fred Drake93c86712001-03-02 20:39:34 +000073If the Python installation has SSL support (\function{socket.ssl()}
74exists), \class{HTTPSHandler} will also be added.
Gustavo Niemeyer9556fba2003-06-07 17:53:08 +000075
Fred Draked9cf8e72003-07-14 21:07:05 +000076Beginning in Python 2.3, a \class{BaseHandler} subclass may also
77change its \member{handler_order} member variable to modify its
Thomas Wouters4d70c3d2006-06-08 14:42:34 +000078position in the handlers list.
Moshe Zadka8a18e992001-03-01 08:40:42 +000079\end{funcdesc}
80
Fred Drake93c86712001-03-02 20:39:34 +000081
82The following exceptions are raised as appropriate:
83
Moshe Zadka8a18e992001-03-01 08:40:42 +000084\begin{excdesc}{URLError}
Fred Drake399bc8c2001-11-09 03:49:29 +000085The handlers raise this exception (or derived exceptions) when they
86run into a problem. It is a subclass of \exception{IOError}.
Moshe Zadka8a18e992001-03-01 08:40:42 +000087\end{excdesc}
88
89\begin{excdesc}{HTTPError}
90A subclass of \exception{URLError}, it can also function as a
Fred Drake93c86712001-03-02 20:39:34 +000091non-exceptional file-like return value (the same thing that
92\function{urlopen()} returns). This is useful when handling exotic
93HTTP errors, such as requests for authentication.
Moshe Zadka8a18e992001-03-01 08:40:42 +000094\end{excdesc}
95
Fred Drake93c86712001-03-02 20:39:34 +000096
97The following classes are provided:
98
Martin v. Löwis2a6ba902004-05-31 18:22:40 +000099\begin{classdesc}{Request}{url\optional{, data}\optional{, headers}
100 \optional{, origin_req_host}\optional{, unverifiable}}
Moshe Zadka8a18e992001-03-01 08:40:42 +0000101This class is an abstraction of a URL request.
102
Thomas Wouters0e3f5912006-08-11 14:57:12 +0000103\var{url} should be a string containing a valid URL.
104
105\var{data} may be a string specifying additional data to send to the
106server, or \code{None} if no such data is needed.
107Currently HTTP requests are the only ones that use \var{data};
108the HTTP request will be a POST instead of a GET when the \var{data}
109parameter is provided. \var{data} should be a buffer in the standard
110\mimetype{application/x-www-form-urlencoded} format. The
111\function{urllib.urlencode()} function takes a mapping or sequence of
1122-tuples and returns a string in this format.
113
Moshe Zadka8a18e992001-03-01 08:40:42 +0000114\var{headers} should be a dictionary, and will be treated as if
Fred Drake93c86712001-03-02 20:39:34 +0000115\method{add_header()} was called with each key and value as arguments.
Martin v. Löwis2a6ba902004-05-31 18:22:40 +0000116
117The final two arguments are only of interest for correct handling of
118third-party HTTP cookies:
119
120\var{origin_req_host} should be the request-host of the origin
121transaction, as defined by \rfc{2965}. It defaults to
122\code{cookielib.request_host(self)}. This is the host name or IP
123address of the original request that was initiated by the user. For
124example, if the request is for an image in an HTML document, this
125should be the request-host of the request for the page containing the
126image.
127
128\var{unverifiable} should indicate whether the request is
129unverifiable, as defined by RFC 2965. It defaults to False. An
130unverifiable request is one whose URL the user did not have the option
131to approve. For example, if the request is for an image in an HTML
132document, and the user had no option to approve the automatic fetching
133of the image, this should be true.
Moshe Zadka8a18e992001-03-01 08:40:42 +0000134\end{classdesc}
135
Fred Drake93c86712001-03-02 20:39:34 +0000136\begin{classdesc}{OpenerDirector}{}
137The \class{OpenerDirector} class opens URLs via \class{BaseHandler}s
138chained together. It manages the chaining of handlers, and recovery
139from errors.
140\end{classdesc}
141
142\begin{classdesc}{BaseHandler}{}
143This is the base class for all registered handlers --- and handles only
144the simple mechanics of registration.
145\end{classdesc}
146
147\begin{classdesc}{HTTPDefaultErrorHandler}{}
148A class which defines a default handler for HTTP error responses; all
149responses are turned into \exception{HTTPError} exceptions.
150\end{classdesc}
151
152\begin{classdesc}{HTTPRedirectHandler}{}
153A class to handle redirections.
154\end{classdesc}
155
Martin v. Löwis2a6ba902004-05-31 18:22:40 +0000156\begin{classdesc}{HTTPCookieProcessor}{\optional{cookiejar}}
157A class to handle HTTP Cookies.
158\end{classdesc}
159
Fred Drake93c86712001-03-02 20:39:34 +0000160\begin{classdesc}{ProxyHandler}{\optional{proxies}}
161Cause requests to go through a proxy.
162If \var{proxies} is given, it must be a dictionary mapping
163protocol names to URLs of proxies.
164The default is to read the list of proxies from the environment
Martin v. Löwisbe837372004-08-25 11:24:42 +0000165variables \envvar{<protocol>_proxy}.
Fred Drake93c86712001-03-02 20:39:34 +0000166\end{classdesc}
167
168\begin{classdesc}{HTTPPasswordMgr}{}
169Keep a database of
170\code{(\var{realm}, \var{uri}) -> (\var{user}, \var{password})}
171mappings.
172\end{classdesc}
173
174\begin{classdesc}{HTTPPasswordMgrWithDefaultRealm}{}
175Keep a database of
176\code{(\var{realm}, \var{uri}) -> (\var{user}, \var{password})} mappings.
177A realm of \code{None} is considered a catch-all realm, which is searched
178if no other realm fits.
179\end{classdesc}
180
181\begin{classdesc}{AbstractBasicAuthHandler}{\optional{password_mgr}}
182This is a mixin class that helps with HTTP authentication, both
183to the remote host and to a proxy.
Fred Drake399bc8c2001-11-09 03:49:29 +0000184\var{password_mgr}, if given, should be something that is compatible
185with \class{HTTPPasswordMgr}; refer to section~\ref{http-password-mgr}
186for information on the interface that must be supported.
Fred Drake93c86712001-03-02 20:39:34 +0000187\end{classdesc}
188
189\begin{classdesc}{HTTPBasicAuthHandler}{\optional{password_mgr}}
190Handle authentication with the remote host.
Fred Drake399bc8c2001-11-09 03:49:29 +0000191\var{password_mgr}, if given, should be something that is compatible
192with \class{HTTPPasswordMgr}; refer to section~\ref{http-password-mgr}
193for information on the interface that must be supported.
Fred Drake93c86712001-03-02 20:39:34 +0000194\end{classdesc}
195
196\begin{classdesc}{ProxyBasicAuthHandler}{\optional{password_mgr}}
197Handle authentication with the proxy.
Fred Drake399bc8c2001-11-09 03:49:29 +0000198\var{password_mgr}, if given, should be something that is compatible
199with \class{HTTPPasswordMgr}; refer to section~\ref{http-password-mgr}
200for information on the interface that must be supported.
Fred Drake93c86712001-03-02 20:39:34 +0000201\end{classdesc}
202
203\begin{classdesc}{AbstractDigestAuthHandler}{\optional{password_mgr}}
Fred Drake399bc8c2001-11-09 03:49:29 +0000204This is a mixin class that helps with HTTP authentication, both
Fred Drake93c86712001-03-02 20:39:34 +0000205to the remote host and to a proxy.
Fred Drake399bc8c2001-11-09 03:49:29 +0000206\var{password_mgr}, if given, should be something that is compatible
207with \class{HTTPPasswordMgr}; refer to section~\ref{http-password-mgr}
208for information on the interface that must be supported.
Fred Drake93c86712001-03-02 20:39:34 +0000209\end{classdesc}
210
211\begin{classdesc}{HTTPDigestAuthHandler}{\optional{password_mgr}}
212Handle authentication with the remote host.
Fred Drake399bc8c2001-11-09 03:49:29 +0000213\var{password_mgr}, if given, should be something that is compatible
214with \class{HTTPPasswordMgr}; refer to section~\ref{http-password-mgr}
215for information on the interface that must be supported.
Fred Drake93c86712001-03-02 20:39:34 +0000216\end{classdesc}
217
218\begin{classdesc}{ProxyDigestAuthHandler}{\optional{password_mgr}}
219Handle authentication with the proxy.
Fred Drake399bc8c2001-11-09 03:49:29 +0000220\var{password_mgr}, if given, should be something that is compatible
221with \class{HTTPPasswordMgr}; refer to section~\ref{http-password-mgr}
222for information on the interface that must be supported.
Fred Drake93c86712001-03-02 20:39:34 +0000223\end{classdesc}
224
225\begin{classdesc}{HTTPHandler}{}
226A class to handle opening of HTTP URLs.
227\end{classdesc}
228
229\begin{classdesc}{HTTPSHandler}{}
230A class to handle opening of HTTPS URLs.
231\end{classdesc}
232
233\begin{classdesc}{FileHandler}{}
234Open local files.
235\end{classdesc}
236
237\begin{classdesc}{FTPHandler}{}
238Open FTP URLs.
239\end{classdesc}
240
241\begin{classdesc}{CacheFTPHandler}{}
242Open FTP URLs, keeping a cache of open FTP connections to minimize
243delays.
244\end{classdesc}
245
Fred Drake93c86712001-03-02 20:39:34 +0000246\begin{classdesc}{UnknownHandler}{}
247A catch-all class to handle unknown URLs.
248\end{classdesc}
249
250
251\subsection{Request Objects \label{request-objects}}
252
Moshe Zadka8a18e992001-03-01 08:40:42 +0000253The following methods describe all of \class{Request}'s public interface,
254and so all must be overridden in subclasses.
255
256\begin{methoddesc}[Request]{add_data}{data}
Martin v. Löwis2a6ba902004-05-31 18:22:40 +0000257Set the \class{Request} data to \var{data}. This is ignored by all
258handlers except HTTP handlers --- and there it should be a byte
259string, and will change the request to be \code{POST} rather than
260\code{GET}.
Moshe Zadka8a18e992001-03-01 08:40:42 +0000261\end{methoddesc}
262
Raymond Hettinger024aaa12003-04-24 15:32:12 +0000263\begin{methoddesc}[Request]{get_method}{}
264Return a string indicating the HTTP request method. This is only
Fred Drakecc97ebf2005-04-13 01:08:23 +0000265meaningful for HTTP requests, and currently always returns
266\code{'GET'} or \code{'POST'}.
Raymond Hettinger024aaa12003-04-24 15:32:12 +0000267\end{methoddesc}
268
Fred Drake399bc8c2001-11-09 03:49:29 +0000269\begin{methoddesc}[Request]{has_data}{}
Moshe Zadka8a18e992001-03-01 08:40:42 +0000270Return whether the instance has a non-\code{None} data.
271\end{methoddesc}
272
Fred Drake399bc8c2001-11-09 03:49:29 +0000273\begin{methoddesc}[Request]{get_data}{}
Moshe Zadka8a18e992001-03-01 08:40:42 +0000274Return the instance's data.
275\end{methoddesc}
276
277\begin{methoddesc}[Request]{add_header}{key, val}
Fred Drake93c86712001-03-02 20:39:34 +0000278Add another header to the request. Headers are currently ignored by
279all handlers except HTTP handlers, where they are added to the list
Fred Drake399bc8c2001-11-09 03:49:29 +0000280of headers sent to the server. Note that there cannot be more than
Fred Drake93c86712001-03-02 20:39:34 +0000281one header with the same name, and later calls will overwrite
282previous calls in case the \var{key} collides. Currently, this is
283no loss of HTTP functionality, since all headers which have meaning
Fred Drake399bc8c2001-11-09 03:49:29 +0000284when used more than once have a (header-specific) way of gaining the
Moshe Zadka8a18e992001-03-01 08:40:42 +0000285same functionality using only one header.
286\end{methoddesc}
287
Jeremy Hyltonc1be59f2003-12-14 05:27:34 +0000288\begin{methoddesc}[Request]{add_unredirected_header}{key, header}
289Add a header that will not be added to a redirected request.
Neal Norwitzfb0521f2004-02-28 16:00:23 +0000290\versionadded{2.4}
Jeremy Hyltonc1be59f2003-12-14 05:27:34 +0000291\end{methoddesc}
292
293\begin{methoddesc}[Request]{has_header}{header}
294Return whether the instance has the named header (checks both regular
295and unredirected).
Neal Norwitzfb0521f2004-02-28 16:00:23 +0000296\versionadded{2.4}
Jeremy Hyltonc1be59f2003-12-14 05:27:34 +0000297\end{methoddesc}
298
Moshe Zadka8a18e992001-03-01 08:40:42 +0000299\begin{methoddesc}[Request]{get_full_url}{}
300Return the URL given in the constructor.
301\end{methoddesc}
302
303\begin{methoddesc}[Request]{get_type}{}
Fred Drake93c86712001-03-02 20:39:34 +0000304Return the type of the URL --- also known as the scheme.
Moshe Zadka8a18e992001-03-01 08:40:42 +0000305\end{methoddesc}
306
307\begin{methoddesc}[Request]{get_host}{}
Fred Drake399bc8c2001-11-09 03:49:29 +0000308Return the host to which a connection will be made.
Moshe Zadka8a18e992001-03-01 08:40:42 +0000309\end{methoddesc}
310
311\begin{methoddesc}[Request]{get_selector}{}
312Return the selector --- the part of the URL that is sent to
313the server.
314\end{methoddesc}
315
316\begin{methoddesc}[Request]{set_proxy}{host, type}
Fred Drake399bc8c2001-11-09 03:49:29 +0000317Prepare the request by connecting to a proxy server. The \var{host}
318and \var{type} will replace those of the instance, and the instance's
Fred Drake93c86712001-03-02 20:39:34 +0000319selector will be the original URL given in the constructor.
Moshe Zadka8a18e992001-03-01 08:40:42 +0000320\end{methoddesc}
321
Martin v. Löwis2a6ba902004-05-31 18:22:40 +0000322\begin{methoddesc}[Request]{get_origin_req_host}{}
323Return the request-host of the origin transaction, as defined by
324\rfc{2965}. See the documentation for the \class{Request}
325constructor.
326\end{methoddesc}
327
328\begin{methoddesc}[Request]{is_unverifiable}{}
329Return whether the request is unverifiable, as defined by RFC 2965.
330See the documentation for the \class{Request} constructor.
331\end{methoddesc}
332
Fred Drake93c86712001-03-02 20:39:34 +0000333
334\subsection{OpenerDirector Objects \label{opener-director-objects}}
335
336\class{OpenerDirector} instances have the following methods:
Moshe Zadka8a18e992001-03-01 08:40:42 +0000337
338\begin{methoddesc}[OpenerDirector]{add_handler}{handler}
Fred Drake93c86712001-03-02 20:39:34 +0000339\var{handler} should be an instance of \class{BaseHandler}. The
Martin v. Löwis2a6ba902004-05-31 18:22:40 +0000340following methods are searched, and added to the possible chains (note
341that HTTP errors are a special case).
Moshe Zadka8a18e992001-03-01 08:40:42 +0000342
343\begin{itemize}
Fred Drake93c86712001-03-02 20:39:34 +0000344 \item \method{\var{protocol}_open()} ---
345 signal that the handler knows how to open \var{protocol} URLs.
Martin v. Löwis2a6ba902004-05-31 18:22:40 +0000346 \item \method{http_error_\var{type}()} ---
347 signal that the handler knows how to handle HTTP errors with HTTP
348 error code \var{type}.
349 \item \method{\var{protocol}_error()} ---
350 signal that the handler knows how to handle errors from
351 (non-\code{http}) \var{protocol}.
Jeremy Hyltonc1be59f2003-12-14 05:27:34 +0000352 \item \method{\var{protocol}_request()} ---
353 signal that the handler knows how to pre-process \var{protocol}
354 requests.
355 \item \method{\var{protocol}_response()} ---
356 signal that the handler knows how to post-process \var{protocol}
357 responses.
Moshe Zadka8a18e992001-03-01 08:40:42 +0000358\end{itemize}
Moshe Zadka8a18e992001-03-01 08:40:42 +0000359\end{methoddesc}
360
Guido van Rossumcd16bf62007-06-13 18:07:49 +0000361\begin{methoddesc}[OpenerDirector]{open}{url\optional{, data}{\optional{, timeout}}}
Fred Drake399bc8c2001-11-09 03:49:29 +0000362Open the given \var{url} (which can be a request object or a string),
Moshe Zadka8a18e992001-03-01 08:40:42 +0000363optionally passing the given \var{data}.
364Arguments, return values and exceptions raised are the same as those
Fred Drake93c86712001-03-02 20:39:34 +0000365of \function{urlopen()} (which simply calls the \method{open()} method
Guido van Rossumcd16bf62007-06-13 18:07:49 +0000366on the currently installed global \class{OpenerDirector}). The optional
367\var{timeout} parameter specifies a timeout in seconds for the connection
368attempt (if not specified, or passed as None, the global default timeout
369setting will be used; this actually only work for HTTP, HTTPS, FTP
370and FTPS connections).
371
372\versionchanged[\var{timeout} was added]{2.6}
Moshe Zadka8a18e992001-03-01 08:40:42 +0000373\end{methoddesc}
374
Fred Drake93c86712001-03-02 20:39:34 +0000375\begin{methoddesc}[OpenerDirector]{error}{proto\optional{,
376 arg\optional{, \moreargs}}}
Martin v. Löwis2a6ba902004-05-31 18:22:40 +0000377Handle an error of the given protocol. This will call the registered
Fred Drake399bc8c2001-11-09 03:49:29 +0000378error handlers for the given protocol with the given arguments (which
379are protocol specific). The HTTP protocol is a special case which
380uses the HTTP response code to determine the specific error handler;
381refer to the \method{http_error_*()} methods of the handler classes.
Moshe Zadka8a18e992001-03-01 08:40:42 +0000382
383Return values and exceptions raised are the same as those
Fred Drake93c86712001-03-02 20:39:34 +0000384of \function{urlopen()}.
Moshe Zadka8a18e992001-03-01 08:40:42 +0000385\end{methoddesc}
386
Martin v. Löwis2a6ba902004-05-31 18:22:40 +0000387OpenerDirector objects open URLs in three stages:
388
Andrew M. Kuchling300ce192004-07-10 18:28:33 +0000389The order in which these methods are called within each stage is
390determined by sorting the handler instances.
391
Martin v. Löwis2a6ba902004-05-31 18:22:40 +0000392\begin{enumerate}
393 \item Every handler with a method named like
394 \method{\var{protocol}_request()} has that method called to
395 pre-process the request.
396
Martin v. Löwis2a6ba902004-05-31 18:22:40 +0000397 \item Handlers with a method named like
398 \method{\var{protocol}_open()} are called to handle the request.
399 This stage ends when a handler either returns a
400 non-\constant{None} value (ie. a response), or raises an exception
Thomas Wouters49fd7fa2006-04-21 10:40:58 +0000401 (usually \exception{URLError}). Exceptions are allowed to propagate.
Martin v. Löwis2a6ba902004-05-31 18:22:40 +0000402
403 In fact, the above algorithm is first tried for methods named
404 \method{default_open}. If all such methods return
405 \constant{None}, the algorithm is repeated for methods named like
406 \method{\var{protocol}_open()}. If all such methods return
407 \constant{None}, the algorithm is repeated for methods named
408 \method{unknown_open()}.
409
410 Note that the implementation of these methods may involve calls of
411 the parent \class{OpenerDirector} instance's \method{.open()} and
412 \method{.error()} methods.
413
Martin v. Löwis2a6ba902004-05-31 18:22:40 +0000414 \item Every handler with a method named like
415 \method{\var{protocol}_response()} has that method called to
416 post-process the response.
417
Martin v. Löwis2a6ba902004-05-31 18:22:40 +0000418\end{enumerate}
Fred Drake93c86712001-03-02 20:39:34 +0000419
420\subsection{BaseHandler Objects \label{base-handler-objects}}
421
422\class{BaseHandler} objects provide a couple of methods that are
423directly useful, and others that are meant to be used by derived
424classes. These are intended for direct use:
Moshe Zadka8a18e992001-03-01 08:40:42 +0000425
426\begin{methoddesc}[BaseHandler]{add_parent}{director}
427Add a director as parent.
428\end{methoddesc}
429
430\begin{methoddesc}[BaseHandler]{close}{}
431Remove any parents.
432\end{methoddesc}
433
Fred Drake399bc8c2001-11-09 03:49:29 +0000434The following members and methods should only be used by classes
Martin v. Löwis2a6ba902004-05-31 18:22:40 +0000435derived from \class{BaseHandler}. \note{The convention has been
436adopted that subclasses defining \method{\var{protocol}_request()} or
437\method{\var{protocol}_response()} methods are named
438\class{*Processor}; all others are named \class{*Handler}.}
439
Moshe Zadka8a18e992001-03-01 08:40:42 +0000440
441\begin{memberdesc}[BaseHandler]{parent}
Fred Drake93c86712001-03-02 20:39:34 +0000442A valid \class{OpenerDirector}, which can be used to open using a
443different protocol, or handle errors.
Moshe Zadka8a18e992001-03-01 08:40:42 +0000444\end{memberdesc}
445
446\begin{methoddesc}[BaseHandler]{default_open}{req}
Fred Drake93c86712001-03-02 20:39:34 +0000447This method is \emph{not} defined in \class{BaseHandler}, but
448subclasses should define it if they want to catch all URLs.
Moshe Zadka8a18e992001-03-01 08:40:42 +0000449
Fred Drake399bc8c2001-11-09 03:49:29 +0000450This method, if implemented, will be called by the parent
Fred Drake93c86712001-03-02 20:39:34 +0000451\class{OpenerDirector}. It should return a file-like object as
452described in the return value of the \method{open()} of
Fred Drake399bc8c2001-11-09 03:49:29 +0000453\class{OpenerDirector}, or \code{None}. It should raise
Fred Drake93c86712001-03-02 20:39:34 +0000454\exception{URLError}, unless a truly exceptional thing happens (for
455example, \exception{MemoryError} should not be mapped to
Fred Drake399bc8c2001-11-09 03:49:29 +0000456\exception{URLError}).
Moshe Zadka8a18e992001-03-01 08:40:42 +0000457
458This method will be called before any protocol-specific open method.
459\end{methoddesc}
460
Fred Drake47852462001-05-11 15:46:45 +0000461\begin{methoddescni}[BaseHandler]{\var{protocol}_open}{req}
Fred Drake93c86712001-03-02 20:39:34 +0000462This method is \emph{not} defined in \class{BaseHandler}, but
463subclasses should define it if they want to handle URLs with the given
464protocol.
Moshe Zadka8a18e992001-03-01 08:40:42 +0000465
Fred Drake399bc8c2001-11-09 03:49:29 +0000466This method, if defined, will be called by the parent
Fred Drake93c86712001-03-02 20:39:34 +0000467\class{OpenerDirector}. Return values should be the same as for
468\method{default_open()}.
Fred Drake47852462001-05-11 15:46:45 +0000469\end{methoddescni}
Moshe Zadka8a18e992001-03-01 08:40:42 +0000470
471\begin{methoddesc}[BaseHandler]{unknown_open}{req}
Fred Drake93c86712001-03-02 20:39:34 +0000472This method is \var{not} defined in \class{BaseHandler}, but
473subclasses should define it if they want to catch all URLs with no
Fred Drake399bc8c2001-11-09 03:49:29 +0000474specific registered handler to open it.
Moshe Zadka8a18e992001-03-01 08:40:42 +0000475
Fred Drake399bc8c2001-11-09 03:49:29 +0000476This method, if implemented, will be called by the \member{parent}
Fred Drake93c86712001-03-02 20:39:34 +0000477\class{OpenerDirector}. Return values should be the same as for
478\method{default_open()}.
Moshe Zadka8a18e992001-03-01 08:40:42 +0000479\end{methoddesc}
480
481\begin{methoddesc}[BaseHandler]{http_error_default}{req, fp, code, msg, hdrs}
Fred Drake93c86712001-03-02 20:39:34 +0000482This method is \emph{not} defined in \class{BaseHandler}, but
483subclasses should override it if they intend to provide a catch-all
484for otherwise unhandled HTTP errors. It will be called automatically
485by the \class{OpenerDirector} getting the error, and should not
486normally be called in other circumstances.
Moshe Zadka8a18e992001-03-01 08:40:42 +0000487
Fred Drake93c86712001-03-02 20:39:34 +0000488\var{req} will be a \class{Request} object, \var{fp} will be a
489file-like object with the HTTP error body, \var{code} will be the
490three-digit code of the error, \var{msg} will be the user-visible
491explanation of the code and \var{hdrs} will be a mapping object with
492the headers of the error.
Moshe Zadka8a18e992001-03-01 08:40:42 +0000493
494Return values and exceptions raised should be the same as those
Fred Drake93c86712001-03-02 20:39:34 +0000495of \function{urlopen()}.
Moshe Zadka8a18e992001-03-01 08:40:42 +0000496\end{methoddesc}
497
Fred Drake93c86712001-03-02 20:39:34 +0000498\begin{methoddesc}[BaseHandler]{http_error_\var{nnn}}{req, fp, code, msg, hdrs}
499\var{nnn} should be a three-digit HTTP error code. This method is
500also not defined in \class{BaseHandler}, but will be called, if it
501exists, on an instance of a subclass, when an HTTP error with code
502\var{nnn} occurs.
Moshe Zadka8a18e992001-03-01 08:40:42 +0000503
Fred Drake93c86712001-03-02 20:39:34 +0000504Subclasses should override this method to handle specific HTTP
505errors.
Moshe Zadka8a18e992001-03-01 08:40:42 +0000506
Fred Drake93c86712001-03-02 20:39:34 +0000507Arguments, return values and exceptions raised should be the same as
508for \method{http_error_default()}.
Moshe Zadka8a18e992001-03-01 08:40:42 +0000509\end{methoddesc}
510
Martin v. Löwis2a6ba902004-05-31 18:22:40 +0000511\begin{methoddescni}[BaseHandler]{\var{protocol}_request}{req}
512This method is \emph{not} defined in \class{BaseHandler}, but
513subclasses should define it if they want to pre-process requests of
514the given protocol.
515
516This method, if defined, will be called by the parent
517\class{OpenerDirector}. \var{req} will be a \class{Request} object.
518The return value should be a \class{Request} object.
519\end{methoddescni}
520
521\begin{methoddescni}[BaseHandler]{\var{protocol}_response}{req, response}
522This method is \emph{not} defined in \class{BaseHandler}, but
523subclasses should define it if they want to post-process responses of
524the given protocol.
525
526This method, if defined, will be called by the parent
527\class{OpenerDirector}. \var{req} will be a \class{Request} object.
528\var{response} will be an object implementing the same interface as
529the return value of \function{urlopen()}. The return value should
530implement the same interface as the return value of
531\function{urlopen()}.
532\end{methoddescni}
533
Fred Drake93c86712001-03-02 20:39:34 +0000534\subsection{HTTPRedirectHandler Objects \label{http-redirect-handler}}
Moshe Zadka8a18e992001-03-01 08:40:42 +0000535
Raymond Hettinger024aaa12003-04-24 15:32:12 +0000536\note{Some HTTP redirections require action from this module's client
537 code. If this is the case, \exception{HTTPError} is raised. See
538 \rfc{2616} for details of the precise meanings of the various
539 redirection codes.}
540
541\begin{methoddesc}[HTTPRedirectHandler]{redirect_request}{req,
542 fp, code, msg, hdrs}
543Return a \class{Request} or \code{None} in response to a redirect.
544This is called by the default implementations of the
Martin v. Löwis2a6ba902004-05-31 18:22:40 +0000545\method{http_error_30*()} methods when a redirection is received from
546the server. If a redirection should take place, return a new
Fred Draked9cf8e72003-07-14 21:07:05 +0000547\class{Request} to allow \method{http_error_30*()} to perform the
Martin v. Löwis2a6ba902004-05-31 18:22:40 +0000548redirect. Otherwise, raise \exception{HTTPError} if no other handler
549should try to handle this URL, or return \code{None} if you can't but
550another handler might.
Raymond Hettinger024aaa12003-04-24 15:32:12 +0000551
Fred Draked9cf8e72003-07-14 21:07:05 +0000552\begin{notice}
553 The default implementation of this method does not strictly
554 follow \rfc{2616}, which says that 301 and 302 responses to \code{POST}
Martin v. Löwis162f0812003-07-12 07:33:32 +0000555 requests must not be automatically redirected without confirmation by
556 the user. In reality, browsers do allow automatic redirection of
Fred Draked9cf8e72003-07-14 21:07:05 +0000557 these responses, changing the POST to a \code{GET}, and the default
558 implementation reproduces this behavior.
559\end{notice}
Raymond Hettinger024aaa12003-04-24 15:32:12 +0000560\end{methoddesc}
561
Moshe Zadka8a18e992001-03-01 08:40:42 +0000562
Fred Drake93c86712001-03-02 20:39:34 +0000563\begin{methoddesc}[HTTPRedirectHandler]{http_error_301}{req,
564 fp, code, msg, hdrs}
565Redirect to the \code{Location:} URL. This method is called by
566the parent \class{OpenerDirector} when getting an HTTP
Raymond Hettinger024aaa12003-04-24 15:32:12 +0000567`moved permanently' response.
Moshe Zadka8a18e992001-03-01 08:40:42 +0000568\end{methoddesc}
569
Fred Drake93c86712001-03-02 20:39:34 +0000570\begin{methoddesc}[HTTPRedirectHandler]{http_error_302}{req,
571 fp, code, msg, hdrs}
572The same as \method{http_error_301()}, but called for the
Raymond Hettinger024aaa12003-04-24 15:32:12 +0000573`found' response.
Fred Drake93c86712001-03-02 20:39:34 +0000574\end{methoddesc}
575
Raymond Hettinger024aaa12003-04-24 15:32:12 +0000576\begin{methoddesc}[HTTPRedirectHandler]{http_error_303}{req,
577 fp, code, msg, hdrs}
578The same as \method{http_error_301()}, but called for the
Martin v. Löwis162f0812003-07-12 07:33:32 +0000579`see other' response.
Raymond Hettinger024aaa12003-04-24 15:32:12 +0000580\end{methoddesc}
Fred Drake93c86712001-03-02 20:39:34 +0000581
Martin v. Löwis162f0812003-07-12 07:33:32 +0000582\begin{methoddesc}[HTTPRedirectHandler]{http_error_307}{req,
583 fp, code, msg, hdrs}
584The same as \method{http_error_301()}, but called for the
585`temporary redirect' response.
Fred Drake9753ae12003-07-14 20:53:57 +0000586\end{methoddesc}
587
Martin v. Löwis162f0812003-07-12 07:33:32 +0000588
Martin v. Löwis2a6ba902004-05-31 18:22:40 +0000589\subsection{HTTPCookieProcessor Objects \label{http-cookie-processor}}
590
Andrew M. Kuchlingd54a0ae2005-12-04 20:25:23 +0000591\versionadded{2.4}
592
Martin v. Löwis2a6ba902004-05-31 18:22:40 +0000593\class{HTTPCookieProcessor} instances have one attribute:
594
Guido van Rossumd8faa362007-04-27 19:54:29 +0000595\begin{memberdesc}[HTTPCookieProcessor]{cookiejar}
Martin v. Löwis2a6ba902004-05-31 18:22:40 +0000596The \class{cookielib.CookieJar} in which cookies are stored.
597\end{memberdesc}
598
599
Fred Drake93c86712001-03-02 20:39:34 +0000600\subsection{ProxyHandler Objects \label{proxy-handler}}
601
Fred Drake47852462001-05-11 15:46:45 +0000602\begin{methoddescni}[ProxyHandler]{\var{protocol}_open}{request}
Fred Drake93c86712001-03-02 20:39:34 +0000603The \class{ProxyHandler} will have a method
604\method{\var{protocol}_open()} for every \var{protocol} which has a
605proxy in the \var{proxies} dictionary given in the constructor. The
606method will modify requests to go through the proxy, by calling
607\code{request.set_proxy()}, and call the next handler in the chain to
608actually execute the protocol.
Fred Drake47852462001-05-11 15:46:45 +0000609\end{methoddescni}
Fred Drake93c86712001-03-02 20:39:34 +0000610
611
612\subsection{HTTPPasswordMgr Objects \label{http-password-mgr}}
613
614These methods are available on \class{HTTPPasswordMgr} and
615\class{HTTPPasswordMgrWithDefaultRealm} objects.
Moshe Zadka8a18e992001-03-01 08:40:42 +0000616
617\begin{methoddesc}[HTTPPasswordMgr]{add_password}{realm, uri, user, passwd}
Fred Drakebb066cf2004-05-12 03:07:27 +0000618\var{uri} can be either a single URI, or a sequence of URIs. \var{realm},
Moshe Zadka8a18e992001-03-01 08:40:42 +0000619\var{user} and \var{passwd} must be strings. This causes
Fred Drake93c86712001-03-02 20:39:34 +0000620\code{(\var{user}, \var{passwd})} to be used as authentication tokens
Moshe Zadka8a18e992001-03-01 08:40:42 +0000621when authentication for \var{realm} and a super-URI of any of the
622given URIs is given.
623\end{methoddesc}
624
625\begin{methoddesc}[HTTPPasswordMgr]{find_user_password}{realm, authuri}
Fred Drake93c86712001-03-02 20:39:34 +0000626Get user/password for given realm and URI, if any. This method will
627return \code{(None, None)} if there is no matching user/password.
628
629For \class{HTTPPasswordMgrWithDefaultRealm} objects, the realm
630\code{None} will be searched if the given \var{realm} has no matching
631user/password.
Moshe Zadka8a18e992001-03-01 08:40:42 +0000632\end{methoddesc}
633
Moshe Zadka8a18e992001-03-01 08:40:42 +0000634
Fred Drake93c86712001-03-02 20:39:34 +0000635\subsection{AbstractBasicAuthHandler Objects
636 \label{abstract-basic-auth-handler}}
Moshe Zadka8a18e992001-03-01 08:40:42 +0000637
Thomas Wouters477c8d52006-05-27 19:21:47 +0000638\begin{methoddesc}[AbstractBasicAuthHandler]{http_error_auth_reqed}
Moshe Zadka8a18e992001-03-01 08:40:42 +0000639 {authreq, host, req, headers}
Fred Drake399bc8c2001-11-09 03:49:29 +0000640Handle an authentication request by getting a user/password pair, and
641re-trying the request. \var{authreq} should be the name of the header
642where the information about the realm is included in the request,
Thomas Wouters477c8d52006-05-27 19:21:47 +0000643\var{host} specifies the URL and path to authenticate for, \var{req}
644should be the (failed) \class{Request} object, and \var{headers}
645should be the error headers.
646
647\var{host} is either an authority (e.g. \code{"python.org"}) or a URL
648containing an authority component (e.g. \code{"http://python.org/"}).
649In either case, the authority must not contain a userinfo component
650(so, \code{"python.org"} and \code{"python.org:80"} are fine,
651\code{"joe:password@python.org"} is not).
Moshe Zadka8a18e992001-03-01 08:40:42 +0000652\end{methoddesc}
653
Fred Drake93c86712001-03-02 20:39:34 +0000654
655\subsection{HTTPBasicAuthHandler Objects
656 \label{http-basic-auth-handler}}
Moshe Zadka8a18e992001-03-01 08:40:42 +0000657
658\begin{methoddesc}[HTTPBasicAuthHandler]{http_error_401}{req, fp, code,
659 msg, hdrs}
Fred Drake399bc8c2001-11-09 03:49:29 +0000660Retry the request with authentication information, if available.
Moshe Zadka8a18e992001-03-01 08:40:42 +0000661\end{methoddesc}
662
Fred Drake93c86712001-03-02 20:39:34 +0000663
664\subsection{ProxyBasicAuthHandler Objects
665 \label{proxy-basic-auth-handler}}
Moshe Zadka8a18e992001-03-01 08:40:42 +0000666
667\begin{methoddesc}[ProxyBasicAuthHandler]{http_error_407}{req, fp, code,
668 msg, hdrs}
Fred Drake399bc8c2001-11-09 03:49:29 +0000669Retry the request with authentication information, if available.
Moshe Zadka8a18e992001-03-01 08:40:42 +0000670\end{methoddesc}
671
Moshe Zadka8a18e992001-03-01 08:40:42 +0000672
Fred Drake93c86712001-03-02 20:39:34 +0000673\subsection{AbstractDigestAuthHandler Objects
674 \label{abstract-digest-auth-handler}}
Moshe Zadka8a18e992001-03-01 08:40:42 +0000675
Thomas Wouters477c8d52006-05-27 19:21:47 +0000676\begin{methoddesc}[AbstractDigestAuthHandler]{http_error_auth_reqed}
Moshe Zadka8a18e992001-03-01 08:40:42 +0000677 {authreq, host, req, headers}
678\var{authreq} should be the name of the header where the information about
Fred Drake399bc8c2001-11-09 03:49:29 +0000679the realm is included in the request, \var{host} should be the host to
680authenticate to, \var{req} should be the (failed) \class{Request}
681object, and \var{headers} should be the error headers.
Moshe Zadka8a18e992001-03-01 08:40:42 +0000682\end{methoddesc}
683
Fred Drake93c86712001-03-02 20:39:34 +0000684
685\subsection{HTTPDigestAuthHandler Objects
686 \label{http-digest-auth-handler}}
Moshe Zadka8a18e992001-03-01 08:40:42 +0000687
688\begin{methoddesc}[HTTPDigestAuthHandler]{http_error_401}{req, fp, code,
689 msg, hdrs}
Fred Drake399bc8c2001-11-09 03:49:29 +0000690Retry the request with authentication information, if available.
Moshe Zadka8a18e992001-03-01 08:40:42 +0000691\end{methoddesc}
692
Fred Drake93c86712001-03-02 20:39:34 +0000693
694\subsection{ProxyDigestAuthHandler Objects
695 \label{proxy-digest-auth-handler}}
Moshe Zadka8a18e992001-03-01 08:40:42 +0000696
697\begin{methoddesc}[ProxyDigestAuthHandler]{http_error_407}{req, fp, code,
698 msg, hdrs}
Fred Drake93c86712001-03-02 20:39:34 +0000699Retry the request with authentication information, if available.
Moshe Zadka8a18e992001-03-01 08:40:42 +0000700\end{methoddesc}
701
Fred Drake93c86712001-03-02 20:39:34 +0000702
703\subsection{HTTPHandler Objects \label{http-handler-objects}}
Moshe Zadka8a18e992001-03-01 08:40:42 +0000704
705\begin{methoddesc}[HTTPHandler]{http_open}{req}
Fred Drake399bc8c2001-11-09 03:49:29 +0000706Send an HTTP request, which can be either GET or POST, depending on
Fred Drake93c86712001-03-02 20:39:34 +0000707\code{\var{req}.has_data()}.
Moshe Zadka8a18e992001-03-01 08:40:42 +0000708\end{methoddesc}
709
Fred Drake93c86712001-03-02 20:39:34 +0000710
711\subsection{HTTPSHandler Objects \label{https-handler-objects}}
Moshe Zadka8a18e992001-03-01 08:40:42 +0000712
713\begin{methoddesc}[HTTPSHandler]{https_open}{req}
Fred Drake93c86712001-03-02 20:39:34 +0000714Send an HTTPS request, which can be either GET or POST, depending on
715\code{\var{req}.has_data()}.
Moshe Zadka8a18e992001-03-01 08:40:42 +0000716\end{methoddesc}
717
Moshe Zadka8a18e992001-03-01 08:40:42 +0000718
Fred Drake93c86712001-03-02 20:39:34 +0000719\subsection{FileHandler Objects \label{file-handler-objects}}
Moshe Zadka8a18e992001-03-01 08:40:42 +0000720
721\begin{methoddesc}[FileHandler]{file_open}{req}
722Open the file locally, if there is no host name, or
Fred Drake93c86712001-03-02 20:39:34 +0000723the host name is \code{'localhost'}. Change the
Moshe Zadka8a18e992001-03-01 08:40:42 +0000724protocol to \code{ftp} otherwise, and retry opening
725it using \member{parent}.
726\end{methoddesc}
727
Fred Drake93c86712001-03-02 20:39:34 +0000728
729\subsection{FTPHandler Objects \label{ftp-handler-objects}}
Moshe Zadka8a18e992001-03-01 08:40:42 +0000730
731\begin{methoddesc}[FTPHandler]{ftp_open}{req}
732Open the FTP file indicated by \var{req}.
733The login is always done with empty username and password.
734\end{methoddesc}
735
Moshe Zadka8a18e992001-03-01 08:40:42 +0000736
Fred Drake93c86712001-03-02 20:39:34 +0000737\subsection{CacheFTPHandler Objects \label{cacheftp-handler-objects}}
738
739\class{CacheFTPHandler} objects are \class{FTPHandler} objects with
740the following additional methods:
Moshe Zadka8a18e992001-03-01 08:40:42 +0000741
742\begin{methoddesc}[CacheFTPHandler]{setTimeout}{t}
743Set timeout of connections to \var{t} seconds.
744\end{methoddesc}
745
746\begin{methoddesc}[CacheFTPHandler]{setMaxConns}{m}
747Set maximum number of cached connections to \var{m}.
748\end{methoddesc}
749
Fred Drake93c86712001-03-02 20:39:34 +0000750
Fred Drake93c86712001-03-02 20:39:34 +0000751\subsection{UnknownHandler Objects \label{unknown-handler-objects}}
752
Fred Drakea9399112001-07-05 21:14:03 +0000753\begin{methoddesc}[UnknownHandler]{unknown_open}{}
Fred Drake93c86712001-03-02 20:39:34 +0000754Raise a \exception{URLError} exception.
755\end{methoddesc}
Fred Drake53e5b712003-04-25 15:27:33 +0000756
757
Jeremy Hyltonc1be59f2003-12-14 05:27:34 +0000758\subsection{HTTPErrorProcessor Objects \label{http-error-processor-objects}}
759
Neal Norwitzfb0521f2004-02-28 16:00:23 +0000760\versionadded{2.4}
761
Jeremy Hyltonc1be59f2003-12-14 05:27:34 +0000762\begin{methoddesc}[HTTPErrorProcessor]{unknown_open}{}
763Process HTTP error responses.
764
765For 200 error codes, the response object is returned immediately.
766
767For non-200 error codes, this simply passes the job on to the
768\method{\var{protocol}_error_\var{code}()} handler methods, via
769\method{OpenerDirector.error()}. Eventually,
770\class{urllib2.HTTPDefaultErrorHandler} will raise an
771\exception{HTTPError} if no other handler handles the error.
772\end{methoddesc}
773
774
Fred Drake53e5b712003-04-25 15:27:33 +0000775\subsection{Examples \label{urllib2-examples}}
776
777This example gets the python.org main page and displays the first 100
778bytes of it:
779
780\begin{verbatim}
781>>> import urllib2
782>>> f = urllib2.urlopen('http://www.python.org/')
783>>> print f.read(100)
784<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
785<?xml-stylesheet href="./css/ht2html
786\end{verbatim}
787
788Here we are sending a data-stream to the stdin of a CGI and reading
Georg Brandla2764ad2005-12-26 23:36:32 +0000789the data it returns to us. Note that this example will only work when the
790Python installation supports SSL.
Fred Drake53e5b712003-04-25 15:27:33 +0000791
792\begin{verbatim}
793>>> import urllib2
794>>> req = urllib2.Request(url='https://localhost/cgi-bin/test.cgi',
795... data='This data is passed to stdin of the CGI')
796>>> f = urllib2.urlopen(req)
797>>> print f.read()
798Got Data: "This data is passed to stdin of the CGI"
799\end{verbatim}
800
801The code for the sample CGI used in the above example is:
802
803\begin{verbatim}
804#!/usr/bin/env python
805import sys
806data = sys.stdin.read()
Raymond Hettinger5de33782004-02-08 20:25:01 +0000807print 'Content-type: text-plain\n\nGot Data: "%s"' % data
Fred Drake53e5b712003-04-25 15:27:33 +0000808\end{verbatim}
Martin v. Löwisbe837372004-08-25 11:24:42 +0000809
810
811Use of Basic HTTP Authentication:
812
813\begin{verbatim}
814import urllib2
815# Create an OpenerDirector with support for Basic HTTP Authentication...
816auth_handler = urllib2.HTTPBasicAuthHandler()
Guido van Rossumd8faa362007-04-27 19:54:29 +0000817auth_handler.add_password(realm='PDQ Application',
818 uri='https://mahler:8092/site-updates.py',
819 user='klem',
820 passwd='kadidd!ehopper')
Martin v. Löwisbe837372004-08-25 11:24:42 +0000821opener = urllib2.build_opener(auth_handler)
822# ...and install it globally so it can be used with urlopen.
823urllib2.install_opener(opener)
824urllib2.urlopen('http://www.example.com/login.html')
825\end{verbatim}
826
827\function{build_opener()} provides many handlers by default, including a
828\class{ProxyHandler}. By default, \class{ProxyHandler} uses the
829environment variables named \code{<scheme>_proxy}, where \code{<scheme>}
830is the URL scheme involved. For example, the \envvar{http_proxy}
831environment variable is read to obtain the HTTP proxy's URL.
832
833This example replaces the default \class{ProxyHandler} with one that uses
834programatically-supplied proxy URLs, and adds proxy authorization support
835with \class{ProxyBasicAuthHandler}.
836
837\begin{verbatim}
838proxy_handler = urllib2.ProxyHandler({'http': 'http://www.example.com:3128/'})
839proxy_auth_handler = urllib2.HTTPBasicAuthHandler()
840proxy_auth_handler.add_password('realm', 'host', 'username', 'password')
841
842opener = build_opener(proxy_handler, proxy_auth_handler)
843# This time, rather than install the OpenerDirector, we use it directly:
844opener.open('http://www.example.com/login.html')
845\end{verbatim}
846
847
848Adding HTTP headers:
849
850Use the \var{headers} argument to the \class{Request} constructor, or:
851
852\begin{verbatim}
853import urllib2
854req = urllib2.Request('http://www.example.com/')
855req.add_header('Referer', 'http://www.python.org/')
856r = urllib2.urlopen(req)
857\end{verbatim}
858
859\class{OpenerDirector} automatically adds a \mailheader{User-Agent}
860header to every \class{Request}. To change this:
861
862\begin{verbatim}
863import urllib2
864opener = urllib2.build_opener()
865opener.addheaders = [('User-agent', 'Mozilla/5.0')]
866opener.open('http://www.example.com/')
867\end{verbatim}
868
869Also, remember that a few standard headers
870(\mailheader{Content-Length}, \mailheader{Content-Type} and
871\mailheader{Host}) are added when the \class{Request} is passed to
872\function{urlopen()} (or \method{OpenerDirector.open()}).