| \section{\module{urllib2} --- |
| extensible library for opening URLs} |
| |
| \declaremodule{standard}{urllib2} |
| \moduleauthor{Jeremy Hylton}{jhylton@users.sourceforge.net} |
| \sectionauthor{Moshe Zadka}{moshez@users.sourceforge.net} |
| |
| \modulesynopsis{An extensible library for opening URLs using a variety of |
| protocols} |
| |
| The \module{urllib2} module defines functions and classes which help |
| in opening URLs (mostly HTTP) in a complex world --- basic and digest |
| authentication, redirections, cookies and more. |
| |
| The \module{urllib2} module defines the following functions: |
| |
| \begin{funcdesc}{urlopen}{url\optional{, data}} |
| Open the URL \var{url}, which can be either a string or a \class{Request} |
| object. |
| |
| \var{data} should be a string, which specifies additional data to |
| send to the server. In HTTP requests, which are the only ones that |
| support \var{data}, it should be a buffer in the format of |
| \mimetype{application/x-www-form-urlencoded}, for example one returned |
| from \function{urllib.urlencode()}. |
| |
| This function returns a file-like object with two additional methods: |
| |
| \begin{itemize} |
| \item \method{geturl()} --- return the URL of the resource retrieved |
| \item \method{info()} --- return the meta-information of the page, as |
| a dictionary-like object |
| \end{itemize} |
| |
| Raises \exception{URLError} on errors. |
| |
| Note that \code{None} may be returned if no handler handles the |
| request (though the default installed global \class{OpenerDirector} |
| uses \class{UnknownHandler} to ensure this never happens). |
| \end{funcdesc} |
| |
| \begin{funcdesc}{install_opener}{opener} |
| Install an \class{OpenerDirector} instance as the default global |
| opener. Installing an opener is only necessary if you want urlopen to |
| use that opener; otherwise, simply call \method{OpenerDirector.open()} |
| instead of \function{urlopen()}. The code does not check for a real |
| \class{OpenerDirector}, and any class with the appropriate interface |
| will work. |
| \end{funcdesc} |
| |
| \begin{funcdesc}{build_opener}{\optional{handler, \moreargs}} |
| Return an \class{OpenerDirector} instance, which chains the |
| handlers in the order given. \var{handler}s can be either instances |
| of \class{BaseHandler}, or subclasses of \class{BaseHandler} (in |
| which case it must be possible to call the constructor without |
| any parameters). Instances of the following classes will be in |
| front of the \var{handler}s, unless the \var{handler}s contain |
| them, instances of them or subclasses of them: |
| \class{ProxyHandler}, \class{UnknownHandler}, \class{HTTPHandler}, |
| \class{HTTPDefaultErrorHandler}, \class{HTTPRedirectHandler}, |
| \class{FTPHandler}, \class{FileHandler}, \class{HTTPErrorProcessor}. |
| |
| If the Python installation has SSL support (\function{socket.ssl()} |
| exists), \class{HTTPSHandler} will also be added. |
| |
| Beginning in Python 2.3, a \class{BaseHandler} subclass may also |
| change its \member{handler_order} member variable to modify its |
| position in the handlers list. |
| \end{funcdesc} |
| |
| |
| The following exceptions are raised as appropriate: |
| |
| \begin{excdesc}{URLError} |
| The handlers raise this exception (or derived exceptions) when they |
| run into a problem. It is a subclass of \exception{IOError}. |
| \end{excdesc} |
| |
| \begin{excdesc}{HTTPError} |
| A subclass of \exception{URLError}, it can also function as a |
| non-exceptional file-like return value (the same thing that |
| \function{urlopen()} returns). This is useful when handling exotic |
| HTTP errors, such as requests for authentication. |
| \end{excdesc} |
| |
| \begin{excdesc}{GopherError} |
| A subclass of \exception{URLError}, this is the error raised by the |
| Gopher handler. |
| \end{excdesc} |
| |
| |
| The following classes are provided: |
| |
| \begin{classdesc}{Request}{url\optional{, data}\optional{, headers} |
| \optional{, origin_req_host}\optional{, unverifiable}} |
| This class is an abstraction of a URL request. |
| |
| \var{url} should be a string which is a valid URL. For a description |
| of \var{data} see the \method{add_data()} description. |
| \var{headers} should be a dictionary, and will be treated as if |
| \method{add_header()} was called with each key and value as arguments. |
| |
| The final two arguments are only of interest for correct handling of |
| third-party HTTP cookies: |
| |
| \var{origin_req_host} should be the request-host of the origin |
| transaction, as defined by \rfc{2965}. It defaults to |
| \code{cookielib.request_host(self)}. This is the host name or IP |
| address of the original request that was initiated by the user. For |
| example, if the request is for an image in an HTML document, this |
| should be the request-host of the request for the page containing the |
| image. |
| |
| \var{unverifiable} should indicate whether the request is |
| unverifiable, as defined by RFC 2965. It defaults to False. An |
| unverifiable request is one whose URL the user did not have the option |
| to approve. For example, if the request is for an image in an HTML |
| document, and the user had no option to approve the automatic fetching |
| of the image, this should be true. |
| \end{classdesc} |
| |
| \begin{classdesc}{OpenerDirector}{} |
| The \class{OpenerDirector} class opens URLs via \class{BaseHandler}s |
| chained together. It manages the chaining of handlers, and recovery |
| from errors. |
| \end{classdesc} |
| |
| \begin{classdesc}{BaseHandler}{} |
| This is the base class for all registered handlers --- and handles only |
| the simple mechanics of registration. |
| \end{classdesc} |
| |
| \begin{classdesc}{HTTPDefaultErrorHandler}{} |
| A class which defines a default handler for HTTP error responses; all |
| responses are turned into \exception{HTTPError} exceptions. |
| \end{classdesc} |
| |
| \begin{classdesc}{HTTPRedirectHandler}{} |
| A class to handle redirections. |
| \end{classdesc} |
| |
| \begin{classdesc}{HTTPCookieProcessor}{\optional{cookiejar}} |
| A class to handle HTTP Cookies. |
| \end{classdesc} |
| |
| \begin{classdesc}{ProxyHandler}{\optional{proxies}} |
| Cause requests to go through a proxy. |
| If \var{proxies} is given, it must be a dictionary mapping |
| protocol names to URLs of proxies. |
| The default is to read the list of proxies from the environment |
| variables \envvar{<protocol>_proxy}. |
| \end{classdesc} |
| |
| \begin{classdesc}{HTTPPasswordMgr}{} |
| Keep a database of |
| \code{(\var{realm}, \var{uri}) -> (\var{user}, \var{password})} |
| mappings. |
| \end{classdesc} |
| |
| \begin{classdesc}{HTTPPasswordMgrWithDefaultRealm}{} |
| Keep a database of |
| \code{(\var{realm}, \var{uri}) -> (\var{user}, \var{password})} mappings. |
| A realm of \code{None} is considered a catch-all realm, which is searched |
| if no other realm fits. |
| \end{classdesc} |
| |
| \begin{classdesc}{AbstractBasicAuthHandler}{\optional{password_mgr}} |
| This is a mixin class that helps with HTTP authentication, both |
| to the remote host and to a proxy. |
| \var{password_mgr}, if given, should be something that is compatible |
| with \class{HTTPPasswordMgr}; refer to section~\ref{http-password-mgr} |
| for information on the interface that must be supported. |
| \end{classdesc} |
| |
| \begin{classdesc}{HTTPBasicAuthHandler}{\optional{password_mgr}} |
| Handle authentication with the remote host. |
| \var{password_mgr}, if given, should be something that is compatible |
| with \class{HTTPPasswordMgr}; refer to section~\ref{http-password-mgr} |
| for information on the interface that must be supported. |
| \end{classdesc} |
| |
| \begin{classdesc}{ProxyBasicAuthHandler}{\optional{password_mgr}} |
| Handle authentication with the proxy. |
| \var{password_mgr}, if given, should be something that is compatible |
| with \class{HTTPPasswordMgr}; refer to section~\ref{http-password-mgr} |
| for information on the interface that must be supported. |
| \end{classdesc} |
| |
| \begin{classdesc}{AbstractDigestAuthHandler}{\optional{password_mgr}} |
| This is a mixin class that helps with HTTP authentication, both |
| to the remote host and to a proxy. |
| \var{password_mgr}, if given, should be something that is compatible |
| with \class{HTTPPasswordMgr}; refer to section~\ref{http-password-mgr} |
| for information on the interface that must be supported. |
| \end{classdesc} |
| |
| \begin{classdesc}{HTTPDigestAuthHandler}{\optional{password_mgr}} |
| Handle authentication with the remote host. |
| \var{password_mgr}, if given, should be something that is compatible |
| with \class{HTTPPasswordMgr}; refer to section~\ref{http-password-mgr} |
| for information on the interface that must be supported. |
| \end{classdesc} |
| |
| \begin{classdesc}{ProxyDigestAuthHandler}{\optional{password_mgr}} |
| Handle authentication with the proxy. |
| \var{password_mgr}, if given, should be something that is compatible |
| with \class{HTTPPasswordMgr}; refer to section~\ref{http-password-mgr} |
| for information on the interface that must be supported. |
| \end{classdesc} |
| |
| \begin{classdesc}{HTTPHandler}{} |
| A class to handle opening of HTTP URLs. |
| \end{classdesc} |
| |
| \begin{classdesc}{HTTPSHandler}{} |
| A class to handle opening of HTTPS URLs. |
| \end{classdesc} |
| |
| \begin{classdesc}{FileHandler}{} |
| Open local files. |
| \end{classdesc} |
| |
| \begin{classdesc}{FTPHandler}{} |
| Open FTP URLs. |
| \end{classdesc} |
| |
| \begin{classdesc}{CacheFTPHandler}{} |
| Open FTP URLs, keeping a cache of open FTP connections to minimize |
| delays. |
| \end{classdesc} |
| |
| \begin{classdesc}{GopherHandler}{} |
| Open gopher URLs. |
| \end{classdesc} |
| |
| \begin{classdesc}{UnknownHandler}{} |
| A catch-all class to handle unknown URLs. |
| \end{classdesc} |
| |
| |
| \subsection{Request Objects \label{request-objects}} |
| |
| The following methods describe all of \class{Request}'s public interface, |
| and so all must be overridden in subclasses. |
| |
| \begin{methoddesc}[Request]{add_data}{data} |
| Set the \class{Request} data to \var{data}. This is ignored by all |
| handlers except HTTP handlers --- and there it should be a byte |
| string, and will change the request to be \code{POST} rather than |
| \code{GET}. |
| \end{methoddesc} |
| |
| \begin{methoddesc}[Request]{get_method}{} |
| Return a string indicating the HTTP request method. This is only |
| meaningful for HTTP requests, and currently always returns |
| \code{'GET'} or \code{'POST'}. |
| \end{methoddesc} |
| |
| \begin{methoddesc}[Request]{has_data}{} |
| Return whether the instance has a non-\code{None} data. |
| \end{methoddesc} |
| |
| \begin{methoddesc}[Request]{get_data}{} |
| Return the instance's data. |
| \end{methoddesc} |
| |
| \begin{methoddesc}[Request]{add_header}{key, val} |
| Add another header to the request. Headers are currently ignored by |
| all handlers except HTTP handlers, where they are added to the list |
| of headers sent to the server. Note that there cannot be more than |
| one header with the same name, and later calls will overwrite |
| previous calls in case the \var{key} collides. Currently, this is |
| no loss of HTTP functionality, since all headers which have meaning |
| when used more than once have a (header-specific) way of gaining the |
| same functionality using only one header. |
| \end{methoddesc} |
| |
| \begin{methoddesc}[Request]{add_unredirected_header}{key, header} |
| Add a header that will not be added to a redirected request. |
| \versionadded{2.4} |
| \end{methoddesc} |
| |
| \begin{methoddesc}[Request]{has_header}{header} |
| Return whether the instance has the named header (checks both regular |
| and unredirected). |
| \versionadded{2.4} |
| \end{methoddesc} |
| |
| \begin{methoddesc}[Request]{get_full_url}{} |
| Return the URL given in the constructor. |
| \end{methoddesc} |
| |
| \begin{methoddesc}[Request]{get_type}{} |
| Return the type of the URL --- also known as the scheme. |
| \end{methoddesc} |
| |
| \begin{methoddesc}[Request]{get_host}{} |
| Return the host to which a connection will be made. |
| \end{methoddesc} |
| |
| \begin{methoddesc}[Request]{get_selector}{} |
| Return the selector --- the part of the URL that is sent to |
| the server. |
| \end{methoddesc} |
| |
| \begin{methoddesc}[Request]{set_proxy}{host, type} |
| Prepare the request by connecting to a proxy server. The \var{host} |
| and \var{type} will replace those of the instance, and the instance's |
| selector will be the original URL given in the constructor. |
| \end{methoddesc} |
| |
| \begin{methoddesc}[Request]{get_origin_req_host}{} |
| Return the request-host of the origin transaction, as defined by |
| \rfc{2965}. See the documentation for the \class{Request} |
| constructor. |
| \end{methoddesc} |
| |
| \begin{methoddesc}[Request]{is_unverifiable}{} |
| Return whether the request is unverifiable, as defined by RFC 2965. |
| See the documentation for the \class{Request} constructor. |
| \end{methoddesc} |
| |
| |
| \subsection{OpenerDirector Objects \label{opener-director-objects}} |
| |
| \class{OpenerDirector} instances have the following methods: |
| |
| \begin{methoddesc}[OpenerDirector]{add_handler}{handler} |
| \var{handler} should be an instance of \class{BaseHandler}. The |
| following methods are searched, and added to the possible chains (note |
| that HTTP errors are a special case). |
| |
| \begin{itemize} |
| \item \method{\var{protocol}_open()} --- |
| signal that the handler knows how to open \var{protocol} URLs. |
| \item \method{http_error_\var{type}()} --- |
| signal that the handler knows how to handle HTTP errors with HTTP |
| error code \var{type}. |
| \item \method{\var{protocol}_error()} --- |
| signal that the handler knows how to handle errors from |
| (non-\code{http}) \var{protocol}. |
| \item \method{\var{protocol}_request()} --- |
| signal that the handler knows how to pre-process \var{protocol} |
| requests. |
| \item \method{\var{protocol}_response()} --- |
| signal that the handler knows how to post-process \var{protocol} |
| responses. |
| \end{itemize} |
| \end{methoddesc} |
| |
| \begin{methoddesc}[OpenerDirector]{open}{url\optional{, data}} |
| Open the given \var{url} (which can be a request object or a string), |
| optionally passing the given \var{data}. |
| Arguments, return values and exceptions raised are the same as those |
| of \function{urlopen()} (which simply calls the \method{open()} method |
| on the currently installed global \class{OpenerDirector}). |
| \end{methoddesc} |
| |
| \begin{methoddesc}[OpenerDirector]{error}{proto\optional{, |
| arg\optional{, \moreargs}}} |
| Handle an error of the given protocol. This will call the registered |
| error handlers for the given protocol with the given arguments (which |
| are protocol specific). The HTTP protocol is a special case which |
| uses the HTTP response code to determine the specific error handler; |
| refer to the \method{http_error_*()} methods of the handler classes. |
| |
| Return values and exceptions raised are the same as those |
| of \function{urlopen()}. |
| \end{methoddesc} |
| |
| OpenerDirector objects open URLs in three stages: |
| |
| The order in which these methods are called within each stage is |
| determined by sorting the handler instances. |
| |
| \begin{enumerate} |
| \item Every handler with a method named like |
| \method{\var{protocol}_request()} has that method called to |
| pre-process the request. |
| |
| \item Handlers with a method named like |
| \method{\var{protocol}_open()} are called to handle the request. |
| This stage ends when a handler either returns a |
| non-\constant{None} value (ie. a response), or raises an exception |
| (usually \exception{URLError}). Exceptions are allowed to propagate. |
| |
| In fact, the above algorithm is first tried for methods named |
| \method{default_open}. If all such methods return |
| \constant{None}, the algorithm is repeated for methods named like |
| \method{\var{protocol}_open()}. If all such methods return |
| \constant{None}, the algorithm is repeated for methods named |
| \method{unknown_open()}. |
| |
| Note that the implementation of these methods may involve calls of |
| the parent \class{OpenerDirector} instance's \method{.open()} and |
| \method{.error()} methods. |
| |
| \item Every handler with a method named like |
| \method{\var{protocol}_response()} has that method called to |
| post-process the response. |
| |
| \end{enumerate} |
| |
| \subsection{BaseHandler Objects \label{base-handler-objects}} |
| |
| \class{BaseHandler} objects provide a couple of methods that are |
| directly useful, and others that are meant to be used by derived |
| classes. These are intended for direct use: |
| |
| \begin{methoddesc}[BaseHandler]{add_parent}{director} |
| Add a director as parent. |
| \end{methoddesc} |
| |
| \begin{methoddesc}[BaseHandler]{close}{} |
| Remove any parents. |
| \end{methoddesc} |
| |
| The following members and methods should only be used by classes |
| derived from \class{BaseHandler}. \note{The convention has been |
| adopted that subclasses defining \method{\var{protocol}_request()} or |
| \method{\var{protocol}_response()} methods are named |
| \class{*Processor}; all others are named \class{*Handler}.} |
| |
| |
| \begin{memberdesc}[BaseHandler]{parent} |
| A valid \class{OpenerDirector}, which can be used to open using a |
| different protocol, or handle errors. |
| \end{memberdesc} |
| |
| \begin{methoddesc}[BaseHandler]{default_open}{req} |
| This method is \emph{not} defined in \class{BaseHandler}, but |
| subclasses should define it if they want to catch all URLs. |
| |
| This method, if implemented, will be called by the parent |
| \class{OpenerDirector}. It should return a file-like object as |
| described in the return value of the \method{open()} of |
| \class{OpenerDirector}, or \code{None}. It should raise |
| \exception{URLError}, unless a truly exceptional thing happens (for |
| example, \exception{MemoryError} should not be mapped to |
| \exception{URLError}). |
| |
| This method will be called before any protocol-specific open method. |
| \end{methoddesc} |
| |
| \begin{methoddescni}[BaseHandler]{\var{protocol}_open}{req} |
| This method is \emph{not} defined in \class{BaseHandler}, but |
| subclasses should define it if they want to handle URLs with the given |
| protocol. |
| |
| This method, if defined, will be called by the parent |
| \class{OpenerDirector}. Return values should be the same as for |
| \method{default_open()}. |
| \end{methoddescni} |
| |
| \begin{methoddesc}[BaseHandler]{unknown_open}{req} |
| This method is \var{not} defined in \class{BaseHandler}, but |
| subclasses should define it if they want to catch all URLs with no |
| specific registered handler to open it. |
| |
| This method, if implemented, will be called by the \member{parent} |
| \class{OpenerDirector}. Return values should be the same as for |
| \method{default_open()}. |
| \end{methoddesc} |
| |
| \begin{methoddesc}[BaseHandler]{http_error_default}{req, fp, code, msg, hdrs} |
| This method is \emph{not} defined in \class{BaseHandler}, but |
| subclasses should override it if they intend to provide a catch-all |
| for otherwise unhandled HTTP errors. It will be called automatically |
| by the \class{OpenerDirector} getting the error, and should not |
| normally be called in other circumstances. |
| |
| \var{req} will be a \class{Request} object, \var{fp} will be a |
| file-like object with the HTTP error body, \var{code} will be the |
| three-digit code of the error, \var{msg} will be the user-visible |
| explanation of the code and \var{hdrs} will be a mapping object with |
| the headers of the error. |
| |
| Return values and exceptions raised should be the same as those |
| of \function{urlopen()}. |
| \end{methoddesc} |
| |
| \begin{methoddesc}[BaseHandler]{http_error_\var{nnn}}{req, fp, code, msg, hdrs} |
| \var{nnn} should be a three-digit HTTP error code. This method is |
| also not defined in \class{BaseHandler}, but will be called, if it |
| exists, on an instance of a subclass, when an HTTP error with code |
| \var{nnn} occurs. |
| |
| Subclasses should override this method to handle specific HTTP |
| errors. |
| |
| Arguments, return values and exceptions raised should be the same as |
| for \method{http_error_default()}. |
| \end{methoddesc} |
| |
| \begin{methoddescni}[BaseHandler]{\var{protocol}_request}{req} |
| This method is \emph{not} defined in \class{BaseHandler}, but |
| subclasses should define it if they want to pre-process requests of |
| the given protocol. |
| |
| This method, if defined, will be called by the parent |
| \class{OpenerDirector}. \var{req} will be a \class{Request} object. |
| The return value should be a \class{Request} object. |
| \end{methoddescni} |
| |
| \begin{methoddescni}[BaseHandler]{\var{protocol}_response}{req, response} |
| This method is \emph{not} defined in \class{BaseHandler}, but |
| subclasses should define it if they want to post-process responses of |
| the given protocol. |
| |
| This method, if defined, will be called by the parent |
| \class{OpenerDirector}. \var{req} will be a \class{Request} object. |
| \var{response} will be an object implementing the same interface as |
| the return value of \function{urlopen()}. The return value should |
| implement the same interface as the return value of |
| \function{urlopen()}. |
| \end{methoddescni} |
| |
| \subsection{HTTPRedirectHandler Objects \label{http-redirect-handler}} |
| |
| \note{Some HTTP redirections require action from this module's client |
| code. If this is the case, \exception{HTTPError} is raised. See |
| \rfc{2616} for details of the precise meanings of the various |
| redirection codes.} |
| |
| \begin{methoddesc}[HTTPRedirectHandler]{redirect_request}{req, |
| fp, code, msg, hdrs} |
| Return a \class{Request} or \code{None} in response to a redirect. |
| This is called by the default implementations of the |
| \method{http_error_30*()} methods when a redirection is received from |
| the server. If a redirection should take place, return a new |
| \class{Request} to allow \method{http_error_30*()} to perform the |
| redirect. Otherwise, raise \exception{HTTPError} if no other handler |
| should try to handle this URL, or return \code{None} if you can't but |
| another handler might. |
| |
| \begin{notice} |
| The default implementation of this method does not strictly |
| follow \rfc{2616}, which says that 301 and 302 responses to \code{POST} |
| requests must not be automatically redirected without confirmation by |
| the user. In reality, browsers do allow automatic redirection of |
| these responses, changing the POST to a \code{GET}, and the default |
| implementation reproduces this behavior. |
| \end{notice} |
| \end{methoddesc} |
| |
| |
| \begin{methoddesc}[HTTPRedirectHandler]{http_error_301}{req, |
| fp, code, msg, hdrs} |
| Redirect to the \code{Location:} URL. This method is called by |
| the parent \class{OpenerDirector} when getting an HTTP |
| `moved permanently' response. |
| \end{methoddesc} |
| |
| \begin{methoddesc}[HTTPRedirectHandler]{http_error_302}{req, |
| fp, code, msg, hdrs} |
| The same as \method{http_error_301()}, but called for the |
| `found' response. |
| \end{methoddesc} |
| |
| \begin{methoddesc}[HTTPRedirectHandler]{http_error_303}{req, |
| fp, code, msg, hdrs} |
| The same as \method{http_error_301()}, but called for the |
| `see other' response. |
| \end{methoddesc} |
| |
| \begin{methoddesc}[HTTPRedirectHandler]{http_error_307}{req, |
| fp, code, msg, hdrs} |
| The same as \method{http_error_301()}, but called for the |
| `temporary redirect' response. |
| \end{methoddesc} |
| |
| |
| \subsection{HTTPCookieProcessor Objects \label{http-cookie-processor}} |
| |
| \versionadded{2.4} |
| |
| \class{HTTPCookieProcessor} instances have one attribute: |
| |
| \begin{memberdesc}{cookiejar} |
| The \class{cookielib.CookieJar} in which cookies are stored. |
| \end{memberdesc} |
| |
| |
| \subsection{ProxyHandler Objects \label{proxy-handler}} |
| |
| \begin{methoddescni}[ProxyHandler]{\var{protocol}_open}{request} |
| The \class{ProxyHandler} will have a method |
| \method{\var{protocol}_open()} for every \var{protocol} which has a |
| proxy in the \var{proxies} dictionary given in the constructor. The |
| method will modify requests to go through the proxy, by calling |
| \code{request.set_proxy()}, and call the next handler in the chain to |
| actually execute the protocol. |
| \end{methoddescni} |
| |
| |
| \subsection{HTTPPasswordMgr Objects \label{http-password-mgr}} |
| |
| These methods are available on \class{HTTPPasswordMgr} and |
| \class{HTTPPasswordMgrWithDefaultRealm} objects. |
| |
| \begin{methoddesc}[HTTPPasswordMgr]{add_password}{realm, uri, user, passwd} |
| \var{uri} can be either a single URI, or a sequence of URIs. \var{realm}, |
| \var{user} and \var{passwd} must be strings. This causes |
| \code{(\var{user}, \var{passwd})} to be used as authentication tokens |
| when authentication for \var{realm} and a super-URI of any of the |
| given URIs is given. |
| \end{methoddesc} |
| |
| \begin{methoddesc}[HTTPPasswordMgr]{find_user_password}{realm, authuri} |
| Get user/password for given realm and URI, if any. This method will |
| return \code{(None, None)} if there is no matching user/password. |
| |
| For \class{HTTPPasswordMgrWithDefaultRealm} objects, the realm |
| \code{None} will be searched if the given \var{realm} has no matching |
| user/password. |
| \end{methoddesc} |
| |
| |
| \subsection{AbstractBasicAuthHandler Objects |
| \label{abstract-basic-auth-handler}} |
| |
| \begin{methoddesc}[AbstractBasicAuthHandler]{http_error_auth_reqed} |
| {authreq, host, req, headers} |
| Handle an authentication request by getting a user/password pair, and |
| re-trying the request. \var{authreq} should be the name of the header |
| where the information about the realm is included in the request, |
| \var{host} specifies the URL and path to authenticate for, \var{req} |
| should be the (failed) \class{Request} object, and \var{headers} |
| should be the error headers. |
| |
| \var{host} is either an authority (e.g. \code{"python.org"}) or a URL |
| containing an authority component (e.g. \code{"http://python.org/"}). |
| In either case, the authority must not contain a userinfo component |
| (so, \code{"python.org"} and \code{"python.org:80"} are fine, |
| \code{"joe:password@python.org"} is not). |
| \end{methoddesc} |
| |
| |
| \subsection{HTTPBasicAuthHandler Objects |
| \label{http-basic-auth-handler}} |
| |
| \begin{methoddesc}[HTTPBasicAuthHandler]{http_error_401}{req, fp, code, |
| msg, hdrs} |
| Retry the request with authentication information, if available. |
| \end{methoddesc} |
| |
| |
| \subsection{ProxyBasicAuthHandler Objects |
| \label{proxy-basic-auth-handler}} |
| |
| \begin{methoddesc}[ProxyBasicAuthHandler]{http_error_407}{req, fp, code, |
| msg, hdrs} |
| Retry the request with authentication information, if available. |
| \end{methoddesc} |
| |
| |
| \subsection{AbstractDigestAuthHandler Objects |
| \label{abstract-digest-auth-handler}} |
| |
| \begin{methoddesc}[AbstractDigestAuthHandler]{http_error_auth_reqed} |
| {authreq, host, req, headers} |
| \var{authreq} should be the name of the header where the information about |
| the realm is included in the request, \var{host} should be the host to |
| authenticate to, \var{req} should be the (failed) \class{Request} |
| object, and \var{headers} should be the error headers. |
| \end{methoddesc} |
| |
| |
| \subsection{HTTPDigestAuthHandler Objects |
| \label{http-digest-auth-handler}} |
| |
| \begin{methoddesc}[HTTPDigestAuthHandler]{http_error_401}{req, fp, code, |
| msg, hdrs} |
| Retry the request with authentication information, if available. |
| \end{methoddesc} |
| |
| |
| \subsection{ProxyDigestAuthHandler Objects |
| \label{proxy-digest-auth-handler}} |
| |
| \begin{methoddesc}[ProxyDigestAuthHandler]{http_error_407}{req, fp, code, |
| msg, hdrs} |
| Retry the request with authentication information, if available. |
| \end{methoddesc} |
| |
| |
| \subsection{HTTPHandler Objects \label{http-handler-objects}} |
| |
| \begin{methoddesc}[HTTPHandler]{http_open}{req} |
| Send an HTTP request, which can be either GET or POST, depending on |
| \code{\var{req}.has_data()}. |
| \end{methoddesc} |
| |
| |
| \subsection{HTTPSHandler Objects \label{https-handler-objects}} |
| |
| \begin{methoddesc}[HTTPSHandler]{https_open}{req} |
| Send an HTTPS request, which can be either GET or POST, depending on |
| \code{\var{req}.has_data()}. |
| \end{methoddesc} |
| |
| |
| \subsection{FileHandler Objects \label{file-handler-objects}} |
| |
| \begin{methoddesc}[FileHandler]{file_open}{req} |
| Open the file locally, if there is no host name, or |
| the host name is \code{'localhost'}. Change the |
| protocol to \code{ftp} otherwise, and retry opening |
| it using \member{parent}. |
| \end{methoddesc} |
| |
| |
| \subsection{FTPHandler Objects \label{ftp-handler-objects}} |
| |
| \begin{methoddesc}[FTPHandler]{ftp_open}{req} |
| Open the FTP file indicated by \var{req}. |
| The login is always done with empty username and password. |
| \end{methoddesc} |
| |
| |
| \subsection{CacheFTPHandler Objects \label{cacheftp-handler-objects}} |
| |
| \class{CacheFTPHandler} objects are \class{FTPHandler} objects with |
| the following additional methods: |
| |
| \begin{methoddesc}[CacheFTPHandler]{setTimeout}{t} |
| Set timeout of connections to \var{t} seconds. |
| \end{methoddesc} |
| |
| \begin{methoddesc}[CacheFTPHandler]{setMaxConns}{m} |
| Set maximum number of cached connections to \var{m}. |
| \end{methoddesc} |
| |
| |
| \subsection{GopherHandler Objects \label{gopher-handler}} |
| |
| \begin{methoddesc}[GopherHandler]{gopher_open}{req} |
| Open the gopher resource indicated by \var{req}. |
| \end{methoddesc} |
| |
| |
| \subsection{UnknownHandler Objects \label{unknown-handler-objects}} |
| |
| \begin{methoddesc}[UnknownHandler]{unknown_open}{} |
| Raise a \exception{URLError} exception. |
| \end{methoddesc} |
| |
| |
| \subsection{HTTPErrorProcessor Objects \label{http-error-processor-objects}} |
| |
| \versionadded{2.4} |
| |
| \begin{methoddesc}[HTTPErrorProcessor]{unknown_open}{} |
| Process HTTP error responses. |
| |
| For 200 error codes, the response object is returned immediately. |
| |
| For non-200 error codes, this simply passes the job on to the |
| \method{\var{protocol}_error_\var{code}()} handler methods, via |
| \method{OpenerDirector.error()}. Eventually, |
| \class{urllib2.HTTPDefaultErrorHandler} will raise an |
| \exception{HTTPError} if no other handler handles the error. |
| \end{methoddesc} |
| |
| |
| \subsection{Examples \label{urllib2-examples}} |
| |
| This example gets the python.org main page and displays the first 100 |
| bytes of it: |
| |
| \begin{verbatim} |
| >>> import urllib2 |
| >>> f = urllib2.urlopen('http://www.python.org/') |
| >>> print f.read(100) |
| <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> |
| <?xml-stylesheet href="./css/ht2html |
| \end{verbatim} |
| |
| Here we are sending a data-stream to the stdin of a CGI and reading |
| the data it returns to us. Note that this example will only work when the |
| Python installation supports SSL. |
| |
| \begin{verbatim} |
| >>> import urllib2 |
| >>> req = urllib2.Request(url='https://localhost/cgi-bin/test.cgi', |
| ... data='This data is passed to stdin of the CGI') |
| >>> f = urllib2.urlopen(req) |
| >>> print f.read() |
| Got Data: "This data is passed to stdin of the CGI" |
| \end{verbatim} |
| |
| The code for the sample CGI used in the above example is: |
| |
| \begin{verbatim} |
| #!/usr/bin/env python |
| import sys |
| data = sys.stdin.read() |
| print 'Content-type: text-plain\n\nGot Data: "%s"' % data |
| \end{verbatim} |
| |
| |
| Use of Basic HTTP Authentication: |
| |
| \begin{verbatim} |
| import urllib2 |
| # Create an OpenerDirector with support for Basic HTTP Authentication... |
| auth_handler = urllib2.HTTPBasicAuthHandler() |
| auth_handler.add_password('realm', 'host', 'username', 'password') |
| opener = urllib2.build_opener(auth_handler) |
| # ...and install it globally so it can be used with urlopen. |
| urllib2.install_opener(opener) |
| urllib2.urlopen('http://www.example.com/login.html') |
| \end{verbatim} |
| |
| \function{build_opener()} provides many handlers by default, including a |
| \class{ProxyHandler}. By default, \class{ProxyHandler} uses the |
| environment variables named \code{<scheme>_proxy}, where \code{<scheme>} |
| is the URL scheme involved. For example, the \envvar{http_proxy} |
| environment variable is read to obtain the HTTP proxy's URL. |
| |
| This example replaces the default \class{ProxyHandler} with one that uses |
| programatically-supplied proxy URLs, and adds proxy authorization support |
| with \class{ProxyBasicAuthHandler}. |
| |
| \begin{verbatim} |
| proxy_handler = urllib2.ProxyHandler({'http': 'http://www.example.com:3128/'}) |
| proxy_auth_handler = urllib2.HTTPBasicAuthHandler() |
| proxy_auth_handler.add_password('realm', 'host', 'username', 'password') |
| |
| opener = build_opener(proxy_handler, proxy_auth_handler) |
| # This time, rather than install the OpenerDirector, we use it directly: |
| opener.open('http://www.example.com/login.html') |
| \end{verbatim} |
| |
| |
| Adding HTTP headers: |
| |
| Use the \var{headers} argument to the \class{Request} constructor, or: |
| |
| \begin{verbatim} |
| import urllib2 |
| req = urllib2.Request('http://www.example.com/') |
| req.add_header('Referer', 'http://www.python.org/') |
| r = urllib2.urlopen(req) |
| \end{verbatim} |
| |
| \class{OpenerDirector} automatically adds a \mailheader{User-Agent} |
| header to every \class{Request}. To change this: |
| |
| \begin{verbatim} |
| import urllib2 |
| opener = urllib2.build_opener() |
| opener.addheaders = [('User-agent', 'Mozilla/5.0')] |
| opener.open('http://www.example.com/') |
| \end{verbatim} |
| |
| Also, remember that a few standard headers |
| (\mailheader{Content-Length}, \mailheader{Content-Type} and |
| \mailheader{Host}) are added when the \class{Request} is passed to |
| \function{urlopen()} (or \method{OpenerDirector.open()}). |