| \section{\module{urllib2} --- |
| extensible library for opening URLs} |
| |
| \declaremodule{standard}{urllib2} |
| \moduleauthor{Jeremy Hylton}{jhylton@users.sourceforge.net} |
| \sectionauthor{Moshe Zadka}{moshez@users.sourceforge.net} |
| |
| \modulesynopsis{An extensible library for opening URLs using a variety of |
| protocols} |
| |
| The \module{urllib2} module defines functions and classes which help |
| in opening URLs (mostly HTTP) in a complex world --- basic and digest |
| authentication, redirections and more. |
| |
| The \module{urllib2} module defines the following functions: |
| |
| \begin{funcdesc}{urlopen}{url\optional{, data}} |
| Open the url \var{url}, which can either a string or a \class{Request} |
| object (currently the code checks that it really is a \class{Request} |
| instance, or an instance of a subclass of \class{Request}). |
| |
| \var{data} should be a string, which specifies additional data to |
| send to the server. In HTTP requests, which are the only ones that |
| support \var{data}, it should be a buffer in the format of |
| \mimetype{application/x-www-form-urlencoded}, for example one returned |
| from \function{urllib.urlencode()}. |
| |
| This function returns a file-like object with two additional methods: |
| |
| \begin{itemize} |
| \item \method{geturl()} --- return the URL of the resource retrieved |
| \item \method{info()} --- return the meta-information of the page, as |
| a dictionary-like object |
| \end{itemize} |
| |
| Raises \exception{URLError} on errors. |
| \end{funcdesc} |
| |
| \begin{funcdesc}{install_opener}{opener} |
| Install a \class{OpenerDirector} instance as the default opener. |
| The code does not check for a real \class{OpenerDirector}, and any |
| class with the appropriate interface will work. |
| \end{funcdesc} |
| |
| \begin{funcdesc}{build_opener}{\optional{handler, \moreargs}} |
| Return an \class{OpenerDirector} instance, which chains the |
| handlers in the order given. \var{handler}s can be either instances |
| of \class{BaseHandler}, or subclasses of \class{BaseHandler} (in |
| which case it must be possible to call the constructor without |
| any parameters. Instances of the following classes will be in |
| the front of the \var{handler}s, unless the \var{handler}s contain |
| them, instances of them or subclasses of them: |
| |
| \code{ProxyHandler, UnknownHandler, HTTPHandler, HTTPDefaultErrorHandler, |
| HTTPRedirectHandler, FTPHandler, FileHandler} |
| |
| If the Python installation has SSL support (\function{socket.ssl()} |
| exists), \class{HTTPSHandler} will also be added. |
| \end{funcdesc} |
| |
| |
| The following exceptions are raised as appropriate: |
| |
| \begin{excdesc}{URLError} |
| The error handlers raise when they run into a problem. It is a |
| subclass of \exception{IOError}. |
| \end{excdesc} |
| |
| \begin{excdesc}{HTTPError} |
| A subclass of \exception{URLError}, it can also function as a |
| non-exceptional file-like return value (the same thing that |
| \function{urlopen()} returns). This is useful when handling exotic |
| HTTP errors, such as requests for authentication. |
| \end{excdesc} |
| |
| \begin{excdesc}{GopherError} |
| A subclass of \exception{URLError}, this is the error raised by the |
| Gopher handler. |
| \end{excdesc} |
| |
| |
| The following classes are provided: |
| |
| \begin{classdesc}{Request}{url\optional{, data\optional{, headers}}} |
| This class is an abstraction of a URL request. |
| |
| \var{url} should be a string which is a valid URL. For descrtion |
| of \var{data} see the \method{add_data()} description. |
| \var{headers} should be a dictionary, and will be treated as if |
| \method{add_header()} was called with each key and value as arguments. |
| \end{classdesc} |
| |
| \begin{classdesc}{OpenerDirector}{} |
| The \class{OpenerDirector} class opens URLs via \class{BaseHandler}s |
| chained together. It manages the chaining of handlers, and recovery |
| from errors. |
| \end{classdesc} |
| |
| \begin{classdesc}{BaseHandler}{} |
| This is the base class for all registered handlers --- and handles only |
| the simple mechanics of registration. |
| \end{classdesc} |
| |
| \begin{classdesc}{HTTPDefaultErrorHandler}{} |
| A class which defines a default handler for HTTP error responses; all |
| responses are turned into \exception{HTTPError} exceptions. |
| \end{classdesc} |
| |
| \begin{classdesc}{HTTPRedirectHandler}{} |
| A class to handle redirections. |
| \end{classdesc} |
| |
| \begin{classdesc}{ProxyHandler}{\optional{proxies}} |
| Cause requests to go through a proxy. |
| If \var{proxies} is given, it must be a dictionary mapping |
| protocol names to URLs of proxies. |
| The default is to read the list of proxies from the environment |
| variables \var{protocol}_proxy. |
| \end{classdesc} |
| |
| \begin{classdesc}{HTTPPasswordMgr}{} |
| Keep a database of |
| \code{(\var{realm}, \var{uri}) -> (\var{user}, \var{password})} |
| mappings. |
| \end{classdesc} |
| |
| \begin{classdesc}{HTTPPasswordMgrWithDefaultRealm}{} |
| Keep a database of |
| \code{(\var{realm}, \var{uri}) -> (\var{user}, \var{password})} mappings. |
| A realm of \code{None} is considered a catch-all realm, which is searched |
| if no other realm fits. |
| \end{classdesc} |
| |
| \begin{classdesc}{AbstractBasicAuthHandler}{\optional{password_mgr}} |
| This is a mixin class that helps with HTTP authentication, both |
| to the remote host and to a proxy. |
| |
| \var{password_mgr} should be something that is compatible with |
| \class{HTTPPasswordMgr} --- supplies the documented interface above. |
| \end{classdesc} |
| |
| \begin{classdesc}{HTTPBasicAuthHandler}{\optional{password_mgr}} |
| Handle authentication with the remote host. |
| Valid \var{password_mgr}, if given, are the same as for |
| \class{AbstractBasicAuthHandler}. |
| \end{classdesc} |
| |
| \begin{classdesc}{ProxyBasicAuthHandler}{\optional{password_mgr}} |
| Handle authentication with the proxy. |
| Valid \var{password_mgr}, if given, are the same as for |
| \class{AbstractBasicAuthHandler}. |
| \end{classdesc} |
| |
| \begin{classdesc}{AbstractDigestAuthHandler}{\optional{password_mgr}} |
| This is a mixin class, that helps with HTTP authentication, both |
| to the remote host and to a proxy. |
| |
| \var{password_mgr} should be something that is compatible with |
| \class{HTTPPasswordMgr} --- supplies the documented interface above. |
| \end{classdesc} |
| |
| \begin{classdesc}{HTTPDigestAuthHandler}{\optional{password_mgr}} |
| Handle authentication with the remote host. |
| Valid \var{password_mgr}, if given, are the same as for |
| \class{AbstractBasicAuthHandler}. |
| \end{classdesc} |
| |
| \begin{classdesc}{ProxyDigestAuthHandler}{\optional{password_mgr}} |
| Handle authentication with the proxy. |
| \var{password_mgr}, if given, shoudl be the same as for |
| the constructor of \class{AbstractDigestAuthHandler}. |
| \end{classdesc} |
| |
| \begin{classdesc}{HTTPHandler}{} |
| A class to handle opening of HTTP URLs. |
| \end{classdesc} |
| |
| \begin{classdesc}{HTTPSHandler}{} |
| A class to handle opening of HTTPS URLs. |
| \end{classdesc} |
| |
| \begin{classdesc}{FileHandler}{} |
| Open local files. |
| \end{classdesc} |
| |
| \begin{classdesc}{FTPHandler}{} |
| Open FTP URLs. |
| \end{classdesc} |
| |
| \begin{classdesc}{CacheFTPHandler}{} |
| Open FTP URLs, keeping a cache of open FTP connections to minimize |
| delays. |
| \end{classdesc} |
| |
| \begin{classdesc}{GopherHandler}{} |
| Open gopher URLs. |
| \end{classdesc} |
| |
| \begin{classdesc}{UnknownHandler}{} |
| A catch-all class to handle unknown URLs. |
| \end{classdesc} |
| |
| |
| \subsection{Request Objects \label{request-objects}} |
| |
| The following methods describe all of \class{Request}'s public interface, |
| and so all must be overridden in subclasses. |
| |
| \begin{methoddesc}[Request]{add_data}{data} |
| Set the \class{Request} data to \var{data} is ignored |
| by all handlers except HTTP handlers --- and there it should be an |
| \mimetype{application/x-www-form-encoded} buffer, and will change the |
| request to be \code{POST} rather then \code{GET}. |
| \end{methoddesc} |
| |
| \begin{methoddesc}[Request]{has_data}{data} |
| Return whether the instance has a non-\code{None} data. |
| \end{methoddesc} |
| |
| \begin{methoddesc}[Request]{get_data}{data} |
| Return the instance's data. |
| \end{methoddesc} |
| |
| \begin{methoddesc}[Request]{add_header}{key, val} |
| Add another header to the request. Headers are currently ignored by |
| all handlers except HTTP handlers, where they are added to the list |
| of headers sent to the server. Note that there cannot be more then |
| one header with the same name, and later calls will overwrite |
| previous calls in case the \var{key} collides. Currently, this is |
| no loss of HTTP functionality, since all headers which have meaning |
| when used more then once have a (header-specific) way of gaining the |
| same functionality using only one header. |
| \end{methoddesc} |
| |
| \begin{methoddesc}[Request]{get_full_url}{} |
| Return the URL given in the constructor. |
| \end{methoddesc} |
| |
| \begin{methoddesc}[Request]{get_type}{} |
| Return the type of the URL --- also known as the scheme. |
| \end{methoddesc} |
| |
| \begin{methoddesc}[Request]{get_host}{} |
| Return the host to which connection will be made. |
| \end{methoddesc} |
| |
| \begin{methoddesc}[Request]{get_selector}{} |
| Return the selector --- the part of the URL that is sent to |
| the server. |
| \end{methoddesc} |
| |
| \begin{methoddesc}[Request]{set_proxy}{host, type} |
| Make the request by connecting to a proxy server. The \var{host} and |
| \var{type} will replace those of the instance, and the instance's |
| selector will be the original URL given in the constructor. |
| \end{methoddesc} |
| |
| |
| \subsection{OpenerDirector Objects \label{opener-director-objects}} |
| |
| \class{OpenerDirector} instances have the following methods: |
| |
| \begin{methoddesc}[OpenerDirector]{add_handler}{handler} |
| \var{handler} should be an instance of \class{BaseHandler}. The |
| following methods are searched, and added to the possible chains. |
| |
| \begin{itemize} |
| \item \method{\var{protocol}_open()} --- |
| signal that the handler knows how to open \var{protocol} URLs. |
| \item \method{\var{protocol}_error_\var{type}()} --- |
| signal that the handler knows how to handle \var{type} errors from |
| \var{protocol}. |
| \end{itemize} |
| \end{methoddesc} |
| |
| \begin{methoddesc}[OpenerDirector]{close}{} |
| Explicitly break cycles, and delete all the handlers. |
| Because the \class{OpenerDirector} needs to know the registered handlers, |
| and a handler needs to know who the \class{OpenerDirector} who called |
| it is, there is a reference cycles. Even though recent versions of Python |
| have cycle-collection, it is sometimes preferable to explicitly break |
| the cycles. |
| \end{methoddesc} |
| |
| \begin{methoddesc}[OpenerDirector]{open}{url\optional{, data}} |
| Open the given \var{url}. (which can be a request object or a string), |
| optionally passing the given \var{data}. |
| Arguments, return values and exceptions raised are the same as those |
| of \function{urlopen()} (which simply calls the \method{open()} method |
| on the default installed \class{OpenerDirector}. |
| \end{methoddesc} |
| |
| \begin{methoddesc}[OpenerDirector]{error}{proto\optional{, |
| arg\optional{, \moreargs}}} |
| Handle an error in a given protocol. The HTTP protocol is special cased to |
| use the code as the error. This will call the registered error handlers |
| for the given protocol with the given arguments (which are protocol specific). |
| |
| Return values and exceptions raised are the same as those |
| of \function{urlopen()}. |
| \end{methoddesc} |
| |
| |
| \subsection{BaseHandler Objects \label{base-handler-objects}} |
| |
| \class{BaseHandler} objects provide a couple of methods that are |
| directly useful, and others that are meant to be used by derived |
| classes. These are intended for direct use: |
| |
| \begin{methoddesc}[BaseHandler]{add_parent}{director} |
| Add a director as parent. |
| \end{methoddesc} |
| |
| \begin{methoddesc}[BaseHandler]{close}{} |
| Remove any parents. |
| \end{methoddesc} |
| |
| The following members and methods should be used only be classes |
| derived from \class{BaseHandler}: |
| |
| \begin{memberdesc}[BaseHandler]{parent} |
| A valid \class{OpenerDirector}, which can be used to open using a |
| different protocol, or handle errors. |
| \end{memberdesc} |
| |
| \begin{methoddesc}[BaseHandler]{default_open}{req} |
| This method is \emph{not} defined in \class{BaseHandler}, but |
| subclasses should define it if they want to catch all URLs. |
| |
| This method, if exists, will be called by the \member{parent} |
| \class{OpenerDirector}. It should return a file-like object as |
| described in the return value of the \method{open()} of |
| \class{OpenerDirector} or \code{None}. It should raise |
| \exception{URLError}, unless a truly exceptional thing happens (for |
| example, \exception{MemoryError} should not be mapped to |
| \exception{URLError}. |
| |
| This method will be called before any protocol-specific open method. |
| \end{methoddesc} |
| |
| \begin{methoddescni}[BaseHandler]{\var{protocol}_open}{req} |
| This method is \emph{not} defined in \class{BaseHandler}, but |
| subclasses should define it if they want to handle URLs with the given |
| protocol. |
| |
| This method, if defined, will be called by the \member{parent} |
| \class{OpenerDirector}. Return values should be the same as for |
| \method{default_open()}. |
| \end{methoddescni} |
| |
| \begin{methoddesc}[BaseHandler]{unknown_open}{req} |
| This method is \var{not} defined in \class{BaseHandler}, but |
| subclasses should define it if they want to catch all URLs with no |
| specific registerd handler to open it. |
| |
| This method, if exists, will be called by the \member{parent} |
| \class{OpenerDirector}. Return values should be the same as for |
| \method{default_open()}. |
| \end{methoddesc} |
| |
| \begin{methoddesc}[BaseHandler]{http_error_default}{req, fp, code, msg, hdrs} |
| This method is \emph{not} defined in \class{BaseHandler}, but |
| subclasses should override it if they intend to provide a catch-all |
| for otherwise unhandled HTTP errors. It will be called automatically |
| by the \class{OpenerDirector} getting the error, and should not |
| normally be called in other circumstances. |
| |
| \var{req} will be a \class{Request} object, \var{fp} will be a |
| file-like object with the HTTP error body, \var{code} will be the |
| three-digit code of the error, \var{msg} will be the user-visible |
| explanation of the code and \var{hdrs} will be a mapping object with |
| the headers of the error. |
| |
| Return values and exceptions raised should be the same as those |
| of \function{urlopen()}. |
| \end{methoddesc} |
| |
| \begin{methoddesc}[BaseHandler]{http_error_\var{nnn}}{req, fp, code, msg, hdrs} |
| \var{nnn} should be a three-digit HTTP error code. This method is |
| also not defined in \class{BaseHandler}, but will be called, if it |
| exists, on an instance of a subclass, when an HTTP error with code |
| \var{nnn} occurs. |
| |
| Subclasses should override this method to handle specific HTTP |
| errors. |
| |
| Arguments, return values and exceptions raised should be the same as |
| for \method{http_error_default()}. |
| \end{methoddesc} |
| |
| |
| \subsection{HTTPRedirectHandler Objects \label{http-redirect-handler}} |
| |
| \note{303 redirection is not supported by this version of |
| \module{urllib2}.} |
| |
| \begin{methoddesc}[HTTPRedirectHandler]{http_error_301}{req, |
| fp, code, msg, hdrs} |
| Redirect to the \code{Location:} URL. This method is called by |
| the parent \class{OpenerDirector} when getting an HTTP |
| permanent-redirect response. |
| \end{methoddesc} |
| |
| \begin{methoddesc}[HTTPRedirectHandler]{http_error_302}{req, |
| fp, code, msg, hdrs} |
| The same as \method{http_error_301()}, but called for the |
| temporary-redirect response. |
| \end{methoddesc} |
| |
| |
| \subsection{ProxyHandler Objects \label{proxy-handler}} |
| |
| \begin{methoddescni}[ProxyHandler]{\var{protocol}_open}{request} |
| The \class{ProxyHandler} will have a method |
| \method{\var{protocol}_open()} for every \var{protocol} which has a |
| proxy in the \var{proxies} dictionary given in the constructor. The |
| method will modify requests to go through the proxy, by calling |
| \code{request.set_proxy()}, and call the next handler in the chain to |
| actually execute the protocol. |
| \end{methoddescni} |
| |
| |
| \subsection{HTTPPasswordMgr Objects \label{http-password-mgr}} |
| |
| These methods are available on \class{HTTPPasswordMgr} and |
| \class{HTTPPasswordMgrWithDefaultRealm} objects. |
| |
| \begin{methoddesc}[HTTPPasswordMgr]{add_password}{realm, uri, user, passwd} |
| \var{uri} can be either a single URI, or a sequene of URIs. \var{realm}, |
| \var{user} and \var{passwd} must be strings. This causes |
| \code{(\var{user}, \var{passwd})} to be used as authentication tokens |
| when authentication for \var{realm} and a super-URI of any of the |
| given URIs is given. |
| \end{methoddesc} |
| |
| \begin{methoddesc}[HTTPPasswordMgr]{find_user_password}{realm, authuri} |
| Get user/password for given realm and URI, if any. This method will |
| return \code{(None, None)} if there is no matching user/password. |
| |
| For \class{HTTPPasswordMgrWithDefaultRealm} objects, the realm |
| \code{None} will be searched if the given \var{realm} has no matching |
| user/password. |
| \end{methoddesc} |
| |
| |
| \subsection{AbstractBasicAuthHandler Objects |
| \label{abstract-basic-auth-handler}} |
| |
| \begin{methoddesc}[AbstractBasicAuthHandler]{handle_authentication_request} |
| {authreq, host, req, headers} |
| Handle an authentication request by getting user/password pair, and retrying. |
| \var{authreq} should be the name of the header where the information about |
| the realm, \var{host} is the host to authenticate too, \var{req} should be the |
| (failed) \class{Request} object, and \var{headers} should be the error headers. |
| \end{methoddesc} |
| |
| |
| \subsection{HTTPBasicAuthHandler Objects |
| \label{http-basic-auth-handler}} |
| |
| \begin{methoddesc}[HTTPBasicAuthHandler]{http_error_401}{req, fp, code, |
| msg, hdrs} |
| Retry the request with authentication info, if available. |
| \end{methoddesc} |
| |
| |
| \subsection{ProxyBasicAuthHandler Objects |
| \label{proxy-basic-auth-handler}} |
| |
| \begin{methoddesc}[ProxyBasicAuthHandler]{http_error_407}{req, fp, code, |
| msg, hdrs} |
| Retry the request with authentication info, if available. |
| \end{methoddesc} |
| |
| |
| \subsection{AbstractDigestAuthHandler Objects |
| \label{abstract-digest-auth-handler}} |
| |
| \begin{methoddesc}[AbstractDigestAuthHandler]{handle_authentication_request} |
| {authreq, host, req, headers} |
| \var{authreq} should be the name of the header where the information about |
| the realm, \var{host} should be the host to authenticate too, \var{req} |
| should be the (failed) \class{Request} object, and \var{headers} should be the |
| error headers. |
| \end{methoddesc} |
| |
| |
| \subsection{HTTPDigestAuthHandler Objects |
| \label{http-digest-auth-handler}} |
| |
| \begin{methoddesc}[HTTPDigestAuthHandler]{http_error_401}{req, fp, code, |
| msg, hdrs} |
| Retry the request with authentication info, if available. |
| \end{methoddesc} |
| |
| |
| \subsection{ProxyDigestAuthHandler Objects |
| \label{proxy-digest-auth-handler}} |
| |
| \begin{methoddesc}[ProxyDigestAuthHandler]{http_error_407}{req, fp, code, |
| msg, hdrs} |
| Retry the request with authentication information, if available. |
| \end{methoddesc} |
| |
| |
| \subsection{HTTPHandler Objects \label{http-handler-objects}} |
| |
| \begin{methoddesc}[HTTPHandler]{http_open}{req} |
| Send an HTTP request, whcih can be either GET or POST, depending on |
| \code{\var{req}.has_data()}. |
| \end{methoddesc} |
| |
| |
| \subsection{HTTPSHandler Objects \label{https-handler-objects}} |
| |
| \begin{methoddesc}[HTTPSHandler]{https_open}{req} |
| Send an HTTPS request, which can be either GET or POST, depending on |
| \code{\var{req}.has_data()}. |
| \end{methoddesc} |
| |
| |
| \subsection{FileHandler Objects \label{file-handler-objects}} |
| |
| \begin{methoddesc}[FileHandler]{file_open}{req} |
| Open the file locally, if there is no host name, or |
| the host name is \code{'localhost'}. Change the |
| protocol to \code{ftp} otherwise, and retry opening |
| it using \member{parent}. |
| \end{methoddesc} |
| |
| |
| \subsection{FTPHandler Objects \label{ftp-handler-objects}} |
| |
| \begin{methoddesc}[FTPHandler]{ftp_open}{req} |
| Open the FTP file indicated by \var{req}. |
| The login is always done with empty username and password. |
| \end{methoddesc} |
| |
| |
| \subsection{CacheFTPHandler Objects \label{cacheftp-handler-objects}} |
| |
| \class{CacheFTPHandler} objects are \class{FTPHandler} objects with |
| the following additional methods: |
| |
| \begin{methoddesc}[CacheFTPHandler]{setTimeout}{t} |
| Set timeout of connections to \var{t} seconds. |
| \end{methoddesc} |
| |
| \begin{methoddesc}[CacheFTPHandler]{setMaxConns}{m} |
| Set maximum number of cached connections to \var{m}. |
| \end{methoddesc} |
| |
| |
| \subsection{GopherHandler Objects \label{gopher-handler}} |
| |
| \begin{methoddesc}[GopherHandler]{gopher_open}{req} |
| Open the gopher resource indicated by \var{req}. |
| \end{methoddesc} |
| |
| |
| \subsection{UnknownHandler Objects \label{unknown-handler-objects}} |
| |
| \begin{methoddesc}[UnknownHandler]{unknown_open}{} |
| Raise a \exception{URLError} exception. |
| \end{methoddesc} |