Blame - Doc/lib/liburlparse.tex - platform/external/python/cpython2

blob: f18efe9cf6eb455a713b144d74ac299668f099a3 [file] [log] [blame]

Fred Drake	295da24	1998-08-10 19:42:37 +0000	[diff] [blame]	1	\section{\module{urlparse} ---
Fred Drake	0308ff8	2000-08-25 17:29:35 +0000	[diff] [blame]	2	Parse URLs into components}
Fred Drake	b91e934	1998-07-23 17:59:49 +0000	[diff] [blame]	3	\declaremodule{standard}{urlparse}
				4
Fred Drake	72d157e	1998-08-06 21:23:17 +0000	[diff] [blame]	5	\modulesynopsis{Parse URLs into components.}
Fred Drake	b91e934	1998-07-23 17:59:49 +0000	[diff] [blame]	6
Guido van Rossum	a12ef94	1995-02-27 17:53:25 +0000	[diff] [blame]	7	\index{WWW}
Fred Drake	8ee679f	2001-07-14 02:50:55 +0000	[diff] [blame]	8	\index{World Wide Web}
Guido van Rossum	a12ef94	1995-02-27 17:53:25 +0000	[diff] [blame]	9	\index{URL}
				10	\indexii{URL}{parsing}
				11	\indexii{relative}{URL}
				12
Guido van Rossum	8675115	1995-02-28 17:14:32 +0000	[diff] [blame]	13
Fred Drake	0308ff8	2000-08-25 17:29:35 +0000	[diff] [blame]	14	This module defines a standard interface to break Uniform Resource
				15	Locator (URL) strings up in components (addressing scheme, network
				16	location, path etc.), to combine the components back into a URL
				17	string, and to convert a ``relative URL'' to an absolute URL given a
				18	``base URL.''
Guido van Rossum	a12ef94	1995-02-27 17:53:25 +0000	[diff] [blame]	19
Fred Drake	d1cc9c2	1998-01-21 04:55:02 +0000	[diff] [blame]	20	The module has been designed to match the Internet RFC on Relative
				21	Uniform Resource Locators (and discovered a bug in an earlier
Georg Brandl	1de3700	2006-01-20 21:17:01 +0000	[diff] [blame]	22	draft!). It supports the following URL schemes:
				23	\code{file}, \code{ftp}, \code{gopher}, \code{hdl}, \code{http},
				24	\code{https}, \code{imap}, \code{mailto}, \code{mms}, \code{news},
				25	\code{nntp}, \code{prospero}, \code{rsync}, \code{rtsp}, \code{rtspu},
Thomas Wouters	49fd7fa	2006-04-21 10:40:58 +0000	[diff] [blame^]	26	\code{sftp}, \code{shttp}, \code{sip}, \code{sips}, \code{snews}, \code{svn},
Georg Brandl	1de3700	2006-01-20 21:17:01 +0000	[diff] [blame]	27	\code{svn+ssh}, \code{telnet}, \code{wais}.
Thomas Wouters	49fd7fa	2006-04-21 10:40:58 +0000	[diff] [blame^]	28
				29	\versionadded[Support for the \code{sftp} and \code{sips} schemes]{2.5}
Guido van Rossum	a12ef94	1995-02-27 17:53:25 +0000	[diff] [blame]	30
Georg Brandl	1de3700	2006-01-20 21:17:01 +0000	[diff] [blame]	31	The \module{urlparse} module defines the following functions:
Guido van Rossum	a12ef94	1995-02-27 17:53:25 +0000	[diff] [blame]	32
Thomas Wouters	49fd7fa	2006-04-21 10:40:58 +0000	[diff] [blame^]	33	\begin{funcdesc}{urlparse}{urlstring\optional{,
				34	default_scheme\optional{, allow_fragments}}}
				35	Parse a URL into six components, returning a 6-tuple. This
				36	corresponds to the general structure of a URL:
Guido van Rossum	a12ef94	1995-02-27 17:53:25 +0000	[diff] [blame]	37	\code{\var{scheme}://\var{netloc}/\var{path};\var{parameters}?\var{query}\#\var{fragment}}.
				38	Each tuple item is a string, possibly empty.
Thomas Wouters	49fd7fa	2006-04-21 10:40:58 +0000	[diff] [blame^]	39	The components are not broken up in smaller parts (for example, the network
Guido van Rossum	a12ef94	1995-02-27 17:53:25 +0000	[diff] [blame]	40	location is a single string), and \% escapes are not expanded.
Thomas Wouters	49fd7fa	2006-04-21 10:40:58 +0000	[diff] [blame^]	41	The delimiters as shown above are not part of the result,
Guido van Rossum	470be14	1995-03-17 16:07:09 +0000	[diff] [blame]	42	except for a leading slash in the \var{path} component, which is
Thomas Wouters	49fd7fa	2006-04-21 10:40:58 +0000	[diff] [blame^]	43	retained if present. For example:
Guido van Rossum	96628a9	1995-04-10 11:34:00 +0000	[diff] [blame]	44
Fred Drake	1947991	1998-02-13 06:58:54 +0000	[diff] [blame]	45	\begin{verbatim}
Thomas Wouters	49fd7fa	2006-04-21 10:40:58 +0000	[diff] [blame^]	46	>>> from urlparse import urlparse
				47	>>> o = urlparse('http://www.cwi.nl:80/%7Eguido/Python.html')
				48	>>> o
Guido van Rossum	96628a9	1995-04-10 11:34:00 +0000	[diff] [blame]	49	('http', 'www.cwi.nl:80', '/%7Eguido/Python.html', '', '', '')
Thomas Wouters	49fd7fa	2006-04-21 10:40:58 +0000	[diff] [blame^]	50	>>> o.scheme
				51	'http'
				52	>>> o.port
				53	80
				54	>>> o.geturl()
				55	'http://www.cwi.nl:80/%7Eguido/Python.html'
Fred Drake	1947991	1998-02-13 06:58:54 +0000	[diff] [blame]	56	\end{verbatim}
Fred Drake	45ca333	2000-08-24 04:58:25 +0000	[diff] [blame]	57
Guido van Rossum	a12ef94	1995-02-27 17:53:25 +0000	[diff] [blame]	58	If the \var{default_scheme} argument is specified, it gives the
Thomas Wouters	49fd7fa	2006-04-21 10:40:58 +0000	[diff] [blame^]	59	default addressing scheme, to be used only if the URL does not
Guido van Rossum	a12ef94	1995-02-27 17:53:25 +0000	[diff] [blame]	60	specify one. The default value for this argument is the empty string.
				61
Thomas Wouters	49fd7fa	2006-04-21 10:40:58 +0000	[diff] [blame^]	62	If the \var{allow_fragments} argument is false, fragment identifiers
Guido van Rossum	a12ef94	1995-02-27 17:53:25 +0000	[diff] [blame]	63	are not allowed, even if the URL's addressing scheme normally does
Thomas Wouters	49fd7fa	2006-04-21 10:40:58 +0000	[diff] [blame^]	64	support them. The default value for this argument is \constant{True}.
				65
				66	The return value is actually an instance of a subclass of
				67	\pytype{tuple}. This class has the following additional read-only
				68	convenience attributes:
				69
				70	\begin{tableiv}{l\|c\|l\|c}{member}{Attribute}{Index}{Value}{Value if not present}
				71	\lineiv{scheme} {0} {URL scheme specifier} {empty string}
				72	\lineiv{netloc} {1} {Network location part} {empty string}
				73	\lineiv{path} {2} {Hierarchical path} {empty string}
				74	\lineiv{params} {3} {Parameters for last path element} {empty string}
				75	\lineiv{query} {4} {Query component} {empty string}
				76	\lineiv{fragment}{5} {Fragment identifier} {empty string}
				77	\lineiv{username}{ } {User name} {\constant{None}}
				78	\lineiv{password}{ } {Password} {\constant{None}}
				79	\lineiv{hostname}{ } {Host name (lower case)} {\constant{None}}
				80	\lineiv{port} { } {Port number as integer, if present} {\constant{None}}
				81	\end{tableiv}
				82
				83	See section~\ref{urlparse-result-object}, ``Results of
				84	\function{urlparse()} and \function{urlsplit()},'' for more
				85	information on the result object.
				86
				87	\versionchanged[Added attributes to return value]{2.5}
Guido van Rossum	a12ef94	1995-02-27 17:53:25 +0000	[diff] [blame]	88	\end{funcdesc}
				89
Thomas Wouters	49fd7fa	2006-04-21 10:40:58 +0000	[diff] [blame^]	90	\begin{funcdesc}{urlunparse}{parts}
				91	Construct a URL from a tuple as returned by \code{urlparse()}.
				92	The \var{parts} argument be any six-item iterable.
Guido van Rossum	a12ef94	1995-02-27 17:53:25 +0000	[diff] [blame]	93	This may result in a slightly different, but equivalent URL, if the
Thomas Wouters	49fd7fa	2006-04-21 10:40:58 +0000	[diff] [blame^]	94	URL that was parsed originally had unnecessary delimiters (for example,
				95	a ? with an empty query; the RFC states that these are equivalent).
Guido van Rossum	a12ef94	1995-02-27 17:53:25 +0000	[diff] [blame]	96	\end{funcdesc}
				97
Fred Drake	5545219	2001-11-16 03:22:15 +0000	[diff] [blame]	98	\begin{funcdesc}{urlsplit}{urlstring\optional{,
				99	default_scheme\optional{, allow_fragments}}}
				100	This is similar to \function{urlparse()}, but does not split the
				101	params from the URL. This should generally be used instead of
				102	\function{urlparse()} if the more recent URL syntax allowing
				103	parameters to be applied to each segment of the \var{path} portion of
Walter Dörwald	ff9ca5e	2005-08-31 11:03:12 +0000	[diff] [blame]	104	the URL (see \rfc{2396}) is wanted. A separate function is needed to
				105	separate the path segments and parameters. This function returns a
				106	5-tuple: (addressing scheme, network location, path, query, fragment
Fred Drake	5545219	2001-11-16 03:22:15 +0000	[diff] [blame]	107	identifier).
Thomas Wouters	49fd7fa	2006-04-21 10:40:58 +0000	[diff] [blame^]	108
				109	The return value is actually an instance of a subclass of
				110	\pytype{tuple}. This class has the following additional read-only
				111	convenience attributes:
				112
				113	\begin{tableiv}{l\|c\|l\|c}{member}{Attribute}{Index}{Value}{Value if not present}
				114	\lineiv{scheme} {0} {URL scheme specifier} {empty string}
				115	\lineiv{netloc} {1} {Network location part} {empty string}
				116	\lineiv{path} {2} {Hierarchical path} {empty string}
				117	\lineiv{query} {3} {Query component} {empty string}
				118	\lineiv{fragment} {4} {Fragment identifier} {empty string}
				119	\lineiv{username} { } {User name} {\constant{None}}
				120	\lineiv{password} { } {Password} {\constant{None}}
				121	\lineiv{hostname} { } {Host name (lower case)} {\constant{None}}
				122	\lineiv{port} { } {Port number as integer, if present} {\constant{None}}
				123	\end{tableiv}
				124
				125	See section~\ref{urlparse-result-object}, ``Results of
				126	\function{urlparse()} and \function{urlsplit()},'' for more
				127	information on the result object.
				128
Fred Drake	5545219	2001-11-16 03:22:15 +0000	[diff] [blame]	129	\versionadded{2.2}
Thomas Wouters	49fd7fa	2006-04-21 10:40:58 +0000	[diff] [blame^]	130	\versionchanged[Added attributes to return value]{2.5}
Fred Drake	5545219	2001-11-16 03:22:15 +0000	[diff] [blame]	131	\end{funcdesc}
				132
Thomas Wouters	49fd7fa	2006-04-21 10:40:58 +0000	[diff] [blame^]	133	\begin{funcdesc}{urlunsplit}{parts}
Fred Drake	5545219	2001-11-16 03:22:15 +0000	[diff] [blame]	134	Combine the elements of a tuple as returned by \function{urlsplit()}
				135	into a complete URL as a string.
Thomas Wouters	49fd7fa	2006-04-21 10:40:58 +0000	[diff] [blame^]	136	The \var{parts} argument be any five-item iterable.
				137	This may result in a slightly different, but equivalent URL, if the
				138	URL that was parsed originally had unnecessary delimiters (for example,
				139	a ? with an empty query; the RFC states that these are equivalent).
Fred Drake	5545219	2001-11-16 03:22:15 +0000	[diff] [blame]	140	\versionadded{2.2}
				141	\end{funcdesc}
				142
Fred Drake	cce1090	1998-03-17 06:33:25 +0000	[diff] [blame]	143	\begin{funcdesc}{urljoin}{base, url\optional{, allow_fragments}}
Guido van Rossum	a12ef94	1995-02-27 17:53:25 +0000	[diff] [blame]	144	Construct a full (``absolute'') URL by combining a ``base URL''
				145	(\var{base}) with a ``relative URL'' (\var{url}). Informally, this
				146	uses components of the base URL, in particular the addressing scheme,
				147	the network location and (part of) the path, to provide missing
Thomas Wouters	49fd7fa	2006-04-21 10:40:58 +0000	[diff] [blame^]	148	components in the relative URL. For example:
Guido van Rossum	96628a9	1995-04-10 11:34:00 +0000	[diff] [blame]	149
Fred Drake	1947991	1998-02-13 06:58:54 +0000	[diff] [blame]	150	\begin{verbatim}
Thomas Wouters	49fd7fa	2006-04-21 10:40:58 +0000	[diff] [blame^]	151	>>> from urlparse import urljoin
				152	>>> urljoin('http://www.cwi.nl/%7Eguido/Python.html', 'FAQ.html')
Guido van Rossum	96628a9	1995-04-10 11:34:00 +0000	[diff] [blame]	153	'http://www.cwi.nl/%7Eguido/FAQ.html'
Fred Drake	1947991	1998-02-13 06:58:54 +0000	[diff] [blame]	154	\end{verbatim}
Fred Drake	0308ff8	2000-08-25 17:29:35 +0000	[diff] [blame]	155
Thomas Wouters	49fd7fa	2006-04-21 10:40:58 +0000	[diff] [blame^]	156	The \var{allow_fragments} argument has the same meaning and default as
				157	for \function{urlparse()}.
Guido van Rossum	a12ef94	1995-02-27 17:53:25 +0000	[diff] [blame]	158	\end{funcdesc}
Fred Drake	45ca333	2000-08-24 04:58:25 +0000	[diff] [blame]	159
Fred Drake	98ef20d	2002-10-16 20:07:54 +0000	[diff] [blame]	160	\begin{funcdesc}{urldefrag}{url}
				161	If \var{url} contains a fragment identifier, returns a modified
				162	version of \var{url} with no fragment identifier, and the fragment
				163	identifier as a separate string. If there is no fragment identifier
				164	in \var{url}, returns \var{url} unmodified and an empty string.
				165	\end{funcdesc}
				166
Fred Drake	45ca333	2000-08-24 04:58:25 +0000	[diff] [blame]	167
				168	\begin{seealso}
				169	\seerfc{1738}{Uniform Resource Locators (URL)}{
				170	This specifies the formal syntax and semantics of absolute
				171	URLs.}
				172	\seerfc{1808}{Relative Uniform Resource Locators}{
				173	This Request For Comments includes the rules for joining an
Fred Drake	5f2c1d2	2002-10-17 19:23:43 +0000	[diff] [blame]	174	absolute and a relative URL, including a fair number of
Fred Drake	45ca333	2000-08-24 04:58:25 +0000	[diff] [blame]	175	``Abnormal Examples'' which govern the treatment of border
				176	cases.}
Fred Drake	0308ff8	2000-08-25 17:29:35 +0000	[diff] [blame]	177	\seerfc{2396}{Uniform Resource Identifiers (URI): Generic Syntax}{
				178	Document describing the generic syntactic requirements for
				179	both Uniform Resource Names (URNs) and Uniform Resource
				180	Locators (URLs).}
Fred Drake	45ca333	2000-08-24 04:58:25 +0000	[diff] [blame]	181	\end{seealso}
Thomas Wouters	49fd7fa	2006-04-21 10:40:58 +0000	[diff] [blame^]	182
				183
				184	\subsection{Results of \function{urlparse()} and \function{urlsplit()}
				185	\label{urlparse-result-object}}
				186
				187	The result objects from the \function{urlparse()} and
				188	\function{urlsplit()} functions are subclasses of the \pytype{tuple}
				189	type. These subclasses add the attributes described in those
				190	functions, as well as provide an additional method:
				191
				192	\begin{methoddesc}[ParseResult]{geturl}{}
				193	Return the re-combined version of the original URL as a string.
				194	This may differ from the original URL in that the scheme will always
				195	be normalized to lower case and empty components may be dropped.
				196	Specifically, empty parameters, queries, and fragment identifiers
				197	will be removed.
				198
				199	The result of this method is a fixpoint if passed back through the
				200	original parsing function:
				201
				202	\begin{verbatim}
				203	>>> import urlparse
				204	>>> url = 'HTTP://www.Python.org/doc/#'
				205
				206	>>> r1 = urlparse.urlsplit(url)
				207	>>> r1.geturl()
				208	'http://www.Python.org/doc/'
				209
				210	>>> r2 = urlparse.urlsplit(r1.geturl())
				211	>>> r2.geturl()
				212	'http://www.Python.org/doc/'
				213	\end{verbatim}
				214
				215	\versionadded{2.5}
				216	\end{methoddesc}
				217
				218	The following classes provide the implementations of the parse results::
				219
				220	\begin{classdesc*}{BaseResult}
				221	Base class for the concrete result classes. This provides most of
				222	the attribute definitions. It does not provide a \method{geturl()}
				223	method. It is derived from \class{tuple}, but does not override the
				224	\method{__init__()} or \method{__new__()} methods.
				225	\end{classdesc*}
				226
				227
				228	\begin{classdesc}{ParseResult}{scheme, netloc, path, params, query, fragment}
				229	Concrete class for \function{urlparse()} results. The
				230	\method{__new__()} method is overridden to support checking that the
				231	right number of arguments are passed.
				232	\end{classdesc}
				233
				234
				235	\begin{classdesc}{SplitResult}{scheme, netloc, path, query, fragment}
				236	Concrete class for \function{urlsplit()} results. The
				237	\method{__new__()} method is overridden to support checking that the
				238	right number of arguments are passed.
				239	\end{classdesc}