Guido van Rossum | 470be14 | 1995-03-17 16:07:09 +0000 | [diff] [blame] | 1 | \section{Standard Module \sectcode{urlparse}} |
Guido van Rossum | e47da0a | 1997-07-17 16:34:52 +0000 | [diff] [blame] | 2 | \label{module-urlparse} |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 3 | \stmodindex{urlparse} |
| 4 | \index{WWW} |
Guido van Rossum | 470be14 | 1995-03-17 16:07:09 +0000 | [diff] [blame] | 5 | \index{World-Wide Web} |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 6 | \index{URL} |
| 7 | \indexii{URL}{parsing} |
| 8 | \indexii{relative}{URL} |
| 9 | |
Guido van Rossum | 8675115 | 1995-02-28 17:14:32 +0000 | [diff] [blame] | 10 | \renewcommand{\indexsubitem}{(in module urlparse)} |
| 11 | |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 12 | This module defines a standard interface to break URL strings up in |
| 13 | components (addessing scheme, network location, path etc.), to combine |
| 14 | the components back into a URL string, and to convert a ``relative |
| 15 | URL'' to an absolute URL given a ``base URL''. |
| 16 | |
Fred Drake | d1cc9c2 | 1998-01-21 04:55:02 +0000 | [diff] [blame] | 17 | The module has been designed to match the Internet RFC on Relative |
| 18 | Uniform Resource Locators (and discovered a bug in an earlier |
| 19 | draft!). Refer to RFC 1808\index{RFC!1808} for details on relative |
| 20 | URLs and RFC 1738\index{RFC!1738} for information on basic URL |
| 21 | syntax. |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 22 | |
| 23 | It defines the following functions: |
| 24 | |
Fred Drake | 6884e3b | 1997-12-29 19:09:37 +0000 | [diff] [blame] | 25 | \begin{funcdesc}{urlparse}{urlstring\optional{, default_scheme\optional{, allow_fragments}}} |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 26 | Parse a URL into 6 components, returning a 6-tuple: (addressing |
| 27 | scheme, network location, path, parameters, query, fragment |
| 28 | identifier). This corresponds to the general structure of a URL: |
| 29 | \code{\var{scheme}://\var{netloc}/\var{path};\var{parameters}?\var{query}\#\var{fragment}}. |
| 30 | Each tuple item is a string, possibly empty. |
| 31 | The components are not broken up in smaller parts (e.g. the network |
| 32 | location is a single string), and \% escapes are not expanded. |
Guido van Rossum | 470be14 | 1995-03-17 16:07:09 +0000 | [diff] [blame] | 33 | The delimiters as shown above are not part of the tuple items, |
| 34 | except for a leading slash in the \var{path} component, which is |
| 35 | retained if present. |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 36 | |
| 37 | Example: |
Guido van Rossum | 96628a9 | 1995-04-10 11:34:00 +0000 | [diff] [blame] | 38 | |
Guido van Rossum | e47da0a | 1997-07-17 16:34:52 +0000 | [diff] [blame] | 39 | \bcode\begin{verbatim} |
Guido van Rossum | 96628a9 | 1995-04-10 11:34:00 +0000 | [diff] [blame] | 40 | urlparse('http://www.cwi.nl:80/%7Eguido/Python.html') |
Guido van Rossum | e47da0a | 1997-07-17 16:34:52 +0000 | [diff] [blame] | 41 | \end{verbatim}\ecode |
| 42 | % |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 43 | yields the tuple |
Guido van Rossum | 96628a9 | 1995-04-10 11:34:00 +0000 | [diff] [blame] | 44 | |
Guido van Rossum | e47da0a | 1997-07-17 16:34:52 +0000 | [diff] [blame] | 45 | \bcode\begin{verbatim} |
Guido van Rossum | 96628a9 | 1995-04-10 11:34:00 +0000 | [diff] [blame] | 46 | ('http', 'www.cwi.nl:80', '/%7Eguido/Python.html', '', '', '') |
Guido van Rossum | e47da0a | 1997-07-17 16:34:52 +0000 | [diff] [blame] | 47 | \end{verbatim}\ecode |
| 48 | % |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 49 | If the \var{default_scheme} argument is specified, it gives the |
| 50 | default addressing scheme, to be used only if the URL string does not |
| 51 | specify one. The default value for this argument is the empty string. |
| 52 | |
| 53 | If the \var{allow_fragments} argument is zero, fragment identifiers |
| 54 | are not allowed, even if the URL's addressing scheme normally does |
| 55 | support them. The default value for this argument is \code{1}. |
| 56 | \end{funcdesc} |
| 57 | |
| 58 | \begin{funcdesc}{urlunparse}{tuple} |
Fred Drake | d1cc9c2 | 1998-01-21 04:55:02 +0000 | [diff] [blame] | 59 | Construct a URL string from a tuple as returned by \code{urlparse()}. |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 60 | This may result in a slightly different, but equivalent URL, if the |
| 61 | URL that was parsed originally had redundant delimiters, e.g. a ? with |
| 62 | an empty query (the draft states that these are equivalent). |
| 63 | \end{funcdesc} |
| 64 | |
| 65 | \begin{funcdesc}{urljoin}{base\, url\optional{\, allow_fragments}} |
| 66 | Construct a full (``absolute'') URL by combining a ``base URL'' |
| 67 | (\var{base}) with a ``relative URL'' (\var{url}). Informally, this |
| 68 | uses components of the base URL, in particular the addressing scheme, |
| 69 | the network location and (part of) the path, to provide missing |
| 70 | components in the relative URL. |
| 71 | |
| 72 | Example: |
Guido van Rossum | 96628a9 | 1995-04-10 11:34:00 +0000 | [diff] [blame] | 73 | |
Guido van Rossum | e47da0a | 1997-07-17 16:34:52 +0000 | [diff] [blame] | 74 | \bcode\begin{verbatim} |
Guido van Rossum | 96628a9 | 1995-04-10 11:34:00 +0000 | [diff] [blame] | 75 | urljoin('http://www.cwi.nl/%7Eguido/Python.html', 'FAQ.html') |
Guido van Rossum | e47da0a | 1997-07-17 16:34:52 +0000 | [diff] [blame] | 76 | \end{verbatim}\ecode |
| 77 | % |
Guido van Rossum | 96628a9 | 1995-04-10 11:34:00 +0000 | [diff] [blame] | 78 | yields the string |
| 79 | |
Guido van Rossum | e47da0a | 1997-07-17 16:34:52 +0000 | [diff] [blame] | 80 | \bcode\begin{verbatim} |
Guido van Rossum | 96628a9 | 1995-04-10 11:34:00 +0000 | [diff] [blame] | 81 | 'http://www.cwi.nl/%7Eguido/FAQ.html' |
Guido van Rossum | e47da0a | 1997-07-17 16:34:52 +0000 | [diff] [blame] | 82 | \end{verbatim}\ecode |
| 83 | % |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 84 | The \var{allow_fragments} argument has the same meaning as for |
Fred Drake | d1cc9c2 | 1998-01-21 04:55:02 +0000 | [diff] [blame] | 85 | \code{urlparse()}. |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 86 | \end{funcdesc} |