Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 1 | \section{Built-in module \sectcode{urlparse}} |
| 2 | \stmodindex{urlparse} |
| 3 | \index{WWW} |
| 4 | \indexii{World-Wide}{Web} |
| 5 | \index{URL} |
| 6 | \indexii{URL}{parsing} |
| 7 | \indexii{relative}{URL} |
| 8 | |
Guido van Rossum | 8675115 | 1995-02-28 17:14:32 +0000 | [diff] [blame^] | 9 | \renewcommand{\indexsubitem}{(in module urlparse)} |
| 10 | |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 11 | This module defines a standard interface to break URL strings up in |
| 12 | components (addessing scheme, network location, path etc.), to combine |
| 13 | the components back into a URL string, and to convert a ``relative |
| 14 | URL'' to an absolute URL given a ``base URL''. |
| 15 | |
| 16 | The module has been designed to match the current Internet draft on |
| 17 | Relative Uniform Resource Locators (and discovered a bug in an earlier |
| 18 | draft!). |
| 19 | |
| 20 | It defines the following functions: |
| 21 | |
| 22 | \begin{funcdesc}{urlparse}{urlstring\optional{\, |
| 23 | default_scheme\optional{\, allow_fragments}}} |
| 24 | Parse a URL into 6 components, returning a 6-tuple: (addressing |
| 25 | scheme, network location, path, parameters, query, fragment |
| 26 | identifier). This corresponds to the general structure of a URL: |
| 27 | \code{\var{scheme}://\var{netloc}/\var{path};\var{parameters}?\var{query}\#\var{fragment}}. |
| 28 | Each tuple item is a string, possibly empty. |
| 29 | The components are not broken up in smaller parts (e.g. the network |
| 30 | location is a single string), and \% escapes are not expanded. |
| 31 | The delimiters as shown above are not part of the tuple items, {\em |
| 32 | except} for a leading slash in the \var{path} component, which is |
| 33 | kept if present. |
| 34 | |
| 35 | Example: |
| 36 | \code{urlparse('http://www.cwi.nl:80/\%7eguido/Python.html')} |
| 37 | yields the tuple |
| 38 | \code{('http', 'www.cwi.nl:80', '/\%e7guido/Python.html', '', '', '')}. |
| 39 | |
| 40 | If the \var{default_scheme} argument is specified, it gives the |
| 41 | default addressing scheme, to be used only if the URL string does not |
| 42 | specify one. The default value for this argument is the empty string. |
| 43 | |
| 44 | If the \var{allow_fragments} argument is zero, fragment identifiers |
| 45 | are not allowed, even if the URL's addressing scheme normally does |
| 46 | support them. The default value for this argument is \code{1}. |
| 47 | \end{funcdesc} |
| 48 | |
| 49 | \begin{funcdesc}{urlunparse}{tuple} |
| 50 | Construct a URL string from a tuple as returned by \code{urlparse}. |
| 51 | This may result in a slightly different, but equivalent URL, if the |
| 52 | URL that was parsed originally had redundant delimiters, e.g. a ? with |
| 53 | an empty query (the draft states that these are equivalent). |
| 54 | \end{funcdesc} |
| 55 | |
| 56 | \begin{funcdesc}{urljoin}{base\, url\optional{\, allow_fragments}} |
| 57 | Construct a full (``absolute'') URL by combining a ``base URL'' |
| 58 | (\var{base}) with a ``relative URL'' (\var{url}). Informally, this |
| 59 | uses components of the base URL, in particular the addressing scheme, |
| 60 | the network location and (part of) the path, to provide missing |
| 61 | components in the relative URL. |
| 62 | |
| 63 | Example: |
| 64 | \code{urljoin('http://www.cwi.nl/\%7eguido/Python.html',} |
| 65 | \code{'FAQ.html')} yields the string |
| 66 | \code{'http://www.cwi.nl/\%7eguido/FAQ.html'}. |
| 67 | |
| 68 | The \var{allow_fragments} argument has the same meaning as for |
| 69 | \code{urlparse}. |
| 70 | \end{funcdesc} |