Fred Drake | 3a0351c | 1998-04-04 07:23:21 +0000 | [diff] [blame] | 1 | \section{Standard Module \module{urlparse}} |
Fred Drake | b91e934 | 1998-07-23 17:59:49 +0000 | [diff] [blame] | 2 | \declaremodule{standard}{urlparse} |
| 3 | |
Fred Drake | 72d157e | 1998-08-06 21:23:17 +0000 | [diff] [blame^] | 4 | \modulesynopsis{Parse URLs into components.} |
Fred Drake | b91e934 | 1998-07-23 17:59:49 +0000 | [diff] [blame] | 5 | |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 6 | \index{WWW} |
Guido van Rossum | 470be14 | 1995-03-17 16:07:09 +0000 | [diff] [blame] | 7 | \index{World-Wide Web} |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 8 | \index{URL} |
| 9 | \indexii{URL}{parsing} |
| 10 | \indexii{relative}{URL} |
| 11 | |
Guido van Rossum | 8675115 | 1995-02-28 17:14:32 +0000 | [diff] [blame] | 12 | |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 13 | This module defines a standard interface to break URL strings up in |
| 14 | components (addessing scheme, network location, path etc.), to combine |
| 15 | the components back into a URL string, and to convert a ``relative |
Fred Drake | 72d157e | 1998-08-06 21:23:17 +0000 | [diff] [blame^] | 16 | URL'' to an absolute URL given a ``base URL.'' |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 17 | |
Fred Drake | d1cc9c2 | 1998-01-21 04:55:02 +0000 | [diff] [blame] | 18 | The module has been designed to match the Internet RFC on Relative |
| 19 | Uniform Resource Locators (and discovered a bug in an earlier |
Fred Drake | c589124 | 1998-02-09 19:16:20 +0000 | [diff] [blame] | 20 | draft!). Refer to \rfc{1808} for details on relative |
| 21 | URLs and \rfc{1738} for information on basic URL syntax. |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 22 | |
| 23 | It defines the following functions: |
| 24 | |
Fred Drake | 6884e3b | 1997-12-29 19:09:37 +0000 | [diff] [blame] | 25 | \begin{funcdesc}{urlparse}{urlstring\optional{, default_scheme\optional{, allow_fragments}}} |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 26 | Parse a URL into 6 components, returning a 6-tuple: (addressing |
| 27 | scheme, network location, path, parameters, query, fragment |
| 28 | identifier). This corresponds to the general structure of a URL: |
| 29 | \code{\var{scheme}://\var{netloc}/\var{path};\var{parameters}?\var{query}\#\var{fragment}}. |
| 30 | Each tuple item is a string, possibly empty. |
| 31 | The components are not broken up in smaller parts (e.g. the network |
| 32 | location is a single string), and \% escapes are not expanded. |
Guido van Rossum | 470be14 | 1995-03-17 16:07:09 +0000 | [diff] [blame] | 33 | The delimiters as shown above are not part of the tuple items, |
| 34 | except for a leading slash in the \var{path} component, which is |
| 35 | retained if present. |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 36 | |
| 37 | Example: |
Guido van Rossum | 96628a9 | 1995-04-10 11:34:00 +0000 | [diff] [blame] | 38 | |
Fred Drake | 1947991 | 1998-02-13 06:58:54 +0000 | [diff] [blame] | 39 | \begin{verbatim} |
Guido van Rossum | 96628a9 | 1995-04-10 11:34:00 +0000 | [diff] [blame] | 40 | urlparse('http://www.cwi.nl:80/%7Eguido/Python.html') |
Fred Drake | 1947991 | 1998-02-13 06:58:54 +0000 | [diff] [blame] | 41 | \end{verbatim} |
Guido van Rossum | e47da0a | 1997-07-17 16:34:52 +0000 | [diff] [blame] | 42 | % |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 43 | yields the tuple |
Guido van Rossum | 96628a9 | 1995-04-10 11:34:00 +0000 | [diff] [blame] | 44 | |
Fred Drake | 1947991 | 1998-02-13 06:58:54 +0000 | [diff] [blame] | 45 | \begin{verbatim} |
Guido van Rossum | 96628a9 | 1995-04-10 11:34:00 +0000 | [diff] [blame] | 46 | ('http', 'www.cwi.nl:80', '/%7Eguido/Python.html', '', '', '') |
Fred Drake | 1947991 | 1998-02-13 06:58:54 +0000 | [diff] [blame] | 47 | \end{verbatim} |
Guido van Rossum | e47da0a | 1997-07-17 16:34:52 +0000 | [diff] [blame] | 48 | % |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 49 | If the \var{default_scheme} argument is specified, it gives the |
| 50 | default addressing scheme, to be used only if the URL string does not |
| 51 | specify one. The default value for this argument is the empty string. |
| 52 | |
| 53 | If the \var{allow_fragments} argument is zero, fragment identifiers |
| 54 | are not allowed, even if the URL's addressing scheme normally does |
| 55 | support them. The default value for this argument is \code{1}. |
| 56 | \end{funcdesc} |
| 57 | |
| 58 | \begin{funcdesc}{urlunparse}{tuple} |
Fred Drake | d1cc9c2 | 1998-01-21 04:55:02 +0000 | [diff] [blame] | 59 | Construct a URL string from a tuple as returned by \code{urlparse()}. |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 60 | This may result in a slightly different, but equivalent URL, if the |
| 61 | URL that was parsed originally had redundant delimiters, e.g. a ? with |
| 62 | an empty query (the draft states that these are equivalent). |
| 63 | \end{funcdesc} |
| 64 | |
Fred Drake | cce1090 | 1998-03-17 06:33:25 +0000 | [diff] [blame] | 65 | \begin{funcdesc}{urljoin}{base, url\optional{, allow_fragments}} |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 66 | Construct a full (``absolute'') URL by combining a ``base URL'' |
| 67 | (\var{base}) with a ``relative URL'' (\var{url}). Informally, this |
| 68 | uses components of the base URL, in particular the addressing scheme, |
| 69 | the network location and (part of) the path, to provide missing |
| 70 | components in the relative URL. |
| 71 | |
| 72 | Example: |
Guido van Rossum | 96628a9 | 1995-04-10 11:34:00 +0000 | [diff] [blame] | 73 | |
Fred Drake | 1947991 | 1998-02-13 06:58:54 +0000 | [diff] [blame] | 74 | \begin{verbatim} |
Guido van Rossum | 96628a9 | 1995-04-10 11:34:00 +0000 | [diff] [blame] | 75 | urljoin('http://www.cwi.nl/%7Eguido/Python.html', 'FAQ.html') |
Fred Drake | 1947991 | 1998-02-13 06:58:54 +0000 | [diff] [blame] | 76 | \end{verbatim} |
Guido van Rossum | e47da0a | 1997-07-17 16:34:52 +0000 | [diff] [blame] | 77 | % |
Guido van Rossum | 96628a9 | 1995-04-10 11:34:00 +0000 | [diff] [blame] | 78 | yields the string |
| 79 | |
Fred Drake | 1947991 | 1998-02-13 06:58:54 +0000 | [diff] [blame] | 80 | \begin{verbatim} |
Guido van Rossum | 96628a9 | 1995-04-10 11:34:00 +0000 | [diff] [blame] | 81 | 'http://www.cwi.nl/%7Eguido/FAQ.html' |
Fred Drake | 1947991 | 1998-02-13 06:58:54 +0000 | [diff] [blame] | 82 | \end{verbatim} |
Guido van Rossum | e47da0a | 1997-07-17 16:34:52 +0000 | [diff] [blame] | 83 | % |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 84 | The \var{allow_fragments} argument has the same meaning as for |
Fred Drake | d1cc9c2 | 1998-01-21 04:55:02 +0000 | [diff] [blame] | 85 | \code{urlparse()}. |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 86 | \end{funcdesc} |