Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 1 | :mod:`urllib.parse` --- Parse URLs into components |
| 2 | ================================================== |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 3 | |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 4 | .. module:: urllib.parse |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 5 | :synopsis: Parse URLs into or assemble them from components. |
| 6 | |
| 7 | |
| 8 | .. index:: |
| 9 | single: WWW |
| 10 | single: World Wide Web |
| 11 | single: URL |
| 12 | pair: URL; parsing |
| 13 | pair: relative; URL |
| 14 | |
| 15 | This module defines a standard interface to break Uniform Resource Locator (URL) |
| 16 | strings up in components (addressing scheme, network location, path etc.), to |
| 17 | combine the components back into a URL string, and to convert a "relative URL" |
| 18 | to an absolute URL given a "base URL." |
| 19 | |
| 20 | The module has been designed to match the Internet RFC on Relative Uniform |
| 21 | Resource Locators (and discovered a bug in an earlier draft!). It supports the |
| 22 | following URL schemes: ``file``, ``ftp``, ``gopher``, ``hdl``, ``http``, |
Georg Brandl | 0f7ede4 | 2008-06-23 11:23:31 +0000 | [diff] [blame] | 23 | ``https``, ``imap``, ``mailto``, ``mms``, ``news``, ``nntp``, ``prospero``, |
| 24 | ``rsync``, ``rtsp``, ``rtspu``, ``sftp``, ``shttp``, ``sip``, ``sips``, |
| 25 | ``snews``, ``svn``, ``svn+ssh``, ``telnet``, ``wais``. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 26 | |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 27 | The :mod:`urllib.parse` module defines the following functions: |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 28 | |
R. David Murray | 341fe91 | 2010-05-25 15:54:24 +0000 | [diff] [blame] | 29 | .. function:: urlparse(urlstring, scheme='', allow_fragments=True) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 30 | |
| 31 | Parse a URL into six components, returning a 6-tuple. This corresponds to the |
| 32 | general structure of a URL: ``scheme://netloc/path;parameters?query#fragment``. |
| 33 | Each tuple item is a string, possibly empty. The components are not broken up in |
| 34 | smaller parts (for example, the network location is a single string), and % |
| 35 | escapes are not expanded. The delimiters as shown above are not part of the |
| 36 | result, except for a leading slash in the *path* component, which is retained if |
Christian Heimes | fe337bf | 2008-03-23 21:54:12 +0000 | [diff] [blame] | 37 | present. For example: |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 38 | |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 39 | >>> from urllib.parse import urlparse |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 40 | >>> o = urlparse('http://www.cwi.nl:80/%7Eguido/Python.html') |
Christian Heimes | fe337bf | 2008-03-23 21:54:12 +0000 | [diff] [blame] | 41 | >>> o # doctest: +NORMALIZE_WHITESPACE |
| 42 | ParseResult(scheme='http', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html', |
| 43 | params='', query='', fragment='') |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 44 | >>> o.scheme |
| 45 | 'http' |
| 46 | >>> o.port |
| 47 | 80 |
| 48 | >>> o.geturl() |
| 49 | 'http://www.cwi.nl:80/%7Eguido/Python.html' |
| 50 | |
Senthil Kumaran | 8801f7a | 2010-08-04 04:53:07 +0000 | [diff] [blame] | 51 | If the scheme value is not specified, urlparse following the syntax |
| 52 | specifications from RFC 1808, expects the netloc value to start with '//', |
| 53 | Otherwise, it is not possible to distinguish between net_loc and path |
| 54 | component and would classify the indistinguishable component as path as in |
| 55 | a relative url. |
| 56 | |
| 57 | >>> from urlparse import urlparse |
| 58 | >>> urlparse('//www.cwi.nl:80/%7Eguido/Python.html') |
| 59 | ParseResult(scheme='', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html', |
| 60 | params='', query='', fragment='') |
| 61 | >>> urlparse('www.cwi.nl:80/%7Eguido/Python.html') |
| 62 | ParseResult(scheme='', netloc='', path='www.cwi.nl:80/%7Eguido/Python.html', |
| 63 | params='', query='', fragment='') |
| 64 | >>> urlparse('help/Python.html') |
| 65 | ParseResult(scheme='', netloc='', path='help/Python.html', params='', |
| 66 | query='', fragment='') |
| 67 | |
R. David Murray | 341fe91 | 2010-05-25 15:54:24 +0000 | [diff] [blame] | 68 | If the *scheme* argument is specified, it gives the default addressing |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 69 | scheme, to be used only if the URL does not specify one. The default value for |
| 70 | this argument is the empty string. |
| 71 | |
| 72 | If the *allow_fragments* argument is false, fragment identifiers are not |
| 73 | allowed, even if the URL's addressing scheme normally does support them. The |
| 74 | default value for this argument is :const:`True`. |
| 75 | |
| 76 | The return value is actually an instance of a subclass of :class:`tuple`. This |
| 77 | class has the following additional read-only convenience attributes: |
| 78 | |
| 79 | +------------------+-------+--------------------------+----------------------+ |
| 80 | | Attribute | Index | Value | Value if not present | |
| 81 | +==================+=======+==========================+======================+ |
| 82 | | :attr:`scheme` | 0 | URL scheme specifier | empty string | |
| 83 | +------------------+-------+--------------------------+----------------------+ |
| 84 | | :attr:`netloc` | 1 | Network location part | empty string | |
| 85 | +------------------+-------+--------------------------+----------------------+ |
| 86 | | :attr:`path` | 2 | Hierarchical path | empty string | |
| 87 | +------------------+-------+--------------------------+----------------------+ |
| 88 | | :attr:`params` | 3 | Parameters for last path | empty string | |
| 89 | | | | element | | |
| 90 | +------------------+-------+--------------------------+----------------------+ |
| 91 | | :attr:`query` | 4 | Query component | empty string | |
| 92 | +------------------+-------+--------------------------+----------------------+ |
| 93 | | :attr:`fragment` | 5 | Fragment identifier | empty string | |
| 94 | +------------------+-------+--------------------------+----------------------+ |
| 95 | | :attr:`username` | | User name | :const:`None` | |
| 96 | +------------------+-------+--------------------------+----------------------+ |
| 97 | | :attr:`password` | | Password | :const:`None` | |
| 98 | +------------------+-------+--------------------------+----------------------+ |
| 99 | | :attr:`hostname` | | Host name (lower case) | :const:`None` | |
| 100 | +------------------+-------+--------------------------+----------------------+ |
| 101 | | :attr:`port` | | Port number as integer, | :const:`None` | |
| 102 | | | | if present | | |
| 103 | +------------------+-------+--------------------------+----------------------+ |
| 104 | |
| 105 | See section :ref:`urlparse-result-object` for more information on the result |
| 106 | object. |
| 107 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 108 | |
Georg Brandl | b044b2a | 2009-09-16 16:05:59 +0000 | [diff] [blame] | 109 | .. function:: parse_qs(qs, keep_blank_values=False, strict_parsing=False) |
Facundo Batista | c469d4c | 2008-09-03 22:49:01 +0000 | [diff] [blame] | 110 | |
| 111 | Parse a query string given as a string argument (data of type |
| 112 | :mimetype:`application/x-www-form-urlencoded`). Data are returned as a |
| 113 | dictionary. The dictionary keys are the unique query variable names and the |
| 114 | values are lists of values for each name. |
| 115 | |
| 116 | The optional argument *keep_blank_values* is a flag indicating whether blank |
Senthil Kumaran | ea54b03 | 2010-08-09 20:05:35 +0000 | [diff] [blame] | 117 | values in percent-encoded queries should be treated as blank strings. A true value |
Facundo Batista | c469d4c | 2008-09-03 22:49:01 +0000 | [diff] [blame] | 118 | indicates that blanks should be retained as blank strings. The default false |
| 119 | value indicates that blank values are to be ignored and treated as if they were |
| 120 | not included. |
| 121 | |
| 122 | The optional argument *strict_parsing* is a flag indicating what to do with |
| 123 | parsing errors. If false (the default), errors are silently ignored. If true, |
| 124 | errors raise a :exc:`ValueError` exception. |
| 125 | |
Georg Brandl | 7fe2c4a | 2008-12-05 07:32:56 +0000 | [diff] [blame] | 126 | Use the :func:`urllib.parse.urlencode` function to convert such |
| 127 | dictionaries into query strings. |
Facundo Batista | c469d4c | 2008-09-03 22:49:01 +0000 | [diff] [blame] | 128 | |
| 129 | |
Georg Brandl | b044b2a | 2009-09-16 16:05:59 +0000 | [diff] [blame] | 130 | .. function:: parse_qsl(qs, keep_blank_values=False, strict_parsing=False) |
Facundo Batista | c469d4c | 2008-09-03 22:49:01 +0000 | [diff] [blame] | 131 | |
| 132 | Parse a query string given as a string argument (data of type |
| 133 | :mimetype:`application/x-www-form-urlencoded`). Data are returned as a list of |
| 134 | name, value pairs. |
| 135 | |
| 136 | The optional argument *keep_blank_values* is a flag indicating whether blank |
Senthil Kumaran | ea54b03 | 2010-08-09 20:05:35 +0000 | [diff] [blame] | 137 | values in percent-encoded queries should be treated as blank strings. A true value |
Facundo Batista | c469d4c | 2008-09-03 22:49:01 +0000 | [diff] [blame] | 138 | indicates that blanks should be retained as blank strings. The default false |
| 139 | value indicates that blank values are to be ignored and treated as if they were |
| 140 | not included. |
| 141 | |
| 142 | The optional argument *strict_parsing* is a flag indicating what to do with |
| 143 | parsing errors. If false (the default), errors are silently ignored. If true, |
| 144 | errors raise a :exc:`ValueError` exception. |
| 145 | |
| 146 | Use the :func:`urllib.parse.urlencode` function to convert such lists of pairs into |
| 147 | query strings. |
| 148 | |
| 149 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 150 | .. function:: urlunparse(parts) |
| 151 | |
Georg Brandl | 0f7ede4 | 2008-06-23 11:23:31 +0000 | [diff] [blame] | 152 | Construct a URL from a tuple as returned by ``urlparse()``. The *parts* |
| 153 | argument can be any six-item iterable. This may result in a slightly |
| 154 | different, but equivalent URL, if the URL that was parsed originally had |
| 155 | unnecessary delimiters (for example, a ``?`` with an empty query; the RFC |
| 156 | states that these are equivalent). |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 157 | |
| 158 | |
R. David Murray | 341fe91 | 2010-05-25 15:54:24 +0000 | [diff] [blame] | 159 | .. function:: urlsplit(urlstring, scheme='', allow_fragments=True) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 160 | |
| 161 | This is similar to :func:`urlparse`, but does not split the params from the URL. |
| 162 | This should generally be used instead of :func:`urlparse` if the more recent URL |
| 163 | syntax allowing parameters to be applied to each segment of the *path* portion |
| 164 | of the URL (see :rfc:`2396`) is wanted. A separate function is needed to |
| 165 | separate the path segments and parameters. This function returns a 5-tuple: |
| 166 | (addressing scheme, network location, path, query, fragment identifier). |
| 167 | |
| 168 | The return value is actually an instance of a subclass of :class:`tuple`. This |
| 169 | class has the following additional read-only convenience attributes: |
| 170 | |
| 171 | +------------------+-------+-------------------------+----------------------+ |
| 172 | | Attribute | Index | Value | Value if not present | |
| 173 | +==================+=======+=========================+======================+ |
| 174 | | :attr:`scheme` | 0 | URL scheme specifier | empty string | |
| 175 | +------------------+-------+-------------------------+----------------------+ |
| 176 | | :attr:`netloc` | 1 | Network location part | empty string | |
| 177 | +------------------+-------+-------------------------+----------------------+ |
| 178 | | :attr:`path` | 2 | Hierarchical path | empty string | |
| 179 | +------------------+-------+-------------------------+----------------------+ |
| 180 | | :attr:`query` | 3 | Query component | empty string | |
| 181 | +------------------+-------+-------------------------+----------------------+ |
| 182 | | :attr:`fragment` | 4 | Fragment identifier | empty string | |
| 183 | +------------------+-------+-------------------------+----------------------+ |
| 184 | | :attr:`username` | | User name | :const:`None` | |
| 185 | +------------------+-------+-------------------------+----------------------+ |
| 186 | | :attr:`password` | | Password | :const:`None` | |
| 187 | +------------------+-------+-------------------------+----------------------+ |
| 188 | | :attr:`hostname` | | Host name (lower case) | :const:`None` | |
| 189 | +------------------+-------+-------------------------+----------------------+ |
| 190 | | :attr:`port` | | Port number as integer, | :const:`None` | |
| 191 | | | | if present | | |
| 192 | +------------------+-------+-------------------------+----------------------+ |
| 193 | |
| 194 | See section :ref:`urlparse-result-object` for more information on the result |
| 195 | object. |
| 196 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 197 | |
| 198 | .. function:: urlunsplit(parts) |
| 199 | |
Georg Brandl | 0f7ede4 | 2008-06-23 11:23:31 +0000 | [diff] [blame] | 200 | Combine the elements of a tuple as returned by :func:`urlsplit` into a |
| 201 | complete URL as a string. The *parts* argument can be any five-item |
| 202 | iterable. This may result in a slightly different, but equivalent URL, if the |
| 203 | URL that was parsed originally had unnecessary delimiters (for example, a ? |
| 204 | with an empty query; the RFC states that these are equivalent). |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 205 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 206 | |
Georg Brandl | b044b2a | 2009-09-16 16:05:59 +0000 | [diff] [blame] | 207 | .. function:: urljoin(base, url, allow_fragments=True) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 208 | |
| 209 | Construct a full ("absolute") URL by combining a "base URL" (*base*) with |
| 210 | another URL (*url*). Informally, this uses components of the base URL, in |
Georg Brandl | 0f7ede4 | 2008-06-23 11:23:31 +0000 | [diff] [blame] | 211 | particular the addressing scheme, the network location and (part of) the |
| 212 | path, to provide missing components in the relative URL. For example: |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 213 | |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 214 | >>> from urllib.parse import urljoin |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 215 | >>> urljoin('http://www.cwi.nl/%7Eguido/Python.html', 'FAQ.html') |
| 216 | 'http://www.cwi.nl/%7Eguido/FAQ.html' |
| 217 | |
| 218 | The *allow_fragments* argument has the same meaning and default as for |
| 219 | :func:`urlparse`. |
| 220 | |
| 221 | .. note:: |
| 222 | |
| 223 | If *url* is an absolute URL (that is, starting with ``//`` or ``scheme://``), |
| 224 | the *url*'s host name and/or scheme will be present in the result. For example: |
| 225 | |
Christian Heimes | fe337bf | 2008-03-23 21:54:12 +0000 | [diff] [blame] | 226 | .. doctest:: |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 227 | |
| 228 | >>> urljoin('http://www.cwi.nl/%7Eguido/Python.html', |
| 229 | ... '//www.python.org/%7Eguido') |
| 230 | 'http://www.python.org/%7Eguido' |
| 231 | |
| 232 | If you do not want that behavior, preprocess the *url* with :func:`urlsplit` and |
| 233 | :func:`urlunsplit`, removing possible *scheme* and *netloc* parts. |
| 234 | |
| 235 | |
| 236 | .. function:: urldefrag(url) |
| 237 | |
Georg Brandl | 0f7ede4 | 2008-06-23 11:23:31 +0000 | [diff] [blame] | 238 | If *url* contains a fragment identifier, return a modified version of *url* |
| 239 | with no fragment identifier, and the fragment identifier as a separate |
| 240 | string. If there is no fragment identifier in *url*, return *url* unmodified |
| 241 | and an empty string. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 242 | |
Georg Brandl | b044b2a | 2009-09-16 16:05:59 +0000 | [diff] [blame] | 243 | |
| 244 | .. function:: quote(string, safe='/', encoding=None, errors=None) |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 245 | |
| 246 | Replace special characters in *string* using the ``%xx`` escape. Letters, |
Georg Brandl | 22fff43 | 2009-10-27 20:19:02 +0000 | [diff] [blame] | 247 | digits, and the characters ``'_.-'`` are never quoted. By default, this |
| 248 | function is intended for quoting the path section of URL. The optional *safe* |
Guido van Rossum | 52dbbb9 | 2008-08-18 21:44:30 +0000 | [diff] [blame] | 249 | parameter specifies additional ASCII characters that should not be quoted |
| 250 | --- its default value is ``'/'``. |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 251 | |
Guido van Rossum | 52dbbb9 | 2008-08-18 21:44:30 +0000 | [diff] [blame] | 252 | *string* may be either a :class:`str` or a :class:`bytes`. |
| 253 | |
| 254 | The optional *encoding* and *errors* parameters specify how to deal with |
| 255 | non-ASCII characters, as accepted by the :meth:`str.encode` method. |
| 256 | *encoding* defaults to ``'utf-8'``. |
| 257 | *errors* defaults to ``'strict'``, meaning unsupported characters raise a |
| 258 | :class:`UnicodeEncodeError`. |
| 259 | *encoding* and *errors* must not be supplied if *string* is a |
| 260 | :class:`bytes`, or a :class:`TypeError` is raised. |
| 261 | |
| 262 | Note that ``quote(string, safe, encoding, errors)`` is equivalent to |
| 263 | ``quote_from_bytes(string.encode(encoding, errors), safe)``. |
| 264 | |
| 265 | Example: ``quote('/El Niño/')`` yields ``'/El%20Ni%C3%B1o/'``. |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 266 | |
| 267 | |
Georg Brandl | b044b2a | 2009-09-16 16:05:59 +0000 | [diff] [blame] | 268 | .. function:: quote_plus(string, safe='', encoding=None, errors=None) |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 269 | |
Georg Brandl | 0f7ede4 | 2008-06-23 11:23:31 +0000 | [diff] [blame] | 270 | Like :func:`quote`, but also replace spaces by plus signs, as required for |
Georg Brandl | c5605df | 2009-08-13 08:26:44 +0000 | [diff] [blame] | 271 | quoting HTML form values when building up a query string to go into a URL. |
| 272 | Plus signs in the original string are escaped unless they are included in |
| 273 | *safe*. It also does not have *safe* default to ``'/'``. |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 274 | |
Guido van Rossum | 52dbbb9 | 2008-08-18 21:44:30 +0000 | [diff] [blame] | 275 | Example: ``quote_plus('/El Niño/')`` yields ``'%2FEl+Ni%C3%B1o%2F'``. |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 276 | |
Georg Brandl | b044b2a | 2009-09-16 16:05:59 +0000 | [diff] [blame] | 277 | |
| 278 | .. function:: quote_from_bytes(bytes, safe='/') |
Guido van Rossum | 52dbbb9 | 2008-08-18 21:44:30 +0000 | [diff] [blame] | 279 | |
| 280 | Like :func:`quote`, but accepts a :class:`bytes` object rather than a |
| 281 | :class:`str`, and does not perform string-to-bytes encoding. |
| 282 | |
| 283 | Example: ``quote_from_bytes(b'a&\xef')`` yields |
| 284 | ``'a%26%EF'``. |
| 285 | |
Georg Brandl | b044b2a | 2009-09-16 16:05:59 +0000 | [diff] [blame] | 286 | |
| 287 | .. function:: unquote(string, encoding='utf-8', errors='replace') |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 288 | |
| 289 | Replace ``%xx`` escapes by their single-character equivalent. |
Guido van Rossum | 52dbbb9 | 2008-08-18 21:44:30 +0000 | [diff] [blame] | 290 | The optional *encoding* and *errors* parameters specify how to decode |
| 291 | percent-encoded sequences into Unicode characters, as accepted by the |
| 292 | :meth:`bytes.decode` method. |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 293 | |
Guido van Rossum | 52dbbb9 | 2008-08-18 21:44:30 +0000 | [diff] [blame] | 294 | *string* must be a :class:`str`. |
| 295 | |
| 296 | *encoding* defaults to ``'utf-8'``. |
| 297 | *errors* defaults to ``'replace'``, meaning invalid sequences are replaced |
| 298 | by a placeholder character. |
| 299 | |
| 300 | Example: ``unquote('/El%20Ni%C3%B1o/')`` yields ``'/El Niño/'``. |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 301 | |
| 302 | |
Georg Brandl | b044b2a | 2009-09-16 16:05:59 +0000 | [diff] [blame] | 303 | .. function:: unquote_plus(string, encoding='utf-8', errors='replace') |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 304 | |
Georg Brandl | 0f7ede4 | 2008-06-23 11:23:31 +0000 | [diff] [blame] | 305 | Like :func:`unquote`, but also replace plus signs by spaces, as required for |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 306 | unquoting HTML form values. |
| 307 | |
Guido van Rossum | 52dbbb9 | 2008-08-18 21:44:30 +0000 | [diff] [blame] | 308 | *string* must be a :class:`str`. |
| 309 | |
| 310 | Example: ``unquote_plus('/El+Ni%C3%B1o/')`` yields ``'/El Niño/'``. |
| 311 | |
Georg Brandl | b044b2a | 2009-09-16 16:05:59 +0000 | [diff] [blame] | 312 | |
Guido van Rossum | 52dbbb9 | 2008-08-18 21:44:30 +0000 | [diff] [blame] | 313 | .. function:: unquote_to_bytes(string) |
| 314 | |
| 315 | Replace ``%xx`` escapes by their single-octet equivalent, and return a |
| 316 | :class:`bytes` object. |
| 317 | |
| 318 | *string* may be either a :class:`str` or a :class:`bytes`. |
| 319 | |
| 320 | If it is a :class:`str`, unescaped non-ASCII characters in *string* |
| 321 | are encoded into UTF-8 bytes. |
| 322 | |
| 323 | Example: ``unquote_to_bytes('a%26%EF')`` yields |
| 324 | ``b'a&\xef'``. |
| 325 | |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 326 | |
Senthil Kumaran | fe1ad15 | 2010-07-03 17:55:41 +0000 | [diff] [blame] | 327 | .. function:: urlencode(query, doseq=False, safe='', encoding=None, errors=None) |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 328 | |
Senthil Kumaran | 8d38687 | 2010-07-03 18:04:31 +0000 | [diff] [blame] | 329 | Convert a mapping object or a sequence of two-element tuples, which may |
Senthil Kumaran | ea54b03 | 2010-08-09 20:05:35 +0000 | [diff] [blame] | 330 | either be a :class:`str` or a :class:`bytes`, to a "percent-encoded" string, |
Senthil Kumaran | fe1ad15 | 2010-07-03 17:55:41 +0000 | [diff] [blame] | 331 | suitable to pass to :func:`urlopen` above as the optional *data* argument. |
| 332 | This is useful to pass a dictionary of form fields to a ``POST`` request. |
| 333 | The resulting string is a series of ``key=value`` pairs separated by ``'&'`` |
| 334 | characters, where both *key* and *value* are quoted using :func:`quote_plus` |
| 335 | above. When a sequence of two-element tuples is used as the *query* |
| 336 | argument, the first element of each tuple is a key and the second is a |
| 337 | value. The value element in itself can be a sequence and in that case, if |
| 338 | the optional parameter *doseq* is evaluates to *True*, individual |
| 339 | ``key=value`` pairs separated by ``'&'`` are generated for each element of |
| 340 | the value sequence for the key. The order of parameters in the encoded |
| 341 | string will match the order of parameter tuples in the sequence. This module |
| 342 | provides the functions :func:`parse_qs` and :func:`parse_qsl` which are used |
| 343 | to parse query strings into Python data structures. |
| 344 | |
| 345 | When *query* parameter is a :class:`str`, the *safe*, *encoding* and *error* |
| 346 | parameters are sent the :func:`quote_plus` for encoding. |
| 347 | |
| 348 | .. versionchanged:: 3.2 |
| 349 | query paramater supports bytes and string. |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 350 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 351 | |
| 352 | .. seealso:: |
| 353 | |
Senthil Kumaran | 679b7b8 | 2010-04-22 05:59:54 +0000 | [diff] [blame] | 354 | :rfc:`3986` - Uniform Resource Identifiers |
| 355 | This is the current standard (STD66). Any changes to urlparse module |
| 356 | should conform to this. Certain deviations could be observed, which are |
| 357 | mostly due backward compatiblity purposes and for certain to de-facto |
| 358 | parsing requirements as commonly observed in major browsers. |
| 359 | |
| 360 | :rfc:`2396` - Uniform Resource Identifiers (URI): Generic Syntax |
| 361 | Document describing the generic syntactic requirements for both Uniform Resource |
| 362 | Names (URNs) and Uniform Resource Locators (URLs). |
| 363 | |
| 364 | :rfc:`2368` - The mailto URL scheme. |
| 365 | Parsing requirements for mailto url schemes. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 366 | |
| 367 | :rfc:`1808` - Relative Uniform Resource Locators |
| 368 | This Request For Comments includes the rules for joining an absolute and a |
| 369 | relative URL, including a fair number of "Abnormal Examples" which govern the |
| 370 | treatment of border cases. |
| 371 | |
Senthil Kumaran | 679b7b8 | 2010-04-22 05:59:54 +0000 | [diff] [blame] | 372 | :rfc:`1738` - Uniform Resource Locators (URL) |
| 373 | This specifies the formal syntax and semantics of absolute URLs. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 374 | |
| 375 | |
| 376 | .. _urlparse-result-object: |
| 377 | |
| 378 | Results of :func:`urlparse` and :func:`urlsplit` |
| 379 | ------------------------------------------------ |
| 380 | |
| 381 | The result objects from the :func:`urlparse` and :func:`urlsplit` functions are |
| 382 | subclasses of the :class:`tuple` type. These subclasses add the attributes |
| 383 | described in those functions, as well as provide an additional method: |
| 384 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 385 | .. method:: ParseResult.geturl() |
| 386 | |
| 387 | Return the re-combined version of the original URL as a string. This may differ |
| 388 | from the original URL in that the scheme will always be normalized to lower case |
| 389 | and empty components may be dropped. Specifically, empty parameters, queries, |
| 390 | and fragment identifiers will be removed. |
| 391 | |
| 392 | The result of this method is a fixpoint if passed back through the original |
Christian Heimes | fe337bf | 2008-03-23 21:54:12 +0000 | [diff] [blame] | 393 | parsing function: |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 394 | |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 395 | >>> import urllib.parse |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 396 | >>> url = 'HTTP://www.Python.org/doc/#' |
| 397 | |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 398 | >>> r1 = urllib.parse.urlsplit(url) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 399 | >>> r1.geturl() |
| 400 | 'http://www.Python.org/doc/' |
| 401 | |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 402 | >>> r2 = urllib.parse.urlsplit(r1.geturl()) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 403 | >>> r2.geturl() |
| 404 | 'http://www.Python.org/doc/' |
| 405 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 406 | |
Georg Brandl | 1f01deb | 2009-01-03 22:47:39 +0000 | [diff] [blame] | 407 | The following classes provide the implementations of the parse results: |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 408 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 409 | .. class:: BaseResult |
| 410 | |
Georg Brandl | 0f7ede4 | 2008-06-23 11:23:31 +0000 | [diff] [blame] | 411 | Base class for the concrete result classes. This provides most of the |
| 412 | attribute definitions. It does not provide a :meth:`geturl` method. It is |
| 413 | derived from :class:`tuple`, but does not override the :meth:`__init__` or |
| 414 | :meth:`__new__` methods. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 415 | |
| 416 | |
| 417 | .. class:: ParseResult(scheme, netloc, path, params, query, fragment) |
| 418 | |
| 419 | Concrete class for :func:`urlparse` results. The :meth:`__new__` method is |
| 420 | overridden to support checking that the right number of arguments are passed. |
| 421 | |
| 422 | |
| 423 | .. class:: SplitResult(scheme, netloc, path, query, fragment) |
| 424 | |
| 425 | Concrete class for :func:`urlsplit` results. The :meth:`__new__` method is |
| 426 | overridden to support checking that the right number of arguments are passed. |
| 427 | |