Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 1 | :mod:`urllib.parse` --- Parse URLs into components |
| 2 | ================================================== |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 3 | |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 4 | .. module:: urllib.parse |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 5 | :synopsis: Parse URLs into or assemble them from components. |
| 6 | |
Terry Jan Reedy | fa089b9 | 2016-06-11 15:02:54 -0400 | [diff] [blame] | 7 | **Source code:** :source:`Lib/urllib/parse.py` |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 8 | |
| 9 | .. index:: |
| 10 | single: WWW |
| 11 | single: World Wide Web |
| 12 | single: URL |
| 13 | pair: URL; parsing |
| 14 | pair: relative; URL |
| 15 | |
Éric Araujo | 19f9b71 | 2011-08-19 00:49:18 +0200 | [diff] [blame] | 16 | -------------- |
| 17 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 18 | This module defines a standard interface to break Uniform Resource Locator (URL) |
| 19 | strings up in components (addressing scheme, network location, path etc.), to |
| 20 | combine the components back into a URL string, and to convert a "relative URL" |
| 21 | to an absolute URL given a "base URL." |
| 22 | |
| 23 | The module has been designed to match the Internet RFC on Relative Uniform |
Senthil Kumaran | 4a27d9f | 2012-06-28 21:07:58 -0700 | [diff] [blame] | 24 | Resource Locators. It supports the following URL schemes: ``file``, ``ftp``, |
| 25 | ``gopher``, ``hdl``, ``http``, ``https``, ``imap``, ``mailto``, ``mms``, |
| 26 | ``news``, ``nntp``, ``prospero``, ``rsync``, ``rtsp``, ``rtspu``, ``sftp``, |
| 27 | ``shttp``, ``sip``, ``sips``, ``snews``, ``svn``, ``svn+ssh``, ``telnet``, |
Berker Peksag | f676748 | 2016-09-16 14:43:58 +0300 | [diff] [blame] | 28 | ``wais``, ``ws``, ``wss``. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 29 | |
Nick Coghlan | 9fc443c | 2010-11-30 15:48:08 +0000 | [diff] [blame] | 30 | The :mod:`urllib.parse` module defines functions that fall into two broad |
| 31 | categories: URL parsing and URL quoting. These are covered in detail in |
| 32 | the following sections. |
| 33 | |
| 34 | URL Parsing |
| 35 | ----------- |
| 36 | |
| 37 | The URL parsing functions focus on splitting a URL string into its components, |
| 38 | or on combining URL components into a URL string. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 39 | |
R. David Murray | f5077aa | 2010-05-25 15:36:46 +0000 | [diff] [blame] | 40 | .. function:: urlparse(urlstring, scheme='', allow_fragments=True) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 41 | |
Lisa Roach | 13c1f72 | 2019-03-24 14:28:48 -0700 | [diff] [blame] | 42 | Parse a URL into six components, returning a 6-item :term:`named tuple`. This |
| 43 | corresponds to the general structure of a URL: |
| 44 | ``scheme://netloc/path;parameters?query#fragment``. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 45 | Each tuple item is a string, possibly empty. The components are not broken up in |
| 46 | smaller parts (for example, the network location is a single string), and % |
| 47 | escapes are not expanded. The delimiters as shown above are not part of the |
| 48 | result, except for a leading slash in the *path* component, which is retained if |
Christian Heimes | fe337bf | 2008-03-23 21:54:12 +0000 | [diff] [blame] | 49 | present. For example: |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 50 | |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 51 | >>> from urllib.parse import urlparse |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 52 | >>> o = urlparse('http://www.cwi.nl:80/%7Eguido/Python.html') |
Christian Heimes | fe337bf | 2008-03-23 21:54:12 +0000 | [diff] [blame] | 53 | >>> o # doctest: +NORMALIZE_WHITESPACE |
| 54 | ParseResult(scheme='http', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html', |
| 55 | params='', query='', fragment='') |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 56 | >>> o.scheme |
| 57 | 'http' |
| 58 | >>> o.port |
| 59 | 80 |
| 60 | >>> o.geturl() |
| 61 | 'http://www.cwi.nl:80/%7Eguido/Python.html' |
| 62 | |
Senthil Kumaran | 7089a4e | 2010-11-07 12:57:04 +0000 | [diff] [blame] | 63 | Following the syntax specifications in :rfc:`1808`, urlparse recognizes |
| 64 | a netloc only if it is properly introduced by '//'. Otherwise the |
| 65 | input is presumed to be a relative URL and thus to start with |
| 66 | a path component. |
Senthil Kumaran | 84c7d9f | 2010-08-04 04:50:44 +0000 | [diff] [blame] | 67 | |
Marco Buttu | e65fcde | 2017-04-27 14:23:34 +0200 | [diff] [blame] | 68 | .. doctest:: |
| 69 | :options: +NORMALIZE_WHITESPACE |
| 70 | |
Senthil Kumaran | fe9230a | 2011-06-19 13:52:49 -0700 | [diff] [blame] | 71 | >>> from urllib.parse import urlparse |
Senthil Kumaran | 84c7d9f | 2010-08-04 04:50:44 +0000 | [diff] [blame] | 72 | >>> urlparse('//www.cwi.nl:80/%7Eguido/Python.html') |
| 73 | ParseResult(scheme='', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html', |
| 74 | params='', query='', fragment='') |
Senthil Kumaran | 8fd3669 | 2013-02-26 01:02:58 -0800 | [diff] [blame] | 75 | >>> urlparse('www.cwi.nl/%7Eguido/Python.html') |
Senthil Kumaran | 21b2933 | 2013-09-30 22:12:16 -0700 | [diff] [blame] | 76 | ParseResult(scheme='', netloc='', path='www.cwi.nl/%7Eguido/Python.html', |
Senthil Kumaran | 84c7d9f | 2010-08-04 04:50:44 +0000 | [diff] [blame] | 77 | params='', query='', fragment='') |
| 78 | >>> urlparse('help/Python.html') |
| 79 | ParseResult(scheme='', netloc='', path='help/Python.html', params='', |
| 80 | query='', fragment='') |
| 81 | |
Berker Peksag | 89584c9 | 2015-06-25 23:38:48 +0300 | [diff] [blame] | 82 | The *scheme* argument gives the default addressing scheme, to be |
| 83 | used only if the URL does not specify one. It should be the same type |
| 84 | (text or bytes) as *urlstring*, except that the default value ``''`` is |
| 85 | always allowed, and is automatically converted to ``b''`` if appropriate. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 86 | |
| 87 | If the *allow_fragments* argument is false, fragment identifiers are not |
Berker Peksag | 89584c9 | 2015-06-25 23:38:48 +0300 | [diff] [blame] | 88 | recognized. Instead, they are parsed as part of the path, parameters |
| 89 | or query component, and :attr:`fragment` is set to the empty string in |
| 90 | the return value. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 91 | |
Lisa Roach | 13c1f72 | 2019-03-24 14:28:48 -0700 | [diff] [blame] | 92 | The return value is a :term:`named tuple`, which means that its items can |
| 93 | be accessed by index or as named attributes, which are: |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 94 | |
| 95 | +------------------+-------+--------------------------+----------------------+ |
| 96 | | Attribute | Index | Value | Value if not present | |
| 97 | +==================+=======+==========================+======================+ |
Berker Peksag | 89584c9 | 2015-06-25 23:38:48 +0300 | [diff] [blame] | 98 | | :attr:`scheme` | 0 | URL scheme specifier | *scheme* parameter | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 99 | +------------------+-------+--------------------------+----------------------+ |
| 100 | | :attr:`netloc` | 1 | Network location part | empty string | |
| 101 | +------------------+-------+--------------------------+----------------------+ |
| 102 | | :attr:`path` | 2 | Hierarchical path | empty string | |
| 103 | +------------------+-------+--------------------------+----------------------+ |
| 104 | | :attr:`params` | 3 | Parameters for last path | empty string | |
| 105 | | | | element | | |
| 106 | +------------------+-------+--------------------------+----------------------+ |
| 107 | | :attr:`query` | 4 | Query component | empty string | |
| 108 | +------------------+-------+--------------------------+----------------------+ |
| 109 | | :attr:`fragment` | 5 | Fragment identifier | empty string | |
| 110 | +------------------+-------+--------------------------+----------------------+ |
| 111 | | :attr:`username` | | User name | :const:`None` | |
| 112 | +------------------+-------+--------------------------+----------------------+ |
| 113 | | :attr:`password` | | Password | :const:`None` | |
| 114 | +------------------+-------+--------------------------+----------------------+ |
| 115 | | :attr:`hostname` | | Host name (lower case) | :const:`None` | |
| 116 | +------------------+-------+--------------------------+----------------------+ |
| 117 | | :attr:`port` | | Port number as integer, | :const:`None` | |
| 118 | | | | if present | | |
| 119 | +------------------+-------+--------------------------+----------------------+ |
| 120 | |
Robert Collins | dfa95c9 | 2015-08-10 09:53:30 +1200 | [diff] [blame] | 121 | Reading the :attr:`port` attribute will raise a :exc:`ValueError` if |
| 122 | an invalid port is specified in the URL. See section |
| 123 | :ref:`urlparse-result-object` for more information on the result object. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 124 | |
Howie Benefiel | f6e863d | 2017-05-15 23:48:16 -0500 | [diff] [blame] | 125 | Unmatched square brackets in the :attr:`netloc` attribute will raise a |
| 126 | :exc:`ValueError`. |
| 127 | |
Steve Dower | 16e6f7d | 2019-03-07 08:02:26 -0800 | [diff] [blame] | 128 | Characters in the :attr:`netloc` attribute that decompose under NFKC |
| 129 | normalization (as used by the IDNA encoding) into any of ``/``, ``?``, |
| 130 | ``#``, ``@``, or ``:`` will raise a :exc:`ValueError`. If the URL is |
| 131 | decomposed before parsing, no error will be raised. |
| 132 | |
Lisa Roach | 13c1f72 | 2019-03-24 14:28:48 -0700 | [diff] [blame] | 133 | As is the case with all named tuples, the subclass has a few additional methods |
| 134 | and attributes that are particularly useful. One such method is :meth:`_replace`. |
| 135 | The :meth:`_replace` method will return a new ParseResult object replacing specified |
| 136 | fields with new values. |
| 137 | |
| 138 | .. doctest:: |
| 139 | :options: +NORMALIZE_WHITESPACE |
| 140 | |
| 141 | >>> from urllib.parse import urlparse |
| 142 | >>> u = urlparse('//www.cwi.nl:80/%7Eguido/Python.html') |
| 143 | >>> u |
| 144 | ParseResult(scheme='', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html', |
| 145 | params='', query='', fragment='') |
| 146 | >>> u._replace(scheme='http') |
| 147 | ParseResult(scheme='http', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html', |
| 148 | params='', query='', fragment='') |
| 149 | |
| 150 | |
Senthil Kumaran | 7a1e09f | 2010-04-22 12:19:46 +0000 | [diff] [blame] | 151 | .. versionchanged:: 3.2 |
| 152 | Added IPv6 URL parsing capabilities. |
| 153 | |
Georg Brandl | a79b8dc | 2012-09-29 08:59:23 +0200 | [diff] [blame] | 154 | .. versionchanged:: 3.3 |
| 155 | The fragment is now parsed for all URL schemes (unless *allow_fragment* is |
| 156 | false), in accordance with :rfc:`3986`. Previously, a whitelist of |
| 157 | schemes that support fragments existed. |
| 158 | |
Robert Collins | dfa95c9 | 2015-08-10 09:53:30 +1200 | [diff] [blame] | 159 | .. versionchanged:: 3.6 |
| 160 | Out-of-range port numbers now raise :exc:`ValueError`, instead of |
| 161 | returning :const:`None`. |
| 162 | |
Steve Dower | 16e6f7d | 2019-03-07 08:02:26 -0800 | [diff] [blame] | 163 | .. versionchanged:: 3.8 |
| 164 | Characters that affect netloc parsing under NFKC normalization will |
| 165 | now raise :exc:`ValueError`. |
| 166 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 167 | |
matthewbelisle-wf | 68f3237 | 2018-10-30 15:30:19 -0500 | [diff] [blame] | 168 | .. function:: parse_qs(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace', max_num_fields=None) |
Facundo Batista | c469d4c | 2008-09-03 22:49:01 +0000 | [diff] [blame] | 169 | |
| 170 | Parse a query string given as a string argument (data of type |
| 171 | :mimetype:`application/x-www-form-urlencoded`). Data are returned as a |
| 172 | dictionary. The dictionary keys are the unique query variable names and the |
| 173 | values are lists of values for each name. |
| 174 | |
| 175 | The optional argument *keep_blank_values* is a flag indicating whether blank |
Senthil Kumaran | f0769e8 | 2010-08-09 19:53:52 +0000 | [diff] [blame] | 176 | values in percent-encoded queries should be treated as blank strings. A true value |
Facundo Batista | c469d4c | 2008-09-03 22:49:01 +0000 | [diff] [blame] | 177 | indicates that blanks should be retained as blank strings. The default false |
| 178 | value indicates that blank values are to be ignored and treated as if they were |
| 179 | not included. |
| 180 | |
| 181 | The optional argument *strict_parsing* is a flag indicating what to do with |
| 182 | parsing errors. If false (the default), errors are silently ignored. If true, |
| 183 | errors raise a :exc:`ValueError` exception. |
| 184 | |
Victor Stinner | ac71c54 | 2011-01-14 12:52:12 +0000 | [diff] [blame] | 185 | The optional *encoding* and *errors* parameters specify how to decode |
| 186 | percent-encoded sequences into Unicode characters, as accepted by the |
| 187 | :meth:`bytes.decode` method. |
| 188 | |
matthewbelisle-wf | 68f3237 | 2018-10-30 15:30:19 -0500 | [diff] [blame] | 189 | The optional argument *max_num_fields* is the maximum number of fields to |
| 190 | read. If set, then throws a :exc:`ValueError` if there are more than |
| 191 | *max_num_fields* fields read. |
| 192 | |
Michael Foord | 207d229 | 2012-09-28 14:40:44 +0100 | [diff] [blame] | 193 | Use the :func:`urllib.parse.urlencode` function (with the ``doseq`` |
| 194 | parameter set to ``True``) to convert such dictionaries into query |
| 195 | strings. |
Facundo Batista | c469d4c | 2008-09-03 22:49:01 +0000 | [diff] [blame] | 196 | |
Senthil Kumaran | 2933312 | 2011-02-11 11:25:47 +0000 | [diff] [blame] | 197 | |
Victor Stinner | c58be2d | 2011-01-14 13:31:45 +0000 | [diff] [blame] | 198 | .. versionchanged:: 3.2 |
| 199 | Add *encoding* and *errors* parameters. |
| 200 | |
matthewbelisle-wf | 68f3237 | 2018-10-30 15:30:19 -0500 | [diff] [blame] | 201 | .. versionchanged:: 3.8 |
| 202 | Added *max_num_fields* parameter. |
Facundo Batista | c469d4c | 2008-09-03 22:49:01 +0000 | [diff] [blame] | 203 | |
matthewbelisle-wf | 68f3237 | 2018-10-30 15:30:19 -0500 | [diff] [blame] | 204 | |
| 205 | .. function:: parse_qsl(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace', max_num_fields=None) |
Facundo Batista | c469d4c | 2008-09-03 22:49:01 +0000 | [diff] [blame] | 206 | |
| 207 | Parse a query string given as a string argument (data of type |
| 208 | :mimetype:`application/x-www-form-urlencoded`). Data are returned as a list of |
| 209 | name, value pairs. |
| 210 | |
| 211 | The optional argument *keep_blank_values* is a flag indicating whether blank |
Senthil Kumaran | f0769e8 | 2010-08-09 19:53:52 +0000 | [diff] [blame] | 212 | values in percent-encoded queries should be treated as blank strings. A true value |
Facundo Batista | c469d4c | 2008-09-03 22:49:01 +0000 | [diff] [blame] | 213 | indicates that blanks should be retained as blank strings. The default false |
| 214 | value indicates that blank values are to be ignored and treated as if they were |
| 215 | not included. |
| 216 | |
| 217 | The optional argument *strict_parsing* is a flag indicating what to do with |
| 218 | parsing errors. If false (the default), errors are silently ignored. If true, |
| 219 | errors raise a :exc:`ValueError` exception. |
| 220 | |
Victor Stinner | ac71c54 | 2011-01-14 12:52:12 +0000 | [diff] [blame] | 221 | The optional *encoding* and *errors* parameters specify how to decode |
| 222 | percent-encoded sequences into Unicode characters, as accepted by the |
| 223 | :meth:`bytes.decode` method. |
| 224 | |
matthewbelisle-wf | 68f3237 | 2018-10-30 15:30:19 -0500 | [diff] [blame] | 225 | The optional argument *max_num_fields* is the maximum number of fields to |
| 226 | read. If set, then throws a :exc:`ValueError` if there are more than |
| 227 | *max_num_fields* fields read. |
| 228 | |
Facundo Batista | c469d4c | 2008-09-03 22:49:01 +0000 | [diff] [blame] | 229 | Use the :func:`urllib.parse.urlencode` function to convert such lists of pairs into |
| 230 | query strings. |
| 231 | |
Victor Stinner | c58be2d | 2011-01-14 13:31:45 +0000 | [diff] [blame] | 232 | .. versionchanged:: 3.2 |
| 233 | Add *encoding* and *errors* parameters. |
| 234 | |
matthewbelisle-wf | 68f3237 | 2018-10-30 15:30:19 -0500 | [diff] [blame] | 235 | .. versionchanged:: 3.8 |
| 236 | Added *max_num_fields* parameter. |
| 237 | |
Facundo Batista | c469d4c | 2008-09-03 22:49:01 +0000 | [diff] [blame] | 238 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 239 | .. function:: urlunparse(parts) |
| 240 | |
Georg Brandl | 0f7ede4 | 2008-06-23 11:23:31 +0000 | [diff] [blame] | 241 | Construct a URL from a tuple as returned by ``urlparse()``. The *parts* |
| 242 | argument can be any six-item iterable. This may result in a slightly |
| 243 | different, but equivalent URL, if the URL that was parsed originally had |
| 244 | unnecessary delimiters (for example, a ``?`` with an empty query; the RFC |
| 245 | states that these are equivalent). |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 246 | |
| 247 | |
R. David Murray | f5077aa | 2010-05-25 15:36:46 +0000 | [diff] [blame] | 248 | .. function:: urlsplit(urlstring, scheme='', allow_fragments=True) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 249 | |
| 250 | This is similar to :func:`urlparse`, but does not split the params from the URL. |
| 251 | This should generally be used instead of :func:`urlparse` if the more recent URL |
| 252 | syntax allowing parameters to be applied to each segment of the *path* portion |
| 253 | of the URL (see :rfc:`2396`) is wanted. A separate function is needed to |
Lisa Roach | 13c1f72 | 2019-03-24 14:28:48 -0700 | [diff] [blame] | 254 | separate the path segments and parameters. This function returns a 5-item |
| 255 | :term:`named tuple`:: |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 256 | |
Lisa Roach | 13c1f72 | 2019-03-24 14:28:48 -0700 | [diff] [blame] | 257 | (addressing scheme, network location, path, query, fragment identifier). |
| 258 | |
| 259 | The return value is a :term:`named tuple`, its items can be accessed by index |
| 260 | or as named attributes: |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 261 | |
| 262 | +------------------+-------+-------------------------+----------------------+ |
| 263 | | Attribute | Index | Value | Value if not present | |
| 264 | +==================+=======+=========================+======================+ |
Berker Peksag | 89584c9 | 2015-06-25 23:38:48 +0300 | [diff] [blame] | 265 | | :attr:`scheme` | 0 | URL scheme specifier | *scheme* parameter | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 266 | +------------------+-------+-------------------------+----------------------+ |
| 267 | | :attr:`netloc` | 1 | Network location part | empty string | |
| 268 | +------------------+-------+-------------------------+----------------------+ |
| 269 | | :attr:`path` | 2 | Hierarchical path | empty string | |
| 270 | +------------------+-------+-------------------------+----------------------+ |
| 271 | | :attr:`query` | 3 | Query component | empty string | |
| 272 | +------------------+-------+-------------------------+----------------------+ |
| 273 | | :attr:`fragment` | 4 | Fragment identifier | empty string | |
| 274 | +------------------+-------+-------------------------+----------------------+ |
| 275 | | :attr:`username` | | User name | :const:`None` | |
| 276 | +------------------+-------+-------------------------+----------------------+ |
| 277 | | :attr:`password` | | Password | :const:`None` | |
| 278 | +------------------+-------+-------------------------+----------------------+ |
| 279 | | :attr:`hostname` | | Host name (lower case) | :const:`None` | |
| 280 | +------------------+-------+-------------------------+----------------------+ |
| 281 | | :attr:`port` | | Port number as integer, | :const:`None` | |
| 282 | | | | if present | | |
| 283 | +------------------+-------+-------------------------+----------------------+ |
| 284 | |
Robert Collins | dfa95c9 | 2015-08-10 09:53:30 +1200 | [diff] [blame] | 285 | Reading the :attr:`port` attribute will raise a :exc:`ValueError` if |
| 286 | an invalid port is specified in the URL. See section |
| 287 | :ref:`urlparse-result-object` for more information on the result object. |
| 288 | |
Howie Benefiel | f6e863d | 2017-05-15 23:48:16 -0500 | [diff] [blame] | 289 | Unmatched square brackets in the :attr:`netloc` attribute will raise a |
| 290 | :exc:`ValueError`. |
| 291 | |
Steve Dower | 16e6f7d | 2019-03-07 08:02:26 -0800 | [diff] [blame] | 292 | Characters in the :attr:`netloc` attribute that decompose under NFKC |
| 293 | normalization (as used by the IDNA encoding) into any of ``/``, ``?``, |
| 294 | ``#``, ``@``, or ``:`` will raise a :exc:`ValueError`. If the URL is |
| 295 | decomposed before parsing, no error will be raised. |
| 296 | |
Robert Collins | dfa95c9 | 2015-08-10 09:53:30 +1200 | [diff] [blame] | 297 | .. versionchanged:: 3.6 |
| 298 | Out-of-range port numbers now raise :exc:`ValueError`, instead of |
| 299 | returning :const:`None`. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 300 | |
Steve Dower | 16e6f7d | 2019-03-07 08:02:26 -0800 | [diff] [blame] | 301 | .. versionchanged:: 3.8 |
| 302 | Characters that affect netloc parsing under NFKC normalization will |
| 303 | now raise :exc:`ValueError`. |
| 304 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 305 | |
| 306 | .. function:: urlunsplit(parts) |
| 307 | |
Georg Brandl | 0f7ede4 | 2008-06-23 11:23:31 +0000 | [diff] [blame] | 308 | Combine the elements of a tuple as returned by :func:`urlsplit` into a |
| 309 | complete URL as a string. The *parts* argument can be any five-item |
| 310 | iterable. This may result in a slightly different, but equivalent URL, if the |
| 311 | URL that was parsed originally had unnecessary delimiters (for example, a ? |
| 312 | with an empty query; the RFC states that these are equivalent). |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 313 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 314 | |
Georg Brandl | 7f01a13 | 2009-09-16 15:58:14 +0000 | [diff] [blame] | 315 | .. function:: urljoin(base, url, allow_fragments=True) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 316 | |
| 317 | Construct a full ("absolute") URL by combining a "base URL" (*base*) with |
| 318 | another URL (*url*). Informally, this uses components of the base URL, in |
Georg Brandl | 0f7ede4 | 2008-06-23 11:23:31 +0000 | [diff] [blame] | 319 | particular the addressing scheme, the network location and (part of) the |
| 320 | path, to provide missing components in the relative URL. For example: |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 321 | |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 322 | >>> from urllib.parse import urljoin |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 323 | >>> urljoin('http://www.cwi.nl/%7Eguido/Python.html', 'FAQ.html') |
| 324 | 'http://www.cwi.nl/%7Eguido/FAQ.html' |
| 325 | |
| 326 | The *allow_fragments* argument has the same meaning and default as for |
| 327 | :func:`urlparse`. |
| 328 | |
| 329 | .. note:: |
| 330 | |
| 331 | If *url* is an absolute URL (that is, starting with ``//`` or ``scheme://``), |
| 332 | the *url*'s host name and/or scheme will be present in the result. For example: |
| 333 | |
Christian Heimes | fe337bf | 2008-03-23 21:54:12 +0000 | [diff] [blame] | 334 | .. doctest:: |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 335 | |
| 336 | >>> urljoin('http://www.cwi.nl/%7Eguido/Python.html', |
| 337 | ... '//www.python.org/%7Eguido') |
| 338 | 'http://www.python.org/%7Eguido' |
| 339 | |
| 340 | If you do not want that behavior, preprocess the *url* with :func:`urlsplit` and |
| 341 | :func:`urlunsplit`, removing possible *scheme* and *netloc* parts. |
| 342 | |
| 343 | |
Antoine Pitrou | 55ac5b3 | 2014-08-21 19:16:17 -0400 | [diff] [blame] | 344 | .. versionchanged:: 3.5 |
| 345 | |
| 346 | Behaviour updated to match the semantics defined in :rfc:`3986`. |
| 347 | |
| 348 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 349 | .. function:: urldefrag(url) |
| 350 | |
Georg Brandl | 0f7ede4 | 2008-06-23 11:23:31 +0000 | [diff] [blame] | 351 | If *url* contains a fragment identifier, return a modified version of *url* |
| 352 | with no fragment identifier, and the fragment identifier as a separate |
| 353 | string. If there is no fragment identifier in *url*, return *url* unmodified |
| 354 | and an empty string. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 355 | |
Lisa Roach | 13c1f72 | 2019-03-24 14:28:48 -0700 | [diff] [blame] | 356 | The return value is a :term:`named tuple`, its items can be accessed by index |
| 357 | or as named attributes: |
Nick Coghlan | 9fc443c | 2010-11-30 15:48:08 +0000 | [diff] [blame] | 358 | |
| 359 | +------------------+-------+-------------------------+----------------------+ |
| 360 | | Attribute | Index | Value | Value if not present | |
| 361 | +==================+=======+=========================+======================+ |
| 362 | | :attr:`url` | 0 | URL with no fragment | empty string | |
| 363 | +------------------+-------+-------------------------+----------------------+ |
| 364 | | :attr:`fragment` | 1 | Fragment identifier | empty string | |
| 365 | +------------------+-------+-------------------------+----------------------+ |
| 366 | |
| 367 | See section :ref:`urlparse-result-object` for more information on the result |
| 368 | object. |
| 369 | |
| 370 | .. versionchanged:: 3.2 |
Raymond Hettinger | 9a236b0 | 2011-01-24 09:01:27 +0000 | [diff] [blame] | 371 | Result is a structured object rather than a simple 2-tuple. |
Nick Coghlan | 9fc443c | 2010-11-30 15:48:08 +0000 | [diff] [blame] | 372 | |
Rémi Lapeyre | 674ee12 | 2019-05-27 15:43:45 +0200 | [diff] [blame] | 373 | .. function:: unwrap(url) |
| 374 | |
| 375 | Extract the url from a wrapped URL (that is, a string formatted as |
| 376 | ``<URL:scheme://host/path>``, ``<scheme://host/path>``, ``URL:scheme://host/path`` |
| 377 | or ``scheme://host/path``). If *url* is not a wrapped URL, it is returned |
| 378 | without changes. |
| 379 | |
Georg Brandl | 009a6bd | 2011-01-24 19:59:08 +0000 | [diff] [blame] | 380 | .. _parsing-ascii-encoded-bytes: |
Nick Coghlan | 9fc443c | 2010-11-30 15:48:08 +0000 | [diff] [blame] | 381 | |
| 382 | Parsing ASCII Encoded Bytes |
| 383 | --------------------------- |
| 384 | |
| 385 | The URL parsing functions were originally designed to operate on character |
| 386 | strings only. In practice, it is useful to be able to manipulate properly |
| 387 | quoted and encoded URLs as sequences of ASCII bytes. Accordingly, the |
| 388 | URL parsing functions in this module all operate on :class:`bytes` and |
| 389 | :class:`bytearray` objects in addition to :class:`str` objects. |
| 390 | |
| 391 | If :class:`str` data is passed in, the result will also contain only |
| 392 | :class:`str` data. If :class:`bytes` or :class:`bytearray` data is |
| 393 | passed in, the result will contain only :class:`bytes` data. |
| 394 | |
| 395 | Attempting to mix :class:`str` data with :class:`bytes` or |
| 396 | :class:`bytearray` in a single function call will result in a |
Éric Araujo | ff2a4ba | 2010-11-30 17:20:31 +0000 | [diff] [blame] | 397 | :exc:`TypeError` being raised, while attempting to pass in non-ASCII |
Nick Coghlan | 9fc443c | 2010-11-30 15:48:08 +0000 | [diff] [blame] | 398 | byte values will trigger :exc:`UnicodeDecodeError`. |
| 399 | |
| 400 | To support easier conversion of result objects between :class:`str` and |
| 401 | :class:`bytes`, all return values from URL parsing functions provide |
| 402 | either an :meth:`encode` method (when the result contains :class:`str` |
| 403 | data) or a :meth:`decode` method (when the result contains :class:`bytes` |
| 404 | data). The signatures of these methods match those of the corresponding |
| 405 | :class:`str` and :class:`bytes` methods (except that the default encoding |
| 406 | is ``'ascii'`` rather than ``'utf-8'``). Each produces a value of a |
| 407 | corresponding type that contains either :class:`bytes` data (for |
| 408 | :meth:`encode` methods) or :class:`str` data (for |
| 409 | :meth:`decode` methods). |
| 410 | |
| 411 | Applications that need to operate on potentially improperly quoted URLs |
| 412 | that may contain non-ASCII data will need to do their own decoding from |
| 413 | bytes to characters before invoking the URL parsing methods. |
| 414 | |
| 415 | The behaviour described in this section applies only to the URL parsing |
| 416 | functions. The URL quoting functions use their own rules when producing |
| 417 | or consuming byte sequences as detailed in the documentation of the |
| 418 | individual URL quoting functions. |
| 419 | |
| 420 | .. versionchanged:: 3.2 |
| 421 | URL parsing functions now accept ASCII encoded byte sequences |
| 422 | |
| 423 | |
| 424 | .. _urlparse-result-object: |
| 425 | |
| 426 | Structured Parse Results |
| 427 | ------------------------ |
| 428 | |
| 429 | The result objects from the :func:`urlparse`, :func:`urlsplit` and |
Georg Brandl | 4640237 | 2010-12-04 19:06:18 +0000 | [diff] [blame] | 430 | :func:`urldefrag` functions are subclasses of the :class:`tuple` type. |
Nick Coghlan | 9fc443c | 2010-11-30 15:48:08 +0000 | [diff] [blame] | 431 | These subclasses add the attributes listed in the documentation for |
| 432 | those functions, the encoding and decoding support described in the |
| 433 | previous section, as well as an additional method: |
| 434 | |
| 435 | .. method:: urllib.parse.SplitResult.geturl() |
| 436 | |
| 437 | Return the re-combined version of the original URL as a string. This may |
| 438 | differ from the original URL in that the scheme may be normalized to lower |
| 439 | case and empty components may be dropped. Specifically, empty parameters, |
| 440 | queries, and fragment identifiers will be removed. |
| 441 | |
| 442 | For :func:`urldefrag` results, only empty fragment identifiers will be removed. |
| 443 | For :func:`urlsplit` and :func:`urlparse` results, all noted changes will be |
| 444 | made to the URL returned by this method. |
| 445 | |
| 446 | The result of this method remains unchanged if passed back through the original |
| 447 | parsing function: |
| 448 | |
| 449 | >>> from urllib.parse import urlsplit |
| 450 | >>> url = 'HTTP://www.Python.org/doc/#' |
| 451 | >>> r1 = urlsplit(url) |
| 452 | >>> r1.geturl() |
| 453 | 'http://www.Python.org/doc/' |
| 454 | >>> r2 = urlsplit(r1.geturl()) |
| 455 | >>> r2.geturl() |
| 456 | 'http://www.Python.org/doc/' |
| 457 | |
| 458 | |
| 459 | The following classes provide the implementations of the structured parse |
| 460 | results when operating on :class:`str` objects: |
| 461 | |
| 462 | .. class:: DefragResult(url, fragment) |
| 463 | |
| 464 | Concrete class for :func:`urldefrag` results containing :class:`str` |
| 465 | data. The :meth:`encode` method returns a :class:`DefragResultBytes` |
| 466 | instance. |
| 467 | |
| 468 | .. versionadded:: 3.2 |
| 469 | |
| 470 | .. class:: ParseResult(scheme, netloc, path, params, query, fragment) |
| 471 | |
| 472 | Concrete class for :func:`urlparse` results containing :class:`str` |
| 473 | data. The :meth:`encode` method returns a :class:`ParseResultBytes` |
| 474 | instance. |
| 475 | |
| 476 | .. class:: SplitResult(scheme, netloc, path, query, fragment) |
| 477 | |
| 478 | Concrete class for :func:`urlsplit` results containing :class:`str` |
| 479 | data. The :meth:`encode` method returns a :class:`SplitResultBytes` |
| 480 | instance. |
| 481 | |
| 482 | |
| 483 | The following classes provide the implementations of the parse results when |
| 484 | operating on :class:`bytes` or :class:`bytearray` objects: |
| 485 | |
| 486 | .. class:: DefragResultBytes(url, fragment) |
| 487 | |
| 488 | Concrete class for :func:`urldefrag` results containing :class:`bytes` |
| 489 | data. The :meth:`decode` method returns a :class:`DefragResult` |
| 490 | instance. |
| 491 | |
| 492 | .. versionadded:: 3.2 |
| 493 | |
| 494 | .. class:: ParseResultBytes(scheme, netloc, path, params, query, fragment) |
| 495 | |
| 496 | Concrete class for :func:`urlparse` results containing :class:`bytes` |
| 497 | data. The :meth:`decode` method returns a :class:`ParseResult` |
| 498 | instance. |
| 499 | |
| 500 | .. versionadded:: 3.2 |
| 501 | |
| 502 | .. class:: SplitResultBytes(scheme, netloc, path, query, fragment) |
| 503 | |
| 504 | Concrete class for :func:`urlsplit` results containing :class:`bytes` |
| 505 | data. The :meth:`decode` method returns a :class:`SplitResult` |
| 506 | instance. |
| 507 | |
| 508 | .. versionadded:: 3.2 |
| 509 | |
| 510 | |
| 511 | URL Quoting |
| 512 | ----------- |
| 513 | |
| 514 | The URL quoting functions focus on taking program data and making it safe |
| 515 | for use as URL components by quoting special characters and appropriately |
| 516 | encoding non-ASCII text. They also support reversing these operations to |
| 517 | recreate the original data from the contents of a URL component if that |
| 518 | task isn't already covered by the URL parsing functions above. |
Georg Brandl | 7f01a13 | 2009-09-16 15:58:14 +0000 | [diff] [blame] | 519 | |
| 520 | .. function:: quote(string, safe='/', encoding=None, errors=None) |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 521 | |
| 522 | Replace special characters in *string* using the ``%xx`` escape. Letters, |
Ratnadeep Debnath | 21024f0 | 2017-02-25 14:30:28 +0530 | [diff] [blame] | 523 | digits, and the characters ``'_.-~'`` are never quoted. By default, this |
Senthil Kumaran | 8aa8bbe | 2009-08-31 16:43:45 +0000 | [diff] [blame] | 524 | function is intended for quoting the path section of URL. The optional *safe* |
Guido van Rossum | 52dbbb9 | 2008-08-18 21:44:30 +0000 | [diff] [blame] | 525 | parameter specifies additional ASCII characters that should not be quoted |
| 526 | --- its default value is ``'/'``. |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 527 | |
Guido van Rossum | 52dbbb9 | 2008-08-18 21:44:30 +0000 | [diff] [blame] | 528 | *string* may be either a :class:`str` or a :class:`bytes`. |
| 529 | |
Ratnadeep Debnath | 21024f0 | 2017-02-25 14:30:28 +0530 | [diff] [blame] | 530 | .. versionchanged:: 3.7 |
Serhiy Storchaka | 0a36ac1 | 2018-05-31 07:39:00 +0300 | [diff] [blame] | 531 | Moved from :rfc:`2396` to :rfc:`3986` for quoting URL strings. "~" is now |
Ratnadeep Debnath | 21024f0 | 2017-02-25 14:30:28 +0530 | [diff] [blame] | 532 | included in the set of reserved characters. |
| 533 | |
Guido van Rossum | 52dbbb9 | 2008-08-18 21:44:30 +0000 | [diff] [blame] | 534 | The optional *encoding* and *errors* parameters specify how to deal with |
| 535 | non-ASCII characters, as accepted by the :meth:`str.encode` method. |
| 536 | *encoding* defaults to ``'utf-8'``. |
| 537 | *errors* defaults to ``'strict'``, meaning unsupported characters raise a |
| 538 | :class:`UnicodeEncodeError`. |
| 539 | *encoding* and *errors* must not be supplied if *string* is a |
| 540 | :class:`bytes`, or a :class:`TypeError` is raised. |
| 541 | |
| 542 | Note that ``quote(string, safe, encoding, errors)`` is equivalent to |
| 543 | ``quote_from_bytes(string.encode(encoding, errors), safe)``. |
| 544 | |
| 545 | Example: ``quote('/El Niño/')`` yields ``'/El%20Ni%C3%B1o/'``. |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 546 | |
| 547 | |
Georg Brandl | 7f01a13 | 2009-09-16 15:58:14 +0000 | [diff] [blame] | 548 | .. function:: quote_plus(string, safe='', encoding=None, errors=None) |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 549 | |
Georg Brandl | 0f7ede4 | 2008-06-23 11:23:31 +0000 | [diff] [blame] | 550 | Like :func:`quote`, but also replace spaces by plus signs, as required for |
Georg Brandl | 81c09db | 2009-07-29 07:27:08 +0000 | [diff] [blame] | 551 | quoting HTML form values when building up a query string to go into a URL. |
| 552 | Plus signs in the original string are escaped unless they are included in |
| 553 | *safe*. It also does not have *safe* default to ``'/'``. |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 554 | |
Guido van Rossum | 52dbbb9 | 2008-08-18 21:44:30 +0000 | [diff] [blame] | 555 | Example: ``quote_plus('/El Niño/')`` yields ``'%2FEl+Ni%C3%B1o%2F'``. |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 556 | |
Georg Brandl | 7f01a13 | 2009-09-16 15:58:14 +0000 | [diff] [blame] | 557 | |
| 558 | .. function:: quote_from_bytes(bytes, safe='/') |
Guido van Rossum | 52dbbb9 | 2008-08-18 21:44:30 +0000 | [diff] [blame] | 559 | |
| 560 | Like :func:`quote`, but accepts a :class:`bytes` object rather than a |
| 561 | :class:`str`, and does not perform string-to-bytes encoding. |
| 562 | |
| 563 | Example: ``quote_from_bytes(b'a&\xef')`` yields |
| 564 | ``'a%26%EF'``. |
| 565 | |
Georg Brandl | 7f01a13 | 2009-09-16 15:58:14 +0000 | [diff] [blame] | 566 | |
| 567 | .. function:: unquote(string, encoding='utf-8', errors='replace') |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 568 | |
| 569 | Replace ``%xx`` escapes by their single-character equivalent. |
Guido van Rossum | 52dbbb9 | 2008-08-18 21:44:30 +0000 | [diff] [blame] | 570 | The optional *encoding* and *errors* parameters specify how to decode |
| 571 | percent-encoded sequences into Unicode characters, as accepted by the |
| 572 | :meth:`bytes.decode` method. |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 573 | |
Stein Karlsen | aad2ee0 | 2019-10-14 12:36:29 +0200 | [diff] [blame] | 574 | *string* may be either a :class:`str` or a :class:`bytes`. |
Guido van Rossum | 52dbbb9 | 2008-08-18 21:44:30 +0000 | [diff] [blame] | 575 | |
| 576 | *encoding* defaults to ``'utf-8'``. |
| 577 | *errors* defaults to ``'replace'``, meaning invalid sequences are replaced |
| 578 | by a placeholder character. |
| 579 | |
| 580 | Example: ``unquote('/El%20Ni%C3%B1o/')`` yields ``'/El Niño/'``. |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 581 | |
Stein Karlsen | aad2ee0 | 2019-10-14 12:36:29 +0200 | [diff] [blame] | 582 | .. versionchanged:: 3.9 |
| 583 | *string* parameter supports bytes and str objects (previously only str). |
| 584 | |
| 585 | |
| 586 | |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 587 | |
Georg Brandl | 7f01a13 | 2009-09-16 15:58:14 +0000 | [diff] [blame] | 588 | .. function:: unquote_plus(string, encoding='utf-8', errors='replace') |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 589 | |
Georg Brandl | 0f7ede4 | 2008-06-23 11:23:31 +0000 | [diff] [blame] | 590 | Like :func:`unquote`, but also replace plus signs by spaces, as required for |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 591 | unquoting HTML form values. |
| 592 | |
Guido van Rossum | 52dbbb9 | 2008-08-18 21:44:30 +0000 | [diff] [blame] | 593 | *string* must be a :class:`str`. |
| 594 | |
| 595 | Example: ``unquote_plus('/El+Ni%C3%B1o/')`` yields ``'/El Niño/'``. |
| 596 | |
Georg Brandl | 7f01a13 | 2009-09-16 15:58:14 +0000 | [diff] [blame] | 597 | |
Guido van Rossum | 52dbbb9 | 2008-08-18 21:44:30 +0000 | [diff] [blame] | 598 | .. function:: unquote_to_bytes(string) |
| 599 | |
| 600 | Replace ``%xx`` escapes by their single-octet equivalent, and return a |
| 601 | :class:`bytes` object. |
| 602 | |
| 603 | *string* may be either a :class:`str` or a :class:`bytes`. |
| 604 | |
| 605 | If it is a :class:`str`, unescaped non-ASCII characters in *string* |
| 606 | are encoded into UTF-8 bytes. |
| 607 | |
Nick Coghlan | 9fc443c | 2010-11-30 15:48:08 +0000 | [diff] [blame] | 608 | Example: ``unquote_to_bytes('a%26%EF')`` yields ``b'a&\xef'``. |
Guido van Rossum | 52dbbb9 | 2008-08-18 21:44:30 +0000 | [diff] [blame] | 609 | |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 610 | |
R David Murray | c17686f | 2015-05-17 20:44:50 -0400 | [diff] [blame] | 611 | .. function:: urlencode(query, doseq=False, safe='', encoding=None, \ |
| 612 | errors=None, quote_via=quote_plus) |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 613 | |
Senthil Kumaran | df022da | 2010-07-03 17:48:22 +0000 | [diff] [blame] | 614 | Convert a mapping object or a sequence of two-element tuples, which may |
Martin Panter | cda85a0 | 2015-11-24 22:33:18 +0000 | [diff] [blame] | 615 | contain :class:`str` or :class:`bytes` objects, to a percent-encoded ASCII |
| 616 | text string. If the resultant string is to be used as a *data* for POST |
| 617 | operation with the :func:`~urllib.request.urlopen` function, then |
| 618 | it should be encoded to bytes, otherwise it would result in a |
| 619 | :exc:`TypeError`. |
Senthil Kumaran | 6b3434a | 2012-03-15 18:11:16 -0700 | [diff] [blame] | 620 | |
Senthil Kumaran | df022da | 2010-07-03 17:48:22 +0000 | [diff] [blame] | 621 | The resulting string is a series of ``key=value`` pairs separated by ``'&'`` |
R David Murray | c17686f | 2015-05-17 20:44:50 -0400 | [diff] [blame] | 622 | characters, where both *key* and *value* are quoted using the *quote_via* |
| 623 | function. By default, :func:`quote_plus` is used to quote the values, which |
| 624 | means spaces are quoted as a ``'+'`` character and '/' characters are |
| 625 | encoded as ``%2F``, which follows the standard for GET requests |
| 626 | (``application/x-www-form-urlencoded``). An alternate function that can be |
| 627 | passed as *quote_via* is :func:`quote`, which will encode spaces as ``%20`` |
| 628 | and not encode '/' characters. For maximum control of what is quoted, use |
| 629 | ``quote`` and specify a value for *safe*. |
| 630 | |
| 631 | When a sequence of two-element tuples is used as the *query* |
Senthil Kumaran | df022da | 2010-07-03 17:48:22 +0000 | [diff] [blame] | 632 | argument, the first element of each tuple is a key and the second is a |
| 633 | value. The value element in itself can be a sequence and in that case, if |
Serhiy Storchaka | a97cd2e | 2016-10-19 16:43:42 +0300 | [diff] [blame] | 634 | the optional parameter *doseq* is evaluates to ``True``, individual |
Senthil Kumaran | df022da | 2010-07-03 17:48:22 +0000 | [diff] [blame] | 635 | ``key=value`` pairs separated by ``'&'`` are generated for each element of |
| 636 | the value sequence for the key. The order of parameters in the encoded |
Nick Coghlan | 9fc443c | 2010-11-30 15:48:08 +0000 | [diff] [blame] | 637 | string will match the order of parameter tuples in the sequence. |
Senthil Kumaran | df022da | 2010-07-03 17:48:22 +0000 | [diff] [blame] | 638 | |
R David Murray | 8c4e112 | 2014-12-24 21:23:18 -0500 | [diff] [blame] | 639 | The *safe*, *encoding*, and *errors* parameters are passed down to |
R David Murray | c17686f | 2015-05-17 20:44:50 -0400 | [diff] [blame] | 640 | *quote_via* (the *encoding* and *errors* parameters are only passed |
R David Murray | 8c4e112 | 2014-12-24 21:23:18 -0500 | [diff] [blame] | 641 | when a query element is a :class:`str`). |
Nick Coghlan | 9fc443c | 2010-11-30 15:48:08 +0000 | [diff] [blame] | 642 | |
| 643 | To reverse this encoding process, :func:`parse_qs` and :func:`parse_qsl` are |
| 644 | provided in this module to parse query strings into Python data structures. |
Senthil Kumaran | df022da | 2010-07-03 17:48:22 +0000 | [diff] [blame] | 645 | |
Senthil Kumaran | 2933312 | 2011-02-11 11:25:47 +0000 | [diff] [blame] | 646 | Refer to :ref:`urllib examples <urllib-examples>` to find out how urlencode |
| 647 | method can be used for generating query string for a URL or data for POST. |
| 648 | |
Senthil Kumaran | df022da | 2010-07-03 17:48:22 +0000 | [diff] [blame] | 649 | .. versionchanged:: 3.2 |
Georg Brandl | 67b21b7 | 2010-08-17 15:07:14 +0000 | [diff] [blame] | 650 | Query parameter supports bytes and string objects. |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 651 | |
R David Murray | c17686f | 2015-05-17 20:44:50 -0400 | [diff] [blame] | 652 | .. versionadded:: 3.5 |
| 653 | *quote_via* parameter. |
| 654 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 655 | |
| 656 | .. seealso:: |
| 657 | |
Senthil Kumaran | 6257bdd | 2010-04-22 05:53:18 +0000 | [diff] [blame] | 658 | :rfc:`3986` - Uniform Resource Identifiers |
Senthil Kumaran | fe9230a | 2011-06-19 13:52:49 -0700 | [diff] [blame] | 659 | This is the current standard (STD66). Any changes to urllib.parse module |
Senthil Kumaran | 6257bdd | 2010-04-22 05:53:18 +0000 | [diff] [blame] | 660 | should conform to this. Certain deviations could be observed, which are |
Georg Brandl | 6faee4e | 2010-09-21 14:48:28 +0000 | [diff] [blame] | 661 | mostly for backward compatibility purposes and for certain de-facto |
Senthil Kumaran | 6257bdd | 2010-04-22 05:53:18 +0000 | [diff] [blame] | 662 | parsing requirements as commonly observed in major browsers. |
| 663 | |
| 664 | :rfc:`2732` - Format for Literal IPv6 Addresses in URL's. |
| 665 | This specifies the parsing requirements of IPv6 URLs. |
| 666 | |
| 667 | :rfc:`2396` - Uniform Resource Identifiers (URI): Generic Syntax |
| 668 | Document describing the generic syntactic requirements for both Uniform Resource |
| 669 | Names (URNs) and Uniform Resource Locators (URLs). |
| 670 | |
| 671 | :rfc:`2368` - The mailto URL scheme. |
Martin Panter | fe289c0 | 2016-05-28 02:20:39 +0000 | [diff] [blame] | 672 | Parsing requirements for mailto URL schemes. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 673 | |
| 674 | :rfc:`1808` - Relative Uniform Resource Locators |
| 675 | This Request For Comments includes the rules for joining an absolute and a |
| 676 | relative URL, including a fair number of "Abnormal Examples" which govern the |
| 677 | treatment of border cases. |
| 678 | |
Senthil Kumaran | 6257bdd | 2010-04-22 05:53:18 +0000 | [diff] [blame] | 679 | :rfc:`1738` - Uniform Resource Locators (URL) |
| 680 | This specifies the formal syntax and semantics of absolute URLs. |